[DP-203] Data Engineering on Microsoft Azure

Azure

목록 보기

1/27

https://learn.microsoft.com/ko-kr/credentials/certifications/resources/study-guides/dp-203#skills-at-a-glance

데이터 스토리지 설계 및 구현(15–20%)
- 파티션 전략
- 데이터 탐색 레이어
데이터 처리 개발(40–45%)
- 데이터 수집 및 변환
- 일괄 처리 솔루션 (a batch processing)
- 스트림 처리 솔루션
- 일괄 처리 및 파이프라인 관리
데이터 스토리지 및 데이터 처리 보안, 모니터링 및 최적화(30–35%)
- 데이터 보안
- 데이터 스토리지 및 데이터 처리 모니터링
- 데이터 스토리지 및 데이터 처리 최적화, 트러블슈팅

to provide accessible, clean data in a useable foramt

Azure Blob: Primary storage service that includes Data Lakes
Azure Data Factory: pipelines of Azure
Azure Synapse Analytics: 구조화된 데이터를 다루는 데 최적화된 플랫폼
Azure Stream Analytics: streaming capability and light transformation
Azure Databricks: provides ETL, analytics, and machine learning at a massive scale

Structured vs. Unstructured Data
- Structured: Relational, Fixed Schema, Complex Queries, Vertical(수직) Scaling(~RAM, CPU Power)
- Unstructured: Non-Relational, Dynamic, Not for Complex Queries, Horizontal(수평) Scaling
Azure Blob Storage: general-purpose object store
Data Lake with Blob storage
- 계층적 네임스페이스
  - 활성화하면: 데이터 레이크 생성 가능
Data Lake Architecture
Data Source >>> Ingestion >>> Data Lake (Raw - Processed - Curated)

Azure Data Factory
- cloud-based data integration service
  - create data-driven workflows in the cloud
    - that orchestrate and automate data movement and transformation
- data pipeline orchestration
Pipeline: logical grouping of activities
- activities perform a task
Activity: produces and consumes data set / runs on linked service
- processing steps in a pipelines
- 3 types of activities: movement, transformation, control
Datasets: represents a data items stored in linked service
- 데이터 저장소 내의 데이터 구조
- 입/출력 데이터가 존재하는 곳
Linked Services
- connection string needed to connect to data

Azure Synapse Analytics: It's SQL (more than SQL)
- Ingest: Data Factory
- Store: Data Lake, Blob Storage, SQL Database
- Prep & Train: Databricks, Azure Machine Learning
- Model & Serve: SQL Data Warehouse

Azure Stream Analytics
- Input: Event Hubs, IOT Hubs, Blob Storage
- Query: transformation
- Output: store and save results
Windowing: sliding, tumbling, hopping

I want to improve more 👩🏻‍💻