[DP-203] Data Storage : 전체 리뷰

Azure

목록 보기

18/27

Azure Data Lake Storage Gen2 allows scalable, flexible and highly available storage for a variety of data format.
Distribution, partitioning, sharding and pruning are all unique methods of breaking up data into workable subsets.

Distriubtion(분배)는 Azure Synapse Analytics가 자동으로 생성해주는 60개의 기본 데이터베이스와 관련이 있으며,

Partitioning(파티셔닝)은 단일 데이터베이스 인스턴스를 여러 부분으로 나누는 것이다.

Sharding(샤딩)은 이와 다르게 데이터를 여러 컴퓨터에 분산시키고

Pruning은 쿼리 필터에 따라 단일 파티션 내에서도 특정 부분만 선택적으로 불러올 수 있게 한다.
Know your data well in order to make wise decisions on folder structure, file formats, and partition keys.

The storage solution for unstructured and semi-structured files that is most often used in Azure analytics solutions

Azure Data Lake Storage Gen2
The Azure Storage access tier which is intended for long-term backup and regulatory compliance

Archive
The type of compression which is allows you to reduce the size of rowstore objects by looking for patterns in the data and making replacements with smaller values

Page Compression
The sharding strategy which utilizes a map to direct queries to the appropriate shard

The Lookup Strategy
The Azure Storage redundancy option which creates synchronous copies across 3 availability zones in the primary region.

ZRS
The Databricks feature which allows you to skip files within a partition based on query files.

Dynamic File Pruning
The following file types will allow you to store nested data in a columnar format.

Parquet

[DP-203] Data Storage : Data Storage 관련 기본 개념 (1) > Available File Types (바로가기)
The partitioning method which allows you to improve isolation and data access performance by identifying a bounded context for a distinct business area

Functional Partitioning

[DP-203] Data Storage : Data Storage 관련 기본 개념 (2) : Partitioning > 파티셔닝 옵션 (바로가기)
In Azure Synapse Analytics, how many underlying distributions are automatically assigned?

60

I want to improve more 👩🏻‍💻