External merge sort
Basic algorithm: “Load-Sort-Store”
Some optimizations: blocked & overlapped I/O
DBMS Sorting 필요한 경우
Data requested in sorted order: order by
Sorting is the first step in bulk loading B+ tree index.
Sort-merge join algorithm involves sorting.
And, alternative implementation for “distinct” and “group by”
vs. hashing
Sorting Problem
Problem: e.g. sort 1Gb of data with 1MB of RAM
such as Oracle and DB2, use separate main memory, not buffer cache, for sorting!
Approaches
− 2-way merge sort
− N-way merge sort
− Three optimizations to reduce # of IOs
PGA
Program Global Area, 프로그램당 사용할 수 있는 메모리 공간
2-Way External Merge Sort (with 3 Buffers)
cf) #N = 2^s
x 2N 이유 : 패스별로 N개 읽고 N개 써야함.
General External Merge Sort (with B pages)
Cost of External Merge Sort
An improvement for CPU-intensive comparison
To use a bottom-up selection tree to support merging
External Merge-Sort: Blocked I/O
Optimal Cluster Size
External Merge-Sort: Double Buffering
To reduce wait time for I/O request to complete, can prefetch into `shadow block’.