Databricks - Delta Lake

no-glass-otackuยท2026๋…„ 6์›” 18์ผ

MS data school

๋ชฉ๋ก ๋ณด๊ธฐ
17/25

Delta Lake

์˜คํ”ˆ์†Œ์Šค!

์Šคํ‚ค๋งˆ ์ง„ํ™”

  • ๋ฐ์ดํ„ฐ ๋ณ€๊ฒฝ์— ๋”ฐ๋ผ ํ…Œ์ด๋ธ” ์Šคํ‚ค๋งˆ๊ฐ€ ์ž๋™์œผ๋กœ ์กฐ์ •!

์Šคํ‚ค๋งˆ ๊ฐ•์ œ

  • ์ž…๋ ฅ๋˜๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์ •์˜๋œ ์Šคํ‚ค๋งˆ์™€ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธ

๐ŸŒŸ์‹œ๊ฐ„ ์—ฌํ–‰ : ์•„๋ž˜์—์„œ ์ž์„ธํžˆ ๋‹ค๋ฃธ

Delta Lake

๋ฒ„์ „ ๊ด€๋ฆฌ, ์ตœ์ ํ™”, VACUUM ํ•ต์‹ฌ ์ •๋ฆฌ

Databricks ์‹ค์Šต ์ค‘ ํ—ท๊ฐˆ๋ ธ๋˜ ๊ฐœ๋…๋“ค์„ ์ง์ ‘ ์งˆ๋ฌธํ•˜๋ฉฐ ์ •๋ฆฌํ•œ ๋…ธํŠธ


1. DESCRIBE HISTORY โ€” ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ ์กฐํšŒ

DESCRIBE HISTORY beans

Delta Lake๋Š” ํ…Œ์ด๋ธ”์„ ์ˆ˜์ •ํ•˜๋Š” ๋ชจ๋“  ์ž‘์—…์„ ๋ฒ„์ „์œผ๋กœ ๊ธฐ๋กํ•œ๋‹ค.

versionoperation
0CREATE TABLE
1WRITE
2WRITE
3UPDATE
4UPDATE
5DELETE
6MERGE
  • operationParameters : WHERE ์กฐ๊ฑด ๋“ฑ ์ƒ์„ธ ํŒŒ๋ผ๋ฏธํ„ฐ ํ™•์ธ ๊ฐ€๋Šฅ
  • operationMetrics : ์ถ”๊ฐ€/์‚ญ์ œ๋œ ํ–‰, ํŒŒ์ผ ์ˆ˜ ํ™•์ธ ๊ฐ€๋Šฅ

2. Time Travel โ€” ์ด์ „ ๋ฒ„์ „ ์กฐํšŒ ๋ฐ ๋ณต์›

-- ๋ฒ„์ „์œผ๋กœ ์กฐํšŒ
SELECT * FROM beans VERSION AS OF 1
SELECT * FROM beans@v1   -- ๋‹จ์ถ• ๋ฌธ๋ฒ•

-- ํƒ€์ž„์Šคํƒฌํ”„๋กœ ์กฐํšŒ
SELECT * FROM beans TIMESTAMP AS OF '2024-01-01'

-- ์ž„์‹œ ๋ทฐ๋กœ ๋“ฑ๋ก
CREATE OR REPLACE TEMP VIEW pre_delete_vw AS
SELECT * FROM beans VERSION AS OF 4

-- ํŠน์ • ๋ฒ„์ „์œผ๋กœ ๋ณต์›
RESTORE TABLE beans TO VERSION AS OF 5

๐Ÿ“Œ ์งˆ๋ฌธํ–ˆ๋˜ ๊ฒƒ: CREATE OR REPLACE TEMP VIEW๋Š” ์™œ ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ์— ์•ˆ ๋‚จ์„๊นŒ?

VIEW๋Š” ์‹ค์ œ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค. "์–ด๋–ป๊ฒŒ ๋ณผ์ง€"์— ๋Œ€ํ•œ ์ •์˜๋งŒ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— Delta ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ์— ๊ธฐ๋ก๋˜์ง€ ์•Š๋Š”๋‹ค. TEMP ์—ฌ๋ถ€์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ, ์ผ๋ฐ˜ VIEW๋„ ๋งˆ์ฐฌ๊ฐ€์ง€.


3. OPTIMIZE + Z-Order โ€” ํŒŒ์ผ ์••์ถ• ๋ฐ ๋ฐ์ดํ„ฐ ์žฌ๋ฐฐ์น˜

OPTIMIZE beans ZORDER BY (name)

์ด ๋ช…๋ น ํ•˜๋‚˜๊ฐ€ ๋‘ ๊ฐ€์ง€๋ฅผ ๋™์‹œ์— ํ•œ๋‹ค:

โ‘  Compaction (ํŒŒ์ผ ์••์ถ•)
์ž‘์€ ํŒŒ์ผ ์ˆ˜๋ฐฑ ๊ฐœ โ†’ ํฐ ํŒŒ์ผ ๋ช‡ ๊ฐœ๋กœ ๋ณ‘ํ•ฉ

โ‘ก Z-Ordering (๋ฐ์ดํ„ฐ ์žฌ๋ฐฐ์น˜)
name ๊ธฐ์ค€์œผ๋กœ ๋น„์Šทํ•œ ๊ฐ’์„ ๊ฐ™์€ ํŒŒ์ผ์— ๋ชจ์•„๋‘ 

Before: ํŒŒ์ผ1 [Alice, Zoe, Bob]  ํŒŒ์ผ2 [Carol, Alice, Zoe]
After:  ํŒŒ์ผ1 [Alice, Alice]     ํŒŒ์ผ2 [Bob, Bob]     ํŒŒ์ผ3 [Carol, Carol]

โ†’ WHERE name = 'Alice' ์ฟผ๋ฆฌ ์‹œ ํŒŒ์ผ 1๊ฐœ๋งŒ ์ฝ์œผ๋ฉด ๋จ (Data Skipping)

๐Ÿ“Œ ์งˆ๋ฌธํ–ˆ๋˜ ๊ฒƒ: ํŒŒ์ผ ๊ฐœ์ˆ˜๋Š” ์–ด๋–ป๊ฒŒ ํ™•์ธํ•ด?

DESCRIBE DETAIL beans
-- numFiles, sizeInBytes ์ปฌ๋Ÿผ ํ™•์ธ

OPTIMIZE ๊ฒฐ๊ณผ ์ถœ๋ ฅ์—์„œ numFilesAdded, numFilesRemoved๋กœ ๋ณ€ํ™”๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. ํ˜„์žฌ ์‹ค์Šต์˜ beans ํ…Œ์ด๋ธ”์€ numFiles: 1, sizeInBytes: 1685 (~1.6KB)๋กœ ๋งค์šฐ ์ž‘์•„์„œ Z-Order ํšจ๊ณผ๊ฐ€ ๊ฑฐ์˜ ์—†๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ํด์ˆ˜๋ก ํšจ๊ณผ๊ฐ€ ๊ทน์ ์œผ๋กœ ๋‚˜ํƒ€๋‚จ.


4. VACUUM โ€” ์˜ค๋ž˜๋œ ํŒŒ์ผ ์˜๊ตฌ ์‚ญ์ œ

-- ์‚ญ์ œ๋  ํŒŒ์ผ ๋ฏธ๋ฆฌ ๋ณด๊ธฐ (์‹ค์ œ ์‚ญ์ œ ์•ˆ ํ•จ)
VACUUM beans RETAIN 0 HOURS DRY RUN

-- ์‹ค์ œ ์‚ญ์ œ
VACUUM beans RETAIN 0 HOURS

๐Ÿ“Œ ์งˆ๋ฌธํ–ˆ๋˜ ๊ฒƒ: DRY RUN์ด ์ •ํ™•ํžˆ ๋ญ์•ผ?

"์‹ค์ œ๋กœ ์‹คํ–‰ํ•˜์ง€ ๋ง๊ณ , ๋ญ˜ ํ• ์ง€๋งŒ ๋ณด์—ฌ์ค˜"๋ผ๋Š” ๋œป. ์‚ญ์ œ ๋Œ€์ƒ ํŒŒ์ผ ๋ชฉ๋ก๋งŒ ์ถœ๋ ฅํ•œ๋‹ค. RETAIN 0 HOURS๋Š” ํ˜„์žฌ ๋ฒ„์ „ ์™ธ ๋ชจ๋“  ํŒŒ์ผ์„ ์‚ญ์ œํ•˜๋Š” ์œ„ํ—˜ํ•œ ์˜ต์…˜์ด๋ผ DRY RUN์œผ๋กœ ๋จผ์ € ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋‹ค.

๊ธฐ๋ณธ ๋ณด์กด ๊ธฐ๊ฐ„์€ 7์ผ์ด๋ฉฐ, ์•„๋ž˜ ์„ค์ •์œผ๋กœ ์šฐํšŒ ๊ฐ€๋Šฅ (ํ”„๋กœ๋•์…˜์—์„œ๋Š” ์ฃผ์˜):

SET spark.databricks.delta.retentionDurationCheck.enabled = false;
SET spark.databricks.delta.vacuum.logging.enabled = true;

5. VACUUM ํ›„ Time Travel์ด ๋ถˆ๊ฐ€๋Šฅํ•œ ์ด์œ 

SELECT * FROM beans@v1
-- FileReadException: ํŒŒ์ผ ์—†์Œ โ†’ ์—๋Ÿฌ ๋ฐœ์ƒ

๐Ÿ“Œ ์งˆ๋ฌธํ–ˆ๋˜ ๊ฒƒ: VACUUM ํ•ด๋„ DESCRIBE HISTORY๋Š” ๋ณด์ด๋Š” ์ด์œ ?

VACUUM์€ ํŒŒ์ผ์„ ์ง€์šฐ๋Š” ๊ฒƒ์ด์ง€, ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ๋ฅผ ์ง€์šฐ๋Š” ๊ฒŒ ์•„๋‹ˆ๋‹ค.

beans/
โ”œโ”€โ”€ _delta_log/          โ† ๋กœ๊ทธ ํŒŒ์ผ (VACUUM์ด ๊ฑด๋“œ๋ฆฌ์ง€ ์•Š์Œ)
โ”‚   โ”œโ”€โ”€ 00000.json       โ† "v0: ํ…Œ์ด๋ธ” ์ƒ์„ฑ"
โ”‚   โ””โ”€โ”€ 00001.json       โ† "v1: ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€"
โ”‚
โ””โ”€โ”€ part-0001.parquet    โ† ์‹ค์ œ ๋ฐ์ดํ„ฐ ํŒŒ์ผ (VACUUM์ด ์‚ญ์ œ)
  • DESCRIBE HISTORY โ†’ ๋กœ๊ทธ๋ฅผ ์ฝ๋Š” ๊ฒƒ โ†’ VACUUM ํ›„์—๋„ ๋ณด์ž„
  • beans@v1 ์กฐํšŒ โ†’ ์‹ค์ œ ํŒŒ์ผ์„ ์ฐพ๋Š” ๊ฒƒ โ†’ VACUUM ํ›„ FileReadException

์ฆ‰, VACUUM ํ›„ beans@v1์—์„œ ์—๋Ÿฌ๊ฐ€ ๋‚˜๋Š” ๊ฒƒ์ด ์ •์ƒ์ด๋‹ค. ์—๋Ÿฌ๊ฐ€ ๋‚˜์•ผ VACUUM์ด ์ œ๋Œ€๋กœ ๋œ ๊ฒƒ.


์ •๋ฆฌ: ๊ฐ ๋ช…๋ น์–ด ์—ญํ•  ํ•œ๋ˆˆ์— ๋ณด๊ธฐ

๋ช…๋ น์–ด์—ญํ• ๋กœ๊ทธ ๊ธฐ๋ก
DESCRIBE HISTORYํŠธ๋žœ์žญ์…˜ ๊ธฐ๋ก ์กฐํšŒ-
SELECT ... VERSION AS OF์ด์ „ ๋ฒ„์ „ ์กฐํšŒ-
RESTORE TABLE์ด์ „ ๋ฒ„์ „์œผ๋กœ ๋ณต์›โœ…
OPTIMIZE ZORDER BYํŒŒ์ผ ์••์ถ• + ๋ฐ์ดํ„ฐ ์žฌ๋ฐฐ์น˜โœ…
VACUUM DRY RUN์‚ญ์ œ ๋Œ€์ƒ ๋ฏธ๋ฆฌ๋ณด๊ธฐ-
VACUUM์˜ค๋ž˜๋œ ํŒŒ์ผ ์˜๊ตฌ ์‚ญ์ œโœ… (์„ค์ • ์‹œ)
profile
์ด์ œ ๊ฐœ๋ฐœํ•ด์•ผ์ง€...

0๊ฐœ์˜ ๋Œ“๊ธ€