๐Ÿ—‚๏ธ ๋ฐ์ดํ„ฐ๋ธŒ๋ฆญ์Šค์˜ DBFS๋ž€? โ€” Databricks File System

GarionNachalยท2026๋…„ 2์›” 23์ผ

databricks

๋ชฉ๋ก ๋ณด๊ธฐ
27/45

DBFS(Databricks File System) ๋Š” ๋ฐ์ดํ„ฐ๋ธŒ๋ฆญ์Šค ํ™˜๊ฒฝ์—์„œ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์†์‰ฝ๊ฒŒ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•œ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.
์ด ๊ธ€์—์„œ๋Š” DBFS์˜ ๊ฐœ๋…๋ถ€ํ„ฐ ๊ตฌ์กฐ, ์‹ค์ œ ์‚ฌ์šฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์ตœ์‹  ํŠธ๋ Œ๋“œ์ธ Unity Catalog์™€์˜ ๋น„๊ต๊นŒ์ง€ ํ•œ ๋ฒˆ์— ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“Œ ๋ชฉ์ฐจ

  1. DBFS๋ž€ ๋ฌด์—‡์ธ๊ฐ€?
  2. DBFS์˜ ํ•ต์‹ฌ ๊ตฌ์กฐ
  3. DBFS Root ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ
  4. ์ง€์› ํŒŒ์ผ ํฌ๋งท
  5. DBFS ์ฃผ์š” ์ด์ 
  6. ์‹ค์ „ ์‚ฌ์šฉ๋ฒ• โ€” ํŒŒ์ผ ์—…๋กœ๋“œ & ์ฟผ๋ฆฌ
  7. DBFS vs HDFS ๋น„๊ต
  8. DBFS vs Unity Catalog Volumes
  9. ๋งˆ๋ฌด๋ฆฌ ๋ฐ ๊ถŒ์žฅ ์‚ฌํ•ญ

๐Ÿ” DBFS๋ž€ ๋ฌด์—‡์ธ๊ฐ€?

DBFS(Databricks File System) ๋Š” ๋ฐ์ดํ„ฐ๋ธŒ๋ฆญ์Šค ์›Œํฌ์ŠคํŽ˜์ด์Šค์— ๊ธฐ๋ณธ ๋‚ด์žฅ๋œ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์œผ๋กœ,
Apache Spark ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ํด๋ผ์šฐ๋“œ ๊ธฐ๋ฐ˜ ์˜ค๋ธŒ์ ํŠธ ์Šคํ† ๋ฆฌ์ง€์— ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•œ ์ถ”์ƒํ™” ๋ ˆ์ด์–ด(Abstraction Layer) ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

์‰ฝ๊ฒŒ ๋งํ•ด, DBFS๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค:

  • โ˜๏ธ AWS S3, Azure Blob Storage / ADLS Gen2, Google Cloud Storage ๋“ฑ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ ์œ„์— ์˜ฌ๋ผ๊ฐ€๋Š” ํ†ตํ•ฉ ์ธํ„ฐํŽ˜์ด์Šค
  • ๐Ÿ—‚๏ธ ๋ณต์žกํ•œ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ API ๋Œ€์‹ , ๋กœ์ปฌ ํŒŒ์ผ ์‹œ์Šคํ…œ์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ๋ ˆ์ด์–ด
  • โšก Spark ์›Œํฌ๋กœ๋“œ์— ์ตœ์ ํ™”๋œ ๊ณ ์„ฑ๋Šฅ ํŒŒ์ผ I/O ์ œ๊ณต

![Databricks ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์š”](https://sspark.genspark.ai/cfimages?u1=gsidLQ4LZwpIWjvH5zCHCelW9%2Bmp11HeHLbBfJquSDMtY7OxFe7z4obIhqCQGBsK99LU0OcvzIpW%2FIIv49XnqGwI9Xl0H4I6nBmEmSVYSoku1rRlTHco&u2=WvvWf8WJLqhdHzpm&width=2560)
*โ–ฒ Databricks ํ”Œ๋žซํผ ์ „์ฒด ์•„ํ‚คํ…์ฒ˜ โ€” DBFS๋Š” Data Plane์˜ ์Šคํ† ๋ฆฌ์ง€ ๋ ˆ์ด์–ด์— ์œ„์น˜*

---

```markdown
> ๐Ÿ’ก **ํ•ต์‹ฌ ํฌ์ธํŠธ**  
> DBFS๋Š” Unix ๊ณ„์—ด ํŒŒ์ผ ์‹œ์Šคํ…œ๊ณผ ๋™์ผํ•œ ๋ช…๋ น์–ด(ls, cp, rm ๋“ฑ)๋กœ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.  
> ๋‚ด๋ถ€์ ์œผ๋กœ๋Š” ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ API ํ˜ธ์ถœ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

---

## ๐Ÿ—๏ธ DBFS์˜ ํ•ต์‹ฌ ๊ตฌ์กฐ

DBFS๋Š” ํฌ๊ฒŒ ๋‘ ๊ฐ€์ง€ ๊ฐœ๋…์œผ๋กœ ๋‚˜๋‰ฉ๋‹ˆ๋‹ค:

### 1. DBFS Root (`dbfs:/`)
์›Œํฌ์ŠคํŽ˜์ด์Šค ์ƒ์„ฑ ์‹œ **์ž๋™์œผ๋กœ ํ”„๋กœ๋น„์ €๋‹**๋˜๋Š” ๊ธฐ๋ณธ ์Šคํ† ๋ฆฌ์ง€ ์œ„์น˜์ž…๋‹ˆ๋‹ค.  
`dbfs:/` ์Šคํ‚ด์œผ๋กœ ์ ‘๊ทผํ•˜๋ฉฐ, ์›Œํฌ์ŠคํŽ˜์ด์Šค์˜ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ ๊ณ„์ •์— ์œ„์น˜ํ•ฉ๋‹ˆ๋‹ค.

```python
# DBFS Root ์ ‘๊ทผ ์˜ˆ์‹œ (Spark API Format)
df = spark.read.csv("dbfs:/FileStore/mydata.csv", header=True)

# ๋˜๋Š” File API Format
import os
files = os.listdir("/dbfs/FileStore/")

2. DBFS Mounts (/mnt/)

์™ธ๋ถ€ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€๋ฅผ DBFS์— ๋งˆ์šดํŠธํ•˜์—ฌ, ๋งˆ์น˜ ๋กœ์ปฌ ๋””๋ ‰ํ† ๋ฆฌ์ฒ˜๋Ÿผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

# S3 ๋ฒ„ํ‚ท ๋งˆ์šดํŠธ ์˜ˆ์‹œ
dbutils.fs.mount(
  source = "s3a://my-s3-bucket/",
  mount_point = "/mnt/my-data",
  extra_configs = {"fs.s3a.access.key": ACCESS_KEY,
                   "fs.s3a.secret.key": SECRET_KEY}
)

# ๋งˆ์šดํŠธ๋œ ๊ฒฝ๋กœ ์ ‘๊ทผ
df = spark.read.parquet("/mnt/my-data/sales/2024/")

โš ๏ธ ์ค‘์š” ๊ณต์ง€ (2024~): DBFS Root ๋ฐ DBFS Mounts๋Š” Deprecated(์ง€์› ์ค‘๋‹จ ์˜ˆ์ •) ์ƒํƒœ์ž…๋‹ˆ๋‹ค.
์‹ ๊ทœ ๊ณ„์ •์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด ๊ธฐ๋Šฅ์— ์ ‘๊ทผํ•  ์ˆ˜ ์—†์œผ๋ฉฐ, Databricks๋Š” Unity Catalog Volumes๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ“ DBFS Root ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ

DBFS Root(dbfs:/)์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ธฐ๋ณธ ๋””๋ ‰ํ† ๋ฆฌ๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค:

๋””๋ ‰ํ† ๋ฆฌ์„ค๋ช…
/FileStoreUI๋ฅผ ํ†ตํ•ด ์—…๋กœ๋“œํ•œ ๋ฐ์ดํ„ฐยท๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌยท์ƒ์„ฑ๋œ ํ”Œ๋กฏ ์ €์žฅ
/databricks-datasetsDatabricks ์ œ๊ณต ์˜คํ”ˆ์†Œ์Šค ์˜ˆ์ œ ๋ฐ์ดํ„ฐ์…‹
/databricks-results์ฟผ๋ฆฌ ๊ฒฐ๊ณผ ๋‹ค์šด๋กœ๋“œ ํŒŒ์ผ ์ €์žฅ
/databricks/init๋ ˆ๊ฑฐ์‹œ ๊ธ€๋กœ๋ฒŒ ์ดˆ๊ธฐํ™” ์Šคํฌ๋ฆฝํŠธ (ํ˜„์žฌ ๋น„๊ถŒ์žฅ)
/user/hive/warehouseHive Metastore ๊ด€๋ฆฌ ํ…Œ์ด๋ธ” ๊ธฐ๋ณธ ์ €์žฅ ์œ„์น˜
/mnt์™ธ๋ถ€ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ ๋งˆ์šดํŠธ ํฌ์ธํŠธ
# DBFS ๋””๋ ‰ํ† ๋ฆฌ ๋ชฉ๋ก ์กฐํšŒ
display(dbutils.fs.ls("dbfs:/"))

๐Ÿ“ฆ ์ง€์› ํŒŒ์ผ ํฌ๋งท

DBFS๋Š” ๋‹ค์–‘ํ•œ ๋น…๋ฐ์ดํ„ฐ ํฌ๋งท์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค:


![DBFS ํŒŒ์ผ ์‹œ์Šคํ…œ ๊ตฌ์กฐ๋„](https://sspark.genspark.ai/cfimages?u1=6PKkX40DLKMwI0KUnuX86U8THI7v5%2BsXG40lzp1NhvL6zb6wMHwDznexe%2B%2FAaQzsceFsrsX9v32CrG%2FV4j8ggDQ%2FmUHZqH%2FSH13AjzRYrj%2B%2Fvqz1&u2=jv3RbziBhMlPHIvG&width=2560)
*โ–ฒ DBFS ๊ตฌ์กฐ โ€” ํด๋ผ์šฐ๋“œ ์˜ค๋ธŒ์ ํŠธ ์Šคํ† ๋ฆฌ์ง€ ์œ„์˜ ์ถ”์ƒํ™” ๋ ˆ์ด์–ด*

---

```markdown
| ํฌ๋งท | ํ˜•ํƒœ | ํŠน์ง• |
|------|------|------|
| **Parquet** โญ | ์ปฌ๋Ÿผํ˜•, ๋ฐ”์ด๋„ˆ๋ฆฌ | ์ตœ๊ณ  ์••์ถ•๋ฅ  & ๋ถ„์„ ์ฟผ๋ฆฌ ์ตœ์ ํ™” |
| **Delta Lake** โญ | Parquet + ํŠธ๋žœ์žญ์…˜ ๋กœ๊ทธ | ACID ํŠธ๋žœ์žญ์…˜, ํƒ€์ž„ํŠธ๋ž˜๋ธ” ์ง€์› |
| **Avro** | ํ–‰ ๊ธฐ๋ฐ˜, ๋ฐ”์ด๋„ˆ๋ฆฌ | ์Šคํ‚ค๋งˆ ์ง„ํ™”(Schema Evolution) ์ง€์› |
| **JSON** | ํ…์ŠคํŠธ, ๋ฐ˜์ •ํ˜• | NoSQL ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ์ ํ•ฉ |
| **ORC** | ์ปฌ๋Ÿผํ˜•, ๋ฐ”์ด๋„ˆ๋ฆฌ | Hive ์›Œํฌ๋กœ๋“œ์— ์ตœ์ ํ™” |
| **CSV** | ํ…์ŠคํŠธ, ์ •ํ˜• | ๋‹จ์ˆœํ•˜๊ณ  ๋ฒ”์šฉ์  |
| **์ด๋ฏธ์ง€/์˜ค๋””์˜ค/PDF** | ๋ฐ”์ด๋„ˆ๋ฆฌ | ML ํ•™์Šต ๋ฐ์ดํ„ฐ ์ €์žฅ |

> ๐Ÿ“Š **์„ฑ๋Šฅ ํŒ**: ๋ถ„์„ ์›Œํฌ๋กœ๋“œ์—๋Š” **Parquet** ๋˜๋Š” **Delta Lake** ํฌ๋งท ์‚ฌ์šฉ์„ ๊ฐ•๋ ฅํžˆ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.  
> ์ปฌ๋Ÿผํ˜• ์ €์žฅ ๋ฐฉ์‹์œผ๋กœ ์ฟผ๋ฆฌ ์†๋„์™€ ์••์ถ•๋ฅ  ๋ชจ๋‘ ํƒ์›”ํ•ฉ๋‹ˆ๋‹ค.

---

## โœ… DBFS ์ฃผ์š” ์ด์ 

### โ‘  ํ†ตํ•ฉ ์ธํ„ฐํŽ˜์ด์Šค (Unified Interface)
AWS S3, Azure ADLS, GCS ๋“ฑ **๊ฐ ํด๋ผ์šฐ๋“œ๋ณ„ ๋‹ค๋ฅธ API**๋ฅผ ๋ฐฐ์šธ ํ•„์š” ์—†์ด, DBFS ํ•˜๋‚˜์˜ ์ธํ„ฐํŽ˜์ด์Šค๋กœ ํ†ตํ•ฉ ์ ‘๊ทผ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

### โ‘ก Spark ์›Œํฌ๋กœ๋“œ ์ตœ์ ํ™”
ETL, ๋จธ์‹ ๋Ÿฌ๋‹, ์• ๋“œํ˜น ๋ถ„์„ ๋“ฑ **Spark ๊ธฐ๋ฐ˜ ์›Œํฌ๋กœ๋“œ**์— ์ตœ์ ํ™”๋œ I/O ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

### โ‘ข ๋ฌด์ œํ•œ ํ™•์žฅ์„ฑ
๋ฐฑ์—”๋“œ์˜ ํด๋ผ์šฐ๋“œ ์˜ค๋ธŒ์ ํŠธ ์Šคํ† ๋ฆฌ์ง€ ๋•๋ถ„์— **์Šคํ† ๋ฆฌ์ง€ ์šฉ๋Ÿ‰ ์ œํ•œ์ด ์—†์Šต๋‹ˆ๋‹ค**.  
๋ฐ์ดํ„ฐ ์ฆ๊ฐ€์— ๋”ฐ๋ผ ์ž๋™ ์Šค์ผ€์ผ์—…/๋‹ค์šด๋ฉ๋‹ˆ๋‹ค.

### โ‘ฃ ํด๋Ÿฌ์Šคํ„ฐ ๊ฐ„ ๋ฐ์ดํ„ฐ ๊ณต์œ 
์—ฌ๋Ÿฌ Spark ํด๋Ÿฌ์Šคํ„ฐ์—์„œ **๋™์ผํ•œ DBFS ๋ฐ์ดํ„ฐ์— ๋™์‹œ ์ ‘๊ทผ** ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.  
ํด๋Ÿฌ์Šคํ„ฐ๊ฐ€ ์ข…๋ฃŒ๋˜์–ด๋„ ๋ฐ์ดํ„ฐ๋Š” ๋ณด์กด๋ฉ๋‹ˆ๋‹ค. (์ปดํ“จํŒ…-์Šคํ† ๋ฆฌ์ง€ ๋ถ„๋ฆฌ)

### โ‘ค ๋‹ค์–‘ํ•œ API ์ง€์›
```python
# Spark DataFrame API
df = spark.read.parquet("dbfs:/data/sales/")

# Python OS API
import os
files = os.listdir("/dbfs/data/")

# dbutils API
dbutils.fs.ls("dbfs:/data/")
dbutils.fs.cp("dbfs:/source/", "dbfs:/dest/", recurse=True)

# CLI
# databricks fs ls dbfs:/

๐Ÿ› ๏ธ ์‹ค์ „ ์‚ฌ์šฉ๋ฒ• โ€” ํŒŒ์ผ ์—…๋กœ๋“œ & ์ฟผ๋ฆฌ

Step 1: DBFS ํŒŒ์ผ ๋ธŒ๋ผ์šฐ์ € ํ™œ์„ฑํ™”

Admin Settings โ†’ Workspace Settings โ†’ Advanced โ†’ Enable DBFS File Browser ์ฒดํฌ ํ›„ ์ƒˆ๋กœ๊ณ ์นจ

Step 2: ํŒŒ์ผ ์—…๋กœ๋“œ

Catalog โ†’ DBFS ํƒญ โ†’ Upload ๋ฒ„ํŠผ์œผ๋กœ ๋กœ์ปฌ ํŒŒ์ผ ์—…๋กœ๋“œ
๊ธฐ๋ณธ ์ €์žฅ ์œ„์น˜: /FileStore/

Step 3: DataFrame ์ƒ์„ฑ ๋ฐ ๋ถ„์„

# CSV ํŒŒ์ผ ์ฝ๊ธฐ
df = spark.read.format("csv") \
    .option("header", "true") \
    .option("inferSchema", "true") \
    .load("dbfs:/FileStore/sales_data.csv")

# ์ž„์‹œ ๋ทฐ ์ƒ์„ฑ
df.createOrReplaceTempView("sales_data")

# SQL๋กœ ๋ถ„์„
%sql
SELECT region, SUM(revenue) as total_revenue
FROM sales_data
GROUP BY region
ORDER BY total_revenue DESC
LIMIT 10

Step 4: DataFrame API๋กœ๋„ ์กฐํšŒ ๊ฐ€๋Šฅ

# ์ƒ์œ„ 10๊ฐœ ํ–‰ ํ™•์ธ
display(df.limit(10))

# ํŠน์ • ์ปฌ๋Ÿผ ํ•„ํ„ฐ๋ง
df.filter(df["revenue"] > 10000).show()

ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ (CLI ๋ฐฉ์‹)

# Databricks CLI๋กœ DBFS โ†’ ๋กœ์ปฌ ๋‹ค์šด๋กœ๋“œ
databricks fs cp dbfs:/FileStore/result.csv ./local/result.csv

# ๋””๋ ‰ํ† ๋ฆฌ ์ „์ฒด ๋‹ค์šด๋กœ๋“œ
databricks fs cp -r dbfs:/FileStore/results/ ./local/results/

โš”๏ธ DBFS vs HDFS ๋น„๊ต

ํ•ญ๋ชฉDBFSHDFS
์Šคํ† ๋ฆฌ์ง€ ๋ฐฑ์—”๋“œํด๋ผ์šฐ๋“œ ์˜ค๋ธŒ์ ํŠธ ์Šคํ† ๋ฆฌ์ง€ (S3, ADLS, GCS)๋กœ์ปฌ ์„œ๋ฒ„ ๋ธ”๋ก ์Šคํ† ๋ฆฌ์ง€
์•„ํ‚คํ…์ฒ˜์„œ๋ฒ„๋ฆฌ์Šค, ์ปดํ“จํŒ…-์Šคํ† ๋ฆฌ์ง€ ๋ถ„๋ฆฌNameNode-DataNode ๋งˆ์Šคํ„ฐ-์Šฌ๋ ˆ์ด๋ธŒ
ํ™•์žฅ์„ฑโœ… ๋ฌด์ œํ•œ (ํด๋ผ์šฐ๋“œ ์ž๋™ ์Šค์ผ€์ผ)โš ๏ธ ์ œํ•œ์  (์ˆ˜๋™ ๋…ธ๋“œ ์ถ”๊ฐ€ ํ•„์š”)
๋žœ๋ค ์•ก์„ธ์Šคโœ… ์šฐ์ˆ˜โŒ ๋ฏธํก (์ˆœ์ฐจ ์ฝ๊ธฐ์— ์ตœ์ ํ™”)
ํด๋Ÿฌ์Šคํ„ฐ ์ข…๋ฃŒ ์‹œโœ… ๋ฐ์ดํ„ฐ ๋ณด์กดโŒ ํด๋Ÿฌ์Šคํ„ฐ์™€ ํ•จ๊ป˜ ์†Œ๋ฉธ ์œ„ํ—˜
๋น„์šฉ ๋ชจ๋ธPay-as-you-go (ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ ๋น„์šฉ)ํ•˜๋“œ์›จ์–ด ๊ตฌ๋งค/์œ ์ง€ ๋น„์šฉ
๋ณด์•ˆSSL, IAM, RBAC ํ†ตํ•ฉKerberos, ACL
ํ™œ์šฉ ํ™˜๊ฒฝํด๋ผ์šฐ๋“œ ๋„ค์ดํ‹ฐ๋ธŒ ๋ฐ์ดํ„ฐ ํ”Œ๋žซํผ์˜จํ”„๋ ˆ๋ฏธ์Šค Hadoop ์—์ฝ”์‹œ์Šคํ…œ

๐Ÿ†š DBFS vs Unity Catalog Volumes


![Unity Catalog vs DBFS Mount ๋น„๊ต](https://sspark.genspark.ai/cfimages?u1=D2skshvWYtVideHCaTxxCVJUQtxXLZPfRMQpwJ7eqgYUxfaguZJeuLzjD4f9NipQ3wwhzeWnBL%2FbUFLOJS3DuQIoaKqMcQr%2BGrP1vRUp6l%2BkvRTVD2nFOfStQ%2B4lyOH8RwxJetfKb7uLkx317EqsxexiVsMPDzhoAgIo%2B49rRQA8R2M51gg3%2Bh1%2BvvA3h9pskXgNsalsH4F1QLWluybGvgrbI5CScSE%2FDqp2OJU%3D&u2=5WAu9Y5htsS3%2FZ5G&width=2560)
*โ–ฒ Unity Catalog Volumes vs DBFS Mount โ€” ์ตœ์‹  Databricks ๊ถŒ์žฅ ์•„ํ‚คํ…์ฒ˜*

---

```markdown
Databricks๋Š” 2023๋…„ ์ดํ›„ **Unity Catalog Volumes**๋ฅผ ๊ณต์‹ ๊ถŒ์žฅ ์Šคํ† ๋ฆฌ์ง€๋กœ ๋ฐœํ‘œํ–ˆ์Šต๋‹ˆ๋‹ค.

| ํ•ญ๋ชฉ | DBFS | Unity Catalog Volumes |
|------|------|----------------------|
| **๊ฑฐ๋ฒ„๋„Œ์Šค** | ๊ธฐ๋ณธ์ ์ธ ์ ‘๊ทผ์ œ์–ด | โœ… ์„ธ๋ฐ€ํ•œ RBAC, ๋ฐ์ดํ„ฐ ๊ณ„๋ณด ์ถ”์  |
| **๋ฉ€ํ‹ฐ ์›Œํฌ์ŠคํŽ˜์ด์Šค** | ์›Œํฌ์ŠคํŽ˜์ด์Šค ๋‹จ์œ„ | โœ… ์กฐ์ง ์ „์ฒด ํ†ตํ•ฉ ๊ด€๋ฆฌ |
| **๊ฐ์‚ฌ ๋กœ๊ทธ** | ์ œํ•œ์  | โœ… ์ƒ์„ธ ๊ฐ์‚ฌ ๋กœ๊ทธ |
| **๋ฐ์ดํ„ฐ ๊ณ„๋ณด** | โŒ ๋ฏธ์ง€์› | โœ… ์ž๋™ ๊ณ„๋ณด ์ถ”์  |
| **๊ถŒ์žฅ ์—ฌ๋ถ€** | โš ๏ธ Deprecated ์ง„ํ–‰ ์ค‘ | โœ… Databricks ๊ณต์‹ ๊ถŒ์žฅ |
| **์ ‘๊ทผ ๊ฒฝ๋กœ** | `dbfs:/` ๋˜๋Š” `/dbfs/` | `/Volumes/catalog/schema/volume/` |

```python
# Unity Catalog Volumes ์ ‘๊ทผ ์˜ˆ์‹œ (ํ˜„์žฌ ๊ถŒ์žฅ)
df = spark.read.parquet(
    "/Volumes/main/sales_data/raw/2024/transactions.parquet"
)

# DBFS ์ ‘๊ทผ ์˜ˆ์‹œ (๋ ˆ๊ฑฐ์‹œ)
df = spark.read.parquet("dbfs:/mnt/sales-data/2024/transactions.parquet")

๐Ÿš€ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๊ถŒ์žฅ: ๊ธฐ์กด DBFS ๋งˆ์šดํŠธ๋ฅผ ์‚ฌ์šฉ ์ค‘์ด๋ผ๋ฉด
Unity Catalog External Locations & Volumes ๋กœ์˜ ๋‹จ๊ณ„์  ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.


๐ŸŽฏ ๋งˆ๋ฌด๋ฆฌ ๋ฐ ๊ถŒ์žฅ ์‚ฌํ•ญ

DBFS๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ ๊ธฐ์–ตํ•  ํ•ต์‹ฌ ํฌ์ธํŠธ

์ƒํ™ฉ๊ถŒ์žฅ ์‚ฌํ•ญ
์‹ ๊ทœ ํ”„๋กœ์ ํŠธ ์‹œ์ž‘Unity Catalog Volumes ์‚ฌ์šฉ
๊ธฐ์กด DBFS ์‚ฌ์šฉ ์ค‘์ ์ง„์  Unity Catalog ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜ ๊ณ„ํš ์ˆ˜๋ฆฝ
ํ”„๋กœ๋•์…˜ ๋ฐ์ดํ„ฐ ์ €์žฅDBFS Root ์‚ฌ์šฉ ์ง€์–‘
ML ํ•™์Šต ์ฒดํฌํฌ์ธํŠธDBFS ๋˜๋Š” MLflow Artifact Store
์ž„์‹œ ๋ฐ์ดํ„ฐ/์‹ค์ŠตDBFS FileStore ํ™œ์šฉ ๊ฐ€๋Šฅ

์ •๋ฆฌํ•˜๋ฉด...

DBFS (Databricks File System)
  โ”œโ”€โ”€ ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€ ์œ„์˜ ํ†ตํ•ฉ ํŒŒ์ผ ์‹œ์Šคํ…œ ์ถ”์ƒํ™” ๋ ˆ์ด์–ด
  โ”œโ”€โ”€ dbfs:/ ์Šคํ‚ด์œผ๋กœ ์ ‘๊ทผ (Spark API)
  โ”œโ”€โ”€ /dbfs/ ๊ฒฝ๋กœ๋กœ ์ ‘๊ทผ (File API)
  โ”œโ”€โ”€ DBFS Root โ†’ ์›Œํฌ์ŠคํŽ˜์ด์Šค ๊ธฐ๋ณธ ์Šคํ† ๋ฆฌ์ง€
  โ”œโ”€โ”€ DBFS Mounts โ†’ ์™ธ๋ถ€ ์Šคํ† ๋ฆฌ์ง€ ๋งˆ์šดํŠธ (Deprecated)
  โ””โ”€โ”€ ๊ถŒ์žฅ ๋Œ€์•ˆ โ†’ Unity Catalog Volumes

DBFS๋Š” Databricks ํ”Œ๋žซํผ์˜ ํ•ต์‹ฌ ์Šคํ† ๋ฆฌ์ง€ ๋ ˆ์ด์–ด๋กœ, ์ˆ˜๋…„๊ฐ„ ๋ฐ์ดํ„ฐ ์—”์ง€๋‹ˆ์–ด์™€ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ๋“ค์ด ํด๋ผ์šฐ๋“œ ์Šคํ† ๋ฆฌ์ง€๋ฅผ ์†์‰ฝ๊ฒŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ๋Š” ๋” ๊ฐ•๋ ฅํ•œ ๊ฑฐ๋ฒ„๋„Œ์Šค์™€ ๋ณด์•ˆ์„ ์ œ๊ณตํ•˜๋Š” Unity Catalog ์ฒด๊ณ„๋กœ์˜ ์ „ํ™˜์ด ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

๊ธฐ์กด DBFS ์‚ฌ์šฉ์ž๋ผ๋ฉด ํ˜„์žฌ ์›Œํฌํ”Œ๋กœ์šฐ๋Š” ๊ทธ๋Œ€๋กœ ์œ ์ง€๋˜์ง€๋งŒ, ์ƒˆ๋กœ์šด ํ”„๋กœ์ ํŠธ์—์„œ๋Š” Unity Catalog Volumes๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฏธ๋ž˜ ์ง€ํ–ฅ์ ์ธ ๋ฐ์ดํ„ฐ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ตฌ์„ฑํ•ด ๋ณด์„ธ์š”!


๐Ÿ“š ์ฐธ๊ณ  ์ž๋ฃŒ


โœ๏ธ ์ด ๊ธ€์ด ๋„์›€์ด ๋˜์…จ๋‹ค๋ฉด ๊ณต์œ  ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค! ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด ๋Œ“๊ธ€๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”. ๐Ÿ™

profile
AI๋ฅผ ๊ฟˆ๊พธ๋Š” BackEnd๊ฐœ๋ฐœ์ž

0๊ฐœ์˜ ๋Œ“๊ธ€