Bigdata, datalake / plan

Jeonghak Choยท2025๋…„ 4์›” 13์ผ

Bigdata

๋ชฉ๋ก ๋ณด๊ธฐ
16/30

๐Ÿ“—ํ•˜๋‘ก-> ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ์ด๊ด€ ๊ณ„ํš ์ˆ˜๋ฆฝ

๐Ÿณ๏ธโ€๐ŸŒˆ [๊ถ๊ธˆํ•œ์ ]

  • ์ด๊ด€ ๊ณ„ํš์— ๋Œ€ํ•œ ํ…œํ”Œ๋ฆฟ

๐Ÿ”—[๋ชฉ์ฐจ]

1. ์ค€๋น„ ๋ฐ ๋ถ„์„

1-1. ํ˜„ํ™ฉ ํŒŒ์•…

๊ธฐ์กด Hadoop ํ™˜๊ฒฝ์˜ ๊ตฌ์„ฑ ์š”์†Œ ํŒŒ์•…

  • HDFS
  • YARN: ๋ฆฌ์†Œ์Šค ๋งค๋‹ˆ์ € ์—ญํ• 
  • MapReduce / Spark: ์ฃผ์š” Job ๋ฐ ์‚ฌ์šฉ ํŒจํ„ด
  • Hive / HBase / Oozie ๋“ฑ: ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ, ์›Œํฌํ”Œ๋กœ, ์ฟผ๋ฆฌ ์Šคํฌ๋ฆฝํŠธ ๋“ฑ

๋ฐ์ดํ„ฐ ํฌ๊ธฐ ๋ฐ ๋ถ„ํฌ ํ™•์ธ

  • HDFS: ์ €์žฅ์†Œ ๊ทœ๋ชจ
  • ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ตฌ์กฐ
  • ํŒŒ์ผ ํฌ๊ธฐ ๋ถ„ํฌ

์›Œํฌ๋กœ๋“œ ์œ ํ˜• ํ™•์ธ

  • ๋ฐฐ์น˜ ์ฒ˜๋ฆฌ ์œ„์ฃผ ์—ฌ๋ถ€, ์ŠคํŠธ๋ฆฌ๋ฐ ์œ ๋ฌด
  • ์ผ์ผ/์ฃผ๊ฐ„ ์ฒ˜๋ฆฌ๋Ÿ‰ ๋ฐ ์Šค์ผ€์ค„๋ง ์ •์ฑ…
  • ์‚ฌ์šฉ์ž ์ˆ˜ ๋ฐ ์ ‘๊ทผ ํŒจํ„ด

์˜์กด์„ฑ๊ณผ ์—ฐ๊ณ„ ์‹œ์Šคํ…œ ์กฐ์‚ฌ

ํ˜„์žฌ Hadoop ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์ด ์–ด๋–ค ๋‹ค๋ฅธ ์‹œ์Šคํ…œ์ด๋‚˜ ์„œ๋น„์Šค์™€ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ๋Š”์ง€๋ฅผ ํŒŒ์•…ํ•˜๊ณ , ์ด๋“ค์ด ์ด๊ด€ ํ›„์—๋„ ์ •์ƒ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ณด์žฅํ•˜๋Š” ์ž‘์—…์ด๋‹ค.

ํ•ญ๋ชฉ์„ค๋ช…์˜ˆ์‹œ
๋ฐ์ดํ„ฐ ์†Œ์ŠคHadoop์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•˜๊ฑฐ๋‚˜ ์ฝ๋Š” ์™ธ๋ถ€ ์‹œ์Šคํ…œDB, Kafka, FTP, API ๋“ฑ
๋ฐ์ดํ„ฐ ์‹ฑํฌHadoop์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ „๋‹ฌํ•˜๋Š” ์™ธ๋ถ€ ์‹œ์Šคํ…œRDBMS, DWH, Elasticsearch, S3 ๋“ฑ
์›Œํฌํ”Œ๋กœ ๋„๊ตฌHadoop ๊ธฐ๋ฐ˜ ETL์„ ํŠธ๋ฆฌ๊ฑฐํ•˜๊ฑฐ๋‚˜ ์Šค์ผ€์ค„ํ•˜๋Š” ๋„๊ตฌAirflow, Oozie, Control-M
์ ‘์†/์ธ์ฆ ๋ฐฉ์‹์™ธ๋ถ€ ์‹œ์Šคํ…œ๊ณผ์˜ ์—ฐ๊ฒฐ์— ์‚ฌ์šฉ๋œ ์ธ์ฆ ์ˆ˜๋‹จKerberos, Basic Auth, OAuth
๋ฉ”์‹œ์ง• ์—ฐ๋™Kafka, RabbitMQ ๋“ฑ๊ณผ์˜ ์ŠคํŠธ๋ฆผ ์—ฐ๊ณ„ ์—ฌ๋ถ€Kafka ํ† ํ”ฝ๋ช…, ๋ฉ”์‹œ์ง€ ํฌ๋งท
๋ฆฌํฌํŒ… ๋„๊ตฌHadoop ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•˜๋Š” BI ๋„๊ตฌTableau, Superset, PowerBI ๋“ฑ
์‚ฌ์šฉ์ž ์ •์˜ ์Šคํฌ๋ฆฝํŠธHadoop ์œ„์—์„œ ์‚ฌ์šฉ์ž๊ฐ€ ๋Œ๋ฆฌ๋Š” ์ง์ ‘ ๋งŒ๋“  ์ฝ”๋“œHiveQL, PySpark, Shell Script
๋ฐฐ์น˜/์ŠคํŠธ๋ฆฌ๋ฐ ๊ตฌ๋ถ„์–ด๋–ค ์ฒ˜๋ฆฌ๋“ค์ด ๋ฐฐ์น˜์ธ์ง€ ์ŠคํŠธ๋ฆฌ๋ฐ์ธ์ง€ ํ™•์ธFlink๋กœ ์ „ํ™˜ ํ•„์š” ์—ฌ๋ถ€ ํ™•์ธ

1-2. ์š”๊ตฌ์‚ฌํ•ญ ๊ฐœ์š”

์ฟ ๋ฒ„๋„คํ‹ฐ์Šค๋กœ ์ด๊ด€ํ•˜๋ ค๋Š” ์ด์œ  ์ •์˜

  • ์œ ์—ฐํ•œ ํ™•์žฅ์„ฑ, ๋น„์šฉ ํšจ์œจ, CI/CD ๋„์ž…
  • ์Šคํ† ๋ฆฌ์ง€/๊ณ„์‚ฐ ์ž์›์˜ ๋ถ„๋ฆฌ ๋ฐ ํƒ„๋ ฅ์  ์‚ฌ์šฉ
  • ์‚ฌ์šฉ์ž ์ธํ„ฐํŽ˜์ด์Šค ๋‹จ์ˆœํ™” (Trino + Iceberg ์ค‘์‹ฌ)

์˜ˆ์ƒ ์›Œํฌ๋กœ๋“œ ๋ฐ SLO ์ •์˜

์„œ๋น„์Šค๊ฐ€ ์–ด๋А ์ •๋„์˜ ์ˆ˜์ค€๊นŒ์ง€ ์œ ์ง€๋˜์–ด์•ผ ํ•˜๋Š”์ง€๋ฅผ ์ •๋Ÿ‰์ ์œผ๋กœ ํ‘œํ˜„ํ•œ ๋ชฉํ‘œ ๊ฐ’์ด๋‹ค.

SLO ํ•ญ๋ชฉ๋ชฉํ‘œ (์˜ˆ์‹œ)์„ค๋ช…
API ๊ฐ€์šฉ์„ฑ99.9% / ์›”์ „์ฒด ์š”์ฒญ ์ค‘ ์„ฑ๊ณต๋ฅ  ๊ธฐ์ค€
Spark Job ์„ฑ๊ณต๋ฅ 99.5% / ๋ถ„๊ธฐETL Job ์ค‘ ์˜ค๋ฅ˜ ์—†์ด ์™„๋ฃŒ๋œ ๋น„์œจ
Kafka ๋ฉ”์‹œ์ง€ ์ฒ˜๋ฆฌ ์ง€์—ฐ< 5์ดˆKafka โ†’ Flink ์ฒ˜๋ฆฌ๊นŒ์ง€์˜ ์‹œ๊ฐ„
DAG ์‹คํ–‰ ์†Œ์š” ์‹œ๊ฐ„10๋ถ„ ์ดํ•˜Airflow DAG ์‹คํ–‰ ์‹œ๊ฐ„ ๋ชฉํ‘œ
Iceberg ์ฟผ๋ฆฌ ์‘๋‹ต ์‹œ๊ฐ„p95 2์ดˆ ์ดํ•˜์‚ฌ์šฉ์ž๊ฐ€ ์ฟผ๋ฆฌ ์‹œ ์‘๋‹ต ์‹œ๊ฐ„ ๊ธฐ์ค€ (95% ๊ตฌ๊ฐ„ ๊ธฐ์ค€)

๋ณด์•ˆ, ๋„คํŠธ์›Œํฌ, ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ์ •์ฑ… ์ •๋ฆฝ

// ์ธ์ฆ(Authentication)

  • ์‚ฌ์šฉ์ž ์ธ์ฆ ๋ฐฉ์‹ ์„ ์ •
  • OIDC + Keycloak ๋˜๋Š” Dex ๋“ฑ ์‚ฌ์šฉํ•ด Kubernetes ์ธ์ฆ ์ฒด๊ณ„ ๊ตฌ์„ฑ
  • SSO ์—ฐ๋™ (Google Workspace, Azure AD ๋“ฑ)
  • ์ปดํฌ๋„ŒํŠธ ๊ฐ„ ์ธ์ฆ
  • Kafka, MinIO ๋“ฑ์€ mTLS ๋˜๋Š” SASL ๊ธฐ๋ฐ˜ ์ธ์ฆ ๊ตฌ์„ฑ
  • ๋‚ด๋ถ€ ์„œ๋น„์Šค ๊ฐ„ ServiceAccount + RBAC ์‚ฌ์šฉ

// ๊ถŒํ•œ ์ •์ฑ… - ๊ถŒํ•œ/์—ญํ• (RBAC)

  • Kubernetes RBAC ์ •์ฑ… ์„ค๊ณ„
  • ์—ญํ•  ๋ถ„๋ฆฌ: platform-admin, data-engineer, analyst ๋“ฑ
  • ๋„ค์ž„์ŠคํŽ˜์ด์Šค ๋‹จ์œ„ ๊ถŒํ•œ ๋ถ„๋ฆฌ
  • ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ๊ถŒํ•œ ๋ถ„๋ฆฌ
  • Iceberg Table ๋‹จ์œ„ ์ ‘๊ทผ ์ œ์–ด (Apache Ranger ๋˜๋Š” ์ž์ฒด ์ •์ฑ…)
  • Airflow DAG ๋‹จ์œ„ Role ๋ถ„๋ฆฌ ๊ฐ€๋Šฅ

// Secrets ๊ด€๋ฆฌ

  • Kubernetes Secret ๊ฐ์ฒด ๋˜๋Š” Vault ์‚ฌ์šฉ
  • ๋ฏผ๊ฐ์ •๋ณด๋Š” Git์— ์ง์ ‘ ๋„ฃ์ง€ ์•Š๊ณ , sealed-secrets ๋˜๋Š” external-secrets ์—ฐ๋™

// ๋„คํŠธ์›Œํฌ ์ •์ฑ…(Network Policy)

  • CNI ํ”Œ๋Ÿฌ๊ทธ์ธ ์„ ํƒ : Calico, Cilium ๋“ฑ โ†’ NetworkPolicy ์ง€์› ํ•„์ˆ˜
  • Pod ๊ฐ„ ํ†ต์‹  ๊ธฐ๋ณธ ์ฐจ๋‹จ ์ •์ฑ… : ๊ธฐ๋ณธ์€ ๋ชจ๋‘ ์ฐจ๋‹จํ•˜๊ณ , ํ•„์š”ํ•œ ํ†ต์‹ ๋งŒ ํ—ˆ์šฉ
  • Ingress Controller์— TLS ์ ์šฉ
  • Public ๋…ธ์ถœ ์„œ๋น„์Šค๋Š” IP allowlist + ์ธ์ฆ ์ ์šฉ
  • Grafana, Airflow ๋“ฑ์€ ๋‚ด๋ถ€๋ง or ์ธ์ฆ ํ•„์ˆ˜

// ๋ฐ์ดํ„ฐ ์ ‘๊ทผ ์ •์ฑ…(Data Access Policy)

  • MinIO ๋ฒ„ํ‚ท๋ณ„ ๊ถŒํ•œ ๊ด€๋ฆฌ
  • ์ •์ฑ… ๊ธฐ๋ฐ˜: Read/Write ์ œํ•œ ๊ฐ€๋Šฅ
  • ์‚ฌ์šฉ์ž ๋˜๋Š” ์„œ๋น„์Šค๊ณ„์ •์— ํ• ๋‹น
  • Iceberg Table ๊ถŒํ•œ
  • Apache Ranger, LakeFS, Sentry ๋“ฑ ์—ฐ๋™ ๊ณ ๋ ค
  • ์ตœ์†Œ ๊ถŒํ•œ ์›์น™ ์ ์šฉ
  • Airflow, Trino, Spark ์‹คํ–‰ ๋กœ๊ทธ โ†’ Loki
  • Iceberg ์ฝ๊ธฐ/์“ฐ๊ธฐ ๋กœ๊น… ์ •์ฑ… ๊ณ ๋ ค
  • kubectl audit ์„ค์ •

2. ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„

2-1. ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ํด๋Ÿฌ์Šคํ„ฐ ์„ค๊ณ„

  • ํด๋Ÿฌ์Šคํ„ฐ ์‚ฌ์ด์ง• (๋…ธ๋“œ ์ˆ˜, ๋ฆฌ์†Œ์Šค ์š”๊ตฌ๋Ÿ‰)
  • ๋„ค์ž„์ŠคํŽ˜์ด์Šค/๋ฆฌ์†Œ์Šค์ฟผํ„ฐ ์„ค๊ณ„ ( dev, prod, batch, realtime ๋“ฑ ๋ถ„๋ฆฌ )
  • ์Šคํ† ๋ฆฌ์ง€ ์„ค๊ณ„ (Ceph, PowerFlex, S3A ๋“ฑ๊ณผ ์—ฐ๊ณ„)
  • ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ ์„ ์ • ๋ฐ ์„ค๊ณ„
๊ตฌ์„ฑ ์š”์†Œ๊ธฐ์ˆ  ์Šคํƒ์„ค๋ช…
StorageMinIOHDFS ๋Œ€์ฒด, Iceberg ์ง€์›
ETL EngineSpark on K8sBatch ์ฒ˜๋ฆฌ
Stream EngineFlink on K8s์‹ค์‹œ๊ฐ„ ์ฒ˜๋ฆฌ
WorkflowApache AirflowDAG ๊ธฐ๋ฐ˜
QueryTrino + IcebergHive ๋Œ€์ฒด
Message BrokerKafka (Strimzi)์‹ค์‹œ๊ฐ„ ๋ฐ์ดํ„ฐ ๋ฒ„ํผ๋ง
GitOpsArgoCD์ž๋™ํ™”๋œ ๋ฐฐํฌ
Image RegistryHarbor๋ณด์•ˆ ์ด๋ฏธ์ง€ ์ €์žฅ์†Œ
CNI / CSICilium / PowerFlex CSI๋„คํŠธ์›Œํฌ ๋ฐ ์Šคํ† ๋ฆฌ์ง€
๊ด€์ธกPrometheus, Grafana, Loki, Jaeger๋ชจ๋‹ˆํ„ฐ๋ง/๋กœ๊ทธ/ํŠธ๋ ˆ์ด์‹ฑ

2-2. ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ํ”Œ๋žซํผ ์žฌ์„ค๊ณ„

  • Spark on Kubernetes ๋„์ž…
  • Hive โ†’ Apache Iceberg ์ „ํ™˜
  • Workflow ๊ด€๋ฆฌ: Oozie โ†’ Airflow or Argo Workflows
  • Kafka/Flume โ†’ Apache Flink/StreamNative ์ „ํ™˜ ๊ณ ๋ ค

2-3. ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ ๋กœ๊น…

  • Prometheus + Grafana, Jaeger, Loki ๋“ฑ ํ†ตํ•ฉ ๋ชจ๋‹ˆํ„ฐ๋ง ๊ตฌ์„ฑ
  • ๋กœ๊ทธ ๋ฐ ํŠธ๋ ˆ์ด์Šค ์ˆ˜์ง‘ ์„ค๊ณ„

3. PoC ๋ฐ ์‹œ๋ฒ” ๊ตฌ์ถ•

3-1. ํ…Œ์ŠคํŠธ ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ

  • ์†Œ๊ทœ๋ชจ ํ…Œ์ŠคํŠธ ํ™˜๊ฒฝ์— Spark, Hive, Iceberg, MinIO/S3 ๊ตฌ์„ฑ
  • ๊ธฐ์กด ๋ฐ์ดํ„ฐ ์ผ๋ถ€ ์ด๊ด€ ๋ฐ ์ฒ˜๋ฆฌ ์„ฑ๋Šฅ ํ…Œ์ŠคํŠธ

3-2. POC

ํ•ญ๋ชฉ๋ชฉํ‘œ
์ธํ”„๋ผHadoop โ†’ Kubernetes ๊ธฐ๋ฐ˜ ์ „ํ™˜
๋ฐ์ดํ„ฐ ์ €์žฅHDFS โ†’ MinIO (S3 compatible)
์›Œํฌํ”Œ๋กœOozie โ†’ Airflow
์ฟผ๋ฆฌHive โ†’ Trino + Iceberg
์‹คํ–‰Yarn โ†’ Spark on K8s / Flink on K8s
๋ฉ”์‹œ์ง•Kafka โ†’ Strimzi ๊ธฐ๋ฐ˜ Kafka on K8s
๋ฐฐํฌ ์ž๋™ํ™”GitOps (ArgoCD) ๋„์ž…
์šด์˜ ํŽธ์˜Harbor, CNI/CSI, ๊ด€์ธก ๋„๊ตฌ ํฌํ•จ
  • Spark on Kubernetes ํ…Œ์ŠคํŠธ: ๊ธฐ์กด Spark ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์‹คํ–‰
  • Spark Executor/Driver ๋ฉ”๋ชจ๋ฆฌ ์„ค์ • ํ…Œ์ŠคํŠธ
  • Iceberg + S3 ์—ฐ๋™: Trino/Hive์—์„œ Iceberg ํ…Œ์ด๋ธ” ์ƒ์„ฑ ๋ฐ ์ฟผ๋ฆฌ
  • ํŒŒํ‹ฐ์…”๋‹, ์Šค๋ƒ…์ƒท, schema evolution ํ…Œ์ŠคํŠธ

3-3. ์ž๋™ํ™” ๊ฒ€์ฆ

  • Helm, Kustomize, GitOps(ArgoCD) ๋“ฑ์„ ํ†ตํ•œ ๋ฐฐํฌ ์ž๋™ํ™” ๊ฒ€์ฆ
  • ์›Œํฌํ”Œ๋กœ ์‹คํ–‰ ๋ฐ ๋ชจ๋‹ˆํ„ฐ๋ง ์ฒด๊ณ„ ๊ฒ€์ฆ
  • ์ž๋™ํ™” ์Šคํฌ๋ฆฝํŠธ ์ค€๋น„ ( Kubespray๋ฅผ ํ†ตํ•œ k8s ์„ค์น˜, helm์„ ํ†ตํ•œ Trino HA ๊ตฌ์„ฑ, heml์„ ํ†ตํ•œ Spark Cluster ์„ค์น˜)

4. ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜

4-1. ๋ฐ์ดํ„ฐ ๋งˆ์ด๊ทธ๋ ˆ์ด์…˜

  • DistCp + S3A ๋˜๋Š” HDFS โ†’ S3 ์ง์ ‘ ์ด๊ด€
  • Iceberg Table ๋ณ€ํ™˜ ์Šคํฌ๋ฆฝํŠธ ์ ์šฉ

4-2. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ด๊ด€

  • Spark ์ž‘์—…์„ Kubernetes์— ๋งž๊ฒŒ ์žฌ์ž‘์„ฑ
  • Hive ์ฟผ๋ฆฌ โ†’ Iceberg SQL ํ˜ธํ™˜์„ฑ ํ…Œ์ŠคํŠธ
  • Workflow ์žฌ์ž‘์„ฑ ๋ฐ ์Šค์ผ€์ค„ ์ด๊ด€
  • Hybrid ์šด์˜ (Hadoop + K8s ๋ณ‘ํ–‰) โ†’ ์ ์ง„์  ์ „ํ™˜
  • ์šฐ์„ ์ˆœ์œ„ ๋†’์€ ์›Œํฌ๋กœ๋“œ๋ถ€ํ„ฐ ์ด์ „

4.3. CI/CD ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•

  • Git + ArgoCD ์ž‘์—… ์ž๋™ํ™”
  • Spark Job, DAG ๋“ฑ GitOps ๊ธฐ๋ฐ˜ ์šด์˜
  • ์šด์˜ ์ž๋™ํ™” ๋ฐ ๊ด€์ฐฐ์„ฑ ๊ตฌ์ถ•
    • ๋ชจ๋‹ˆํ„ฐ๋ง: Prometheus, Grafana, Jaeger, Loki ๊ตฌ์„ฑ
    • ์•Œ๋ฆผ ์—ฐ๋™: Alertmanager โ†’ Slack/Teams ์—ฐ๋™
    • Spark + Iceberg ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘

5. ์šด์˜ ์ „ํ™˜ ๋ฐ ์ตœ์ ํ™”

5-1. ์šด์˜ ์•ˆ์ •ํ™”

  • ์žฅ์•  ๋Œ€์‘ ๋งค๋‰ด์–ผ ์ž‘์„ฑ
  • ๋ฆฌ์†Œ์Šค ์ตœ์ ํ™” (Spark Executor tuning ๋“ฑ)

5-2. ๊ต์œก ๋ฐ ๋ฌธ์„œํ™”

  • ์šด์˜์ž ๋ฐ ์‚ฌ์šฉ์ž ๋Œ€์ƒ ๊ธฐ์ˆ  ๊ฐ€์ด๋“œ ์ œ๊ณต
  • ํ”Œ๋žซํผ ์‚ฌ์šฉ ๋งค๋‰ด์–ผ ๋ฌธ์„œํ™”
  • ์šด์˜ ๋งค๋‰ด์–ผ ์ž‘์„ฑ: ์žฅ์•  ๋Œ€์‘, ์Šค์ผ€์ผ๋ง ๋“ฑ
  • ์‚ฌ์šฉ์ž ๊ฐ€์ด๋“œ: Spark ์‹คํ–‰๋ฒ•, ์ฟผ๋ฆฌ ์ž‘์„ฑ๋ฒ•, ๋ฐ์ดํ„ฐ ๋“ฑ๋ก ๋ฐฉ๋ฒ•
  • ์ „ํ™˜ ๊ธฐ๊ฐ„ ์ค‘ Dual Run (Hadoop + K8s ๋ณ‘ํ–‰)
  • ์šด์˜ ๋ฌธ์„œ ๋ฐ ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ ๊ฐ€์ด๋“œ ์ œ๊ณต
  • Spark, Trino, Airflow ์‚ฌ์šฉ๋ฒ• ๊ต์œก ์„ธ์…˜

5.3. ์„ฑ๋Šฅ ์ตœ์ ํ™”

  • Spark Executor ์ˆ˜, ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ ํŠœ๋‹
  • Iceberg ํ…Œ์ด๋ธ” ํŒŒ์ผ ์ •๋ฆฌ (compaction ๋“ฑ)

๋‹จ๊ณ„๋ณ„ ์ด๊ด€ ๊ณ„ํš

16์ฃผ ์˜ˆ์‹œ, ์ธ๋ ฅ ์ƒํ™ฉ ๋ฐ ์›Œํฌ๋กœ๋“œ ์„ฑ๊ฒฉ์— ๋”ฐ๋ผ ์กฐ์ • ํ•„์š”

์ฃผ์ฐจ๋‹จ๊ณ„์ฃผ์š” ์ž‘์—…
1~2์ฃผ์š”๊ตฌ ๋ถ„์„Hadoop ๊ตฌ์„ฑ ์ธ๋ฒคํ† ๋ฆฌ, ์‚ฌ์šฉ ํŒจํ„ด ์กฐ์‚ฌ
3~4์ฃผ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„K8s ์ธํ”„๋ผ ๊ตฌ์กฐ, ๋„คํŠธ์›Œํฌ/์Šคํ† ๋ฆฌ์ง€ ์„ค๊ณ„
5~6์ฃผํด๋Ÿฌ์Šคํ„ฐ ์„ค์น˜K8s ์„ค์น˜, CNI/CSI/Harbor ๊ตฌ์„ฑ
7~8์ฃผํ•ต์‹ฌ ๋ฐฐํฌMinIO, Trino, Spark, Iceberg, ArgoCD ๋ฐฐํฌ
9~10์ฃผKafka & FlinkStrimzi Kafka + Flink Operator ๊ตฌ์„ฑ
11~12์ฃผAirflow ์ด๊ด€DAG ์ž‘์„ฑ, Spark/Flink ์—ฐ๋™
13~14์ฃผ๋ฐ์ดํ„ฐ ์ด๊ด€HDFS โ†’ MinIO ์ „ํ™˜, Iceberg ํ…Œ์ด๋ธ” ์ƒ์„ฑ
15์ฃผํ…Œ์ŠคํŠธ & ์šด์˜ ์ค€๋น„๋ณ‘ํ–‰ ์šด์˜, ์„ฑ๋Šฅ/์•ˆ์ •์„ฑ ํ…Œ์ŠคํŠธ
16์ฃผ์ „ํ™˜ ๋งˆ๋ฌด๋ฆฌ์šด์˜ ์ „ํ™˜, ์‚ฌ์šฉ์ž ๊ต์œก ๋ฐ ๋ฌธ์„œํ™”

0๊ฐœ์˜ ๋Œ“๊ธ€