π Spark μ€ν μ΅μ
1. Spark νλ‘κ·Έλ¨ μ€ν νκ²½
β Spark νλ‘κ·Έλ¨ μ€ν νκ²½
β κ°λ°/ν
μ€νΈ/νμ΅ νκ²½ (Interactive Clients)
β λ
ΈνΈλΆ (μ£ΌνΌν°, μ νλ¦°)
β Spark Shell
β νλ‘λμ
νκ²½ (Submit Job)
β spark-submit (command-line utility): κ°μ₯ λ§μ΄ μ¬μ©λ¨
β λ°μ΄ν°λΈλ¦μ€ λ
ΈνΈλΆ:
βͺ λ
ΈνΈλΆ μ½λλ₯Ό μ£ΌκΈ°μ μΌλ‘ μ€νν΄μ£Όλ κ²μ΄ κ°λ₯
β REST API:
βͺ Spark Standalone λͺ¨λμμλ§ κ°λ₯
βͺ APIλ₯Ό ν΅ν΄ Spark μ‘μ μ€ν
βͺ μ€νμ½λλ 미리 HDFSλ±μ νμΌ μμ€ν
μ μ μ¬λμ΄ μμ΄μΌν¨
2. Spark νλ‘κ·Έλ¨μ ꡬ쑰
- Driver
- μ€νλλ μ½λμ λ§μ€ν° μν μν (YARNμ Application Master)
- Executor
- μ€μ νμ€ν¬λ₯Ό μ€νν΄μ£Όλ μν μν (YARNμ 컨ν
μ΄λ)

3. Spark νλ‘κ·Έλ¨μ ꡬ쑰
- Driver:
- μ¬μ©μ μ½λλ₯Ό μ€ννλ©° μ€ν λͺ¨λ(client, cluster)μ λ°λΌ μ€νλλ κ³³μ΄ λ¬λΌμ§λ€.
- μ½λλ₯Ό μ€ννλλ° νμν 리μμ€λ₯Ό μ§μ νλ€.
- --num-executors, --executor-cores, --executor-memory
- SparkSessionμ λ§λ€μ΄ Spark ν΄λ¬μ€ν°μ ν΅μ μννλ€.
- Cluster Manager (YARNμ κ²½μ° Resource Manager)
- Executor (YARNμ κ²½μ° Container)
- μ¬μ©μ μ½λλ₯Ό μ€μ Spark νμ€ν¬λ‘ λ³νν΄ Spark ν΄λ¬μ€ν°μμ μ€ννλ€.
- Executor:
- μ€μ νμ€ν¬λ₯Ό μ€νν΄μ£Όλ μν μν (JVM): Transformations, Actions
- YARNμμλ Containerκ° λ¨
4. Spark ν΄λ¬μ€ν° λ§€λμ μ΅μ
- local[n]
- YARN
- Kubernetes
- Mesos
- Standalone
5. Spark ν΄λ¬μ€ν° λ§€λμ
μ΅μ
- local[n]:
- κ°λ°/ν
μ€νΈμ©
- Spark Shell, IDE, λ
ΈνΈλΆ
- νλμ JVMμ΄ ν΄λ¬μ€ν°λ‘ λμ
- Driverμ νλμ Executor μ€ν
- nμ μ½μ΄μ μ
- Executorμ μ€λ λ μκ° λ¨
- local[*]λ 무μμΌκΉ?
- μ»΄ν¨ν°μ μλ λͺ¨λ μ½μ΄ μ¬μ©

6. Spark ν΄λ¬μ€ν° λ§€λμ μ΅μ
- YARN
- λ κ°μ μ€ν λͺ¨λκ° μ‘΄μ¬: Client vs. Cluster
- Client λͺ¨λ: Driverκ° Spark ν΄λ¬μ€ν° λ°μμ λμ
- YARN κΈ°λ° Spark ν΄λ¬μ€ν°λ₯Ό λ°νμΌλ‘ κ°λ°/ν
μ€νΈ λ±μ ν λ μ¬μ©
- Cluster λͺ¨λ: Driverκ° Spark ν΄λ¬μ€ν° μμμ λμ
- νλμ Container μ¬λ‘―μ μ°¨μ§
- μ€μ νλ‘λμ
μ΄μμ μ¬μ©λλ λͺ¨λ

7. Spark Cluster Managerμ μ€ν λͺ¨λΈ μμ½
| Cluster Manager | μ€νλͺ¨λ(Deployed mode | νλ‘κ·Έλ¨ μ€ν λ°©μ |
|---|
| local[n] | Client | Spark Shell, IDE, Notebook |
| YARN | Cluster | Spark Shell, IDE, Notebook |
| YARN | Cluster | Spark-Submit |