πŸ“’ Spark(2)

KimdongkiΒ·2024λ…„ 6μ›” 17일

Spark

λͺ©λ‘ 보기
2/22

πŸ“Œ Spark μ‹€ν–‰ μ˜΅μ…˜

1. Spark ν”„λ‘œκ·Έλž¨ μ‹€ν–‰ ν™˜κ²½

β—† Spark ν”„λ‘œκ·Έλž¨ μ‹€ν–‰ ν™˜κ²½
❖ 개발/ν…ŒμŠ€νŠΈ/ν•™μŠ΅ ν™˜κ²½ (Interactive Clients)
● λ…ΈνŠΈλΆ (μ£Όν”Όν„°, μ œν”Œλ¦°)
● Spark Shell
❖ ν”„λ‘œλ•μ…˜ ν™˜κ²½ (Submit Job)
● spark-submit (command-line utility): κ°€μž₯ 많이 μ‚¬μš©λ¨
● λ°μ΄ν„°λΈŒλ¦­μŠ€ λ…ΈνŠΈλΆ:
β–ͺ λ…ΈνŠΈλΆ μ½”λ“œλ₯Ό 주기적으둜 μ‹€ν–‰ν•΄μ£ΌλŠ” 것이 κ°€λŠ₯
● REST API:
β–ͺ Spark Standalone λͺ¨λ“œμ—μ„œλ§Œ κ°€λŠ₯
β–ͺ APIλ₯Ό 톡해 Spark μž‘μ„ μ‹€ν–‰
β–ͺ μ‹€ν–‰μ½”λ“œλŠ” 미리 HDFSλ“±μ˜ 파일 μ‹œμŠ€ν…œμ— μ μž¬λ˜μ–΄ μžˆμ–΄μ•Όν•¨

2. Spark ν”„λ‘œκ·Έλž¨μ˜ ꡬ쑰

  • Driver
    • μ‹€ν–‰λ˜λŠ” μ½”λ“œμ˜ λ§ˆμŠ€ν„° μ—­ν•  μˆ˜ν–‰ (YARN의 Application Master)
  • Executor
    • μ‹€μ œ νƒœμŠ€ν¬λ₯Ό μ‹€ν–‰ν•΄μ£ΌλŠ” μ—­ν•  μˆ˜ν–‰ (YARN의 μ»¨ν…Œμ΄λ„ˆ)

3. Spark ν”„λ‘œκ·Έλž¨μ˜ ꡬ쑰

  • Driver:
    • μ‚¬μš©μž μ½”λ“œλ₯Ό μ‹€ν–‰ν•˜λ©° μ‹€ν–‰ λͺ¨λ“œ(client, cluster)에 따라 μ‹€ν–‰λ˜λŠ” 곳이 달라진닀.
    • μ½”λ“œλ₯Ό μ‹€ν–‰ν•˜λŠ”λ° ν•„μš”ν•œ λ¦¬μ†ŒμŠ€λ₯Ό μ§€μ •ν•œλ‹€.
      • --num-executors, --executor-cores, --executor-memory
    • SparkSession을 λ§Œλ“€μ–΄ Spark ν΄λŸ¬μŠ€ν„°μ™€ 톡신 μˆ˜ν–‰ν•œλ‹€.
      • Cluster Manager (YARN의 경우 Resource Manager)
      • Executor (YARN의 경우 Container)
    • μ‚¬μš©μž μ½”λ“œλ₯Ό μ‹€μ œ Spark νƒœμŠ€ν¬λ‘œ λ³€ν™˜ν•΄ Spark ν΄λŸ¬μŠ€ν„°μ—μ„œ μ‹€ν–‰ν•œλ‹€.
  • Executor:
    • μ‹€μ œ νƒœμŠ€ν¬λ₯Ό μ‹€ν–‰ν•΄μ£ΌλŠ” μ—­ν•  μˆ˜ν–‰ (JVM): Transformations, Actions
    • YARNμ—μ„œλŠ” Containerκ°€ 됨

4. Spark ν΄λŸ¬μŠ€ν„° λ§€λ‹ˆμ € μ˜΅μ…˜

  • local[n]
  • YARN
  • Kubernetes
  • Mesos
  • Standalone

5. Spark ν΄λŸ¬μŠ€ν„° λ§€λ‹ˆμ €

μ˜΅μ…˜

  • local[n]:
    • 개발/ν…ŒμŠ€νŠΈμš©
      • Spark Shell, IDE, λ…ΈνŠΈλΆ
    • ν•˜λ‚˜μ˜ JVM이 ν΄λŸ¬μŠ€ν„°λ‘œ λ™μž‘
      • Driver와 ν•˜λ‚˜μ˜ Executor μ‹€ν–‰
    • n은 μ½”μ–΄μ˜ 수
      • Executor의 μŠ€λ ˆλ“œ μˆ˜κ°€ 됨
    • local[*]λŠ” λ¬΄μ—‡μΌκΉŒ?
      • 컴퓨터에 μžˆλŠ” λͺ¨λ“  μ½”μ–΄ μ‚¬μš©

6. Spark ν΄λŸ¬μŠ€ν„° λ§€λ‹ˆμ € μ˜΅μ…˜

  • YARN
    • 두 개의 μ‹€ν–‰ λͺ¨λ“œκ°€ 쑴재: Client vs. Cluster
    • Client λͺ¨λ“œ: Driverκ°€ Spark ν΄λŸ¬μŠ€ν„° λ°–μ—μ„œ λ™μž‘
      • YARN 기반 Spark ν΄λŸ¬μŠ€ν„°λ₯Ό λ°”νƒ•μœΌλ‘œ 개발/ν…ŒμŠ€νŠΈ 등을 ν•  λ•Œ μ‚¬μš©
    • Cluster λͺ¨λ“œ: Driverκ°€ Spark ν΄λŸ¬μŠ€ν„° μ•ˆμ—μ„œ λ™μž‘
      • ν•˜λ‚˜μ˜ Container μŠ¬λ‘―μ„ μ°¨μ§€
      • μ‹€μ œ ν”„λ‘œλ•μ…˜ μš΄μ˜μ— μ‚¬μš©λ˜λŠ” λͺ¨λ“œ

7. Spark Cluster Manager와 μ‹€ν–‰ λͺ¨λΈ μš”μ•½

Cluster Managerμ‹€ν–‰λͺ¨λ“œ(Deployed modeν”„λ‘œκ·Έλž¨ μ‹€ν–‰ 방식
local[n]ClientSpark Shell, IDE, Notebook
YARNClusterSpark Shell, IDE, Notebook
YARNClusterSpark-Submit

0개의 λŒ“κΈ€