[Spark] 애플리케이션간 스케줄링

Woong·2022년 7월 6일

0

Apache Spark

목록 보기

19/25

애플리케이션간 스케줄링

Standalone 모드

기본적으로 FIFO 스케줄링
- 애플리케이션이 사용 가능한 모든 노드를 사용하려 함
spark.cores.max : 애플리케이션의 CPU 코어수 제한
spark.executor.memory : 각 애플리케이션의 메모리 제한
spark.cores.max : spark.cores.max 설정을 하지 않았을 때, 애플리케이션의 CPU 코어수 limit
- 기본값 무제한

Mesos

spark.mesos.coarse : true 로 설정할시 Mesos 정적 파티셔닝 적용
spark.cores.max : 애플리케이션의 CPU 코어수 제한
- Standalone 과 동일
spark.executor.memory : 각 애플리케이션의 메모리 제한
- Standalone 과 동일

YARN

--num-executors : YARN 클러스터 내 할당할 Executor 수
- property 에선 spark.executor.instances
--executor-memory : Executor별 메모리 설정
- property 에선 spark.executor.memory
--executor-cores : Executor별 CPU 코어 설정
- property 에선 spark.executor.cores
※ 그외 설정은 개별 포스트를 통해 정리

동적 리소스 할당

기본 설정

아래 둘 중 하나의 방법으로 설정
- 1. 애플리케이션에서 spark.dynamicAllocation.enabled, spark.dynamicAllocation.shuffleTracking.enabled 를 true 로 설정
- 1. 각 Worker 노드에서 spark.dynamicAllocation.enabled, spark.shuffle.service.enabled 를 true로 설정

Standalone

worker 노드에서 spark.shuffle.service.enabled 를 true 로 설정

Mesos

모든 worker 노드에서 $SPARK_HOME/sbin/start-mesos-shuffle-service.sh 실행
- spark.shuffle.service.enabled 가 true 로 설정됨

YARN

※ YARN 문서는 개별 포스팅을 통해 정리

그외 설정

그외 설정은 spark.dynamicAllocation.* 과 spark.shuffle.service.* 네임스페이스에서 설정

reference

이전 포스트

[Spark] Spark FAIR 스케줄링에 대하여

0개의 댓글

관련 채용 정보