Spark On Kubernetes 시도

Log·2022년 9월 29일
0

Spark

목록 보기
2/2

문서 목적

해당 문서는 시도해봤던 내용을 모두 정리하기 위해 작성된 문서이다.

시도 1 (실패)

시도 내역

  • Spark file download
    wget https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
    tar zxvf spark-3.3.0-bin-hadoop3.tgz # unzip
  • cd spark-3.3.0-bin-hadoop3/
  • Docker imag build
    # base image
    docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
    # pyspark image
    docker build -t sparkpy:latest --build-arg base_img=spark:latest -f kubernetes/dockerfiles/spark/bindings/python/Dockerfile .
  • yaml 적용
    • rbac.yaml
      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: spark-sa
      ---
      kind: ClusterRole
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        namespace: default
        name: spark-role
      rules:
       - apiGroups: [""]
        resources: ["pods", "services", "configmaps" ]
        verbs: ["create", "get", "watch", "list", "post", "delete"  ]
      ---
      kind: ClusterRoleBinding
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        namespace: default
        name: spark-role-binding
      subjects:
       - kind: ServiceAccount
         namespace: default
         name: spark-sa
      roleRef:
        kind: ClusterRole
        name: spark-role
        apiGroup: rbac.authorization.k8s.io
    • apply
      kubectl apply -f rbac.yaml
  • Get kubectl cluster-info kubectl cluster-info
  • For test run spark submit
    ./bin/spark-submit \
    --deploy-mode cluster \
    --master k8s://https://kubernetes.docker.internal:6443 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
    --name spark-pi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.driver.container.image=spark:latest \
    --conf spark.kubernetes.executor.container.image=spark:latest \
    local:///opt/spark/examples/src/main/python/pi.py

    여기서 아래와 같은 에러 발생... 아직 해결 방법을 서치하지 못함..
    Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory

  • for show log kubectl logs -f spark-pi-619b12838a29574a-driver
    • spark-pi-619b12838a29574a-driver is pod name

추가 시도!

  • 가상환경 말아 올려 보기..!

    conda create --name py_env --channel conda-forge --no-default-packages python=3.8
    conda activate py_env
    pip install pyspark==3.3.0
    
    mkdir -p ./envs
    # conda install -c conda-forge -n py_env conda-pack
    conda pack -f -o ./envs/test-env.tar.gz
    
    ./bin/spark-submit --deploy-mode cluster --master k8s://http://127.0.0.1:8001 --conf spark.yarn.dist.archives="./envs/test-env.tar.gz#py_env" --conf spark.pyspark.python="./py_env/bin/python"  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa --name spark-pi --conf spark.executor.instances=5 --conf spark.kubernetes.driver.container.image=spark:latest --conf spark.kubernetes.executor.container.image=spark:latest local:///opt/spark/examples/src/main/python/pi.py
    
  • 응... 실패...

출처

시도 2 (실패)

공식 문서상에 나온대로 이미지 빌드 후, submit진행

./bin/docker-image-tool.sh -t "3.3.0" -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build # docker.io/library/spark-py:3.3.0

동일한 에러 발생...

출처

profile
열심히 정리하는 습관 기르기..

0개의 댓글