해당 문서는 시도해봤던 내용을 모두 정리하기 위해 작성된 문서이다.
wget https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-hadoop3.tgz
tar zxvf spark-3.3.0-bin-hadoop3.tgz # unzip
cd spark-3.3.0-bin-hadoop3/
# base image
docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .
# pyspark image
docker build -t sparkpy:latest --build-arg base_img=spark:latest -f kubernetes/dockerfiles/spark/bindings/python/Dockerfile .
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark-sa
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: spark-role
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps" ]
verbs: ["create", "get", "watch", "list", "post", "delete" ]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
namespace: default
name: spark-role-binding
subjects:
- kind: ServiceAccount
namespace: default
name: spark-sa
roleRef:
kind: ClusterRole
name: spark-role
apiGroup: rbac.authorization.k8s.io
kubectl apply -f rbac.yaml
kubectl cluster-info
./bin/spark-submit \
--deploy-mode cluster \
--master k8s://https://kubernetes.docker.internal:6443 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
--name spark-pi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.driver.container.image=spark:latest \
--conf spark.kubernetes.executor.container.image=spark:latest \
local:///opt/spark/examples/src/main/python/pi.py
여기서 아래와 같은 에러 발생... 아직 해결 방법을 서치하지 못함..
Exception in thread "main" java.io.IOException: Cannot run program "python3": error=2, No such file or directory
kubectl logs -f spark-pi-619b12838a29574a-driver
가상환경 말아 올려 보기..!
conda create --name py_env --channel conda-forge --no-default-packages python=3.8
conda activate py_env
pip install pyspark==3.3.0
mkdir -p ./envs
# conda install -c conda-forge -n py_env conda-pack
conda pack -f -o ./envs/test-env.tar.gz
./bin/spark-submit --deploy-mode cluster --master k8s://http://127.0.0.1:8001 --conf spark.yarn.dist.archives="./envs/test-env.tar.gz#py_env" --conf spark.pyspark.python="./py_env/bin/python" --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa --name spark-pi --conf spark.executor.instances=5 --conf spark.kubernetes.driver.container.image=spark:latest --conf spark.kubernetes.executor.container.image=spark:latest local:///opt/spark/examples/src/main/python/pi.py
응... 실패...
공식 문서상에 나온대로 이미지 빌드 후, submit진행
./bin/docker-image-tool.sh -t "3.3.0" -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build # docker.io/library/spark-py:3.3.0
동일한 에러 발생...