๐ Provisioning - Spark on Kubernetes
๐ณ๏ธโ๐ [๊ถ๊ธํ์ ]
๋ชฉ์ฐจ
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm delete myspark -n myspark
helm install myspark spark-operator/spark-operator \
--namespace myspark \
--create-namespace \
--set webhook.enable=true
acho@DESKTOP-SCOK45O:~/.kube$ curl -LO "https://dl.k8s.io/release/v1.30.1/bin/linux/amd64/kubectl"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 138 100 138 0 0 664 0 --:--:-- --:--:-- --:--:-- 666
100 49.0M 100 49.0M 0 0 25.0M 0 0:00:01 0:00:01 --:--:-- 31.9M
acho@DESKTOP-SCOK45O:~/.kube$ chmod +x kubectl
sudo mv kubectl /usr/local/bin/
kubectl version --client
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
k delete sa myspark-sa -n myspark
kubectl create serviceaccount myspark-sa -n myspark
cd ~
wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz
tar -xzf spark-3.5.5-bin-hadoop3.tgz
vi ~/.bashrc
export SPARK_HOME=~/spark-3.5.5-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin
k delete clusterrolebinding myspark-cluster-admin-binding
kubectl create clusterrolebinding myspark-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=myspark:myspark-sa
myspark ๋ค์์คํ์ด์ค์ spark-sa ์๋น์ค ์ด์นด์ดํธ๊ฐ pod ์์ฑ ๊ถํ์ด ์๋ ์ง ํ์ธ
acho@DESKTOP-SCOK45O:~$ kubectl auth can-i create pods --as=system:serviceaccount:myspark:myspark-sa
yes
vagrant@master:~$ helm status --namespace myspark myspark
NAME: myspark
LAST DEPLOYED: Sun Apr 20 08:42:56 2025
NAMESPACE: myspark
STATUS: deployed
REVISION: 1
TEST SUITE: None
vagrant@master:~$ k get pod -n myspark
NAME READY STATUS RESTARTS AGE
myspark-spark-operator-controller-9b884b965-9dwdr 1/1 Running 0 84s
myspark-spark-operator-webhook-6784dd785-sz7p8 1/1 Running 0 84s
spark-submit --master k8s://192.168.56.10:6443 --name mysubmit \
--deploy-mode cluster \
--driver-cores 1 \
--driver-memory 512m \
--num-executors 1 \
--executor-cores 1 \
--executor-memory 512m \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=myspark \
--conf spark.kubernetes.container.image=spark:3.5.5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=myspark-sa \
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
acho@DESKTOP-SCOK45O:~$ k get po -n myspark
NAME READY STATUS RESTARTS AGE
myspark-spark-operator-controller-57f699456d-8jzp2 1/1 Running 0 117m
myspark-spark-operator-webhook-7655df59db-t9m2k 1/1 Running 0 117m
mysubmit-47ec4c96e1e691cb-driver 0/1 Completed 0 9m24s
mysubmit-fe84a896e1e84519-driver 0/1 Completed 0 7m32s
k logs mysubmit-fe84a896e1e84519-driver -n myspark
...
Pi is roughly 3.138275691378457
...
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm pull spark-operator/spark-operator --untar
acho@DESKTOP-SCOK45O:~/mysparkjob$ mv spark-operator myspark
acho@DESKTOP-SCOK45O:~/mysparkjob$ ls
myspark
helm install myspark ./myspark
acho@DESKTOP-SCOK45O:~/mysparkjob$ k get po -n myspark
NAME READY STATUS RESTARTS AGE
myspark-spark-operator-controller-57f699456d-7hvks 1/1 Running 0 4m
myspark-spark-operator-webhook-7655df59db-w5fcb 1/1 Running 0 4m
spark-submit --master k8s://192.168.56.10:6443 --name mysubmit \
--deploy-mode cluster \
--driver-cores 1 \
--driver-memory 512m \
--num-executors 1 \
--executor-cores 1 \
--executor-memory 512m \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=myspark \
--conf spark.kubernetes.container.image=spark:3.5.5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=myspark-sa \
local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
acho@DESKTOP-SCOK45O:~/mysparkjob$ k get po -n myspark
NAME READY STATUS RESTARTS AGE
myspark-spark-operator-controller-57f699456d-7hvks 1/1 Running 0 6m13s
myspark-spark-operator-webhook-7655df59db-w5fcb 1/1 Running 0 6m13s
mysubmit-bef8ce96e295fa27-driver 0/1 Completed 0 21s
k logs mysubmit-bef8ce96e295fa27-driver -n myspark
...
Pi is roughly 3.138275691378457
...
https://github.com/kubeflow/spark-operator/
https://github.com/kubeflow/spark-operator/blob/master/Dockerfile