Provisioning, Spark/Custom Helm

Jeonghak Choยท2025๋…„ 5์›” 18์ผ

Provisioning

๋ชฉ๋ก ๋ณด๊ธฐ
29/44

๐Ÿ“— Provisioning - Spark on Kubernetes

๐Ÿณ๏ธโ€๐ŸŒˆ [๊ถ๊ธˆํ•œ์ ]

  • Spark๋ฅผ ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์— ์„ค์น˜ํ•˜๊ธฐ ์œ„ํ•œ ์ปค์Šคํ…€ ํ—ฌ๋ฆ„ ์ฐจํŠธ ์ƒ์„ฑ

๋ชฉ์ฐจ

์ŠคํŒŒํฌ ์„ค์น˜ ํ™˜๊ฒฝ ์„ค์ •

ํ—ฌ๋ฆ„ REPO ์„ค์ •

helm repo add spark-operator https://kubeflow.github.io/spark-operator

SPARK-ON-KUBERNETES ์„ค์น˜

helm delete myspark -n myspark

helm install myspark spark-operator/spark-operator \
    --namespace myspark \
    --create-namespace \
    --set webhook.enable=true

Kubectl ์„ค์น˜

acho@DESKTOP-SCOK45O:~/.kube$ curl -LO "https://dl.k8s.io/release/v1.30.1/bin/linux/amd64/kubectl"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   138  100   138    0     0    664      0 --:--:-- --:--:-- --:--:--   666
100 49.0M  100 49.0M    0     0  25.0M      0  0:00:01  0:00:01 --:--:-- 31.9M
acho@DESKTOP-SCOK45O:~/.kube$ chmod +x kubectl
sudo mv kubectl /usr/local/bin/
kubectl version --client
Client Version: v1.30.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
  • ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์˜ config ํŒŒ์ผ์„ .kube/config ๋กœ ์ €์žฅ

service account ์ƒ์„ฑ

  • ๋„ค์ž„์ŠคํŽ˜์ด์Šค๋Š” spark ์„ค์น˜ ์‹œ myspark๋กœ ์ƒ์„ฑ๋˜์–ด ์žˆ๋‹ค
k delete sa myspark-sa -n myspark
kubectl create serviceaccount myspark-sa -n myspark

spark ์„ค์น˜

cd ~
wget https://archive.apache.org/dist/spark/spark-3.5.5/spark-3.5.5-bin-hadoop3.tgz
tar -xzf spark-3.5.5-bin-hadoop3.tgz

vi ~/.bashrc
export SPARK_HOME=~/spark-3.5.5-bin-hadoop3
export PATH=$PATH:$SPARK_HOME/bin

ํด๋Ÿฌ์Šคํ„ฐ ๋กค ๋ฐ”์ธ๋”ฉ

k delete clusterrolebinding myspark-cluster-admin-binding
kubectl create clusterrolebinding myspark-cluster-admin-binding --clusterrole=cluster-admin --serviceaccount=myspark:myspark-sa

๊ถŒํ•œ ์„ค์ •

myspark ๋„ค์ž„์ŠคํŽ˜์ด์Šค์˜ spark-sa ์„œ๋น„์Šค ์–ด์นด์šดํŠธ๊ฐ€ pod ์ƒ์„ฑ ๊ถŒํ•œ์ด ์žˆ๋Š” ์ง€ ํ™•์ธ

acho@DESKTOP-SCOK45O:~$ kubectl auth can-i create pods --as=system:serviceaccount:myspark:myspark-sa
yes

ํ—ฌ๋ฆ„ ์ƒํƒœ ํ™•์ธ

vagrant@master:~$ helm status --namespace myspark myspark
NAME: myspark
LAST DEPLOYED: Sun Apr 20 08:42:56 2025
NAMESPACE: myspark
STATUS: deployed
REVISION: 1
TEST SUITE: None

์˜คํผ๋ ˆ์ดํ„ฐ POD ํ™•์ธ

vagrant@master:~$ k get pod -n myspark
NAME                                                READY   STATUS    RESTARTS   AGE
myspark-spark-operator-controller-9b884b965-9dwdr   1/1     Running   0          84s
myspark-spark-operator-webhook-6784dd785-sz7p8      1/1     Running   0          84s

์ŠคํŒŒํฌ ์žก ์ƒ์„ฑ

SPARK-SUBMIT

spark-submit --master k8s://192.168.56.10:6443 --name mysubmit \
--deploy-mode cluster \
--driver-cores 1 \
--driver-memory 512m \
--num-executors 1 \
--executor-cores 1 \
--executor-memory 512m \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=myspark \
--conf spark.kubernetes.container.image=spark:3.5.5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=myspark-sa \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar
  • ์ฒ˜์Œ์—๋Š” "spark:3.5.5" ์ด๋ฏธ์ง€๋ฅผ ๋ฐ›๋А๋ผ Container ์ƒ์„ฑ ์‹œ Pending ์ƒํƒœ๋กœ ์‹œ๊ฐ„์ด ์†Œ์š”๋œ๋‹ค.

MYSUBMIT POD ํ™•์ธ

acho@DESKTOP-SCOK45O:~$ k get po -n myspark
NAME                                                 READY   STATUS      RESTARTS   AGE
myspark-spark-operator-controller-57f699456d-8jzp2   1/1     Running     0          117m
myspark-spark-operator-webhook-7655df59db-t9m2k      1/1     Running     0          117m
mysubmit-47ec4c96e1e691cb-driver                     0/1     Completed   0          9m24s
mysubmit-fe84a896e1e84519-driver                     0/1     Completed   0          7m32s
  • ๋กœ๊ทธ ํ™•์ธ
    POD๊ฐ€ ์ƒ์„ฑ๋˜์–ด Pi ๊ฐ’์„ ์ •์ƒ์ ์œผ๋กœ ์ฐพ์•˜๋Š” ์ง€ ๋กœ๊ทธ ํ™•์ธ์„ ํ•œ๋‹ค.
k logs mysubmit-fe84a896e1e84519-driver -n myspark
...
Pi is roughly 3.138275691378457
...

Spark ์ปค์Šคํ…€ ํ—ฌ๋ฆ„ ์ฐจํŠธ ์ƒ์„ฑ

์ŠคํŒŒํฌ ์ฐจํŠธ pull

helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm pull spark-operator/spark-operator --untar
acho@DESKTOP-SCOK45O:~/mysparkjob$ mv spark-operator myspark
acho@DESKTOP-SCOK45O:~/mysparkjob$ ls
myspark

์ŠคํŒŒํฌ ์„ค์น˜

helm install myspark ./myspark

acho@DESKTOP-SCOK45O:~/mysparkjob$ k get po -n myspark
NAME                                                 READY   STATUS      RESTARTS   AGE
myspark-spark-operator-controller-57f699456d-7hvks   1/1     Running     0          4m
myspark-spark-operator-webhook-7655df59db-w5fcb      1/1     Running     0          4m

์ŠคํŒŒํฌ ์‹คํ–‰

spark-submit --master k8s://192.168.56.10:6443 --name mysubmit \
--deploy-mode cluster \
--driver-cores 1 \
--driver-memory 512m \
--num-executors 1 \
--executor-cores 1 \
--executor-memory 512m \
--class org.apache.spark.examples.SparkPi \
--conf spark.kubernetes.namespace=myspark \
--conf spark.kubernetes.container.image=spark:3.5.5 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=myspark-sa \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.5.5.jar

์ŠคํŒŒํฌ ํ™•์ธ

acho@DESKTOP-SCOK45O:~/mysparkjob$ k get po -n myspark
NAME                                                 READY   STATUS      RESTARTS   AGE
myspark-spark-operator-controller-57f699456d-7hvks   1/1     Running     0          6m13s
myspark-spark-operator-webhook-7655df59db-w5fcb      1/1     Running     0          6m13s
mysubmit-bef8ce96e295fa27-driver                     0/1     Completed   0          21s
k logs mysubmit-bef8ce96e295fa27-driver -n myspark

...
Pi is roughly 3.138275691378457
...

์ฐธ๊ณ 

Spark ์ฐจํŠธ ์œ„์น˜

https://github.com/kubeflow/spark-operator/

๋„์ปค ์ด๋ฏธ์ง€

https://github.com/kubeflow/spark-operator/blob/master/Dockerfile

0๊ฐœ์˜ ๋Œ“๊ธ€