[k8s] EKS 에 Spark operator 설치, Airflow worker 에 Spark IRSA 부여

Woong·2026년 1월 22일

Docker, k8s

목록 보기
30/38

Spark Operator 설치

helm 으로 spark operator 설치

helm repo add spark-operator https://kubeflow.github.io/spark-operator
  • nodeselector, tolerations 설정하고 service account 에 IRSA annotations 부여
spark:
  jobNamespaces:
    - spark
  serviceAccount:
    create: true
    name: spark-job-sa
    ## role 부여
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::<AWS_ACCOUNT_ID>:role/<IRSA_ROLE>
  rbac:
    create: true

controller:
  nodeSelector:
    nodegroup: <NODE_GROUP>
  tolerations:
    - key: "nodegroup"
      operator: "Equal"
      value: "<NODE_GROUP>"
      effect: "NoSchedule"

webhook:
  enable: true
  nodeSelector:
    nodegroup: <NODE_GROUP>
  tolerations:
    - key: "nodegroup"
      operator: "Equal"
      value: "<NODE_GROUP>"
      effect: "NoSchedule"
kubectl create namespace spark
helm upgrade --install spark-operator spark-operator/spark-operator \
  -n spark-operator --create-namespace \
  -f values.yaml
  • 설치 확인
jungahn@jungahn-MacBooK5 spark % kubectl get pods -n spark-operator
NAME                                        READY   STATUS    RESTARTS   AGE
spark-operator-controller-dbd665897-cq9hs   1/1     Running   0          87s
spark-operator-webhook-547c75d9d9-l9t9d     1/1     Running   0          87s
jungahn@jungahn-MacBooK5 spark % kubectl get crd | grep spark
scheduledsparkapplications.sparkoperator.k8s.io                2026-01-15T05:18:42Z
sparkapplications.sparkoperator.k8s.io                         2026-01-15T05:18:44Z
sparkconnects.sparkoperator.k8s.io                             2026-01-15T05:18:45Z

IRSA 적용

  • IRSA (IAM Roles for Service Account) 생성
PROFILE="<AWS_PROFILE>"
CLUSTER_NAME="<CLUSTER_NAME>"
REGION="ap-northeast-1"
  • 클러스터 OIDC issuer 확인
OIDC_ISSUER=$(aws eks describe-cluster \
  --name "${CLUSTER_NAME}" --region "${REGION}" \
  --query "cluster.identity.oidc.issuer" --profile ${PROFILE} --output text)

echo "${OIDC_ISSUER}"
# https://oidc.eks.ap-northeast-1.amazonaws.com/id/<...>
  • OIDC provider ARN 확인
    • https:// 는 제거하고 확인
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --profile ${PROFILE} --output text)
OIDC_HOSTPATH=${OIDC_ISSUER#https://}

OIDC_PROVIDER_ARN="arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_HOSTPATH}"
echo "${OIDC_PROVIDER_ARN}"
  • ARN 체크
aws iam list-open-id-connect-providers --query "OpenIDConnectProviderList[].Arn" --profile ${PROFILE} --output text | tr '\t' '\n' | grep -F "${OIDC_PROVIDER_ARN}"
  • trust policy.json 만들기
NAMESPACE="spark"
SA_NAME="spark-job-sa"

cat > trust-policy.json <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "Federated": "${OIDC_PROVIDER_ARN}" },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_HOSTPATH}:aud": "sts.amazonaws.com",
          "${OIDC_HOSTPATH}:sub": "system:serviceaccount:${NAMESPACE}:${SA_NAME}"
        }
      }
    }
  ]
}
EOF
  • role 생성
ROLE_NAME="spark-job-irsa-role"

aws iam create-role \
  --role-name "${ROLE_NAME}" \
  --assume-role-policy-document file://trust-policy.json \
  --description "IRSA role for Spark jobs in ${NAMESPACE}/${SA_NAME}" \
  --profile ${PROFILE}
  • 생성된 role 점검
ROLE_ARN=$(aws iam get-role --role-name "${ROLE_NAME}" --query 'Role.Arn' --output text --profile ${PROFILE})
echo "${ROLE_ARN}"
  • role 에 권한 부여
aws iam attach-role-policy --role-name "${ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess --profile ${PROFILE}
aws iam attach-role-policy --role-name "${ROLE_NAME}" --policy-arn arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess --profile ${PROFILE}
  • Kubernetes ServiceAccount에 role annotation 붙이기
kubectl -n "${NAMESPACE}" annotate sa "${SA_NAME}" \
  eks.amazonaws.com/role-arn="${ROLE_ARN}" \
  --overwrite
  • role annotaion 잘 붙었는지 점검

kubectl -n "$NAMESPACE" get sa "$SA_NAME" -o yaml | sed -n '1,80p'


커스텀 airflow 빌드

  • 의존성 추가를 위해 커스텀 airflow 를 빌드 후 ECR 로 업로드
FROM apache/airflow:2.11.0

RUN pip install --no-cache-dir \
    "apache-airflow==2.11.0" \
    "apache-airflow-providers-cncf-kubernetes==10.5.0" \
    "apache-airflow-providers-standard==1.10.3" \
    "apache-airflow-providers-common-compat>=1.7.2"
docker build .
AWS_PROFILE="<profile>"
AWS_REGION="ap-northeast-1"
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text --profile "$AWS_PROFILE")

aws ecr get-login-password \
  --region "${AWS_REGION}" \
  --profile "${AWS_PROFILE}" \
| docker login --username AWS --password-stdin \
  "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"


REGISTRY_NAME="apache/airflow"
IMAGE_TAG=2.11.0-custom
REPO_NAME="${AWS_ACCOUNT_ID}.dkr.ecr.ap-northeast-1.amazonaws.com/${REGISTRY_NAME}"

docker build --platform linux/amd64 -t "${REPO_NAME}:${IMAGE_TAG}" .

docker push "${REPO_NAME}:${IMAGE_TAG}"
  • sha256 digest 확인
    • docker push 할 때에도 출력되나, 재확인 필요시 점검
aws ecr describe-images \
  --region "${AWS_REGION}" \
  --registry-id "${AWS_ACCOUNT_ID}" \
  --repository-name "${REGISTRY_NAME}" \       
  --image-ids imageTag="${IMAGE_TAG}" \
  --query 'imageDetails[0].imageDigest' \
  --output text

helm chart 로 airflow 설치

  • ECR 에 업로드한 커스텀 airflow 이미지로 명시
    • digest 도 설정되어있는 경우 반영
...
images:
  airflow:
    # dev
    repository: <ECR_REPO>
    tag: 2.11.0-custom
    # 버전 업데이트시 갱신
    digest: sha256:<hash>
...
helm repo add apache-airflow https://airflow.apache.org
kubectl apply -f secret.yaml
helm upgrade --install airflow apache-airflow/airflow --namespace airflow --create-namespace -f values.yaml

airflow 에서 spark 사용

  • service account 확인
% kubectl get pod -n airflow -l component=worker -o jsonpath='{.items[0].spec.serviceAccountName}{"\n"}'
airflow-worker
  • airflow worker 에서 SparkApplication CRD 생성,조회,삭제 가능하도록 권한 부여
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: airflow-worker-spark-access
  namespace: spark
rules:
  # SparkKubernetesOperator가 driver/job 확인을 위해 pods 조회 필요
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch", "patch"]

  # get_logs/attach_log 사용 시 필요
  - apiGroups: [""]
    resources: ["pods/log"]
    verbs: ["get"]

  # SparkApplication CRD를 생성/조회/삭제
  - apiGroups: ["sparkoperator.k8s.io"]
    resources: ["sparkapplications"]
    verbs: ["create", "get", "list", "watch", "delete"]
  
  #  status 서브리소스 권한
  - apiGroups: ["sparkoperator.k8s.io"]
    resources: ["sparkapplications/status"]
    verbs: ["get"]

  - apiGroups: [""]
    resources: ["events"]
    verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: airflow-worker-spark-access
  namespace: spark
subjects:
  - kind: ServiceAccount
    name: airflow-worker
    namespace: airflow
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: airflow-worker-spark-access
kubectl apply -f airflow-worker-spark-access.yaml
  • 권한 부여되었는지 점검
# pod list 확인
% kubectl auth can-i list pods -n spark --as system:serviceaccount:airflow:airflow-worker

yes

# 
% kubectl auth can-i create sparkapplications.sparkoperator.k8s.io -n spark --as system:serviceaccount:airflow:airflow-worker

yes

% kubectl auth can-i get pods/log -n spark --as system:serviceaccount:airflow:airflow-worker
yes

% kubectl auth can-i get sparkapplications/status -n spark --as system:serviceaccount:airflow:airflow-worker

yes

% kubectl auth can-i patch pods -n spark --as system:serviceaccount:airflow:airflow-worker

yes

0개의 댓글