아래와 같이 반복적으로 Init:~
현상이 지속되거나, Error를 나타내는 문구를 나타내면 Init Container 기동에 문제가 발생한 것이다.
Kubernetes는 Pod라는 배포 최소단위를 쓰고, 이 Pod 안에는 1개 또는 그 이상의 Container가 포함된다.
이 Container는 아래와 같은 유형으로 나눠볼 수 있다.
Init Container는 Pod의 Runtime Container가 실행되기 전에 실행되는 초기화 컨테이너이다.
Pod가 Init:1/2
와 같은 Status라면 2개의 Init Container 중 1개가 성공적으로 완료되었음을 나타낸다.
Init Container가 실패한다면 k8s는 기본적으로 Init Container가 성공할 때까지 Pod를 반복적으로 재시작한다.
즉, Init: ~
상태가 지속되고 에러를 나타내는 Status를 나타낸다면 해당 Pod의 어떤 Container에서 문제가 발생했는지 조사해볼 필요가 있다.
우선 Error Pod의 Log를 살펴보자.
Error 부분에 scheduler
Container가 대기중이고, PodInitalizing
상태라고 나와있다. 앞단에서 Error가 나서 대기중 인 것으로 보인다.
다만 Pod Log 조회만으로는 한계가 있다.
kubectl logs <pod_id>
Pod를 좀 더 자세히 살펴보자.
kubectl describe pod <pod_id>
아래와 같은 상세내역이 출력된다.
Name: airflow-scheduler-5c9d5d7d69-qssbr
Namespace: airflow
Priority: 0
Service Account: airflow-scheduler
Node: airflow-cluster-worker3/172.18.0.4
Start Time: Fri, 03 Feb 2023 00:45:35 +0900
Labels: component=scheduler
pod-template-hash=5c9d5d7d69
release=airflow
tier=airflow
Annotations: checksum/airflow-config: 7c087ba34ba46da1bc27e008d659d87d9afe6d39dccd0b7ddcf7287caa66e105
checksum/extra-configmaps: 2e44e493035e2f6a255d08f8104087ff10d30aef6f63176f1b18f75f73295598
checksum/extra-secrets: bb91ef06ddc31c0c5a29973832163d8b0b597812a793ef911d33b622bc9d1655
checksum/metadata-secret: dcbb26b06a9d686bf5fedceff6d4024447053fded58a37271cdfef14f8c8c800
checksum/pgbouncer-config-secret: da52bd1edfe820f0ddfacdebb20a4cc6407d296ee45bcb500a6407e2261a5ba2
checksum/result-backend-secret: 74e3e99feee51248d44224665d60fab543dd6b25ba95f04e6fcb0e5758342056
cluster-autoscaler.kubernetes.io/safe-to-evict: true
Status: Pending
IP: 10.244.1.4
IPs:
IP: 10.244.1.4
Controlled By: ReplicaSet/airflow-scheduler-5c9d5d7d69
Init Containers:
wait-for-airflow-migrations:
Container ID: containerd://55cbaa8d8a05cf937488701aa144959c0997f2a7ae0983a003cb4580e431f612
Image: airflow-custom:1.0.0
Image ID: docker.io/library/import-2023-02-02@sha256:f3854eb3d766f2b7814942d41403e064d9b61674a76ea7f8945a8b42c77c1308
Port: <none>
Host Port: <none>
Args:
airflow
db
check-migrations
--migration-wait-timeout=60
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 03 Feb 2023 14:18:50 +0900
Finished: Fri, 03 Feb 2023 14:19:49 +0900
Ready: True
Restart Count: 1
Environment Variables from:
airflow-variables ConfigMap Optional: false
Environment:
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'airflow-fernet-key'> Optional: false
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-airflow-metadata'> Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'> Optional: false
AIRFLOW__CELERY__BROKER_URL: <set to the key 'connection' in secret 'airflow-broker-url'> Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4w45 (ro)
git-sync-init:
Container ID: containerd://13a385b88ec5c5549f6f057c53e2758c9894606a7ab455457ed9c4c0a51a3683
Image: k8s.gcr.io/git-sync/git-sync:v3.4.0
Image ID: k8s.gcr.io/git-sync/git-sync@sha256:a470676e946f1060815f89dadad4f2c3e4f9d1ab36a46f4423e00f44170fc80c
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 03 Feb 2023 14:41:00 +0900
Finished: Fri, 03 Feb 2023 14:41:00 +0900
Ready: False
Restart Count: 9
Environment:
GIT_SSH_KEY_FILE: /etc/git-secret/ssh
GIT_SYNC_SSH: true
GIT_KNOWN_HOSTS: false
GIT_SYNC_REV: HEAD
GIT_SYNC_BRANCH: main
GIT_SYNC_REPO: ssh://git@github.com:jeongseok912/airflow_dags.git
GIT_SYNC_DEPTH: 1
GIT_SYNC_ROOT: /git
GIT_SYNC_DEST: repo
GIT_SYNC_ADD_USER: true
GIT_SYNC_WAIT: 60
GIT_SYNC_MAX_SYNC_FAILURES: 0
GIT_SYNC_ONE_TIME: true
Mounts:
/etc/git-secret/ssh from git-sync-ssh-key (ro,path="gitSshKey")
/git from dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4w45 (ro)
Containers:
scheduler:
Container ID:
Image: airflow-custom:1.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
bash
-c
exec airflow scheduler
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Liveness: exec [sh -c CONNECTION_CHECK_MAX_COUNT=0 AIRFLOW__LOGGING__LOGGING_LEVEL=ERROR exec /entrypoint \
airflow jobs check --job-type SchedulerJob --hostname $(hostname)
] delay=10s timeout=20s period=60s #success=1 #failure=5
Environment Variables from:
airflow-variables ConfigMap Optional: false
Environment:
AIRFLOW__CORE__FERNET_KEY: <set to the key 'fernet-key' in secret 'airflow-fernet-key'>
Optional: false
AIRFLOW__CORE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'>
Optional: false
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: <set to the key 'connection' in secret 'airflow-airflow-metadata'>
Optional: false
AIRFLOW_CONN_AIRFLOW_DB: <set to the key 'connection' in secret 'airflow-airflow-metadata'>
Optional: false
AIRFLOW__WEBSERVER__SECRET_KEY: <set to the key 'webserver-secret-key' in secret 'airflow-webserver-secret-key'> Optional: false
AIRFLOW__CELERY__BROKER_URL: <set to the key 'connection' in secret 'airflow-broker-url'>
Optional: false
Mounts:
/opt/airflow/airflow.cfg from config (ro,path="airflow.cfg")
/opt/airflow/config/airflow_local_settings.py from config (ro,path="airflow_local_settings.py")
/opt/airflow/dags from dags (ro)
/opt/airflow/logs from logs (rw)
/opt/airflow/pod_templates/pod_template_file.yaml from config (ro,path="pod_template_file.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4w45 (ro)
git-sync:
Container ID:
Image: k8s.gcr.io/git-sync/git-sync:v3.4.0
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
GIT_SSH_KEY_FILE: /etc/git-secret/ssh
GIT_SYNC_SSH: true
GIT_KNOWN_HOSTS: false
GIT_SYNC_REV: HEAD
GIT_SYNC_BRANCH: main
GIT_SYNC_REPO: ssh://git@github.com:jeongseok912/airflow_dags.git
GIT_SYNC_DEPTH: 1
GIT_SYNC_ROOT: /git
GIT_SYNC_DEST: repo
GIT_SYNC_ADD_USER: true
GIT_SYNC_WAIT: 60
GIT_SYNC_MAX_SYNC_FAILURES: 0
Mounts:
/etc/git-secret/ssh from git-sync-ssh-key (ro,path="gitSshKey")
/git from dags (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4w45 (ro)
scheduler-log-groomer:
Container ID:
Image: airflow-custom:1.0.0
Image ID:
Port: <none>
Host Port: <none>
Args:
bash
/clean-logs
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
AIRFLOW__LOG_RETENTION_DAYS: 15
Mounts:
/opt/airflow/logs from logs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j4w45 (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: airflow-airflow-config
Optional: false
dags:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
git-sync-ssh-key:
Type: Secret (a volume populated by a Secret)
SecretName: airflow-ssh-git-secret
Optional: false
logs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-j4w45:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 13h kubelet Successfully pulled image "k8s.gcr.io/git-sync/git-sync:v3.4.0" in 1m4.951487006s
Normal Started 13h (x4 over 13h) kubelet Started container git-sync-init
Normal Created 13h (x5 over 13h) kubelet Created container git-sync-init
Normal Pulled 13h (x4 over 13h) kubelet Container image "k8s.gcr.io/git-sync/git-sync:v3.4.0" already present on machine
Warning BackOff 12h (x271 over 13h) kubelet Back-off restarting failed container
Normal Pulled 26m kubelet Container image "airflow-custom:1.0.0" already present on machine
Normal SandboxChanged 26m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Created 26m kubelet Created container wait-for-airflow-migrations
Normal Started 26m kubelet Started container wait-for-airflow-migrations
Normal Started 24m (x4 over 25m) kubelet Started container git-sync-init
Normal Pulled 23m (x5 over 25m) kubelet Container image "k8s.gcr.io/git-sync/git-sync:v3.4.0" already present on machine
Normal Created 23m (x5 over 25m) kubelet Created container git-sync-init
Warning BackOff 68s (x111 over 25m) kubelet Back-off restarting failed container
Init Containers
라는 항목과 Containers
라는 항목이 보이고, 각각 아래와 같은 Container들이 보인다.
wait-for-airflow-migrations
, git-sync-init
sceduler
, git-sync
, scheduler-log-groomer
각 Container의 State, Reason을 보면 git-sync-init
Container에 Waiting/CrashLoopBackOff
가 떴고, Ready: False
, Restart Count: 9
인 것을 볼 수 있다.
그래서 sceduler
Container에 Waiting/PodInitializing
상태이다.
Error Pod의 Error Container의 Log를 조회하면 Error 원인을 파악할 수 있다.
kubectl logs <pod_id> -c <container_id>
GitHub에서 Repo를 SSH를 이용해 Clone 해 본 결과 SSH Key 등록도 정상적으로 된 상태이다.
뭔가 미심쩍은 부분이 예상됐다.
SSH로 Clone 시 ssh://
를 안붙이는데 가이드와 주석대로 ssh://
붙인 형식이 혹시?
역시나 였다.
ssh://
제거 후 재배포 해보니 Pod Status도 Running으로 정상화되고, GitHub 상에 있는 DAG도 정상적으로 Sync 되었다.
https://kubernetes.io/ko/docs/tasks/debug/debug-application/debug-init-containers/
감사합니다 도움을 받았습니다. 원인이 맞는지는 모르겠지만 master를 쓰는 repo에 연결했을 때는 "ssh://"를 붙였었는데, main으로 새로 repo를 만들고 세팅하면서 문제를 겪었습니다. "ssh://"를 붙여도 됐었어서 더 헷갈렸네요..;