Kubernetes Pod

Chang-__-·2023년 4월 25일

Lifecycle

Pod를 생성한 후에 내용 하단을 보면 status 라는 내용을 볼 수 있다.

status:
  phase: Pending
  conditions:
	- type: Initalized
    ....

여기 내용에 포함되는 정보는 다음과 같다.

Phase: Pod의 전체 전체 상태를 대표하는 속성 (Pending, Running, Succeeded, Failed)
Conditions: Pod가 생성되면서 실행하는 단계가 있는데 그 단계를 알려주는 속성(Initialized, ContainerReady, PodScheduled, Ready)
ContainerStatuses: Pod안의 컨테이너의 상태를 알려주는 속성 (Waiting, Running, Terminitaed)

Pending

Pod가 최초상태이다. InitContainer 라고해서 실제 컨테이너가 기동 되기전에 초기화 시켜야하는 내용들이 있을경우 그 내용을 담는 컨테이너다.
볼륨 혹은 보안셋팅을 해야할 일이 있을경우 initContainers 안에 항목에 초기화 스크립트를 넣을 수 있다.

apiVersoin: v1
kind: Pod
...
spec:
  containers:
  ...
  initContainers:
  - name: init-myservice
    image: busybox:1.28
    command: ['sh', '-c', 'echo The app is running!']

initConatiers의 스크립트가 성공적으로 실행 했거나 아예 설정하지 않았을 때는 Initalized의 값이 True, 실패하게 되면 False 가 된다.
그 다음 Pod를 직접 Node에 지정을 했을 때는 해당 Node에 Pod를 기동하고, 아닐 경우엔 k8s가 직접 노드의 자원의 상태를 고려하여 Pod를 생성한다. 이게 완료가 되면 PodScheduled: True 가 되고 Image를 다운로드 한다. 이 작업이 끝나면 Running 상태로 간다.

Running

일반적으로 Pod의 상태가 Running이 될때는 정상적으로 기동이 되겠지만 하나의 컨테이너 혹은 모든 컨테이너가 기동중에 문제가 발생해서 재시작 될수도 있다. 그럼 이때 Container의 상태는 CrashLoopBackOff가 될것이고 Pod는 이런상태를 Running이라고 간주한다.
하지만 내부 Condition 값에는 ContainerReady: False Ready: False의 값을 줄것이다.
그러다가 결국 모든 Container들이 제대로 기동이 되어서 원활하게 돌아간다면 ContainerReady: True Ready: True 상태로 바뀔것이다.
여기서 만약 Job 으로 생성된 Pod일 경우 Job이 돌아가는 경우에는 Running 상태 이지만 Job이 끝나거나 실패했을 땐 Succeeded, Failed 상태로 된다.

Succeeded

Job이 성공을 한 상태라도 Pod의 상태는 ContainerReady: False Ready: False 의 상태로 변해 있을 것이다. 다만 Container의 상태가 Terminiated: Completed로 Failed 일때와는 차이가 있다.

Failed

Job이 돌다가 실패를 하게 되며면 ContainerReady: False Ready: False로 있고 Container의 Terminiated 상태도 Error로 되어있다.

Health Check

한 서비스에 2개의 Pod가 있다고 가정을 해보면 이 각각의 Pod들은 50%씩 트래픽을 나눠가질 것이다.
만약에 Pod2가 죽으면 Pod1로 트래픽이 100% 몰릴껀데, Pod1이 견뎌 준다면 AutoHealing을 통해 Pod2는 그 동안 새롭게 재생성 될것이다. 이 때 Pod와 Container는 Running 중이고 Service에도 붙을것이다. 하지만 Pod와 Container는 Running 중이지만 application이 구동이 되고 있을 때 Pod2로 트래픽이 유입이 되는 사용자는 오류를 경험 할것이다. 이 순간에 ReadinessProbe 옵션을 주게 되면 앱이 구동되기 전까지 Service와 연결이 되지 않고 application이 완전히 올라온 상태여야 Service와 연결을 시켜준다.

또 한가지 경우가 있는데 Pod2 와 Continer는 정상적으로 Running이 되고 있지만 application이 죽어 있는 경우가 있다. 가령 OOM (Out of Memory) 상태나 기타 여러 상황들이 있는데 이때 Pod2로 흐르는 트래픽들은 또 계속 오류를 뱉게 된다. 이때 application의 장애 상황을 탐지해주는게 LivenessProbe 인데 application에 문제가 생기면 Pod를 재생성 시키도록 도와준다.

Probe를 확인하는 방법은 3가지가 있다.

httpGet 으로 특정 path를 확인하여 application이 구동 되었는지 확인
Exec 을해서 cat /tmp/ready.txt 와 같이 실제 application이 구동 되었는지 확인
tcpSocket 으로 tcp의 connection을 통해 application이 구동 되었는지 확인
이 3가지 속성중 한가지는 선택해서 어떤 방법으로 application이 구동 되었는지를 알수있는지에 대한 옵션은 줘야한다.

또다른 옵션 값들이 있는데 다음과 같다.

initalDelaySeconds: 최초 probe를 하기전에 delay 시간을 지정하는 옵션. (default 0)
periodSeconds: probe를 체크하는 시간의 간격을 지정하는 옵션. (default 10)
timeoutSeconds: 지정된 시간까지 결과가 와야하는 것을 지정하는 옵션. (default 1)
successThreshold: 몇번 성공 결과를 받아야 정말 성공인지 체크하는 옵션. (default 1)
failureThreshold: 몇번 실패를 해야 정말 실패인지 지정을 하는 옵션 실패가 되면 pod 재시작. (default 3)

ReadinessProbe

ReadinessProbe 에선 Exec 옵션을 통해 application이 구동 되었는지 확인 해보겠다.

apiVersion: v1
kind: Pod
metadata:
  name: pod-readiness-exec1
  labels:
    app: readiness  
spec:
  containers:
  - name: readiness
    image: kubetm/app
    ports:
    - containerPort: 8080	
    readinessProbe:
      exec:
        command: ["cat", "/readiness/ready.txt"]
      initialDelaySeconds: 5
      periodSeconds: 10
      successThreshold: 3
    volumeMounts:
    - name: host-path
      mountPath: /readiness
  volumes:
  - name : host-path
    hostPath:
      path: /tmp/readiness
      type: DirectoryOrCreate
  terminationGracePeriodSeconds: 0

여기서 spec.containers.readinessProbe 에 exec을 옵션으로 줬고 실제 파일을 체크하여 application이 정상 구동되었는지 확인을 할수 있다.

LivenessProbe

LivenessProbe는 api 로 health check를 해보겠다.

apiVersion: v1
kind: Pod
metadata:
  name: pod-liveness-httpget1
  labels:
    app: liveness
spec:
  containers:
  - name: liveness
    image: kubetm/app
    ports:
    - containerPort: 8080
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
      failureThreshold: 3
  terminationGracePeriodSeconds: 0

여기서 spec.containers.livenessProbe 에 httpGet 옵션으로 줬고 api 를 호출하여 application이 정상적으로 돌아가고 있는지 확인할 수 있다.

Node Scheduling

Pod가 기본적으로 k8s 클러스터의 Shceduler를 통해 할당이 되지만 임의로 운영자나 개발자가 Node를 지정해서 스케쥴 할수도 있다.

Node를 직접 선택

다음과 같은 노드들이 있다고 가정해보자.

Node1: Cpu 70, Label KR:az-1
Node2: Cpu 50, Label KR:az-1
Node3: Cpu 30, Label KR:az-2
Node4: Cpu 50, Label US:az-1
Node5: Cpu 30, Label US:az-2

NodeName

NodeName을 사용하여 Pod에 Node1 을 선택하면 NodeName으로 지정한 Node1 에 바로 Pod가 생성이 된다. 명시적이긴 하지만 Node가 삭제될 수도 있고 이름이 변경될 수도 있기에 이렇게 잘 사용은 하지 않는다.

NodeSelector

이번엔 Pod에 Node의 key 와 value를 KR: az-2 라고 달아서 생성했다고 가정해보자. 그러면 Node3에 Pod가 생성이 될것이다. 여기서 재미있는건 Node1 과 Node2는 key value가 동일한데 이 때는 자원이 많은 Node로 Pod가 생성이 된다. 다만 조금 불편한점은 key value가 정확하지 않으면 Pod가 생성이 되지 않는다.

NodeAffinity

Pod에 Key만 설정해도 자원이 많은 Node에 할당이 된다 예를들어 US 라는 Key를 단 Pod를 생성하면 Node4에 생성이 될것이다. 그리고 Key가 아예 없는 CH를 생성하더라도 마찬가지로 Pod 가 가장 자원이 많이 남는 Node에 생성이 될것이다.

matchExpressions: key를 kr로 설정한 경우엔 Node1, Node2, Node3에 Pod가 생성이 될 수 있다.

apiVersion: v1
kind: Pod
metadata:
 name: pod-match-expressions1
spec:
 affinity:
  nodeAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:   
    nodeSelectorTerms:
    - matchExpressions:
      -  {key: kr, operator: Exists}
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0

required: 예를 들어 key를 ch로 설정한 다음 Pod를 생성하려하면 required 옵션은 스케쥴링 되지 않는다. (키값이 KR, US 만 있기 때문)

apiVersion: v1
kind: Pod
metadata:
 name: pod-required
spec:
 affinity:
  nodeAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
    - matchExpressions:
      - {key: ch, operator: Exists}
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0

preffered: 이 경우에는 ch로 설정해도 Pod가 생성이 된다.

apiVersion: v1
kind: Pod
metadata:
 name: pod-preferred
spec:
 affinity:
  nodeAffinity:
   preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 1
      preference:
       matchExpressions:
       - {key: ch, operator: Exists}
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0

Pod 집중/분산

다음과 같은 노드들이 있다고 가정해보자.

Node1: Label a-team: 1
Node2: Label a-team: 2

Pod Affinity

예를 들어 Server1 과 Server2가 항상 같은 Node에 있어야 한다고 가정을 해보면 Pod Affinity를 사용해야 한다. type: Server 라는 라벨을 가진 Server1 Pod가 Node1 에 생성이 된다고 가정을 해보자. Pod Affinity 속성을 넣고 type: Server 라벨을 넣으면 Server2의 Pod는 항상 Server1의 Pod와 같은 Node에 생성 될것이다.

Server1은 다음과 같은 yaml 이 있다.

apiVersion: v1
kind: Pod
metadata:
 name: server1
 labels:
  type: server
spec:
 nodeSelector:
  a-team: '1'
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0

여기에 Pod Affinity를 통해 Server1과 항상 같은 Node에 생성시켜야 하는 Pod는 다음과 같다.

apiVersion: v1
kind: Pod
metadata:
 name: server2
spec:
 affinity:
  podAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:   
   - topologyKey: a-team
     labelSelector:
      matchExpressions:
      -  {key: type, operator: In, values: [server]}
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0

spec.affinity.requiredDuringSchedulingIgnoredDuringExecution.labelSelector.matchExpressions 를 통해 Server1의 Pod와 항상 같은 Pod에 생성할 수 있다.

Pod Anti Affinity

Pod Affinity 와 반대 되는 개념이다. Type: Master 라는 라벨을 가진 Master Pod가 있으면 Pod Anti Affinity 옵션에 라벨을 Type: Master 를 주고 Slave라는 Pod를 생성하면 항상 Master Pod가 있는 Node에는 Slave Pod는 생성이 안될것이다.

Master Pod를 다음과 같이 생성해보자.

apiVersion: v1
kind: Pod
metadata:
  name: master
  labels:
     type: master
spec:
  nodeSelector:
    a-team: '1'
  containers:
  - name: container
    image: kubetm/app
  terminationGracePeriodSeconds: 0

Slave Pod는 Master Pod가 있는 Node에 생성되면 안되기에 다음과 같이 yaml 을 생성할 수 있다.

apiVersion: v1
kind: Pod
metadata:
 name: slave
spec:
 affinity:
  podAntiAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:   
   - topologyKey: a-team
     labelSelector:
      matchExpressions:
      -  {key: type, operator: In, values: [master]}
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0

spec.affinity.podAntiAffinity 를 통해 master pod와 항상 다른 노드에 slave pod를 만들 수 있다.

Node 할당 제한

특정 Node에는 아무 Pod나 할당할 수 없도록 만들어놓은 제약이다. 예를 들어 GPU 가 담긴 Node에는 ai 관련된 Pod만 올라와야 하는데 웹서버 Pod를 못올라오게 막으려고 할당을 제한할수 있다.

다음과 같은 노드들이 있다고 가정해보자.

Node1: GPU 탑재
Node2
Node3

Traint

Node1에 Traint 옵션을 걸어놓으면 Pod에 Node 이름을 Node1로 지정해서 생성하려해도 생성이 되지 않는다. Node5 에 Pod를 생성을 하려면 Toleration을 설정 해야한다.

Traint 내용은 label 과 effect 로 구성할 수 있는데, label 은 우리가 여태 사용했던 label과 동일하고 effect는
NoSchedule: 아예 Pod가 스케쥴이 안됨
PreferNoSchedule: 다른 Node에 전혀 리소스가 남지 않는다면 여기에 배치
이렇게 두가지 옵션이 있다.

kubectl taint nodes k8s-node1 hw=gpu:NoSchedule

Toleration

Toleration 옵션은 Pod를 생성할 때 Node의 Traint 의 값과 항상 일치해야한다.

apiVersion: v1
kind: Pod
metadata:
 name: pod-with-toleration
spec:
 nodeSelector:
  gpu: no1
 tolerations:
 - effect: NoSchedule
   key: hw
   operator: Equal
   value: gpu
 containers:
 - name: container
   image: kubetm/app
 terminationGracePeriodSeconds: 0