Autoscaling in Kubernetes

김현수·2024년 3월 26일

Kubernetes

목록 보기

11/14

CPU based HPA

HPA(Horizontal Pod Autoscaling)은 3단계로 나뉘어진다.

Pod metrics obtaining

각 node의 Kubelet에 있는 cAdvisor에서 pod metrics와 node metrics를 수집한다. 수집한 metrics는 cluster 단위에 있는 Heapster가 통합한다. HPA가 REST API로 query를 보내어 가져온다.

Pod, node -> cAdvisor -> Heapster -> HPA

Caluating the number of pods

가져온 metrics를 가지고 pod이 얼마나 필요한지 계산한다. pod을 얼마나 늘려야 desired CPU utilization, QPS에 도달하는지 간단하게 알아볼 수 있다.

pod 수 = ceil[현재 pod 수 * ( current metric / desired metric )]

Ex) Pod 1, 2, 3에서 CPU 사용량이 각각 60%, 70%, 80%라고 하자. CPU 사용량을 50%로 맞추고 싶다면 ceil[(60 + 70 + 80) / 50] = 5이다. 그러면 Pod당 평균 사용량은 (60 + 70 + 80) / 5 = 42%가 된다.

pod의 ready 상태등의 문제로 인한 누락된 metric이 있는 경우, pod updating이 적게 일어나도록 예측한다.
위의 예시에서 Pod 4의 metric이 누락됐다면, scale up을 최소화하기 위해 Pod 4는 0%의 CPU utilization값을 가진다고 가정한다.

Updating the number of Pod

앞의 예시에서 Pod의 현재 개수는 3이고 필요한 pod의 개수는 5이므로 pod을 2개 추가한다. 2개의 pod을 추가해달라는 요청을 워크로드 리소스(Deployments, Statefulsets, ...)에게 보낸다.

구현 예제

다음은 HPA를 정의하는 yaml이다.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: php-apache
spec:
  scaleTargetRef:				// scaling하려는 워크로드 리소스
    apiVersion: apps/v1			
    kind: Deployment
    name: php-apache
  minReplicas: 1				// 최소, 최대 replica 개수
  maxReplicas: 10
  metrics:						// desired state
  - type: Resource				// 여기에선 CPU 사용량 50%를 목표로 함
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
status:
  observedGeneration: 1
  lastScaleTime: <some-time>
  currentReplicas: 1
  desiredReplicas: 1
  currentMetrics:
  - type: Resource
    resource:
      name: cpu
      current:
        averageUtilization: 0
        averageValue: 0

또한 이렇게 간단하게 만들 수도 있다.

kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

HPA based on others

Memory based Autoscaling

Kubernetes 1.8에 소개되었는데, memory는 scaling up시 기존 pod들의 memory release가 일어나야 하므로, 더 구현하기 어렵다. 관련자료를 찾아보려고 하는데 보이지 않는다..

Pod based Autoscaling

Pod based autoscaling을 하기 위해 기준이 되는 resource를 정의해야 한다. 이 경우Utilization 대신 QPS(Query per second, 초당 요청량)이나 packets-per-second등을 사용한다. 마찬가지로 다음과 같이 정의하면 된다.

spec:
 metrics:
 - type: Pods
 resource:
 metricName: qps
 targetAverageValue: 100

Object based Autoscaling

동일한 namespace 내의 어떤 Object를 이용해 기준 resource를 정의한다.
Object based Autoscaling의 경우 기존과 달리 모든 pod에서 metric을 가져오는 게 아니라, 하나의 single object만에서 metric을 가져와 계산한다.

type: Object
object:
  metric:
    name: requests-per-second
  describedObject:
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    name: main-route
  target:
    type: Value
    value: 2k

External resource based autoscaling

외부에서 가져온 다른 리소스를 기반으로 autoscaling할 수도 있다. 다음 예제의 경우 30개의 message queue(미해결된 요청 큐)당 하나씩 pod을 추가하는 작성법이다.

- type: External
  external:
    metric:
      name: queue_messages_ready
      selector:
        matchLabels:
          queue: "worker_tasks"
    target:
      type: AverageValue
      averageValue: 30