Vertical Pod Autoscaling(VPA)는 컨테이너의 리소스 request 및 limit를 자동으로 설정하는 방법을 제공함. 즉, CPU나 Memory의 부족으로 인한 성능 저하 위험을 최소화하고 리소스 낭비를 줄이는 것임.
VPA admission hook
Every pod submitted to the cluster goes through this webhook automatically which checks whether a VerticalPodAutoscaler object is referencing this pod or one of its parents (a ReplicaSet, a Deployment, etc.)
VPA recommender
Connects to the metrics-server application in the cluster, fetches historical and current usage data (CPU and memory) for each VPA-enabled pod and generates recommendations for scaling up or down the requests and limits of these pods.
VPA updater
Runs every 1 minute. If a pod is not running in the calculated recommendation range, it evicts the currently running version of this pod, so it can restart and go through the VPA admission webhook which will change the CPU and memory settings for it, before it can start.
VPA 예
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: prometheus-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: StatefulSet
name: prometheus
updatePolicy:
updateMode: Auto
evictionRequirements:
- resources: ["cpu", "memory"]
changeRequirement: TargetHigherThanRequests
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 0m
memory: 0Mi
maxAllowed:
cpu: 1
memory: 500Mi
controlledResources: ["cpu", "memory"]
controlledValues: RequestsAndLimits
updatePolicy.updateMode
시간이 지나면 VPA recommender에 의해 권장되는 resource양을 VPA 객체의 Status에 저장함. VPA는 하한과 상한을 사용하여 포드를 퇴출한다. 현재 리소스 요청이 하한보다 낮거나 상한보다 높고 리소스 요청이 대상 추정치에 비해 10% 변경되면 퇴출이 발생할 수 있음.
초기 request, limit를 동일하게 최소한으로 설정하고 순간적으로 부하를 발생하여 OOM-killed가 되면 VPA recommend가 리소스 사용을 학습하는 데 부족한 데이터로 인해 기대한 대로 resource가 변경하여 재 기동되지 않는 경우가 있다.
Cronjob에서 최초에 resource에 대한 설정을 하지 않고 VPA object를 initial mode로 설정하면 2~3번의 학습뒤에 적정한 resource가 자동으로 설정되어 실행(추방이나 중단이 발생하지 않음)됨으로 활용 가치가 있음.
Status:
Conditions:
Last Transition Time: 2020-12-23T08:03:07Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: prometheus
Lower Bound:
Cpu: 25m
Memory: 380220488
Target:
Cpu: 410m
Memory: 380258472
Uncapped Target:
Cpu: 410m
Memory: 380258472
Upper Bound:
Cpu: 704m
Memory: 464927423
memory-aggregation-interval: The length of a single interval, for which the peak memory usage is computed. Memory usage peaks are aggregated in multiples of this interval. In other words there is one memory usage sample per interval (the maximum usage over that interval)
memory-aggregation-interval-count: The number of consecutive memory-aggregation-intervals which make up the MemoryAggregationWindowLength which in turn is the period for memory usage aggregation by VPA.
즉, MemoryAggregationWindowLength = memory-aggregation-interval * memory-aggregation-interval-count
memory-histogram-decay-half-life: The amount of time it takes a historical memory usage sample to lose half of its weight. In other words, a fresh usage sample is twice as 'important' as one with age equal to the half life period.
oom-bump-up-ratio: The memory bump up ratio when OOM occurred, default is 1.2
oom-min-bump-up-bytes: The minimal increase of memory when OOM occurred in bytes, default is 100 1024 1024
참고