kubectl top은 cpu/mem을 확인할 수 있는 tool로 node나 pod의 resource 사용량을 확인할 수 있습니다.
Ref :: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_top/
kubectl top 을 사용하기 위해서는 metric-server가 필수로 설치되어야합니다.
github :: https://github.com/kubernetes-sigs/metrics-server/
FAQ: https://github.com/kubernetes-sigs/metrics-server/blob/master/FAQ.md

적용방법은 아래와 같습니다.
$ sudo kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml
kubectl top 명령어의 사용법은 아래와 같습니다.
Ref :: https://kubernetes.io/docs/reference/kubectl/generated/kubectl_top/kubectl_top_pod/
개인테스트 환경에서 확인결과
$ sudo kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
compute1 163m 2% 2413Mi 31%
compute2 170m 2% 2515Mi 33%
control1 2498m 31% 8795Mi 56%
control2 1213m 15% 8447Mi 53%
control3 807m 10% 9208Mi 58%
pod 상태
$ sudo kubectl top pod -n openstack
NAME CPU(cores) MEMORY(bytes)
barbican-api-8696645bb9-6mx4s 1m 95Mi
barbican-api-8696645bb9-tfm6g 1m 101Mi
btx-0 1m 14Mi
cinder-api-848659d6b9-h2mxs 2m 418Mi
cinder-api-848659d6b9-xs8w7 4m 430Mi
cinder-scheduler-5cb7b687f8-4c9pk 2m 140Mi
cinder-scheduler-5cb7b687f8-b5d8z 3m 126Mi
cinder-volume-0 10m 257Mi
cinder-volume-1 11m 238Mi
glance-api-0 46m 235Mi
glance-api-1 31m 235Mi
horizon-6ccbcc8cbb-527sv 10m 491Mi
horizon-6ccbcc8cbb-j4p28 3m 497Mi
ingress-0 4m 179Mi
ingress-1 2m 205Mi
ingress-2 2m 198Mi
ingress-error-pages-54759f98b7-2l2nz 1m 6Mi
keystone-api-f5bbcfc86-slfh6 27m 421Mi
keystone-api-f5bbcfc86-th4k2 19m 424Mi
libvirt-libvirt-default-wnkh5 0m 8Mi
libvirt-libvirt-default-zzhnv 4m 8Mi
mariadb-ingress-6f58cbbd6d-pns7q 4m 159Mi
mariadb-ingress-6f58cbbd6d-wh8b8 4m 148Mi
mariadb-ingress-error-pages-7585b86565-hn2cx 1m 5Mi
mariadb-server-0 62m 439Mi
mariadb-server-1 27m 352Mi
mariadb-server-2 23m 344Mi
memcached-memcached-55bf4f86d5-ck79k 2m 348Mi
neutron-dhcp-agent-default-4fb6t 154m 125Mi
neutron-dhcp-agent-default-65s9f 119m 127Mi
neutron-dhcp-agent-default-qkk82 205m 122Mi
neutron-lb-agent-default-bbxn4 20m 354Mi
neutron-lb-agent-default-d59ww 20m 367Mi
neutron-lb-agent-default-fkw4k 25m 356Mi
neutron-lb-agent-default-rjtmm 21m 348Mi
neutron-lb-agent-default-z8f7w 3m 351Mi
neutron-metadata-agent-default-2wmw8 2m 150Mi
neutron-metadata-agent-default-5pcb8 2m 152Mi
neutron-metadata-agent-default-l592l 2m 154Mi
neutron-netns-cleanup-cron-default-m2k5f 0m 200Mi
neutron-netns-cleanup-cron-default-qhhvs 0m 202Mi
neutron-netns-cleanup-cron-default-vnp2t 0m 297Mi
neutron-server-75b9f9fd5b-9mqjb 31m 745Mi
neutron-server-75b9f9fd5b-mfd2m 28m 726Mi
nova-api-metadata-6455bbbbf9-bb84k 1m 383Mi
nova-api-metadata-6455bbbbf9-kjgc6 1m 382Mi
nova-api-osapi-74478f9597-69cm8 2m 443Mi
nova-api-osapi-74478f9597-sb5wr 2m 433Mi
nova-compute-default-7fnrh 51m 166Mi
nova-compute-default-p48rz 213m 159Mi
nova-conductor-5868db877-727zs 93m 106Mi
nova-conductor-5868db877-dwjjz 83m 109Mi
nova-novncproxy-5c897647df-4gzn7 1m 106Mi
nova-novncproxy-5c897647df-p7zqk 1m 103Mi
nova-scheduler-59ddc55fdc-gs7bd 534m 298Mi
nova-scheduler-59ddc55fdc-hl9sz 152m 299Mi
placement-api-7759748b99-44vkc 2m 336Mi
placement-api-7759748b99-xn6sp 4m 340Mi
다만, 설치는 명령어 한줄로 쉬우나, 외부에서 이미지와 yaml을 가져와야하기 때문에 외부와의 통신이 가능해야합니다.
$ kubectl top
Display resource (CPU/memory) usage.
The top command allows you to see the resource consumption for nodes or pods.
This command requires Metrics Server to be correctly configured and working on the server.
Available Commands:
node Display resource (CPU/memory) usage of nodes
pod Display resource (CPU/memory) usage of pods
Usage:
kubectl top [flags] [options]
Use "kubectl top <command> --help" for more information about a given command.
Use "kubectl options" for a list of global command-line options (applies to all commands).
$ kubectl top node --help
Display resource (CPU/memory) usage of nodes.
The top-node command allows you to see the resource consumption of nodes.
Aliases:
node, nodes, no
Examples:
# Show metrics for all nodes
kubectl top node
# Show metrics for a given node
kubectl top node NODE_NAME
Options:
--no-headers=false:
If present, print output without headers
-l, --selector='':
Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2). Matching
objects must satisfy all of the specified label constraints.
--show-capacity=false:
Print node resources based on Capacity instead of Allocatable(default) of the nodes.
--sort-by='':
If non-empty, sort nodes list using specified field. The field can be either 'cpu' or 'memory'.
--use-protocol-buffers=true:
Enables using protocol-buffers to access Metrics API.
Usage:
kubectl top node [NAME | -l label] [options]
Use "kubectl options" for a list of global command-line options (applies to all commands).
그렇다면 특정 lables만 확인하여 노출하는 방법을 확인해보았습니다.
$ sudo kubectl top node --selector=openstack-control-plane=enabled
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
control1 1182m 14% 9275Mi 59%
control2 921m 11% 8821Mi 56%
control3 922m 11% 9630Mi 61%
다중 lables는 아래와 같습니다.
$ sudo kubectl top node --selector=openstack-control-plane=enabled,openstack-network-plane=enabled
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
control1 1970m 24% 9154Mi 58%
control2 726m 9% 8785Mi 56%
control3 799m 9% 9574Mi 61%
$ kubectl top pods -h
Display resource (CPU/memory) usage of pods.
The 'top pod' command allows you to see the resource consumption of pods.
Due to the metrics pipeline delay, they may be unavailable for a few minutes since pod creation.
Aliases:
pod, pods, po
Examples:
# Show metrics for all pods in the default namespace
kubectl top pod
# Show metrics for all pods in the given namespace
kubectl top pod --namespace=NAMESPACE
# Show metrics for a given pod and its containers
kubectl top pod POD_NAME --containers
# Show metrics for the pods defined by label name=myLabel
kubectl top pod -l name=myLabel
Options:
-A, --all-namespaces=false:
If present, list the requested object(s) across all namespaces. Namespace in current context is ignored even
if specified with --namespace.
--containers=false:
If present, print usage of containers within a pod.
--field-selector='':
Selector (field query) to filter on, supports '=', '==', and '!='.(e.g. --field-selector
key1=value1,key2=value2). The server only supports a limited number of field queries per type.
--no-headers=false:
If present, print output without headers.
-l, --selector='':
Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2). Matching
objects must satisfy all of the specified label constraints.
--sort-by='':
If non-empty, sort pods list using specified field. The field can be either 'cpu' or 'memory'.
--sum=false:
Print the sum of the resource usage
--use-protocol-buffers=true:
Enables using protocol-buffers to access Metrics API.
Usage:
kubectl top pod [NAME | -l label] [options]
$ kubectl top pods --containers=true -n openstack
POD NAME CPU(cores) MEMORY(bytes)
barbican-api-8696645bb9-6mx4s barbican-api 1m 101Mi
barbican-api-8696645bb9-tfm6g barbican-api 1m 101Mi
btx-0 btx 11m 14Mi
cinder-api-848659d6b9-h2mxs cinder-api 2m 415Mi
cinder-api-848659d6b9-xs8w7 cinder-api 3m 433Mi
cinder-scheduler-5cb7b687f8-4c9pk cinder-scheduler 2m 140Mi
cinder-scheduler-5cb7b687f8-b5d8z cinder-scheduler 2m 126Mi
...
$ sudo kubectl top pod --selector='component=libvirt' -n openstack
NAME CPU(cores) MEMORY(bytes)
libvirt-libvirt-default-wnkh5 0m 8Mi
libvirt-libvirt-default-zzhnv 0m 8Mi
sum만 사용할 경우
$ sudo kubectl top pod --sum -n openstack
...
nova-scheduler-59ddc55fdc-hl9sz 10m 297Mi
placement-api-7759748b99-44vkc 3m 336Mi
placement-api-7759748b99-xn6sp 2m 341Mi
________ ________
588m 15835Mi
특정 pod만 합산하여 조회하고 싶을때,
$ sudo kubectl top pod --sum -l 'component=libvirt' -n openstack
NAME CPU(cores) MEMORY(bytes)
libvirt-libvirt-default-wnkh5 0m 8Mi
libvirt-libvirt-default-zzhnv 0m 8Mi
________ ________
0m 16Mi
다만 kubetl top의 아쉬운 면은, pod resource에 대해서만 나타나고 있어 어떤 pod 가 어느 node에 schedulering 되어있는지에 대한 표기가 없는게 아쉬운 부분입니다.
stress 를 통해서 control-01에 부하를 주고 kubectl top으로 모니터링을 해보면 어떻게될까요?
$ sudo kubectl top node -l openstack-control-plane=enabled
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
control1 1706m 21% 9429Mi 60%
control2 688m 8% 8804Mi 56%
control3 877m 10% 9626Mi 61%
stress (4core, 512M 2개의 작업)
$ stress --cpu 4 --vm 2 --vm-bytes 512M --timeout 120s
node부하는 아래와 같이 확인이 가능하였습니다.
$ sudo kubectl top node -l openstack-control-plane=enabled
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
control1 6636m 82% 9983Mi 63%
control2 873m 10% 8961Mi 57%
control3 653m 8% 9627Mi 61%
node에 부하가 걸리는 만큼 그 위에 떠있는 pod 또한 부하를 같이 받는 모습입니다.
$ sudo kubectl top pod -A --sort-by=cpu
NAMESPACE NAME CPU(cores) MEMORY(bytes)
openstack nova-scheduler-59ddc55fdc-gs7bd 540m 295Mi
openstack nova-compute-default-7fnrh 156m 156Mi
kube-system kube-apiserver-control2 102m 849Mi
kube-system kube-apiserver-control3 72m 708Mi
openstack neutron-server-75b9f9fd5b-9mqjb 63m 747Mi
cloudpc redis-0 58m 35Mi
...
pod에 부하를 주기 위해 아래와 같이 pod에서 stress를 걸도록 하겠습니다.
$ vi stress-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: stress-pod
spec:
containers:
- name: stress-container
image: polinux/stress
command: ["stress"]
args: ["--cpu", "4", "--vm", "2", "--vm-bytes", "512M", "--timeout", "120s"]
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 3000
seccompProfile:
type: RuntimeDefault
$ sudo kubectl apply -f stress-pod.yaml
top pod
$ sudo kubectl top pod -A --sort-by=cpu
NAMESPACE NAME CPU(cores) MEMORY(bytes)
default stress-pod 5935m 188Mi
openstack nova-compute-default-p48rz 236m 156Mi
openstack nova-compute-default-7fnrh 192m 163Mi
openstack nova-conductor-5868db877-727zs 143m 106Mi
kube-system kube-apiserver-control3 80m 708Mi
cloudpc redis-2 63m 37Mi
openstack glance-api-0 60m 235Mi
kube-system kube-apiserver-control2 58m 805Mi
cloudpc redis-1 56m 28Mi
kube-system kube-apiserver-control1 53m 712Mi
cloudpc redis-0 51m 35Mi
kube-system calico-node-pd49p 47m 208Mi
top node
$ sudo kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
compute1 181m 2% 2428Mi 31%
compute2 6133m 76% 2764Mi 36%
control1 1693m 21% 9202Mi 58%
control2 775m 9% 8527Mi 54%
control3 717m 8% 9364Mi 59%
metric-server의 부하율은 github에서 아래와 같이 나타나고있습니다.
Resource efficiency, using 1 mili core of CPU and 2 MB of memory for each node in a cluster.
또한 metric-server의 chart는 아래와 같은데 이곳에서 metric 수집하는 시간을 볼 수 있었습니다.
metric server는 고가용성 yaml파일도 제공하고있습니다.
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
rbac.authorization.k8s.io/aggregate-to-admin: "true"
rbac.authorization.k8s.io/aggregate-to-edit: "true"
rbac.authorization.k8s.io/aggregate-to-view: "true"
name: system:aggregated-metrics-reader
rules:
- apiGroups:
- metrics.k8s.io
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server-auth-reader
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: metrics-server:system:auth-delegator
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:auth-delegator
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:metrics-server
subjects:
- kind: ServiceAccount
name: metrics-server
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
ports:
- name: https
port: 443
protocol: TCP
targetPort: https
selector:
k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
replicas: 2
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: metrics-server
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
k8s-app: metrics-server
namespaces:
- kube-system
topologyKey: kubernetes.io/hostname
containers:
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
image: registry.k8s.io/metrics-server/metrics-server:v0.7.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 10250
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
volumeMounts:
- mountPath: /tmp
name: tmp-dir
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
minAvailable: 1
selector:
matchLabels:
k8s-app: metrics-server
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
labels:
k8s-app: metrics-server
name: v1beta1.metrics.k8s.io
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
version: v1beta1
versionPriority: 100