헬름 기반 설치 - prometheus stack

문정환·2023년 9월 25일

#0. 사전 환경 구성

## K8S Spec
1. OS: Ubuntu 20.04.5 LTS"
2. k8s provisining: kubespray-v2.22.0
3. k8s 버전:
[spkr@ubun20-01 ~ (⎈|ubun01:default)]$ kgn
NAME        STATUS   ROLES           AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ubun20-01   Ready    control-plane   26d   v1.25.0   192.168.119.131   <none>        Ubuntu 20.04.5 LTS   5.4.0-156-generic   containerd://1.7.1
ubun20-02   Ready    control-plane   26d   v1.25.0   192.168.119.132   <none>        Ubuntu 20.04.5 LTS   5.4.0-156-generic   containerd://1.7.1
ubun20-03   Ready    control-plane   26d   v1.25.0   192.168.119.133   <none>        Ubuntu 20.04.5 LTS   5.4.0-156-generic   containerd://1.7.1

## 사전 작업 필요
1. OpenEBS - 스토리지 클래스 설치
[spkr@ubun20-01 ~ (⎈|ubun01:nginx)]$ k get sc
NAME               PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
openebs-device     openebs.io/local   Delete          WaitForFirstConsumer   false                  22h
openebs-hostpath   openebs.io/local   Delete          WaitForFirstConsumer   false                  22h

2. MetalLB - Loadbalace 사용 필요
[spkr@ubun20-01 metallb-0.12.1 (⎈|ubun01:metallb)]$ kgp
NAME                                  READY   STATUS    RESTARTS   AGE   IP                NODE        NOMINATED NODE   READINESS GATES
metallb-controller-7898b886f6-jsdp8   1/1     Running   0          51s   10.233.104.224    ubun20-01   <none>           <none>
metallb-speaker-4mnqq                 1/1     Running   0          51s   192.168.119.132   ubun20-02   <none>           <none>
metallb-speaker-j8wpm                 1/1     Running   0          51s   192.168.119.133   ubun20-03   <none>           <none>
metallb-speaker-s6lgz                 1/1     Running   0          51s   192.168.119.131   ubun20-01   <none>           <none>

#1. 모니터링 개요

1. 쿠버네티스 환경에서 모니터링 대상 3가지

https://www.cncf.io/announcements/2018/08/09/prometheus-graduates/

노드와 컨테이너 자원 사용량
- 노드와 컨테이너의 CPU, 메모리, 네트워크, 스토리지 등
클러스터 모니터링
- 쿠버네티스 오브젝트의 전체 수량, 종류 등 전반적인 현황과 파드 재시작, 이벤트 메시지 등 장애와 관련된 모니터링 등
애플리케이션 모니터링
- 컨트롤 플레인 파드(etcd, apiserver, coreDNS), 웹, 데이터베이스 등 개발자가 추가로 설치한 애플리케이션
- 페이지 응답 속도, 세션 수, 데이터베이스 쿼리 응답 속도 등

2. 프로메테우스의 특징

https://prometheus.io/docs/introduction/overview/#features

서비스 디스커버리
- 동적으로 확장되고 축소되는 k8s 환경에서 서비스 엔드포인트로 등록해서 자동 변경 내역 감지
Pull 방식
- 에이전트를 설치하고 중앙 서버로 모니터링 정보를 전달하는(Push) 방식이 아닌 중앙의 프로메테우스 서버가 모니터링 대상의 정보를 직접 가져오는 방식
다양한 애플리케이션 익스포트 제공
- HAProxy, MySQL, Elastic 등 거의 모든 애플리케이션이 프로메테우스에서 사용할 수 있는 메트릭 정보를 제공
다양한 레이블 지원
- 메트릭에 다양한 레이블 추가해서 사용자가 원하는 메트릭만 필터링 조회 가능
자체 검색 언어인 PromQL 제공
- 다양한 레이블 사용이 가능한 메트릭을 조회할 수 있도록 프로메테우스는 자체 검색 언어를 제공
- 그라파나에서도 동일한 PromQL(Prometheus Query Language)로 다양하게 자료 조회해서 그래프로 나타 낼 수 있음
- 로깅 솔루션인 로키에서도 비슷한 검색 언어 LogQL를 사용
시계열 데이터베이스(TSDB, Time-Series DB) 사용
- 시간에 따라 순차적으로 저장하는 시계열 데이테베이스를 사용

#03. 헬름 차트 기반의 프로메테우스-스택 설치

모니터링 대상이 되는 서비스가 성능과 관련된 정보를 메트릭 제공
중앙 모니터링 시스템에서 저장
그라파나(시각화), 서비스 경고(alert) 발생할 경우 이를 채널로 담당자에게 전달(얼럿 매니저) 요소 포함
차트를 이용하면 사용자는 모니터링 메트릭 추가, 커스텀 대시보드 구성 등의 작업을 하지 않고도 실제 운영에 바로 적용 가능한 수준의 모니터링 시스템 구축

# 프로메테우스-스택 리포지토리 추가
[spkr@ubun20-01 ~ (⎈|ubun01:default)]$ cd prometheus-stack/
[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories

[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ helm repo list
NAME                    URL
bitnami                 https://charts.bitnami.com/bitnami
prometheus-community    **https://prometheus-community.github.io/helm-charts**

# 프로메테우스-스택 다운로드
[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ helm pull prometheus-community/kube-prometheus-stack
[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ tar xvfz kube-prometheus-stack-50.3.1.tgz
[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ rm -f kube-prometheus-stack-50.3.1.tgz
[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ mv kube-prometheus-stack/ kube-prometheus-stack-50.3.1
[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ ll
total 12
drwxrwxr-x  3 spkr spkr 4096 Sep  6 15:10 ./
drwxr-xr-x 11 spkr spkr 4096 Sep  6 15:06 ../
drwxrwxr-x  4 spkr spkr 4096 Sep  6 15:09 kube-prometheus-stack-50.3.1/

[spkr@ubun20-01 prometheus-stack (⎈|ubun01:default)]$ cd kube-prometheus-stack-50.3.1/
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:default)]$ ll
total 232
drwxrwxr-x 4 spkr spkr   4096 Sep  6 15:09 ./
drwxrwxr-x 3 spkr spkr   4096 Sep  6 15:10 ../
-rw-r--r-- 1 spkr spkr    615 Sep  5 10:54 Chart.lock
drwxrwxr-x 7 spkr spkr   4096 Sep  6 15:09 charts/
-rw-r--r-- 1 spkr spkr   2073 Sep  5 10:54 Chart.yaml
-rw-r--r-- 1 spkr spkr    656 Sep  5 10:54 CONTRIBUTING.md
-rw-r--r-- 1 spkr spkr    398 Sep  5 10:54 .helmignore
-rw-r--r-- 1 spkr spkr  62965 Sep  5 10:54 README.md
drwxrwxr-x 8 spkr spkr   4096 Sep  6 15:09 templates/
-rw-r--r-- 1 spkr spkr 138754 Sep  5 10:54 values.yaml
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:default)]$ cp values.yaml my-values.yaml

# 프로메테우스-스택 템플릿 변수 수정
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:default)]$ vi my-values.yaml
alertmanager:
  ## Deploy alertmanager
  ##
  enabled: true

## Service type
    ##
    type: NodePort

kubeApiServer:
  enabled: true
  tlsConfig:
    serverName: kubernetes
    insecureSkipVerify: false
  serviceMonitor:

storageSpec: {}
    ## Using PersistentVolumeClaim
    ##
      volumeClaimTemplate:
        spec:
          storageClassName: openebs-hostpath
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 15Gi

# 그라파나 템플릿 변수 수정
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:openebs)]$ ls charts/
crds  grafana  kube-state-metrics  prometheus-node-exporter  prometheus-windows-exporter
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:openebs)]$ vi charts/grafana/values.yaml
service:
  enabled: true
  type: LoadBalancer
  port: 80
  targetPort: 3000

persistence:
  type: pvc
  enabled: false
  storageClassName: openebs-hostpath

[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:openebs)]$ k create ns monitoring
namespace/monitoring created
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:openebs)]$ k ns monitoring
Context "ubun01" modified.
Active namespace is "monitoring".

[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ helm install prometheus -f my-values.yaml .
NAME: prometheus
LAST DEPLOYED: Wed Sep  6 15:20:53 2023
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=prometheus"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ helm list
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
prometheus      monitoring      1               2023-09-06 15:20:53.138685417 +0000 UTC deployed        **kube-prometheus-stack-50.3.1**    v0.67.1

1. 프로메테우스 스택 - 차트 설명

Chart.yaml - 차트의 대한 정보
- https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/Chart.yaml
values.yaml - 기본 템플릿의 변수
- https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml

Untitled

2. 프로메테우스 스택 - 아키텍처

https://prometheus.io/docs/introduction/overview/#architecture

Untitled

[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ kgp
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                NODE        NOMINATED NODE   READINESS GATES
alertmanager-prometheus-kube-prometheus-**alertmanager-0**   2/2     Running   0          4m20s   10.233.104.211    ubun20-01   <none>           <none>
prometheus-**grafana**-7dd9ccd99b-zwdgv                      3/3     Running   0          4m35s   10.233.109.140    ubun20-02   <none>           <none>
prometheus-kube-**prometheus-operator**-6bdbffdc8b-5dvqh     1/1     Running   0          4m35s   10.233.104.209    ubun20-01   <none>           <none>
prometheus-**kube-state-metrics**-648b666689-bjbv6           1/1     Running   0          4m35s   10.233.70.11      ubun20-03   <none>           <none>
prometheus-prometheus-kube-prometheus-**prometheus-0**       2/2     Running   0          4m19s   10.233.70.13      ubun20-03   <none>           <none>
prometheus-prometheus-**node-exporter**-kd7pd                1/1     Running   0          4m35s   192.168.119.131   ubun20-01   <none>           <none>
prometheus-prometheus-node-exporter-rnxx7                1/1     Running   0          4m34s   192.168.119.132   ubun20-02   <none>           <none>
prometheus-prometheus-node-exporter-svsbz                1/1     Running   0          4m34s   192.168.119.133   ubun20-03   <none>           <none>

alertmanager
- 사전 정의한 정책 기반으로 시스템 경고 메시지 생성
- 얼럿매니저로 전달되고 사후 처리 작업을 거쳐서 지정된 이메일, 슬랙 등 채널로 전송
grafana
- 시각화 솔루션인 그라파나로 다양한 그래프 차트 생성
prometheus-0
- 스테이트풀셋으로 배포, 모니터링이 되는 파드는 ‘exporter’라는 별도의 사이카 형식의 컨테이너로 모니터링 대상이 되는 메트릭을 노출
- 해당 메트릭을 프로메테우스 파드는 full 방식으로 가져와 내부의 TSDB 저장
node-exporter
- 데몬셋으로 설치되어 모니터링 대상되는 전체 노드에 자동으로 설치
- 물리 노드에 대한 자원 사용량 정보를 메트릭 형태로 변경해서 노출
prometheus-operator
- 시스템 경고 메시지 정책, 애플리케이션 모니터링 대상 추가 등의 작업을 편리하게 Custom Resource 지원
kube-state-metrics
- k8s 상태를 메트릭으로 변환하는 파드
- k8s API 서버와 통신해서 각 오브젝트의 상태를 메트릭 형태로 변환해 프로메테우스가 수집할 수 있게 한다.

[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ kgs
NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                         AGE   SELECTOR
alertmanager-operated                     ClusterIP      None            <none>            9093/TCP,9094/TCP,9094/UDP      37m   app.kubernetes.io/name=alertmanager
prometheus-grafana                        LoadBalancer   10.233.33.147   192.168.119.151   80:30173/TCP                    37m   app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=grafana
prometheus-kube-prometheus-alertmanager   ClusterIP      10.233.37.149   <none>            9093/TCP,8080/TCP               37m   alertmanager=prometheus-kube-prometheus-alertmanager,app.kubernetes.io/name=alertmanager
prometheus-kube-prometheus-operator       ClusterIP      10.233.34.231   <none>            443/TCP                         37m   app=kube-prometheus-stack-operator,release=prometheus
prometheus-kube-prometheus-prometheus     NodePort       10.233.31.130   <none>            9090:30090/TCP,8080:31734/TCP   37m   app.kubernetes.io/name=prometheus,operator.prometheus.io/name=prometheus-kube-prometheus-prometheus
prometheus-kube-state-metrics             ClusterIP      10.233.6.21     <none>            8080/TCP                        37m   app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=kube-state-metrics
prometheus-operated                       ClusterIP      None            <none>            9090/TCP                        37m   app.kubernetes.io/name=prometheus
**prometheus-prometheus-node-exporter**       ClusterIP      10.233.26.92    <none>            9100/TCP                        37m   app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=prometheus-node-exporter

[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ k port-forward svc/prometheus-prometheus-node-exporter 8080:9100
Forwarding from 127.0.0.1:8080 -> 9100
Forwarding from [::1]:8080 -> 9100

[spkr@ubun20-01 ~ (⎈|ubun01:monitoring)]$ curl localhost:8080/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.498e-05
go_gc_duration_seconds{quantile="0.25"} 3.4287e-05
go_gc_duration_seconds{quantile="0.5"} 3.7335e-05
go_gc_duration_seconds{quantile="0.75"} 4.4596e-05
go_gc_duration_seconds{quantile="1"} 0.000162875
go_gc_duration_seconds_sum 0.004808126
go_gc_duration_seconds_count 106
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.20.6"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 2.559208e+06

자신의 메트릭 정보를 웹서버의 /metrics 엔드포인트 경로에 노출
해당 정보를 프로메테우스 서버가 HTTP GET 방식으로 가져옴
프로메테우스는 노드 익스포터의 서비스 이름(prometheus-prometheus-node-exporter)과 9100번 포트를 사용해 접속

[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ kgs
NAME                                      TYPE           CLUSTER-IP      EXTERNAL-IP       PORT(S)                         AGE   SELECTOR
alertmanager-operated                     ClusterIP      None            <none>            9093/TCP,9094/TCP,9094/UDP      43m   app.kubernetes.io/name=alertmanager
prometheus-grafana                        LoadBalancer   10.233.33.147   192.168.119.151   80:30173/TCP                    43m   app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=grafana
prometheus-kube-prometheus-alertmanager   ClusterIP      10.233.37.149   <none>            9093/TCP,8080/TCP               43m   alertmanager=prometheus-kube-prometheus-alertmanager,app.kubernetes.io/name=alertmanager
prometheus-kube-prometheus-operator       ClusterIP      10.233.34.231   <none>            443/TCP                         43m   app=kube-prometheus-stack-operator,release=prometheus
prometheus-kube-prometheus-prometheus     NodePort       10.233.31.130   <none>            9090:**30090**/TCP,8080:31734/TCP   43m   app.kubernetes.io/name=prometheus,operator.prometheus.io/name=prometheus-kube-prometheus-prometheus
prometheus-kube-state-metrics             ClusterIP      10.233.6.21     <none>            8080/TCP                        43m   app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=kube-state-metrics
prometheus-operated                       ClusterIP      None            <none>            9090/TCP                        43m   app.kubernetes.io/name=prometheus
prometheus-prometheus-node-exporter       ClusterIP      10.233.26.92    <none>            9100/TCP                        43m   app.kubernetes.io/instance=prometheus,app.kubernetes.io/name=prometheus-node-exporter
[spkr@ubun20-01 kube-prometheus-stack-50.3.1 (⎈|ubun01:monitoring)]$ kgn -o wide
NAME        STATUS   ROLES           AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
ubun20-01   Ready    control-plane   26d   v1.25.0   192.168.119.131   <none>        Ubuntu 20.04.5 LTS   5.4.0-156-generic   containerd://1.7.1
ubun20-02   Ready    control-plane   26d   v1.25.0   192.168.119.132   <none>        Ubuntu 20.04.5 LTS   5.4.0-156-generic   containerd://1.7.1
ubun20-03   Ready    control-plane   26d   v1.25.0   192.168.119.133   <none>        Ubuntu 20.04.5 LTS   5.4.0-156-generic   containerd://1.7.

Untitled

[Status] → [Configuration]
- /etc/prometheus/config_out/prometheus.env.yaml
- 설치할 때 헬름 템플릿 설정 파일(my-values.yaml) 설정에서 내용 추가나 수정 가능

Untitled

[Status] → [Targets]
- 현재 프로메테우스가 가져오는 전체 메트릭 대상
- 기본 헬름 차트는 노드-익스포터, cAdvisor, 쿠버네티스 전반적인 현황(kube-state-metircs)
- 쿠버네티스트 마스터 컴포넌트(apiserver, coredns, sheduler 등), 그라파나, 얼럿매니저 등 다양한 메트릭 포함
- 어플리케이션 메트릭 정보가 추가되면 프로메테우스 타깃 화면에서 확인 가능

Untitled

[Graph] → 1 - avg(rate(node_cpu_seconds_total{mode="idle"}[1m]))
- 1 - 유휴 CPU 사용량
  - 전체 CPU 사용량 구하기 위해 ‘1’에서 유휴(mode=”idle”) 사용량 퍼센트 수치 제외
- avg(rate(..)[1m]
  - [ ]에 시간을 입력 시 해당 시간 동안의 결과를 조회, rate 함수로 해당 시간 동안의 변화량
- node_cpu_seconds_total{mode=”idle”}
  - 노드 사용한 전체 CPU 사용량 중에서 mode=”idle” 메트릭 변수로 필터링한 결과만 조회

문정환

All-rounder

이전 포스트

헬름 - 라이프 사이클 및 APP 설치

다음 포스트

헬름 기반 설치 - prometheus stack

#0. 사전 환경 구성

#1. 모니터링 개요

1. 쿠버네티스 환경에서 모니터링 대상 3가지

2. 프로메테우스의 특징

#03. 헬름 차트 기반의 프로메테우스-스택 설치

1. 프로메테우스 스택 - 차트 설명

2. 프로메테우스 스택 - 아키텍처

헬름 - 라이프 사이클 및 APP 설치

헬름 기반 설치 - prometheus stack

0개의 댓글