
모니터링:
서버의 상태, 성능 등 다양한 매트릭스 정보를 지속적으로 관찰, 분석하기 위한 과정
docker-compose.yml
prometheus:
container_name: prometheus
image: prom/prometheus
volumes:
- ./prometheus:/etc/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
depends_on:
- backend
networks:
- teamd
grafana:
container_name: grafana
image: grafana/grafana
ports:
- "3000:3000"
volumes:
- ./grafana:/var/lib/grafana
depends_on:
- prometheus
- backend
networks:
- teamd
node_exporter:
container_name: node-exporter
image: prom/node-exporter
ports:
- "9100:9100"
volumes:
- "/proc:/host/proc:ro"
- "/sys:/host/sys:ro"
- "/:/rootfs:ro"
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points'
- '^/(sys|proc|dev|host|etc)($|/)'
networks:
- teamd
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.46.0
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /var/run/docker.sock:/var/run/docker.sock:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
ports:
- 8080:8080
networks:
- teamd
alertmanager:
container_name: alertmanager
image: prom/alertmanager
ports:
- 9093:9093
volumes:
- ./alertmanager/:/etc/alertmanager/
restart: always
command:
- '--config.file=/etc/alertmanager/config.yml'
- '--storage.path=/alertmanager'
networks:
- teamd
모니터링에 필요한 Prometheus, grafana, cadvisor를 구성하고
slack으로 알림을 보내주기 위해 alertmanager를 구성한다.
urls.py
urlpatterns = [
path("", include('django_prometheus.urls')),
}
처음에 url을 지정 안해줘서 localhost:8080/metrics Not found 에러가 무한 반복되었다.. 꼭 url을 지정해주자.
prometheus/prometheus.yml
global:
scrape_interval: 15s
scrape_timeout: 15s
rule_files:
- "alert.yml"
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "alertmanager:9093"
scrape_configs:
- job_name: 'backend'
static_configs:
- targets: ['backend:8000']
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node_exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
groups:
- name: alert.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: "critical"
annotations:
summary: "Endpoint {{ $labels.instance }}"
identifier: "{{ $labels.instance }}"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minutes.
alertmanager/config.yml
global:
resolve_timeout: 1m
slack_api_url: 'https://hooks.slack.com/services/****'
route:
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- send_resolved: true
슬랙의 알림을 보내기 위해 슬랙 내 Web Hook API를 구성하고 url을 지정해준다.
node-exporter를 정지 시켰을 때 발생하는 InstanceDown Slack 알림

grafana 대시보드
