24.01.28 최초 작성
Prometheus가 system metric을 수집하는걸 도와주는 도구Vagrantfile 생성, vagrant up & vagrant ssh prometheus로 가상머신 접속VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "ubuntu/focal64"
config.vm.provider "virtualbox" do |vb|
vb.memory = 1024
end
if Vagrant.has_plugin?("vagrant-vbguest")
config.vbguest.auto_update = false
end
config.vm.synced_folder ".", "/vagrant", type: "rsync", rsync__exclude: [".git/"]
config.vm.provision "shell", inline: <<-SHELL
export DEBIAN_FRONTEND=noninteractive
sudo apt -y update
sudo apt install -y ca-certificates curl gnupg libnss-mdns
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
echo "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] \
https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt -y update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker vagrant
SHELL
config.vm.define "prometheus", primary: true do |prometheus|
prometheus.vm.hostname = "prometheus.local"
prometheus.vm.network "private_network", ip: "192.168.56.200"
end
end
Prometheus, Node exporter, Grafana서비스를 실행하는 docker-compose.yml 파일 생성prometheus : 9090번 포트, Grafana : 3000번 포트 사용, 포트포워딩 필요)version: '3.2'
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- 9090:9090
nodeexporter:
image: prom/node-exporter:latest
#해당 컨테이너에 해당 파일의 내용 읽을 권한 부여
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
- ./textfile_collector:/etc/node_exporter/textfile_collector
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.ignored-mount-points="^/(sys|proc|dev|host|etc)($$|/)"'
- '--collector.textfile.directory=/etc/node_exporter/textfile_collector'
grafana:
image: grafana/grafana:latest
ports:
- 3000:3000
prometheus.yml파일 생성, docker compose up을 통해 실행global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'nodeexporter'
scrape_interval: 5s
static_configs:
- targets: ['nodeexporter:9100']
Prometheus 접속 및 확인Expression에 100 * (1 - ((node_memory_MemFree_bytes + node_memory_Cached_bytes + node_memory_Buffers_bytes) / node_memory_MemTotal_bytes)) 입력해 동작 확인)Grafana 접속 및 확인admin/admin)

앱 메뉴에 webhook 검색, Incomming Webhooks 채널에 설치
prometheus.yml 위치에 alertmanager.yml, alert.rules.yml 생성route:
receiver: 'slack-notifications'
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'slack-notifications'
slack_configs:
- send_resolved: true
username: 'msh123' #표시될 사용자 이름
channel: '#project' #표시될 채널 지정
#Incomming hook의 url
api_url: 'https://hooks.slack.com/services/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'
text: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{end}}"
groups:
- name: example
rules:
- alert: HighCPUUsage #경고 유형 이름
#cpu 상한선
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[1m])) * 100) > 80
for: 1m
annotations:
summary: "High CPU usage" #메시지
prometheus.yml 수정global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'nodeexporter'
scrape_interval: 5s
static_configs:
- targets: ['nodeexporter:9100']
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
rule_files:
- "alert.rules.yml"
docker-compose.yml에 다음 내용 추가...
alertmanager:
image: prom/alertmanager:latest
ports:
- "9093:9093"
volumes:
- ./alertmanager:/etc/alertmanager/alertmanager.yml
command:
--config.file=/etc/alertmanager/alertmanager.yml
...
volumes:
alertmanager-data:

목록의 Connections > Add new connection 선택, Prometheus 검색
Add new data source에서 Connection항목에 Prometheus가 동작하고 있는 서버의 url 입력



+ >import dashboard 선택Find and import dashboards ~에서 원하는 템플릿 선택 (1860)Prometheus 지정해 대시보드 생성



vagrant up nodex & vagrant ssh nodex로 가상머신 생성, 접속...
config.vm.define "nodex", primary: true do |nodex|
nodex.vm.hostname = "nodex.local"
nodex.vm.network "private_network", ip: "192.168.56.201"
end
end
docker run -d --net="host" --pid="host" -v "/:/host:ro,rslave" prom/node-exporter
prometheus.yml에 다음 내용 추가...
- job_name: 'remote-node-exporter'
static_configs:
- targets: ['192.168.56.201:9100']
...
Prometheus 재실행, Grafana대시보드의 Job에서 확인docker compose restart prometheus

docker-compose.yml에 다음 내용 추가docker compose up -d cadvisor로 실행...
cadvisor:
image: zcube/cadvisor:latest
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
ports:
- 8080:8080
...
Prometheus 대시보드를 생성할 때와 마찬가지로 cAdvisor를 위한 대시보드 생성 (14282)
Node Exporter를 실행한 가상머신에서 다음 명령 실행docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --publish=8080:8080 --detach=true --name=cadvisor google/cadvisor:latest
prometheus.yml에 다음 내용 추가, docker compose restart prometheus로 재실행...
- job_name: 'remote-cAdvisor'
scrape_interval: 5s
static_configs:
- targets: ['192.168.56.201:8080']
...

app.py 작성from flask import Flask, request
from PIL import Image
from torchvision import transforms, models
import torch
import io
app = Flask(__name__)
model = models.mobilenet_v2(pretrained=True)
model.eval()
transform = transform.Compose([transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])])
@app.route('/predict', methods=['POST'])
def predict():
if request.method == 'POST':
file = request.files['file']
image_bytes = file.read()
iamge = Image.open(io.BytesIO(image_bytes))
image = transform(image).unsqueeze(0)
outputs = model(image)
_, predicted = torch.max(outputs.data, 1)
return str(predicted.item())
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
requirements.txt 작성flask
torch
torchvision
pillow
Dockerfile 작성FROM python:3.8-slim-buster
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py ./
CMD ["python", "./app.py"]
docker build -t imageclassi ication:latest .를 통해 이미지 생성
docker swarm init을 실행, docker node ls를 통해 확인
docker service create --replicas 2 --name my_service --publish published=5000,target=5000 imageclassi ication:latest
run.sh 생성#! /bin/bash
IMAGES=($(ls ./test_image/*.jpg))
RANDOM_IMAGE="${IMAGES[RANDOM % ${#IMAGES[@]}]}"
curl -X POST -F file=@${RANDOM_IMAGE} http://<아이피>:5000/predict