



역할:
P2P 네트워크의 "두뇌"입니다. 클라이언트(dfdaemon)가 이미지를 요청하면, 어떤 피어(다른 노드의 dfdaemon)에게서 데이터 조각(chunk)을 받아올지 최적의 경로를 결정하고 할당합니다.
운영 고려사항:
전체 P2P 네트워크의 성능과 안정성을 좌우하므로, 고가용성(HA)을 위해 반드시 2개 이상의 Pod으로 운영해야 합니다.
역할:
P2P 네트워크의 "최초의 공급자"입니다. 클러스터 내 누구도 가지고 있지 않은 새로운 이미지가 요청되면, Seed Peer가 가장 먼저 원본 레지스트리(nexus.com)에서 이미지를 다운로드하여 P2P 네트워크에 "씨앗"을 뿌립니다.
운영 고려사항:
원본 레지스트리와의 통신이 많으므로, 안정적인 네트워크 환경에 배치하는 것이 좋습니다. 일반적으로 1~2개의 Pod으로 운영합니다.
역할:
모든 워커 노드에 배포되는 "로컬 프록시"입니다. containerd가 이미지를 Pull할 때, 원본 레지스트리가 아닌 자신의 노드에 설치된 dfdaemon(주로 127.0.0.1:65001)을 통해 이미지를 요청하도록 설정됩니다.
dfdaemon은 Scheduler에게 피어 목록을 받아 여러 노드에서 데이터 조각을 병렬로 다운로드한 후, 완성된 이미지를 containerd에 전달합니다.
운영 고려사항:
모든 노드에 설치되어야 하므로 DaemonSet으로 배포됩니다. 노드의 네트워크를 직접 사용하기 위해 hostNetwork: true 설정이 권장됩니다.
역할:
Dragonfly 클러스터의 "운영 및 모니터링 허브"입니다. 웹 UI를 통해 P2P 전송 현황, 트래픽 통계, 피어 상태 등을 실시간으로 모니터링하고 관리할 수 있습니다.
운영 고려사항:
운영 편의성을 위해 반드시 설치하는 것을 권장합니다. Scheduler와 마찬가지로 고가용성을 위해 2개 이상의 Pod으로 운영합니다.
[Kubernetes Cluster - 100 Nodes]
├── Dragonfly
│ ├── Manager (3 replicas) → External MySQL
│ ├── Scheduler (3 replicas) → External Redis
│ ├── Seed Peer (5 replicas)
│ └── Client (100 DaemonSet)
├── External Services
│ ├── MySQL (StatefulSet)
│ └── Redis (StatefulSet)
└── Monitoring
├── Prometheus
└── Grafana
# 1. Namespace 생성
kubectl create namespace dragonfly-infra
# 2. MySQL 배포
kubectl apply -f mysql-deployment.yaml
# 3. Redis 배포
kubectl apply -f redis-deployment.yaml
# 4. 배포 확인
kubectl get pods -n dragonfly-infra -w
# 예상 출력:
# NAME READY STATUS RESTARTS AGE
# mysql-0 1/1 Running 0 2m
# redis-0 1/1 Running 0 2m
# mysql-exporter-* 1/1 Running 0 2m
# redis-exporter-* 1/1 Running 0 2m
# 5. MySQL 접속 테스트
kubectl exec -it mysql-0 -n dragonfly-infra -- mysql -udragonfly -pDragonflyPassword123! dragonfly -e "SELECT 1;"
# 6. Redis 접속 테스트
kubectl exec -it redis-0 -n dragonfly-infra -- redis-cli -a RedisPassword123! ping
# 출력: PONG
# ==========================================
# Dragonfly Helm Chart v1.4.15
# App Version: 2.3.3
# Kubernetes: v1.33.3
# 가동계 폐쇄망 환경 (모니터링 전체 활성화)
# ==========================================
# ==========================================
# Global 설정
# ==========================================
global:
# 폐쇄망 Private Registry
imageRegistry: "nexus.com"
imagePullSecrets: []
# ==========================================
# Manager (중앙 관리)
# ==========================================
manager:
enable: true
replicas: 3
image:
repository: nexus.com/dragonflyoss/manager
tag: v2.3.3
pullPolicy: IfNotPresent
# 리소스
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
# Service
service:
type: ClusterIP
ports:
- name: http
port: 8080
targetPort: 8080
- name: grpc
port: 65003
targetPort: 65003
# 메트릭 활성화 (모니터링)
metrics:
enable: true
port: 8000
path: /metrics
serviceMonitor:
enable: true
interval: 30s
scrapeTimeout: 10s
labels:
release: prometheus
# 고가용성
podDisruptionBudget:
minAvailable: 2
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-manager
topologyKey: kubernetes.io/hostname
# ==========================================
# Scheduler
# ==========================================
scheduler:
enable: true
replicas: 3
image:
repository: nexus.com/dragonflyoss/scheduler
tag: v2.3.3
pullPolicy: IfNotPresent
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
service:
type: ClusterIP
ports:
- name: http
port: 8002
targetPort: 8002
# 메트릭 활성화
metrics:
enable: true
port: 8000
path: /metrics
serviceMonitor:
enable: true
interval: 30s
scrapeTimeout: 10s
labels:
release: prometheus
# 스케줄러 설정
config:
scheduler:
algorithm: default
backSourceCount: 3
filterParentLimit: 40
manager:
schedulerClusterID: 1
podDisruptionBudget:
minAvailable: 2
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-scheduler
topologyKey: kubernetes.io/hostname
# ==========================================
# Seed Peer
# ==========================================
seedClient:
enable: true
replicas: 5
image:
repository: nexus.com/dragonflyoss/client
tag: v0.1.118
pullPolicy: IfNotPresent
resources:
requests:
cpu: 2000m
memory: 4Gi
limits:
cpu: 4000m
memory: 8Gi
# Persistence 필수
persistence:
enable: true
size: 200Gi
storageClass: "local-path"
accessModes:
- ReadWriteOnce
service:
type: ClusterIP
# 메트릭 활성화
metrics:
enable: true
port: 8000
path: /metrics
serviceMonitor:
enable: true
interval: 30s
scrapeTimeout: 10s
labels:
release: prometheus
config:
seedPeer:
enable: true
type: "super"
clusterID: 1
proxy:
registryMirror:
addr: https://nexus.com
disableBackToSource: false
security:
insecure: false
cacert: "/etc/containerd/certs.d/nexus.com/ca.crt"
cert: "/etc/containerd/certs.d/nexus.com/client.crt"
key: "/etc/containerd/certs.d/nexus.com/client.key"
download:
concurrentPieceCount: 16
pieceDownloadTimeout: 60s
rateLimit: 0
upload:
rateLimit: 0
maxConcurrency: 200
storage:
dir: /var/lib/dragonfly
taskExpireTime: 24h
diskGCThreshold: 85
diskGCInterval: 30s
writeBufferSize: 16777216
readBufferSize: 16777216
volumeMounts:
- name: containerd-certs
mountPath: /etc/containerd/certs.d
readOnly: true
volumes:
- name: containerd-certs
hostPath:
path: /etc/containerd/certs.d
type: Directory
podDisruptionBudget:
minAvailable: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-seed-client
topologyKey: kubernetes.io/hostname
# ==========================================
# Client (DaemonSet)
# ==========================================
client:
enable: true
image:
repository: nexus.com/dragonflyoss/client
tag: v0.1.118
pullPolicy: IfNotPresent
resources:
requests:
cpu: 1000m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
persistence:
enable: true
size: 30Gi
storageClass: "local-path"
accessModes:
- ReadWriteOnce
# 메트릭 활성화
metrics:
enable: true
port: 8000
path: /metrics
serviceMonitor:
enable: true
interval: 30s
scrapeTimeout: 10s
labels:
release: prometheus
config:
proxy:
registryMirror:
addr: https://nexus.com
listenAddress: "0.0.0.0:65001"
disableBackToSource: false
security:
insecure: false
cacert: "/etc/containerd/certs.d/nexus.com/ca.crt"
cert: "/etc/containerd/certs.d/nexus.com/client.crt"
key: "/etc/containerd/certs.d/nexus.com/client.key"
# 여러 Registry 지원
proxies:
- regx: "nexus.com/*"
useHTTPS: true
direct: true
- regx: "docker.io/*"
useHTTPS: true
direct: true
- regx: "gcr.io/*"
useHTTPS: true
direct: true
- regx: "ghcr.io/*"
useHTTPS: true
direct: true
- regx: "k8s.gcr.io/*"
useHTTPS: true
direct: true
- regx: "quay.io/*"
useHTTPS: true
direct: true
- regx: "registry.k8s.io/*"
useHTTPS: true
direct: true
download:
concurrentPieceCount: 10
pieceDownloadTimeout: 30s
downloadTimeout: 10m
downloadRetryCount: 3
downloadRetryBackoff: 1s
storage:
dir: /var/lib/dragonfly
taskExpireTime: 6h
diskGCThreshold: 90
diskGCInterval: 15s
writeBufferSize: 8388608
readBufferSize: 8388608
volumeMounts:
- name: containerd-certs
mountPath: /etc/containerd/certs.d
readOnly: true
volumes:
- name: containerd-certs
hostPath:
path: /etc/containerd/certs.d
type: Directory
enableHost: true
# ==========================================
# dfinit (containerd 자동 설정)
# ==========================================
dfinit:
enable: true
restartContainerRuntime: true
image:
repository: nexus.com/dragonflyoss/dfinit
tag: v0.1.118
pullPolicy: IfNotPresent
config:
containerRuntime:
containerd:
configPath: /etc/containerd/config.toml
registries:
- hostNamespace: nexus.com
serverAddr: https://nexus.com
capabilities: ['pull', 'resolve']
- hostNamespace: docker.io
serverAddr: https://registry-1.docker.io
capabilities: ['pull', 'resolve']
- hostNamespace: gcr.io
serverAddr: https://gcr.io
capabilities: ['pull', 'resolve']
- hostNamespace: ghcr.io
serverAddr: https://ghcr.io
capabilities: ['pull', 'resolve']
- hostNamespace: k8s.gcr.io
serverAddr: https://k8s.gcr.io
capabilities: ['pull', 'resolve']
- hostNamespace: quay.io
serverAddr: https://quay.io
capabilities: ['pull', 'resolve']
- hostNamespace: registry.k8s.io
serverAddr: https://registry.k8s.io
capabilities: ['pull', 'resolve']
# ==========================================
# 외부 MySQL 연결
# ==========================================
mysql:
enable: false
externalMysql:
migrate: true
host: mysql.dragonfly-infra.svc.cluster.local
port: 3306
username: dragonfly
password: "DragonflyPassword123!"
database: dragonfly
maxOpenConns: 200
maxIdleConns: 50
connMaxLifetime: 3600
# ==========================================
# 외부 Redis 연결
# ==========================================
redis:
enable: false
externalRedis:
addrs:
- redis.dragonfly-infra.svc.cluster.local:6379
password: "RedisPassword123!"
db: 0
brokerDB: 1
backendDB: 2
# ==========================================
# 보안
# ==========================================
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
#!/bin/bash
# ==========================================
# Dragonfly 설치 스크립트 (변수 기반)
# ==========================================
set -e
# ==========================================
# 설정 변수 (여기만 수정!)
# ==========================================
# 기본 설정
NAMESPACE="dragonfly-system"
INFRA_NAMESPACE="dragonfly-infra"
RELEASE_NAME="dragonfly"
CHART_VERSION="1.4.15"
CHART_PATH="./dragonfly-1.4.15.tgz"
# 이미지 설정
IMAGE_REGISTRY="nexus.com"
MANAGER_IMAGE="${IMAGE_REGISTRY}/dragonflyoss/manager"
SCHEDULER_IMAGE="${IMAGE_REGISTRY}/dragonflyoss/scheduler"
CLIENT_IMAGE="${IMAGE_REGISTRY}/dragonflyoss/client"
DFINIT_IMAGE="${IMAGE_REGISTRY}/dragonflyoss/dfinit"
IMAGE_TAG_MANAGER="v2.3.3"
IMAGE_TAG_SCHEDULER="v2.3.3"
IMAGE_TAG_CLIENT="v0.1.118"
IMAGE_TAG_DFINIT="v0.1.118"
# 스토리지 설정
STORAGE_CLASS="local-path"
SEED_PEER_STORAGE_SIZE="200Gi"
CLIENT_STORAGE_SIZE="30Gi"
# 리소스 설정
MANAGER_REPLICAS=3
SCHEDULER_REPLICAS=3
SEED_PEER_REPLICAS=5
# Manager 리소스
MANAGER_CPU_REQUEST="1000m"
MANAGER_CPU_LIMIT="2000m"
MANAGER_MEM_REQUEST="2Gi"
MANAGER_MEM_LIMIT="4Gi"
# Scheduler 리소스
SCHEDULER_CPU_REQUEST="1000m"
SCHEDULER_CPU_LIMIT="2000m"
SCHEDULER_MEM_REQUEST="2Gi"
SCHEDULER_MEM_LIMIT="4Gi"
# Seed Peer 리소스
SEED_CPU_REQUEST="2000m"
SEED_CPU_LIMIT="4000m"
SEED_MEM_REQUEST="4Gi"
SEED_MEM_LIMIT="8Gi"
# Client 리소스
CLIENT_CPU_REQUEST="1000m"
CLIENT_CPU_LIMIT="2000m"
CLIENT_MEM_REQUEST="1Gi"
CLIENT_MEM_LIMIT="2Gi"
# MySQL 설정
MYSQL_HOST="mysql.${INFRA_NAMESPACE}.svc.cluster.local"
MYSQL_PORT="3306"
MYSQL_USERNAME="dragonfly"
MYSQL_PASSWORD="DragonflyPassword123!"
MYSQL_DATABASE="dragonfly"
# Redis 설정
REDIS_HOST="redis.${INFRA_NAMESPACE}.svc.cluster.local"
REDIS_PORT="6379"
REDIS_PASSWORD="RedisPassword123!"
REDIS_DB="0"
REDIS_BROKER_DB="1"
REDIS_BACKEND_DB="2"
# Registry 설정
REGISTRY_ADDR="https://nexus.com"
REGISTRY_CERT_PATH="/etc/containerd/certs.d/nexus.com"
# Containerd 설정
CONTAINERD_CONFIG_PATH="/etc/containerd/config.toml"
CONTAINERD_CERTS_PATH="/etc/containerd/certs.d"
# 모니터링
ENABLE_METRICS="true"
METRICS_PORT="8000"
METRICS_PATH="/metrics"
SERVICE_MONITOR_ENABLED="true"
SERVICE_MONITOR_INTERVAL="30s"
PROMETHEUS_LABEL="prometheus"
# dfinit 설정
DFINIT_ENABLED="true"
DFINIT_RESTART_CONTAINERD="true"
# ==========================================
# 함수 정의
# ==========================================
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
}
check_prerequisites() {
log "Checking prerequisites..."
# kubectl 확인
if ! command -v kubectl &> /dev/null; then
log "ERROR: kubectl not found"
exit 1
fi
# helm 확인
if ! command -v helm &> /dev/null; then
log "ERROR: helm not found"
exit 1
fi
# StorageClass 확인
if ! kubectl get storageclass "${STORAGE_CLASS}" &> /dev/null; then
log "WARNING: StorageClass '${STORAGE_CLASS}' not found"
log "Please create StorageClass or update STORAGE_CLASS variable"
exit 1
fi
log "Prerequisites check passed"
}
create_namespace() {
log "Creating namespaces..."
kubectl create namespace "${INFRA_NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f -
kubectl create namespace "${NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f -
log "Namespaces created"
}
deploy_mysql() {
log "Deploying MySQL..."
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql-config
namespace: ${INFRA_NAMESPACE}
data:
my.cnf: |
[mysqld]
default-storage-engine=INNODB
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
max_connections=500
max_allowed_packet=256M
innodb_buffer_pool_size=2G
innodb_log_file_size=512M
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
---
apiVersion: v1
kind: Secret
metadata:
name: mysql-secret
namespace: ${INFRA_NAMESPACE}
type: Opaque
stringData:
MYSQL_ROOT_PASSWORD: "RootPassword123!"
MYSQL_PASSWORD: "${MYSQL_PASSWORD}"
---
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: ${INFRA_NAMESPACE}
spec:
type: ClusterIP
ports:
- port: ${MYSQL_PORT}
targetPort: ${MYSQL_PORT}
selector:
app: mysql
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
namespace: ${INFRA_NAMESPACE}
spec:
accessModes:
- ReadWriteOnce
storageClassName: ${STORAGE_CLASS}
resources:
requests:
storage: 50Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: ${INFRA_NAMESPACE}
spec:
serviceName: mysql
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: ${IMAGE_REGISTRY}/mysql:8.0
ports:
- containerPort: ${MYSQL_PORT}
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_ROOT_PASSWORD
- name: MYSQL_DATABASE
value: ${MYSQL_DATABASE}
- name: MYSQL_USER
value: ${MYSQL_USERNAME}
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_PASSWORD
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: mysql-config
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: mysql-pvc
- name: mysql-config
configMap:
name: mysql-config
EOF
log "Waiting for MySQL to be ready..."
kubectl wait --for=condition=Ready pod -l app=mysql -n "${INFRA_NAMESPACE}" --timeout=300s
log "MySQL deployed successfully"
}
deploy_redis() {
log "Deploying Redis..."
cat <<EOF | kubectl apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: ${INFRA_NAMESPACE}
data:
redis.conf: |
bind 0.0.0.0
protected-mode no
port ${REDIS_PORT}
requirepass "${REDIS_PASSWORD}"
maxmemory 2gb
maxmemory-policy allkeys-lru
appendonly yes
---
apiVersion: v1
kind: Service
metadata:
name: redis
namespace: ${INFRA_NAMESPACE}
spec:
type: ClusterIP
ports:
- port: ${REDIS_PORT}
targetPort: ${REDIS_PORT}
selector:
app: redis
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-pvc
namespace: ${INFRA_NAMESPACE}
spec:
accessModes:
- ReadWriteOnce
storageClassName: ${STORAGE_CLASS}
resources:
requests:
storage: 20Gi
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: ${INFRA_NAMESPACE}
spec:
serviceName: redis
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: ${IMAGE_REGISTRY}/redis:7.2
ports:
- containerPort: ${REDIS_PORT}
command:
- redis-server
- /usr/local/etc/redis/redis.conf
volumeMounts:
- name: redis-data
mountPath: /data
- name: redis-config
mountPath: /usr/local/etc/redis
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
cpu: 1000m
memory: 4Gi
volumes:
- name: redis-data
persistentVolumeClaim:
claimName: redis-pvc
- name: redis-config
configMap:
name: redis-config
EOF
log "Waiting for Redis to be ready..."
kubectl wait --for=condition=Ready pod -l app=redis -n "${INFRA_NAMESPACE}" --timeout=300s
log "Redis deployed successfully"
}
install_dragonfly() {
log "Installing Dragonfly..."
helm upgrade --install "${RELEASE_NAME}" "${CHART_PATH}" \
--namespace "${NAMESPACE}" \
--set global.imageRegistry="${IMAGE_REGISTRY}" \
\
--set manager.enable=true \
--set manager.replicas="${MANAGER_REPLICAS}" \
--set manager.image.repository="${MANAGER_IMAGE}" \
--set manager.image.tag="${IMAGE_TAG_MANAGER}" \
--set manager.resources.requests.cpu="${MANAGER_CPU_REQUEST}" \
--set manager.resources.requests.memory="${MANAGER_MEM_REQUEST}" \
--set manager.resources.limits.cpu="${MANAGER_CPU_LIMIT}" \
--set manager.resources.limits.memory="${MANAGER_MEM_LIMIT}" \
--set manager.metrics.enable="${ENABLE_METRICS}" \
--set manager.metrics.port="${METRICS_PORT}" \
--set manager.metrics.serviceMonitor.enable="${SERVICE_MONITOR_ENABLED}" \
--set manager.metrics.serviceMonitor.interval="${SERVICE_MONITOR_INTERVAL}" \
--set manager.metrics.serviceMonitor.labels.release="${PROMETHEUS_LABEL}" \
\
--set scheduler.enable=true \
--set scheduler.replicas="${SCHEDULER_REPLICAS}" \
--set scheduler.image.repository="${SCHEDULER_IMAGE}" \
--set scheduler.image.tag="${IMAGE_TAG_SCHEDULER}" \
--set scheduler.resources.requests.cpu="${SCHEDULER_CPU_REQUEST}" \
--set scheduler.resources.requests.memory="${SCHEDULER_MEM_REQUEST}" \
--set scheduler.resources.limits.cpu="${SCHEDULER_CPU_LIMIT}" \
--set scheduler.resources.limits.memory="${SCHEDULER_MEM_LIMIT}" \
--set scheduler.metrics.enable="${ENABLE_METRICS}" \
--set scheduler.metrics.serviceMonitor.enable="${SERVICE_MONITOR_ENABLED}" \
--set scheduler.metrics.serviceMonitor.labels.release="${PROMETHEUS_LABEL}" \
\
--set seedClient.enable=true \
--set seedClient.replicas="${SEED_PEER_REPLICAS}" \
--set seedClient.image.repository="${CLIENT_IMAGE}" \
--set seedClient.image.tag="${IMAGE_TAG_CLIENT}" \
--set seedClient.persistence.enable=true \
--set seedClient.persistence.size="${SEED_PEER_STORAGE_SIZE}" \
--set seedClient.persistence.storageClass="${STORAGE_CLASS}" \
--set seedClient.resources.requests.cpu="${SEED_CPU_REQUEST}" \
--set seedClient.resources.requests.memory="${SEED_MEM_REQUEST}" \
--set seedClient.resources.limits.cpu="${SEED_CPU_LIMIT}" \
--set seedClient.resources.limits.memory="${SEED_MEM_LIMIT}" \
--set seedClient.metrics.enable="${ENABLE_METRICS}" \
--set seedClient.metrics.serviceMonitor.enable="${SERVICE_MONITOR_ENABLED}" \
--set seedClient.metrics.serviceMonitor.labels.release="${PROMETHEUS_LABEL}" \
\
--set client.enable=true \
--set client.image.repository="${CLIENT_IMAGE}" \
--set client.image.tag="${IMAGE_TAG_CLIENT}" \
--set client.persistence.enable=true \
--set client.persistence.size="${CLIENT_STORAGE_SIZE}" \
--set client.persistence.storageClass="${STORAGE_CLASS}" \
--set client.resources.requests.cpu="${CLIENT_CPU_REQUEST}" \
--set client.resources.requests.memory="${CLIENT_MEM_REQUEST}" \
--set client.resources.limits.cpu="${CLIENT_CPU_LIMIT}" \
--set client.resources.limits.memory="${CLIENT_MEM_LIMIT}" \
--set client.metrics.enable="${ENABLE_METRICS}" \
--set client.metrics.serviceMonitor.enable="${SERVICE_MONITOR_ENABLED}" \
--set client.metrics.serviceMonitor.labels.release="${PROMETHEUS_LABEL}" \
--set client.enableHost=true \
\
--set dfinit.enable="${DFINIT_ENABLED}" \
--set dfinit.restartContainerRuntime="${DFINIT_RESTART_CONTAINERD}" \
--set dfinit.image.repository="${DFINIT_IMAGE}" \
--set dfinit.image.tag="${IMAGE_TAG_DFINIT}" \
\
--set mysql.enable=false \
--set externalMysql.migrate=true \
--set externalMysql.host="${MYSQL_HOST}" \
--set externalMysql.port="${MYSQL_PORT}" \
--set externalMysql.username="${MYSQL_USERNAME}" \
--set externalMysql.password="${MYSQL_PASSWORD}" \
--set externalMysql.database="${MYSQL_DATABASE}" \
\
--set redis.enable=false \
--set externalRedis.addrs[0]="${REDIS_HOST}:${REDIS_PORT}" \
--set externalRedis.password="${REDIS_PASSWORD}" \
--set externalRedis.db="${REDIS_DB}" \
--set externalRedis.brokerDB="${REDIS_BROKER_DB}" \
--set externalRedis.backendDB="${REDIS_BACKEND_DB}" \
\
--wait \
--timeout 15m
log "Dragonfly installed successfully"
}
verify_installation() {
log "Verifying installation..."
log "MySQL status:"
kubectl get pods -n "${INFRA_NAMESPACE}" -l app=mysql
log "Redis status:"
kubectl get pods -n "${INFRA_NAMESPACE}" -l app=redis
log "Dragonfly status:"
kubectl get pods -n "${NAMESPACE}"
log "Verification complete"
}
# ==========================================
# 메인 실행
# ==========================================
main() {
log "Starting Dragonfly deployment..."
log "Configuration:"
log " Namespace: ${NAMESPACE}"
log " Infra Namespace: ${INFRA_NAMESPACE}"
log " Image Registry: ${IMAGE_REGISTRY}"
log " Storage Class: ${STORAGE_CLASS}"
log " Manager Replicas: ${MANAGER_REPLICAS}"
log " Scheduler Replicas: ${SCHEDULER_REPLICAS}"
log " Seed Peer Replicas: ${SEED_PEER_REPLICAS}"
check_prerequisites
create_namespace
deploy_mysql
deploy_redis
install_dragonfly
verify_installation
log "========================================="
log "Dragonfly deployment completed!"
log "========================================="
log "Next steps:"
log "1. Check pods: kubectl get pods -n ${NAMESPACE}"
log "2. Check services: kubectl get svc -n ${NAMESPACE}"
log "3. Check metrics: kubectl port-forward svc/dragonfly-manager -n ${NAMESPACE} 8000:8000"
log "4. Test image pull: kubectl run test --image=${IMAGE_REGISTRY}/nginx:latest"
}
# 스크립트 실행
main "$@"
# 1. 실행 권한 부여
chmod +x install-with-variables.sh
# 2. 변수 수정
# install-with-variables.sh 파일 상단의 변수 섹션 수정
# 3. 설치 실행
./install-with-variables.sh
# 4. 로그 확인
./install-with-variables.sh 2>&1 | tee install.log
# dragonfly-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: dragonfly-alerts
namespace: dragonfly-system
labels:
release: prometheus
spec:
groups:
- name: dragonfly
interval: 30s
rules:
# Manager Down
- alert: DragonflyManagerDown
expr: up{job="dragonfly-manager"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Dragonfly Manager down"
description: "Manager {{ $labels.pod }} is down for >5min"
# Scheduler Down
- alert: DragonflySchedulerDown
expr: up{job="dragonfly-scheduler"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Dragonfly Scheduler down"
# Seed Peer Low Count
- alert: LowSeedPeerCount
expr: count(up{job="dragonfly-seed-client"} == 1) < 3
for: 10m
labels:
severity: warning
annotations:
summary: "Low Seed Peer count (<3)"
# Low Cache Hit Rate
- alert: LowCacheHitRate
expr: |
(sum(rate(dragonfly_client_cache_hit_total[5m]))
/
(sum(rate(dragonfly_client_cache_hit_total[5m])) + sum(rate(dragonfly_client_cache_miss_total[5m]))))
< 0.5
for: 30m
labels:
severity: warning
annotations:
summary: "Cache hit rate <50%"
# High Task Failure Rate
- alert: HighTaskFailureRate
expr: |
(sum(rate(dragonfly_scheduler_tasks_total{state="failed"}[5m]))
/
sum(rate(dragonfly_scheduler_tasks_total[5m])))
> 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "Task failure rate >10%"
# High Disk Usage
- alert: HighDiskUsage
expr: |
dragonfly_client_disk_usage_bytes / dragonfly_client_disk_capacity_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Disk usage >90%"
# Low P2P Efficiency
- alert: LowP2PEfficiency
expr: |
(sum(rate(dragonfly_client_download_piece_total{source_type="peer"}[5m]))
/
sum(rate(dragonfly_client_download_piece_total[5m])))
< 0.7
for: 30m
labels:
severity: info
annotations:
summary: "P2P efficiency <70%"
# MySQL Connection Errors
- alert: MySQLConnectionError
expr: mysql_global_status_aborted_connects > 10
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL connection errors"
# Redis High Memory
- alert: RedisHighMemory
expr: redis_memory_used_bytes / redis_memory_max_bytes > 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Redis memory usage >90%"
# Grafana Dashboard Import
# 1. Grafana 접속
kubectl port-forward svc/prometheus-grafana -n monitoring 3000:80
# 2. 브라우저 열기: http://localhost:3000
# ID: admin
# PW: (prometheus-grafana secret에서 확인)
# 3. Dashboard Import
# - 좌측 메뉴 > Dashboards > Import
# - Dashboard ID: 자체 제작 필요 또는 아래 참고
주요 패널 구성:
Row 1: Overview
- Total Nodes
- Active Tasks
- Cache Hit Rate
- P2P Download Ratio
Row 2: Manager & Scheduler
- Manager Request Rate
- Manager Error Rate
- Scheduler Task Rate
- Scheduler Duration
Row 3: Seed Peer
- Seed Peer Count
- Seed Peer Disk Usage
- Upload Traffic
- Cache Size
Row 4: Client
- Client Count
- Download Speed
- Cache Hit Rate
- Disk Usage
Row 5: Infrastructure
- MySQL Connections
- MySQL Query Rate
- Redis Memory
- Redis Commands/sec
# 1. 모든 Pod 확인
kubectl get pods -n dragonfly-infra
kubectl get pods -n dragonfly-system
# 2. Service 확인
kubectl get svc -n dragonfly-infra
kubectl get svc -n dragonfly-system
# 3. ServiceMonitor 확인
kubectl get servicemonitor -n dragonfly-system
# 4. MySQL 테스트
kubectl exec -it mysql-0 -n dragonfly-infra -- mysql -udragonfly -pDragonflyPassword123! -e "SHOW DATABASES;"
# 5. Redis 테스트
kubectl exec -it redis-0 -n dragonfly-infra -- redis-cli -a RedisPassword123! ping
# 6. 메트릭 확인
kubectl port-forward svc/dragonfly-manager -n dragonfly-system 8000:8000
curl http://localhost:8000/metrics
# 7. Prometheus Target 확인
kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090:9090
# http://localhost:9090/targets
# 8. 기능 테스트
kubectl run test-nginx --image=nexus.com/nginx:latest
kubectl logs -f -n dragonfly-system -l app=dragonfly-client | grep download
제공된 파일:
1. ✅ dragonfly-values.yaml - 완전한 Helm values
2. ✅ install-with-variables.sh - 변수 기반 설치 스크립트
3. ✅ mysql-deployment.yaml - MySQL 배포
4. ✅ redis-deployment.yaml - Redis 배포
5. ✅ dragonfly-alerts.yaml - Prometheus 알림
핵심 특징:
5대 테스트 → 전체 적용을 위한 단계 테스트 진행
Phase 1: 인프라 구축 (MySQL/Redis)
└─ 한 번만 구축, 이후 재사용
Phase 2: 테스트 배포 (5대 노드)
├─ Manager (1 replica)
├─ Scheduler (1 replica)
├─ Seed Peer (2 replicas)
└─ Client (5대만)
Phase 3: 검증 및 최적화
└─ 1~2주 모니터링
Phase 4: 전체 확장 (100대)
├─ Manager (3 replicas)
├─ Scheduler (3 replicas)
├─ Seed Peer (5 replicas)
└─ Client (100대)
# 인프라는 한 번만 배포
# 테스트/프로덕션 모두 사용
# 1. Namespace 생성
kubectl create namespace dragonfly-infra
# 2. MySQL 배포
kubectl apply -f mysql-deployment.yaml
# 3. Redis 배포
kubectl apply -f redis-deployment.yaml
# 4. 확인
kubectl get pods -n dragonfly-infra -w
# 예상 출력:
# NAME READY STATUS RESTARTS AGE
# mysql-0 1/1 Running 0 2m
# redis-0 1/1 Running 0 2m
# 테스트 노드 5대 선정
TEST_NODES=(
"worker-node-1"
"worker-node-2"
"worker-node-3"
"worker-node-4"
"worker-node-5"
)
# 라벨 추가
for node in "${TEST_NODES[@]}"; do
kubectl label node $node dragonfly-phase=test
done
# 확인
kubectl get nodes -l dragonfly-phase=test
# dragonfly-test-values.yaml
# ==========================================
# Phase 2: 테스트 배포 (5대 노드)
# ==========================================
global:
imageRegistry: "nexus.com"
# ==========================================
# Manager (소규모)
# ==========================================
manager:
enable: true
replicas: 1 # 테스트는 1개
image:
repository: nexus.com/dragonflyoss/manager
tag: v2.3.3
pullPolicy: IfNotPresent
resources:
requests:
cpu: 500m # 테스트는 절반
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
service:
type: ClusterIP
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
# ==========================================
# Scheduler (소규모)
# ==========================================
scheduler:
enable: true
replicas: 1 # 테스트는 1개
image:
repository: nexus.com/dragonflyoss/scheduler
tag: v2.3.3
pullPolicy: IfNotPresent
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
service:
type: ClusterIP
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
config:
scheduler:
algorithm: default
backSourceCount: 3
manager:
schedulerClusterID: 1
# ==========================================
# Seed Peer (소규모)
# ==========================================
seedClient:
enable: true
replicas: 2 # 테스트는 2개
image:
repository: nexus.com/dragonflyoss/client
tag: v0.1.118
pullPolicy: IfNotPresent
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
persistence:
enable: true
size: 50Gi # 테스트는 작게
storageClass: "local-path"
accessModes:
- ReadWriteOnce
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
config:
seedPeer:
enable: true
type: "super"
clusterID: 1
proxy:
registryMirror:
addr: https://nexus.com
disableBackToSource: false
security:
insecure: false
cacert: "/etc/containerd/certs.d/nexus.com/ca.crt"
cert: "/etc/containerd/certs.d/nexus.com/client.crt"
key: "/etc/containerd/certs.d/nexus.com/client.key"
download:
concurrentPieceCount: 10
pieceDownloadTimeout: 60s
upload:
rateLimit: 0
maxConcurrency: 100
storage:
dir: /var/lib/dragonfly
taskExpireTime: 12h # 테스트 기간
diskGCThreshold: 85
volumeMounts:
- name: containerd-certs
mountPath: /etc/containerd/certs.d
readOnly: true
volumes:
- name: containerd-certs
hostPath:
path: /etc/containerd/certs.d
type: Directory
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-seed-client
topologyKey: kubernetes.io/hostname
# ==========================================
# Client (테스트 노드만!) 🔥
# ==========================================
client:
enable: true
# 중요: 테스트 노드만 선택!
nodeSelector:
dragonfly-phase: test # 5대만!
image:
repository: nexus.com/dragonflyoss/client
tag: v0.1.118
pullPolicy: IfNotPresent
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
persistence:
enable: true
size: 20Gi # 테스트는 작게
storageClass: "local-path"
accessModes:
- ReadWriteOnce
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
config:
proxy:
registryMirror:
addr: https://nexus.com
listenAddress: "0.0.0.0:65001"
disableBackToSource: false
security:
insecure: false
cacert: "/etc/containerd/certs.d/nexus.com/ca.crt"
cert: "/etc/containerd/certs.d/nexus.com/client.crt"
key: "/etc/containerd/certs.d/nexus.com/client.key"
proxies:
- regx: "nexus.com/*"
useHTTPS: true
direct: true
- regx: "docker.io/*"
useHTTPS: true
direct: true
download:
concurrentPieceCount: 10
pieceDownloadTimeout: 30s
downloadTimeout: 10m
storage:
dir: /var/lib/dragonfly
taskExpireTime: 6h
diskGCThreshold: 90
volumeMounts:
- name: containerd-certs
mountPath: /etc/containerd/certs.d
readOnly: true
volumes:
- name: containerd-certs
hostPath:
path: /etc/containerd/certs.d
type: Directory
enableHost: true
# ==========================================
# dfinit (테스트 노드만!)
# ==========================================
dfinit:
enable: true
restartContainerRuntime: true
# 중요: 테스트 노드만!
nodeSelector:
dragonfly-phase: test
image:
repository: nexus.com/dragonflyoss/dfinit
tag: v0.1.118
pullPolicy: IfNotPresent
config:
containerRuntime:
containerd:
configPath: /etc/containerd/config.toml
registries:
- hostNamespace: nexus.com
serverAddr: https://nexus.com
capabilities: ['pull', 'resolve']
- hostNamespace: docker.io
serverAddr: https://registry-1.docker.io
capabilities: ['pull', 'resolve']
# ==========================================
# 외부 MySQL (기구축 사용)
# ==========================================
mysql:
enable: false
externalMysql:
migrate: true
host: mysql.dragonfly-infra.svc.cluster.local
port: 3306
username: dragonfly
password: "DragonflyPassword123!"
database: dragonfly
maxOpenConns: 50 # 테스트는 작게
maxIdleConns: 10
# ==========================================
# 외부 Redis (기구축 사용)
# ==========================================
redis:
enable: false
externalRedis:
addrs:
- redis.dragonfly-infra.svc.cluster.local:6379
password: "RedisPassword123!"
db: 0
brokerDB: 1
backendDB: 2
# install-test.sh
#!/bin/bash
set -e
echo "================================================"
echo "Phase 2: Dragonfly 테스트 배포 (5대 노드)"
echo "================================================"
# 변수
NAMESPACE="dragonfly-system"
RELEASE_NAME="dragonfly"
CHART_PATH="./dragonfly-1.4.15.tgz"
VALUES_FILE="./dragonfly-test-values.yaml"
# Namespace 생성
echo "[1/5] Creating namespace..."
kubectl create namespace ${NAMESPACE} --dry-run=client -o yaml | kubectl apply -f -
# Helm 설치
echo "[2/5] Installing Dragonfly..."
helm upgrade --install ${RELEASE_NAME} ${CHART_PATH} \
--namespace ${NAMESPACE} \
--values ${VALUES_FILE} \
--wait \
--timeout 10m
# 상태 확인
echo "[3/5] Checking pod status..."
kubectl get pods -n ${NAMESPACE} -o wide
# Client가 5대만 배포되었는지 확인
echo "[4/5] Verifying Client DaemonSet (should be 5 pods)..."
CLIENT_COUNT=$(kubectl get pods -n ${NAMESPACE} -l component=client --no-headers | wc -l)
echo "Client pods: ${CLIENT_COUNT}"
if [ "${CLIENT_COUNT}" -ne 5 ]; then
echo "WARNING: Expected 5 Client pods, but found ${CLIENT_COUNT}"
fi
# dfinit 확인
echo "[5/5] Checking dfinit job..."
kubectl get job -n ${NAMESPACE} -l component=dfinit
echo "================================================"
echo "Phase 2 테스트 배포 완료!"
echo "================================================"
echo ""
echo "다음 단계:"
echo "1. 기능 테스트: ./test-functionality.sh"
echo "2. 성능 측정: ./test-performance.sh"
echo "3. 1~2주 모니터링"
echo "4. 문제 없으면 Phase 4 전체 배포"
# 실행 권한
chmod +x install-test.sh
# 설치
./install-test.sh
# 확인
kubectl get pods -n dragonfly-system -o wide
# 예상 출력:
# NAME READY STATUS NODE
# dragonfly-manager-0 1/1 Running worker-node-1
# dragonfly-scheduler-0 1/1 Running worker-node-2
# dragonfly-seed-client-0 1/1 Running worker-node-3
# dragonfly-seed-client-1 1/1 Running worker-node-4
# dragonfly-client-xxxxx 1/1 Running worker-node-1 # 5개만
# dragonfly-client-xxxxx 1/1 Running worker-node-2
# dragonfly-client-xxxxx 1/1 Running worker-node-3
# dragonfly-client-xxxxx 1/1 Running worker-node-4
# dragonfly-client-xxxxx 1/1 Running worker-node-5
# test-functionality.sh
#!/bin/bash
set -e
NAMESPACE="dragonfly-system"
echo "================================================"
echo "Dragonfly 기능 테스트"
echo "================================================"
# 1. 기본 동작 테스트
echo "[Test 1/5] 기본 이미지 Pull 테스트..."
kubectl run test-nginx-1 --image=nexus.com/nginx:latest \
--overrides='{"spec":{"nodeSelector":{"dragonfly-phase":"test"}}}'
sleep 10
kubectl wait --for=condition=Ready pod/test-nginx-1 --timeout=120s
echo "✅ Test 1 통과"
# 2. P2P 동작 확인
echo "[Test 2/5] P2P 동작 확인..."
kubectl logs -n ${NAMESPACE} -l component=client --tail=50 | grep -i "download from peer"
if [ $? -eq 0 ]; then
echo "✅ Test 2 통과 (P2P 동작 확인)"
else
echo "⚠️ Test 2: P2P 동작 확인 불가 (첫 다운로드일 수 있음)"
fi
# 3. 캐시 히트 테스트
echo "[Test 3/5] 캐시 히트 테스트..."
kubectl delete pod test-nginx-1
sleep 5
kubectl run test-nginx-2 --image=nexus.com/nginx:latest \
--overrides='{"spec":{"nodeSelector":{"dragonfly-phase":"test"}}}'
sleep 10
kubectl logs -n ${NAMESPACE} -l component=client --tail=50 | grep -E "(cache hit|download from peer)"
if [ $? -eq 0 ]; then
echo "✅ Test 3 통과 (캐시 히트 확인)"
else
echo "⚠️ Test 3: 캐시 동작 확인 필요"
fi
# 4. Fallback 테스트
echo "[Test 4/5] Fallback 테스트 (Seed Peer 중단)..."
kubectl scale deployment dragonfly-seed-client --replicas=0 -n ${NAMESPACE}
sleep 10
kubectl run test-nginx-fallback --image=nexus.com/busybox:latest \
--overrides='{"spec":{"nodeSelector":{"dragonfly-phase":"test"}}}' \
-- sleep 3600
sleep 20
kubectl wait --for=condition=Ready pod/test-nginx-fallback --timeout=120s
if [ $? -eq 0 ]; then
echo "✅ Test 4 통과 (Fallback 정상 동작)"
else
echo "❌ Test 4 실패 (Fallback 문제)"
fi
# Seed Peer 복구
kubectl scale deployment dragonfly-seed-client --replicas=2 -n ${NAMESPACE}
sleep 20
# 5. 여러 Registry 테스트
echo "[Test 5/5] 여러 Registry 테스트..."
kubectl run test-docker-io --image=nexus.com/library/alpine:latest \
--overrides='{"spec":{"nodeSelector":{"dragonfly-phase":"test"}}}'
sleep 10
# 정리
echo "[Cleanup] 테스트 Pod 정리..."
kubectl delete pod test-nginx-2 test-nginx-fallback test-docker-io --ignore-not-found=true
echo "================================================"
echo "기능 테스트 완료!"
echo "================================================"
# test-performance.sh
#!/bin/bash
set -e
NAMESPACE="dragonfly-system"
IMAGE="nexus.com/test-app:large" # 큰 이미지 (1GB+)
TEST_NODES=5
echo "================================================"
echo "Dragonfly 성능 측정 (5대 노드)"
echo "================================================"
# 캐시 초기화
echo "[Prep] 캐시 초기화..."
for pod in $(kubectl get pods -n ${NAMESPACE} -l component=client -o name); do
kubectl exec -n ${NAMESPACE} ${pod} -- rm -rf /var/lib/dragonfly/storage/tasks/* 2>/dev/null || true
done
# Test 1: Cold Start
echo ""
echo "[Test 1/2] Cold Start (첫 다운로드)"
START=$(date +%s)
for i in $(seq 1 ${TEST_NODES}); do
kubectl run perf-test-cold-${i} --image=${IMAGE} \
--overrides='{"spec":{"nodeSelector":{"dragonfly-phase":"test"}}}' &
done
wait
kubectl wait --for=condition=Ready pod -l run=perf-test-cold --timeout=600s
END=$(date +%s)
COLD_TIME=$((END - START))
echo "Cold Start Time: ${COLD_TIME}s"
# Pod 삭제
kubectl delete pod -l run=perf-test-cold
sleep 10
# Test 2: Cache Hit
echo ""
echo "[Test 2/2] Cache Hit (캐시 사용)"
START=$(date +%s)
for i in $(seq 1 ${TEST_NODES}); do
kubectl run perf-test-cache-${i} --image=${IMAGE} \
--overrides='{"spec":{"nodeSelector":{"dragonfly-phase":"test"}}}' &
done
wait
kubectl wait --for=condition=Ready pod -l run=perf-test-cache --timeout=600s
END=$(date +%s)
CACHE_TIME=$((END - START))
echo "Cache Hit Time: ${CACHE_TIME}s"
# 결과 정리
kubectl delete pod -l run=perf-test-cache
echo ""
echo "================================================"
echo "성능 측정 결과"
echo "================================================"
echo "Cold Start: ${COLD_TIME}s"
echo "Cache Hit: ${CACHE_TIME}s"
echo "Speedup: $((COLD_TIME / CACHE_TIME))x"
echo "================================================"
# daily-check.sh
#!/bin/bash
NAMESPACE="dragonfly-system"
INFRA_NS="dragonfly-infra"
echo "========== Dragonfly 일일 체크 $(date) =========="
# 1. Pod 상태
echo ""
echo "=== Pod Status ==="
kubectl get pods -n ${NAMESPACE} -o wide
# 2. 리소스 사용량
echo ""
echo "=== Resource Usage ==="
kubectl top pods -n ${NAMESPACE} 2>/dev/null || echo "Metrics server not available"
# 3. 캐시 히트율
echo ""
echo "=== Cache Metrics ==="
for pod in $(kubectl get pods -n ${NAMESPACE} -l component=client -o name | head -1); do
kubectl exec -n ${NAMESPACE} ${pod} -- curl -s http://localhost:8000/metrics 2>/dev/null | \
grep -E "dragonfly_client_cache_(hit|miss)_total" || echo "Metrics not available"
done
# 4. 디스크 사용량
echo ""
echo "=== Disk Usage ==="
kubectl exec -n ${NAMESPACE} dragonfly-seed-client-0 -- df -h /var/lib/dragonfly 2>/dev/null || echo "N/A"
# 5. 최근 에러
echo ""
echo "=== Recent Errors ==="
kubectl logs -n ${NAMESPACE} --tail=50 -l component=client 2>/dev/null | grep -i error | tail -10 || echo "No errors"
# 6. MySQL/Redis 상태
echo ""
echo "=== Infrastructure Status ==="
kubectl get pods -n ${INFRA_NS}
echo ""
echo "================================================"
# Cron으로 매일 실행
chmod +x daily-check.sh
# crontab -e
# 0 9 * * * /path/to/daily-check.sh >> /var/log/dragonfly-daily.log 2>&1
# 1~2주 테스트 결과 체크리스트
# ✅ Pod 안정성 (재시작 없음)
# ✅ 캐시 히트율 >50%
# ✅ P2P 동작 확인
# ✅ Fallback 정상 동작
# ✅ 성능 개선 확인
# ✅ 에러 로그 없음
# ✅ 리소스 사용량 정상
# dragonfly-production-values.yaml
# ==========================================
# Phase 4: 전체 배포 (100대 노드)
# ==========================================
global:
imageRegistry: "nexus.com"
# ==========================================
# Manager (프로덕션)
# ==========================================
manager:
enable: true
replicas: 3 # 테스트: 1 → 프로덕션: 3
image:
repository: nexus.com/dragonflyoss/manager
tag: v2.3.3
pullPolicy: IfNotPresent
resources:
requests:
cpu: 1000m # 테스트: 500m → 프로덕션: 1000m
memory: 2Gi # 테스트: 1Gi → 프로덕션: 2Gi
limits:
cpu: 2000m
memory: 4Gi
service:
type: ClusterIP
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
podDisruptionBudget:
minAvailable: 2
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-manager
topologyKey: kubernetes.io/hostname
# ==========================================
# Scheduler (프로덕션)
# ==========================================
scheduler:
enable: true
replicas: 3 # 테스트: 1 → 프로덕션: 3
image:
repository: nexus.com/dragonflyoss/scheduler
tag: v2.3.3
pullPolicy: IfNotPresent
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
service:
type: ClusterIP
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
config:
scheduler:
algorithm: default
backSourceCount: 3
filterParentLimit: 40
manager:
schedulerClusterID: 1
podDisruptionBudget:
minAvailable: 2
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-scheduler
topologyKey: kubernetes.io/hostname
# ==========================================
# Seed Peer (프로덕션)
# ==========================================
seedClient:
enable: true
replicas: 5 # 테스트: 2 → 프로덕션: 5
image:
repository: nexus.com/dragonflyoss/client
tag: v0.1.118
pullPolicy: IfNotPresent
resources:
requests:
cpu: 2000m # 테스트: 1000m → 프로덕션: 2000m
memory: 4Gi # 테스트: 2Gi → 프로덕션: 4Gi
limits:
cpu: 4000m
memory: 8Gi
persistence:
enable: true
size: 200Gi # 테스트: 50Gi → 프로덕션: 200Gi
storageClass: "local-path"
accessModes:
- ReadWriteOnce
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
config:
seedPeer:
enable: true
type: "super"
clusterID: 1
proxy:
registryMirror:
addr: https://nexus.com
disableBackToSource: false
security:
insecure: false
cacert: "/etc/containerd/certs.d/nexus.com/ca.crt"
cert: "/etc/containerd/certs.d/nexus.com/client.crt"
key: "/etc/containerd/certs.d/nexus.com/client.key"
download:
concurrentPieceCount: 16 # 테스트: 10 → 프로덕션: 16
pieceDownloadTimeout: 60s
rateLimit: 0
upload:
rateLimit: 0
maxConcurrency: 200 # 테스트: 100 → 프로덕션: 200
storage:
dir: /var/lib/dragonfly
taskExpireTime: 24h # 테스트: 12h → 프로덕션: 24h
diskGCThreshold: 85
diskGCInterval: 30s
writeBufferSize: 16777216
readBufferSize: 16777216
volumeMounts:
- name: containerd-certs
mountPath: /etc/containerd/certs.d
readOnly: true
volumes:
- name: containerd-certs
hostPath:
path: /etc/containerd/certs.d
type: Directory
podDisruptionBudget:
minAvailable: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-seed-client
topologyKey: kubernetes.io/hostname
# ==========================================
# Client (전체 노드!) 🔥
# ==========================================
client:
enable: true
# 중요: nodeSelector 제거 → 모든 노드에 배포!
# nodeSelector:
# dragonfly-phase: test # 이 줄 삭제!
image:
repository: nexus.com/dragonflyoss/client
tag: v0.1.118
pullPolicy: IfNotPresent
resources:
requests:
cpu: 1000m # 테스트: 500m → 프로덕션: 1000m
memory: 1Gi # 테스트: 512Mi → 프로덕션: 1Gi
limits:
cpu: 2000m
memory: 2Gi
persistence:
enable: true
size: 30Gi # 테스트: 20Gi → 프로덕션: 30Gi
storageClass: "local-path"
accessModes:
- ReadWriteOnce
metrics:
enable: true
port: 8000
serviceMonitor:
enable: true
interval: 30s
labels:
release: prometheus
config:
proxy:
registryMirror:
addr: https://nexus.com
listenAddress: "0.0.0.0:65001"
disableBackToSource: false
security:
insecure: false
cacert: "/etc/containerd/certs.d/nexus.com/ca.crt"
cert: "/etc/containerd/certs.d/nexus.com/client.crt"
key: "/etc/containerd/certs.d/nexus.com/client.key"
proxies:
- regx: "nexus.com/*"
useHTTPS: true
direct: true
- regx: "docker.io/*"
useHTTPS: true
direct: true
- regx: "gcr.io/*"
useHTTPS: true
direct: true
- regx: "ghcr.io/*"
useHTTPS: true
direct: true
- regx: "k8s.gcr.io/*"
useHTTPS: true
direct: true
- regx: "quay.io/*"
useHTTPS: true
direct: true
- regx: "registry.k8s.io/*"
useHTTPS: true
direct: true
download:
concurrentPieceCount: 10
pieceDownloadTimeout: 30s
downloadTimeout: 10m
downloadRetryCount: 3
storage:
dir: /var/lib/dragonfly
taskExpireTime: 6h
diskGCThreshold: 90
diskGCInterval: 15s
writeBufferSize: 8388608
readBufferSize: 8388608
volumeMounts:
- name: containerd-certs
mountPath: /etc/containerd/certs.d
readOnly: true
volumes:
- name: containerd-certs
hostPath:
path: /etc/containerd/certs.d
type: Directory
enableHost: true
# ==========================================
# dfinit (전체 노드!)
# ==========================================
dfinit:
enable: true
restartContainerRuntime: true
# nodeSelector 제거 → 모든 노드 적용!
# nodeSelector:
# dragonfly-phase: test # 이 줄 삭제!
image:
repository: nexus.com/dragonflyoss/dfinit
tag: v0.1.118
pullPolicy: IfNotPresent
config:
containerRuntime:
containerd:
configPath: /etc/containerd/config.toml
registries:
- hostNamespace: nexus.com
serverAddr: https://nexus.com
capabilities: ['pull', 'resolve']
- hostNamespace: docker.io
serverAddr: https://registry-1.docker.io
capabilities: ['pull', 'resolve']
- hostNamespace: gcr.io
serverAddr: https://gcr.io
capabilities: ['pull', 'resolve']
- hostNamespace: ghcr.io
serverAddr: https://ghcr.io
capabilities: ['pull', 'resolve']
- hostNamespace: k8s.gcr.io
serverAddr: https://k8s.gcr.io
capabilities: ['pull', 'resolve']
- hostNamespace: quay.io
serverAddr: https://quay.io
capabilities: ['pull', 'resolve']
- hostNamespace: registry.k8s.io
serverAddr: https://registry.k8s.io
capabilities: ['pull', 'resolve']
# ==========================================
# 외부 MySQL (동일)
# ==========================================
mysql:
enable: false
externalMysql:
migrate: true
host: mysql.dragonfly-infra.svc.cluster.local
port: 3306
username: dragonfly
password: "DragonflyPassword123!"
database: dragonfly
maxOpenConns: 200 # 테스트: 50 → 프로덕션: 200
maxIdleConns: 50 # 테스트: 10 → 프로덕션: 50
connMaxLifetime: 3600
# ==========================================
# 외부 Redis (동일)
# ==========================================
redis:
enable: false
externalRedis:
addrs:
- redis.dragonfly-infra.svc.cluster.local:6379
password: "RedisPassword123!"
db: 0
brokerDB: 1
backendDB: 2
# ==========================================
# 보안
# ==========================================
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
# upgrade-to-production.sh
#!/bin/bash
set -e
echo "================================================"
echo "Phase 4: 전체 배포 (100대 노드)"
echo "================================================"
NAMESPACE="dragonfly-system"
RELEASE_NAME="dragonfly"
CHART_PATH="./dragonfly-1.4.15.tgz"
VALUES_FILE="./dragonfly-production-values.yaml"
# 확인 프롬프트
echo ""
echo "⚠️ 경고: 전체 노드(100대)로 확장합니다."
echo ""
echo "현재 테스트 상태:"
kubectl get pods -n ${NAMESPACE} -o wide | grep client | wc -l
echo "개의 Client pods"
echo ""
read -p "계속하시겠습니까? (yes/no): " CONFIRM
if [ "$CONFIRM" != "yes" ]; then
echo "취소되었습니다."
exit 1
fi
# 백업
echo ""
echo "[1/5] 현재 설정 백업..."
helm get values ${RELEASE_NAME} -n ${NAMESPACE} > dragonfly-test-backup-$(date +%Y%m%d).yaml
echo "✅ 백업 완료: dragonfly-test-backup-$(date +%Y%m%d).yaml"
# 노드 라벨 확인
echo ""
echo "[2/5] 노드 라벨 정리..."
echo "테스트 라벨이 있는 노드:"
kubectl get nodes -l dragonfly-phase=test
read -p "테스트 라벨을 제거하시겠습니까? (yes/no): " REMOVE_LABEL
if [ "$REMOVE_LABEL" = "yes" ]; then
kubectl label nodes --all dragonfly-phase-
echo "✅ 라벨 제거 완료"
else
echo "⚠️ 라벨 유지 (nodeSelector가 제거되므로 상관없음)"
fi
# Helm Upgrade
echo ""
echo "[3/5] Helm Upgrade 실행..."
echo "Manager: 1→3, Scheduler: 1→3, Seed Peer: 2→5, Client: 5→100"
echo ""
helm upgrade ${RELEASE_NAME} ${CHART_PATH} \
--namespace ${NAMESPACE} \
--values ${VALUES_FILE} \
--wait \
--timeout 20m
echo "✅ Upgrade 완료"
# 확인
echo ""
echo "[4/5] 배포 상태 확인..."
sleep 10
echo "Manager:"
kubectl get pods -n ${NAMESPACE} -l app=dragonfly-manager
echo ""
echo "Scheduler:"
kubectl get pods -n ${NAMESPACE} -l app=dragonfly-scheduler
echo ""
echo "Seed Peer:"
kubectl get pods -n ${NAMESPACE} -l app=dragonfly-seed-client
echo ""
echo "Client (DaemonSet):"
CLIENT_COUNT=$(kubectl get pods -n ${NAMESPACE} -l component=client --no-headers | wc -l)
echo "Total Client pods: ${CLIENT_COUNT}"
if [ "${CLIENT_COUNT}" -lt 90 ]; then
echo "⚠️ 경고: Client pods가 예상보다 적습니다 (${CLIENT_COUNT} < 100)"
echo "일부 노드에 Client가 배포되지 않았을 수 있습니다."
fi
# dfinit 확인
echo ""
echo "[5/5] dfinit 상태 확인..."
kubectl get job -n ${NAMESPACE} -l component=dfinit
echo ""
echo "================================================"
echo "Phase 4 전체 배포 완료!"
echo "================================================"
echo ""
echo "다음 단계:"
echo "1. 전체 노드 상태 모니터링"
echo "2. 성능 측정"
echo "3. 이슈 발생 시 롤백: ./rollback-to-test.sh"