Sandbox는 Kubernetes에서 Pod의 격리된 실행 환경을 의미합니다.
Pod (Kubernetes 추상화)
└── Sandbox (containerd 구현체)
├── Pause Container (네트워크/IPC 네임스페이스 유지)
├── App Container 1
├── App Container 2
└── Sidecar Container
Sandbox 생성, 시작, 중지, 삭제 작업 수
# TYPE containerd_sandbox_controller_operations_total counter
# Sandbox 생성 작업
containerd_sandbox_controller_operations_total{
operation="create"
}
# Sandbox 시작 작업
containerd_sandbox_controller_operations_total{
operation="start"
}
# Sandbox 중지 작업
containerd_sandbox_controller_operations_total{
operation="stop"
}
# Sandbox 삭제 작업
containerd_sandbox_controller_operations_total{
operation="delete"
}
예시 값:
containerd_sandbox_controller_operations_total{operation="create"} 1245
containerd_sandbox_controller_operations_total{operation="start"} 1245
containerd_sandbox_controller_operations_total{operation="stop"} 1189
containerd_sandbox_controller_operations_total{operation="delete"} 1189
Sandbox 작업 소요 시간 (히스토그램)
# TYPE containerd_sandbox_controller_operations_duration_seconds histogram
# Sandbox 생성 시간
containerd_sandbox_controller_operations_duration_seconds_bucket{
operation="create",
le="0.1"
} 850
containerd_sandbox_controller_operations_duration_seconds_bucket{
operation="create",
le="0.5"
} 1200
containerd_sandbox_controller_operations_duration_seconds_sum{
operation="create"
} 456.78
containerd_sandbox_controller_operations_duration_seconds_count{
operation="create"
} 1245
Sandbox 작업 실패 수
# TYPE containerd_sandbox_controller_operations_errors_total counter
containerd_sandbox_controller_operations_errors_total{
operation="create"
} 12
containerd_sandbox_controller_operations_errors_total{
operation="start"
} 5
Sandbox 메타데이터 저장소 작업
# TYPE containerd_sandbox_store_operations_total counter
# Sandbox 메타데이터 저장
containerd_sandbox_store_operations_total{
operation="create"
}
# Sandbox 메타데이터 조회
containerd_sandbox_store_operations_total{
operation="get"
}
# Sandbox 메타데이터 업데이트
containerd_sandbox_store_operations_total{
operation="update"
}
# Sandbox 메타데이터 삭제
containerd_sandbox_store_operations_total{
operation="delete"
}
# Sandbox 목록 조회
containerd_sandbox_store_operations_total{
operation="list"
}
이 개념들의 관계를 이해하는 것이 중요합니다:
┌─────────────── Pod (Kubernetes) ───────────────┐
│ │
│ ┌─────────── Sandbox (containerd) ─────────┐ │
│ │ │ │
│ │ ┌──── Pause Container ────┐ │ │
│ │ │ Network Namespace │ │ │
│ │ │ IPC Namespace │ │ │
│ │ └──────────────────────────┘ │ │
│ │ │ │
│ │ ┌──── App Container (Task) ────┐ │ │
│ │ │ Container ID: abc123 │ │ │
│ │ │ Image: nginx:1.25 │ │ │
│ │ └───────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──── Sidecar (Task) ──────────┐ │ │
│ │ │ Container ID: def456 │ │ │
│ │ │ Image: envoy:1.28 │ │ │
│ │ └───────────────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────┘
용어 정리:
# 1. Sandbox 생성
kubelet → containerd: RunPodSandbox()
↓
containerd: sandbox_controller.Create()
↓
containerd_sandbox_controller_operations_total{operation="create"} ++
# 2. Container 생성 및 시작 (Sandbox 내부)
kubelet → containerd: CreateContainer()
kubelet → containerd: StartContainer()
# 3. Sandbox 중지
kubelet → containerd: StopPodSandbox()
↓
containerd: sandbox_controller.Stop()
↓
containerd_sandbox_controller_operations_total{operation="stop"} ++
# 4. Sandbox 삭제
kubelet → containerd: RemovePodSandbox()
↓
containerd: sandbox_controller.Delete()
↓
containerd_sandbox_controller_operations_total{operation="delete"} ++
# 초당 Sandbox 생성 수
rate(containerd_sandbox_controller_operations_total{
operation="create"
}[5m])
rate(containerd_sandbox_controller_operations_duration_seconds_sum{
operation="create"
}[5m])
/
rate(containerd_sandbox_controller_operations_duration_seconds_count{
operation="create"
}[5m])
# Sandbox 생성 실패율 (%)
(
rate(containerd_sandbox_controller_operations_errors_total{
operation="create"
}[5m])
/
rate(containerd_sandbox_controller_operations_total{
operation="create"
}[5m])
) * 100
histogram_quantile(0.95,
rate(containerd_sandbox_controller_operations_duration_seconds_bucket{
operation="create"
}[5m])
)
# 생성된 Sandbox - 삭제된 Sandbox
(
containerd_sandbox_controller_operations_total{operation="create"}
-
containerd_sandbox_controller_operations_total{operation="delete"}
)
| Sandbox 메트릭 | 대응 gRPC 메트릭 | 설명 |
|---|---|---|
sandbox_controller_operations{operation="create"} | grpc_server_handled_total{grpc_method="RunPodSandbox"} | Pod Sandbox 생성 |
sandbox_controller_operations{operation="stop"} | grpc_server_handled_total{grpc_method="StopPodSandbox"} | Pod Sandbox 중지 |
sandbox_controller_operations{operation="delete"} | grpc_server_handled_total{grpc_method="RemovePodSandbox"} | Pod Sandbox 삭제 |
차이점:
panels:
- title: "Sandbox Creation Rate"
expr: |
rate(containerd_sandbox_controller_operations_total{
operation="create"
}[5m])
unit: "ops/s"
- title: "Average Sandbox Creation Time"
expr: |
rate(containerd_sandbox_controller_operations_duration_seconds_sum{
operation="create"
}[5m])
/
rate(containerd_sandbox_controller_operations_duration_seconds_count{
operation="create"
}[5m])
unit: "seconds"
- title: "Sandbox Creation Failures"
expr: |
rate(containerd_sandbox_controller_operations_errors_total{
operation="create"
}[5m])
alert: "if value > 0.1 for 5m"
- title: "Sandbox Lifecycle Operations"
expr: |
sum by (operation) (
rate(containerd_sandbox_controller_operations_total[5m])
)
legend: "{{operation}}"
groups:
- name: containerd_sandbox
rules:
# Sandbox 생성이 느림
- alert: SlowSandboxCreation
expr: |
rate(containerd_sandbox_controller_operations_duration_seconds_sum{
operation="create"
}[5m])
/
rate(containerd_sandbox_controller_operations_duration_seconds_count{
operation="create"
}[5m]) > 5
for: 10m
annotations:
summary: "Sandbox creation taking over 5 seconds"
# Sandbox 생성 실패율이 높음
- alert: HighSandboxCreationFailures
expr: |
(
rate(containerd_sandbox_controller_operations_errors_total{
operation="create"
}[5m])
/
rate(containerd_sandbox_controller_operations_total{
operation="create"
}[5m])
) > 0.05
for: 5m
annotations:
summary: "Sandbox creation failure rate over 5%"
# Sandbox 생성/삭제 불균형
- alert: SandboxLeakDetected
expr: |
(
rate(containerd_sandbox_controller_operations_total{
operation="create"
}[1h])
-
rate(containerd_sandbox_controller_operations_total{
operation="delete"
}[1h])
) > 10
for: 30m
annotations:
summary: "Potential sandbox leak detected"
containerd_sandbox_* 메트릭은:
Kubernetes 환경에서 Pod 레벨 성능 문제를 진단하는 핵심 메트릭입니다!