[K8S] SandBox

진웅·2026년 2월 5일

K8S Basics

목록 보기
41/41

Sandbox 개념

Sandbox는 Kubernetes에서 Pod의 격리된 실행 환경을 의미합니다.

Pod (Kubernetes 추상화)
└── Sandbox (containerd 구현체)
    ├── Pause Container (네트워크/IPC 네임스페이스 유지)
    ├── App Container 1
    ├── App Container 2
    └── Sidecar Container

containerdsandbox* 메트릭 상세

1. containerd_sandbox_controller_operations_total

Sandbox 생성, 시작, 중지, 삭제 작업 수

# TYPE containerd_sandbox_controller_operations_total counter

# Sandbox 생성 작업
containerd_sandbox_controller_operations_total{
  operation="create"
}

# Sandbox 시작 작업
containerd_sandbox_controller_operations_total{
  operation="start"
}

# Sandbox 중지 작업
containerd_sandbox_controller_operations_total{
  operation="stop"
}

# Sandbox 삭제 작업
containerd_sandbox_controller_operations_total{
  operation="delete"
}

예시 값:

containerd_sandbox_controller_operations_total{operation="create"} 1245
containerd_sandbox_controller_operations_total{operation="start"} 1245
containerd_sandbox_controller_operations_total{operation="stop"} 1189
containerd_sandbox_controller_operations_total{operation="delete"} 1189

2. containerd_sandbox_controller_operations_duration_seconds

Sandbox 작업 소요 시간 (히스토그램)

# TYPE containerd_sandbox_controller_operations_duration_seconds histogram

# Sandbox 생성 시간
containerd_sandbox_controller_operations_duration_seconds_bucket{
  operation="create",
  le="0.1"
} 850

containerd_sandbox_controller_operations_duration_seconds_bucket{
  operation="create",
  le="0.5"
} 1200

containerd_sandbox_controller_operations_duration_seconds_sum{
  operation="create"
} 456.78

containerd_sandbox_controller_operations_duration_seconds_count{
  operation="create"
} 1245

3. containerd_sandbox_controller_operations_errors_total

Sandbox 작업 실패 수

# TYPE containerd_sandbox_controller_operations_errors_total counter

containerd_sandbox_controller_operations_errors_total{
  operation="create"
} 12

containerd_sandbox_controller_operations_errors_total{
  operation="start"
} 5

4. containerd_sandbox_store_operations_total

Sandbox 메타데이터 저장소 작업

# TYPE containerd_sandbox_store_operations_total counter

# Sandbox 메타데이터 저장
containerd_sandbox_store_operations_total{
  operation="create"
}

# Sandbox 메타데이터 조회
containerd_sandbox_store_operations_total{
  operation="get"
}

# Sandbox 메타데이터 업데이트
containerd_sandbox_store_operations_total{
  operation="update"
}

# Sandbox 메타데이터 삭제
containerd_sandbox_store_operations_total{
  operation="delete"
}

# Sandbox 목록 조회
containerd_sandbox_store_operations_total{
  operation="list"
}

Sandbox vs Task vs Container

이 개념들의 관계를 이해하는 것이 중요합니다:

┌─────────────── Pod (Kubernetes) ───────────────┐
│                                                 │
│  ┌─────────── Sandbox (containerd) ─────────┐ │
│  │                                           │ │
│  │  ┌──── Pause Container ────┐            │ │
│  │  │ Network Namespace        │            │ │
│  │  │ IPC Namespace            │            │ │
│  │  └──────────────────────────┘            │ │
│  │                                           │ │
│  │  ┌──── App Container (Task) ────┐       │ │
│  │  │ Container ID: abc123          │       │ │
│  │  │ Image: nginx:1.25             │       │ │
│  │  └───────────────────────────────┘       │ │
│  │                                           │ │
│  │  ┌──── Sidecar (Task) ──────────┐       │ │
│  │  │ Container ID: def456          │       │ │
│  │  │ Image: envoy:1.28             │       │ │
│  │  └───────────────────────────────┘       │ │
│  │                                           │ │
│  └───────────────────────────────────────────┘ │
│                                                 │
└─────────────────────────────────────────────────┘

용어 정리:

  • Pod: Kubernetes 추상화 레벨
  • Sandbox: containerd의 Pod 구현체 (pause container + 네임스페이스)
  • Container: 애플리케이션 컨테이너 (이미지 + 설정)
  • Task: 실행 중인 컨테이너 프로세스

Sandbox Lifecycle

# 1. Sandbox 생성
kubelet → containerd: RunPodSandbox()
          ↓
containerd: sandbox_controller.Create()
          ↓
containerd_sandbox_controller_operations_total{operation="create"} ++

# 2. Container 생성 및 시작 (Sandbox 내부)
kubelet → containerd: CreateContainer()
kubelet → containerd: StartContainer()

# 3. Sandbox 중지
kubelet → containerd: StopPodSandbox()
          ↓
containerd: sandbox_controller.Stop()
          ↓
containerd_sandbox_controller_operations_total{operation="stop"} ++

# 4. Sandbox 삭제
kubelet → containerd: RemovePodSandbox()
          ↓
containerd: sandbox_controller.Delete()
          ↓
containerd_sandbox_controller_operations_total{operation="delete"} ++

실전 모니터링 쿼리

Sandbox 생성 속도

# 초당 Sandbox 생성 수
rate(containerd_sandbox_controller_operations_total{
  operation="create"
}[5m])

평균 Sandbox 생성 시간

rate(containerd_sandbox_controller_operations_duration_seconds_sum{
  operation="create"
}[5m])
/
rate(containerd_sandbox_controller_operations_duration_seconds_count{
  operation="create"
}[5m])

Sandbox 생성 실패율

# Sandbox 생성 실패율 (%)
(
  rate(containerd_sandbox_controller_operations_errors_total{
    operation="create"
  }[5m])
  /
  rate(containerd_sandbox_controller_operations_total{
    operation="create"
  }[5m])
) * 100

Sandbox 생성 P95 레이턴시

histogram_quantile(0.95,
  rate(containerd_sandbox_controller_operations_duration_seconds_bucket{
    operation="create"
  }[5m])
)

활성 Sandbox 수 추정

# 생성된 Sandbox - 삭제된 Sandbox
(
  containerd_sandbox_controller_operations_total{operation="create"}
  -
  containerd_sandbox_controller_operations_total{operation="delete"}
)

Sandbox vs gRPC 메트릭 비교

Sandbox 메트릭대응 gRPC 메트릭설명
sandbox_controller_operations{operation="create"}grpc_server_handled_total{grpc_method="RunPodSandbox"}Pod Sandbox 생성
sandbox_controller_operations{operation="stop"}grpc_server_handled_total{grpc_method="StopPodSandbox"}Pod Sandbox 중지
sandbox_controller_operations{operation="delete"}grpc_server_handled_total{grpc_method="RemovePodSandbox"}Pod Sandbox 삭제

차이점:

  • gRPC 메트릭: CRI API 레벨 (Kubelet → containerd 통신)
  • Sandbox 메트릭: containerd 내부 구현 레벨

Grafana 대시보드 패널 예시

panels:
  - title: "Sandbox Creation Rate"
    expr: |
      rate(containerd_sandbox_controller_operations_total{
        operation="create"
      }[5m])
    unit: "ops/s"
  
  - title: "Average Sandbox Creation Time"
    expr: |
      rate(containerd_sandbox_controller_operations_duration_seconds_sum{
        operation="create"
      }[5m])
      /
      rate(containerd_sandbox_controller_operations_duration_seconds_count{
        operation="create"
      }[5m])
    unit: "seconds"
  
  - title: "Sandbox Creation Failures"
    expr: |
      rate(containerd_sandbox_controller_operations_errors_total{
        operation="create"
      }[5m])
    alert: "if value > 0.1 for 5m"
  
  - title: "Sandbox Lifecycle Operations"
    expr: |
      sum by (operation) (
        rate(containerd_sandbox_controller_operations_total[5m])
      )
    legend: "{{operation}}"

알림 규칙 예시

groups:
- name: containerd_sandbox
  rules:
  # Sandbox 생성이 느림
  - alert: SlowSandboxCreation
    expr: |
      rate(containerd_sandbox_controller_operations_duration_seconds_sum{
        operation="create"
      }[5m])
      /
      rate(containerd_sandbox_controller_operations_duration_seconds_count{
        operation="create"
      }[5m]) > 5
    for: 10m
    annotations:
      summary: "Sandbox creation taking over 5 seconds"
  
  # Sandbox 생성 실패율이 높음
  - alert: HighSandboxCreationFailures
    expr: |
      (
        rate(containerd_sandbox_controller_operations_errors_total{
          operation="create"
        }[5m])
        /
        rate(containerd_sandbox_controller_operations_total{
          operation="create"
        }[5m])
      ) > 0.05
    for: 5m
    annotations:
      summary: "Sandbox creation failure rate over 5%"
  
  # Sandbox 생성/삭제 불균형
  - alert: SandboxLeakDetected
    expr: |
      (
        rate(containerd_sandbox_controller_operations_total{
          operation="create"
        }[1h])
        -
        rate(containerd_sandbox_controller_operations_total{
          operation="delete"
        }[1h])
      ) > 10
    for: 30m
    annotations:
      summary: "Potential sandbox leak detected"

요약

containerd_sandbox_* 메트릭은:

  • 🏗️ Pod Sandbox 생명주기 모니터링
  • ⏱️ Pod 생성/삭제 성능 측정
  • 🐛 Pod 시작 실패 디버깅
  • 📊 클러스터 Pod Churn 추적

Kubernetes 환경에서 Pod 레벨 성능 문제를 진단하는 핵심 메트릭입니다!

profile
bytebliss

0개의 댓글