ceph osd pod의 재시작 횟수가 비정상적으로 많은 상태를 확인하였습니다.
NAME READY STATUS RESTARTS
csi-cephfsplugin-fkqc5 3/3 Running 0
csi-cephfsplugin-h2r9z 3/3 Running 2
csi-cephfsplugin-provisioner-6c6bc95d4b-kjl8f 6/6 Running 16
csi-cephfsplugin-provisioner-6c6bc95d4b-rxk6d 6/6 Running 0
csi-cephfsplugin-qn4jp 3/3 Running 3
csi-rbdplugin-hknnk 3/3 Running 0
csi-rbdplugin-kk6m6 3/3 Running 3
csi-rbdplugin-provisioner-6ff4d774d4-6bwn5 6/6 Running 0
csi-rbdplugin-provisioner-6ff4d774d4-vttsx 6/6 Running 17
csi-rbdplugin-sq8ws 3/3 Running 1
rook-ceph-crashcollector-k8snode01ps-7bf76fb968-6r6j7 1/1 Running 0
rook-ceph-crashcollector-k8snode02ps-6564549d8f-vxxrj 1/1 Running 0
rook-ceph-crashcollector-k8snode03ps-7b5c88744-j97dh 1/1 Running 0
rook-ceph-mds-myfs-a-69cbd955cc-zg6mc 1/1 Running 0
rook-ceph-mds-myfs-b-67b559dd77-56rlq 1/1 Running 0
rook-ceph-mgr-a-5b98749d8-sx8hl 1/1 Running 0
rook-ceph-mon-a-54dc9758cc-b74gc 1/1 Running 1
rook-ceph-operator-6df54ddc6b-9zgwf 1/1 Running 0
rook-ceph-osd-0-7cbb8c7b86-x27n5 1/1 Running 0
rook-ceph-osd-1-6976f8fb4b-s8v8b 1/1 Running 1463
rook-ceph-osd-2-74cf858d5c-94jtp 1/1 Running 0
rook-ceph-osd-3-566dbbb44d-tk6xq 1/1 Running 3172
rook-ceph-osd-prepare-devshic01ps-xpq5d 0/1 Completed 0
rook-ceph-osd-prepare-devshic02ps-6n95c 0/1 Completed 0
rook-ceph-tools-676879fd44-slsrn 1/1 Running 0
rook-discover-2xbf7 1/1 Running 0
rook-discover-9hxqw 1/1 Running 1
rook-discover-ldvks 1/1 Running 0
kubectl exec -it -n rook-ceph-tools-676879fd44-slsrn -- /bin/bash
HEALTH_WARN을 확인하였습니다.
ceph -s
cluster:
id: a7
health: HEALTH_WARN
**1 daemons have recently crashed**
services:
mon: 1 daemons, quorum a (age 4d)
mgr: a(active, since 2w)
mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
osd: 4 osds: 4 up (since 4d), 4 in (since 4d)
task status:
scrub status:
mds.myfs-a: idle
mds.myfs-b: idle
data:
pools: 4 pools, 97 pgs
objects: 7.93k objects, 19 GiB
usage: 42 GiB used, 3.9 TiB / 3.9 TiB avail
pgs: 97 active+clean
io:
client: 16 KiB/s rd, 117 KiB/s wr, 3 op/s rd, 3 op/s wr
ceph의 crash 모듈이 daemon crashdumps 정보를 수집하고 ceph cluster에 저장합니다.
# ceph crash ls
ID ENTITY NEW
2023-01-02T00:31:54.551063Z_bfde5f0b-3510-4c38-831f-e43833fc3f79 mon.a *
2023-01-20T04:20:42.360745Z_8e8f9804-2d7a-45f6-859f-cdd6bf4c7e28 mon.a *
ceph crash archive 2023-01-20T04:20:42.360745Z_8e8f9804-2d7a-45f6-859f-cdd6bf4c7e28
# ceph crash ls
ID ENTITY NEW
2023-01-02T00:31:54.551063Z_bfde5f0b-3510-4c38-831f-e43833fc3f79 mon.a *
2023-01-20T04:20:42.360745Z_8e8f9804-2d7a-45f6-859f-cdd6bf4c7e28 mon.a
# ceph health
HEALTH_OK
---
# ceph -s
cluster:
id: a7
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 4d)
mgr: a(active, since 2w)
mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
osd: 4 osds: 4 up (since 4d), 4 in (since 4d)
task status:
scrub status:
mds.myfs-a: idle
mds.myfs-b: idle
data:
pools: 4 pools, 97 pgs
objects: 7.93k objects, 19 GiB
usage: 42 GiB used, 3.9 TiB / 3.9 TiB avail
pgs: 97 active+clean
io:
client: 17 KiB/s rd, 137 KiB/s wr, 3 op/s rd, 8 op/s wr
yumserv님의 블로그/ceph 1 daemons have recently crashed
mb00g님의 github/1 daemons have recently crashed proxmox.md
docs.ceph.mgr/crash