ETCD 백업 및 복구 테스트

Nam_JU·2025년 10월 9일
0

k8s

목록 보기
14/14
  • etcd 클러스터가 깨졌을경우 해당 부분을 복구하기 위한 백업 설정

1. ETCD CTL 명령어 전에 알아야 할 정보

항목설명확인 명령어
etcd 엔드포인트 주소 (IP:Port)etcd가 클러스터 통신용으로 열고 있는 포트 주소cat /etc/etcd/etcd.conf 또는 `ps -ef
인증서 경로 (CA / Admin cert / Admin key)TLS 인증에 필요한 pem 파일들/etc/ssl/etcd/ssl/ 또는 /etc/kubernetes/pki/etcd/
etcdctl API 버전etcdctl이 사용하는 API 버전ETCDCTL_API=3 (항상 v3로 설정)
etcd 버전설치된 etcd 버전 (명령어 옵션 호환성 확인용)etcdctl version 또는 etcd --version

2. ETCD 확인

  • etcd 서비스 확인
root@thk-master-1:~# systemctl status etcd.service 
● etcd.service - etcd
     Loaded: loaded (/etc/systemd/system/etcd.service; enabled; preset: enabled)
     Active: active (running) since Fri 2025-10-03 06:09:13 UTC; 4 days ago
   Main PID: 9085 (etcd)
      Tasks: 12 (limit: 19147)
     Memory: 667.9M (peak: 779.4M)
        CPU: 5h 42min 3.431s
     CGroup: /system.slice/etcd.service
             └─9085 /usr/local/bin/etcd
  • etcd ps 확인
root@thk-master-1:~# ps -ef | grep etcd
root        9085       1  4 Oct03 ?        05:42:03 /usr/local/bin/etcd
root       16125   15986  7 Oct03 ?        08:25:58 kube-apiserver --advertise-address=172.10.10.200 --allow-privileged=true --anonymous-auth=True --apiserver-count=3 --authorization-mode=Node,RBAC --bind-address=:: --client-ca-file=/etc/kubernetes/ssl/ca.crt --default-not-ready-toleration-seconds=300 --default-unreachable-toleration-seconds=300 --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=False --enable-bootstrap-token-auth=true --endpoint-reconciler-type=lease --etcd-cafile=/etc/ssl/etcd/ssl/ca.pem --etcd-certfile=/etc/ssl/etcd/ssl/node-thk-master-1.pem --etcd-compaction-interval=5m0s --etcd-keyfile=/etc/ssl/etcd/ssl/node-thk-master-1-key.pem --etcd-servers=https://172.10.10.200:2379,https://172.10.10.123:2379,https://172.10.10.202:2379 --event-ttl=1h0m0s --kubelet-client-certificate=/etc/kubernetes/ssl/apiserver-kubelet-client.crt 
  • etcd 실행 파일 확인
root@thk-master-1:~# cat /etc/systemd/system/etcd.service
[Unit]
Description=etcd
After=network.target

[Service]
Type=notify
User=root
EnvironmentFile=/etc/etcd.env
ExecStart=/usr/local/bin/etcd
NotifyAccess=all
Restart=always
RestartSec=10s
LimitNOFILE=40000

[Install]
WantedBy=multi-user.target
  • etcd 설정 파일 확인
root@thk-master-1:~#  cat /etc/etcd.env
# Environment file for etcd 3.5.16
ETCD_DATA_DIR=/var/lib/etcd
ETCD_ADVERTISE_CLIENT_URLS=https://172.10.10.200:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.10.10.200:2380
ETCD_INITIAL_CLUSTER_STATE=existing
ETCD_METRICS=basic
ETCD_LISTEN_CLIENT_URLS=https://172.10.10.200:2379,https://127.0.0.1:2379
ETCD_ELECTION_TIMEOUT=5000
ETCD_HEARTBEAT_INTERVAL=250
ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
ETCD_LISTEN_PEER_URLS=https://172.10.10.200:2380
ETCD_NAME=thk-master-1
ETCD_PROXY=off
ETCD_INITIAL_CLUSTER=thk-master-1=https://172.10.10.200:2380,thk-master-2=https://172.10.10.123:2380,thk-master-3=https://172.10.10.202:2380
ETCD_AUTO_COMPACTION_RETENTION=8
ETCD_SNAPSHOT_COUNT=100000
# Flannel need etcd v2 API
ETCD_ENABLE_V2=true

# TLS settings
ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-thk-master-1.pem
ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-thk-master-1-key.pem
ETCD_CLIENT_CERT_AUTH=true

ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-thk-master-1.pem
ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-thk-master-1-key.pem
ETCD_PEER_CLIENT_CERT_AUTH=True

# CLI settings
ETCDCTL_ENDPOINTS=https://127.0.0.1:2379
ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem
ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-thk-master-1.pem

# ETCD 3.5.x issue
# https://groups.google.com/a/kubernetes.io/g/dev/c/B7gJs88XtQc/m/rSgNOzV2BwAJ?utm_medium=email&utm_source=footer
ETCD_EXPERIMENTAL_INITIAL_CORRUPT_CHECK=True

ETCD_EXPERIMENTAL_WATCH_PROGRESS_NOTIFY_INTERVAL=5s
  • 백업 스냅샷 확인
root@thk-master-1:~# cd /var/backups/
root@thk-master-1:/var/backups# ll
total 772
drwxr-xr-x  3 root root   4096 Oct  4 00:00 ./
drwxr-xr-x 13 root root   4096 Oct  3 03:45 ../
-rw-r--r--  1 root root  40960 Oct  4 00:00 alternatives.tar.0
-rw-r--r--  1 root root  37736 Oct  3 05:37 apt.extended_states.0
-rw-r--r--  1 root root      0 Oct  4 00:00 dpkg.arch.0
-rw-r--r--  1 root root   1518 Jun 26 12:55 dpkg.diversions.0
-rw-r--r--  1 root root    100 Jun 26 12:52 dpkg.statoverride.0
-rw-r--r--  1 root root 685461 Oct  3 05:37 dpkg.status.0
drw-------  3 root root   4096 Oct  3 06:09 etcd-2025-10-03_06:09:10/
root@thk-master-1:/var/backups# cd etcd-2025-10-03_06\:09\:10/
root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# ls
member  snapshot.db

root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# cat snapshot.db 
��
  ���������
          }2���0���
                   ������������59�_�
� �D�            CY l �� %�alarmauth
                                   authRevisionauthRolesauthUsersclusterclusterVersion3.5.0keyleasemembers0Z�Z�Z165748cefbffad9d{"id":1609835445636541853,"peerURLs":["https://172.10.10.200:2380"],"name":"thk-master-1"}58c4e282fe1a3b29{"id":6396486423009704745,"peerURLs":["https://172.10.10.123:2380"],"name":"thk-master-2"}7c7b975a758bd6b9{"id":8969929497613424313,"peerURLs":["https://172.10.10.202:2380"],"name":"thk-master-3"}members_removedmeta0        [confState{"voters":[1609835445636541853,6396486423009704745,8969929497613424313],"auto_leave":false}consistent_index
                                                                                                       term
� �D�            CY l �� %�alarmauth
                                   authRevisionauthRolesauthUsersclusterclusterVersion3.5.0keyleasemembers0Z�Z�Z165748cefbffad9d{"id":1609835445636541853,"peerURLs":["https://172.10.10.200:2380"],"name":"thk-master-1"}58c4e282fe1a3b29{"id":6396486423009704745,"peerURLs":["https://172.10.10.123:2380"],"name":"thk-master-2"}7c7b975a758bd6b9{"id":8969929497613424313,"peerURLs":["https://172.10.10.202:2380"],"name":"thk-master-3"}members_removedmeta0        [confState{"voters":[1609835445636541853,6396486423009704745,8969929497613424313],"auto_leave":false}consistent_indeterm
� �D�             6 I ^ u � alarmauth
��@��\��8zU9mH~��f�nroot@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# ^Csemembersmembers_removedmeta�@$�m

root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# etcdutl snapshot status /var/backups/etcd-2025-10-03_06\:09\:10/snapshot.db  -w table
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 84c36aa6 |        0 |          8 |      20 kB |
+----------+----------+------------+------------+
root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# 
  • etcd상태 확인
root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
> --cacert=/etc/ssl/etcd/ssl/ca
ca-key.pem  ca.pem      
> --cacert=/etc/ssl/etcd/ssl/ca.pem \
> --cert=/etc/ssl/etcd/ssl/admin-thk-master-1
admin-thk-master-1-key.pem  admin-thk-master-1.pem      
> --cert=/etc/ssl/etcd/ssl/admin-thk-master-1
admin-thk-master-1-key.pem  admin-thk-master-1.pem      
> --cert=/etc/ssl/etcd/ssl/admin-thk-master-1.pem \
> --key=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem \
> endpoint status -w table
root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/admin-thk-master-1.pem --key=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem endpoint status -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | 165748cefbffad9d |  3.5.16 |   59 MB |     false |      false |         5 |    3494579 |            3494579 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
root@thk-master-1:/var/backups/etcd-2025-10-03_06:09:10# 
ETCDCTL_API=3 etcdctl \
  --endpoints=https://172.10.10.200:2379,https://172.10.10.123:2379,https://172.10.10.202:2379 \
  --cacert=/etc/ssl/etcd/ssl/ca.pem \
  --cert=/etc/ssl/etcd/ssl/admin-thk-master-1.pem \
  --key=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem \
  endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.10.10.200:2379 | 165748cefbffad9d |  3.5.16 |   59 MB |     false |      false |         5 |    4334275 |            4334275 |        |
| https://172.10.10.123:2379 | 58c4e282fe1a3b29 |  3.5.16 |   59 MB |     false |      false |         5 |    4334275 |            4334275 |        |
| https://172.10.10.202:2379 | 7c7b975a758bd6b9 |  3.5.16 |   59 MB |      true |      false |         5 |    4334275 |            4334275 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
root@thk-master-1:~# 

etcd 리더 확인

root@thk-master-1:~# ETCDCTL_API=3 etcdctl   --endpoints=https://172.10.10.200:2379,https://172.10.10.123:2379,https://172.10.10.202:2379   --cacert=/etc/ssl/etcd/ssl/ca.pem   --cert=/etc/ssl/etcd/ssl/admin-thk-master-1.pem   --key=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem   endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.10.10.200:2379 | 165748cefbffad9d |  3.5.16 |   59 MB |     false |      false |         5 |    4335173 |            4335173 |        |
| https://172.10.10.123:2379 | 58c4e282fe1a3b29 |  3.5.16 |   59 MB |     false |      false |         5 |    4335173 |            4335173 |        |
| https://172.10.10.202:2379 | 7c7b975a758bd6b9 |  3.5.16 |   59 MB |      true |      false |         5 |    4335173 |            4335173 |        |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

etcd 클러스터 확인

root@thk-master-1:~# ETCDCTL_API=3 etcdctl \
  --endpoints=https://172.10.10.200:2379,https://172.10.10.123:2379,https://172.10.10.202:2379 \
  --cacert=/etc/ssl/etcd/ssl/ca.pem \
  --cert=/etc/ssl/etcd/ssl/admin-thk-master-1.pem \
  --key=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem \
  endpoint health -w table
+----------------------------+--------+------------+-------+
|          ENDPOINT          | HEALTH |    TOOK    | ERROR |
+----------------------------+--------+------------+-------+
| https://172.10.10.202:2379 |   true | 8.443392ms |       |
| https://172.10.10.123:2379 |   true | 8.440635ms |       |
| https://172.10.10.200:2379 |   true | 8.528177ms |       |
+----------------------------+--------+------------+-------+
root@thk-master-1:~# 


3. 백업 실행

“특정 마스터(leader - master1)에서 스냅샷 저장. 이때 endpoints는 로컬 127.0.0.1:2379 대신 클러스터 내부 IP(ex: 172.20.10.122) 지정하는 게 안전하며, 리더 노드가 아니더라도 etcd 클러스터의 모든 노드는 동일한 데이터 상태이므로 백업 가능하다.”

  • 보통 리더 노드(=현재 etcd에서 주도권을 가진 노드)에서 백업을 뜨면 가장 직관적이지만 꼭 리더를 복제할 필요는 없다. 리더가 잠시 바뀌어도 etcd는 내부적으로 데이터가 동기화되어 있기 때문이다.
  • 127.0.0.1은 “이 컴퓨터 자신”을 뜻하는 주소이지만 etcd의 인증서에는 대부분 127.0.0.1이 아니라 “실제 IP (예: 172.20.10.122)”만 들어있다. 그래서 127.0.0.1로 접속하면 “인증서 이름 불일치”로 에러가 날 수도 있으며 다른 서버에서 백업 스크립트를 실행하려면 127.0.0.1은 통하지 않는다. 그러기 때문에 실제 주소를 지정하는게 안전하다
root@thk-master-1:~# ETCDCTL_API=3 etcdctl --endpoints=https://172.10.10.200:2379 --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/
etc/ssl/etcd/ssl/node-thk-master-1.pem --key=/etc/ssl/etcd/ssl/node-thk-master-1-key.pem snapshot save /var/backups/etcd-sta
npshot-$(date +%Y%m%d%H%M).db
{"level":"info","ts":"2025-10-09T05:29:56.349161Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/var/backups/etcd-stanpshot-202510090529.db.part"}
{"level":"info","ts":"2025-10-09T05:29:56.355473Z","logger":"client","caller":"v3@v3.5.16/maintenance.go:212","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":"2025-10-09T05:29:56.355540Z","caller":"snapshot/v3_snapshot.go:73","msg":"fetching snapshot","endpoint":"https://172.10.10.200:2379"}
{"level":"info","ts":"2025-10-09T05:29:56.808742Z","logger":"client","caller":"v3@v3.5.16/maintenance.go:220","msg":"completed snapshot read; closing"}
{"level":"info","ts":"2025-10-09T05:29:57.503369Z","caller":"snapshot/v3_snapshot.go:88","msg":"fetched snapshot","endpoint":"https://172.10.10.200:2379","size":"59 MB","took":"1 second ago"}
{"level":"info","ts":"2025-10-09T05:29:57.503496Z","caller":"snapshot/v3_snapshot.go:97","msg":"saved","path":"/var/backups/etcd-stanpshot-202510090529.db"}
Snapshot saved at /var/backups/etcd-stanpshot-202510090529.db


정상 복제인지 확인

root@thk-master-1:~# etcdctl --endpoints=$ETCDCTL_ENDPOINTS snapshot status /var/backups/etcd-stanpshot-202510090529.db -w t
able
Deprecated: Use `etcdutl snapshot status` instead.

+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 10f4598d |  3999901 |       6693 |      59 MB |
+----------+----------+------------+------------+
항목쉽게 말하면
HASH파일 고유의 체크값백업이 손상되지 않았는지 확인용 번호 (파일의 지문)
REVISIONetcd 내부 데이터 버전백업 시점의 etcd ‘마지막 저장 번호’ — 높을수록 최신
TOTAL KEYS저장된 key(데이터) 개수쿠버네티스 리소스 개수(파드, 서비스, 시크릿 등)
TOTAL SIZE파일 크기etcd 데이터 전체 용량 (보통 수십 MB면 정상)

스냅샷 임시 복원

  • etcdctl snapshot restore 명령어는 백업된 snapshot.db 파일을 풀어서, etcd가 다시 동작할 수 있는 상태로 재구성
ETCDCTL_API=3 etcdctl snapshot restore /var/backups/etcd-stanpshot-202510090529.db --data-dir /tmp/etcd-from-snapshot

  • etcd가 실제로 동작할 때 쓰는 내부 데이터 구조로 바뀜
/tmp/etcd-from-snapshot/
└── member/
    ├── snap/    ← etcd 스냅샷(데이터 저장)
    └── wal/     ← WAL 로그(트랜잭션 로그)

임시 ETCD 실행

  • 백업한 etcd 파일로 단일 etcd를 띄워서 실행 (포트 23790으로 변경)
  • /tmp/etcd-from-snapshot 폴더는 스냅샷 파일(snapshot.db)을 풀어서 만든 완전히 별도의 데이터 디렉토리
"msg":"serving client traffic insecurely","address":"127.0.0.1:23790"
"msg":"skipped leadership transfer for single voting member cluster"
  • "127.0.0.1:23790" → 로컬에서만 열림 (외부 접근 불가)
  • "single voting member cluster" → 혼자서만 구성된 단일 노드

운영 환경에 영향이 없는 이유

  1. 다른 포트에서 동작 → 운영 etcd의 2379/2380 포트와 겹치지 않음
  2. 다른 데이터 디렉터리 → 원본 파일 읽기/쓰기 안 함
  3. 다른 클러스터 ID → raft cluster와 통신 불가
  4. 로컬(127.0.0.1)에서만 리스닝 → 외부 네트워크 연결 불가
etcd --data-dir /tmp/etcd-from-snapshot \
  --listen-client-urls http://127.0.0.1:23790 \
  --advertise-client-urls http://127.0.0.1:23790


전체 키 확인

root@thk-master-1:~# ETCDCTP_API=3 etcdctl --endpoints=http://127.0.0.1:23790 get "" --prefix --keys-only

ETCDCTL_API=3 etcdctl \
  --endpoints=http://127.0.0.1:23790 \
  get /registry/ --prefix --keys-only
  • etcd 백업 안에 실제 쿠버네티스 리소스(특히 Pod) 데이터가 들어 있는지 확인하는 실무적인 검증 작업
  • etcd는 쿠버네티스의모든 상태를 key-value 형태로 저장하는 데이터베이스. 이때 모든 데이터는 /registry/라는 경로 아래에 저장함
Kubernetes 리소스etcd key 경로 예시
Namespace/registry/namespaces/default
Pod/registry/pods/<namespace>/<pod이름>
Service/registry/services/specs/<namespace>/<service이름>
ConfigMap/registry/configmaps/<namespace>/<configmap이름>
Secret/registry/secrets/<namespace>/<secret이름>
root@thk-master-1:~# ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:23790   get /registry/ --prefix --keys-only > etcd-keys.txt
root@thk-master-1:~# ls
certs_new.sh  etcd-keys.txt  kube-manifests
root@thk-master-1:~# grep "/registry/pods/" etcd-keys.txt
/registry/pods/auth/keycloak-694fc9d848-2h2zl
/registry/pods/auth/keycloak-694fc9d848-xn6d9
/registry/pods/auth/mariadb-keycloak-0
/registry/pods/auth/oauth2-proxy-admin-76db4f4f7d-d4ll5
/registry/pods/auth/oauth2-proxy-admin-76db4f4f7d-lk54g
/registry/pods/auth/oauth2-proxy-user-55bbf8579b-5cxm7
/registry/pods/auth/oauth2-proxy-user-55bbf8579b-fbtjt
/registry/pods/auth/oauth2-redis-admin-0
/registry/pods/auth/oauth2-redis-admin-1
/registry/pods/auth/oauth2-redis-admin-2
/registry/pods/auth/oauth2-redis-admin-3
/registry/pods/auth/oauth2-redis-admin-4
/registry/pods/auth/oauth2-redis-admin-5
/registry/pods/auth/oauth2-redis-user-0
/registry/pods/auth/oauth2-redis-user-1
/registry/pods/auth/oauth2-redis-user-2
/registry/pods/auth/oauth2-redis-user-3
/registry/pods/auth/oauth2-redis-user-4
  • pod정보와 동일함
root@thk-deploy:~# k get po -n auth
NAME                                                    READY   STATUS      RESTARTS       AGE
keycloak-694fc9d848-2h2zl                               1/1     Running     0              5d2h
keycloak-694fc9d848-xn6d9                               1/1     Running     0              5d2h
mariadb-keycloak-0                                      1/1     Running     0              5d2h
oauth2-proxy-admin-76db4f4f7d-d4ll5                     1/1     Running     5 (5d2h ago)   5d2h
oauth2-proxy-admin-76db4f4f7d-lk54g                     1/1     Running     4 (5d2h ago)   5d2h
oauth2-proxy-user-55bbf8579b-5cxm7                      1/1     Running     5 (5d2h ago)   5d2h
oauth2-proxy-user-55bbf8579b-fbtjt                      1/1     Running     5 (5d2h ago)   5d2h
oauth2-redis-admin-0                                    1/1     Running     0              5d2h
oauth2-redis-admin-1                                    1/1     Running     0              5d2h
oauth2-redis-admin-2                                    1/1     Running     0              5d2h
oauth2-redis-admin-3                                    1/1     Running     0              5d2h
oauth2-redis-admin-4                                    1/1     Running     0              5d2h

4.백업자동화

스크립트

#!/bin/bash
# ===============================================
# etcd 자동 백업 스크립트
# -----------------------------------------------
# 1. etcdctl 환경 변수 설정 (TLS 인증 사용)
# 2. 스냅샷 파일 생성 (날짜 기반 이름)
# 3. 7일 초과 백업 파일 자동 삭제
# -----------------------------------------------
# 사용 위치: /usr/local/bin/etcd-backup.sh
# 실행 방법: sudo bash /usr/local/bin/etcd-backup.sh
# Cron 예시 : 매일 새벽 3시 자동 백업
# 0 3 * * * /usr/local/bin/etcd-backup.sh
# ===============================================

set -e  # 오류 발생 시 즉시 종료

# etcdctl 환경 변수 설정
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-thk-master-1.pem  # ★각 노드에 맞게 변경 필요
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem   # ★각 노드에 맞게 변경 필요
export ETCDCTL_ENDPOINTS="https://172.10.10.200:2379,[https://172.10.10.123:2379](https://172.10.10.123:2379/),[https://172.10.10.202:2379](https://172.10.10.202:2379/)"

# 백업 저장 경로 및 파일명 지정
BACKUP_DIR="/var/backups"
TIMESTAMP=$(date +%Y%m%d%H%M)
SNAPSHOT_FILE="$BACKUP_DIR/etcd-snapshot-$TIMESTAMP.db"

# 백업 실행
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting etcd snapshot backup..." >> /var/log/etcd_backup.log
etcdctl snapshot save "$SNAPSHOT_FILE" >> /var/log/etcd_backup.log 2>&1
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup completed: $SNAPSHOT_FILE" >> /var/log/etcd_backup.log

# 오래된 백업(7일 이상) 삭제
find "$BACKUP_DIR" -maxdepth 1 -type f -name "etcd-snapshot-*.db" -mtime +7 -exec rm -f {} \;
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Old backups deleted (older than 7 days)" >> /var/log/etcd_backup.log
  • 하나의 노드에만 적용시키고 싶을 경우 (굳이3개다 할필요 없으니까…)
set -e  # 오류 발생 시 즉시 종료

# etcdctl 환경 변수 설정
export PATH=/usr/local/bin:/usr/bin:/bin
export ETCDCTL_API=3
export ETCDCTL_CACERT=/etc/ssl/etcd/ssl/ca.pem
export ETCDCTL_CERT=/etc/ssl/etcd/ssl/admin-thk-master-1.pem  # ★각 노드에 맞게 변경 필요
export ETCDCTL_KEY=/etc/ssl/etcd/ssl/admin-thk-master-1-key.pem   # ★각 노드에 맞게 변경 필요
export ETCDCTL_ENDPOINTS="https://172.10.10.200:2379"

# 백업 저장 경로 및 파일명 지정
BACKUP_DIR="/var/backups"
TIMESTAMP=$(date +%Y%m%d%H%M)
SNAPSHOT_FILE="$BACKUP_DIR/etcd-snapshot-$TIMESTAMP.db"

# 백업 실행
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting etcd snapshot backup..." >> /var/log/etcd_backup.log
etcdctl snapshot save "$SNAPSHOT_FILE" >> /var/log/etcd_backup.log 2>&1
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Backup completed: $SNAPSHOT_FILE" >> /var/log/etcd_backup.log

# 오래된 백업(7일 이상) 삭제
find "$BACKUP_DIR" -maxdepth 1 -type f -name "etcd-snapshot-*.db" -mtime +7 -exec rm -f {} \;
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Old backups deleted (older than 7 days)" >> /var/log/etcd_backup.log

서비스 파일 생성

  • systemd는 “책임 분리 원칙”을 따른다
    • .service = 무엇을 실행할지 정의

    • .timer = 언제 실행할지 정의

      타이머가 실제 작업을 실행하려면 항상 서비스 유닛을 호출해야 한다

      [타이머]
      OnCalendar=*-*-* 00:00
       ↓
      [systemd가 자동 실행]
      systemctl start etcd-backup.service
       ↓
      [서비스]
      ExecStart=/usr/local/bin/etcd_backup.sh
  • service1 : /etc/systemd/system/etcd-backup.service
[Unit]
Description=etcd Snapshot Backup Service
After=network-online.target

[Service]
Type=oneshot
# etcd 백업 스크립트 경로 — 실제 위치에 맞게 수정
ExecStart=/usr/local/bin/etcd_backup.sh
# 로그가 어디로 가는지 명시 (선택) -- 로그가 꽉 찰수 있음 >>> 아래에 추가 설정 적용
StandardOutput=append:/var/log/etcd_backup.log
StandardError=append:/var/log/etcd_backup.log
  • service2: /etc/systemd/system/etcd-backup.timer
[Unit]
Description=Timer for etcd Snapshot Backup (runs at 00:00 and 12:00)

[Timer]
# 매일 00:00:00 과 12:00:00에 실행
OnCalendar=*-*-* 00:00:00
OnCalendar=*-*-* 12:00:00
# 시스템이 꺼져 있다가 켜져도 누락된 백업을 바로 수행
Persistent=true
# 서비스 파일 연결 (자동 인식되지만 명시적으로 작성 가능)
Unit=etcd-backup.service

[Install]
WantedBy=timers.target
  • 적용
root@thk-master-1:~# vi /etc/systemd/system/etcd-backup.service
root@thk-master-1:~# vi /etc/systemd/system/etcd-backup.timer
root@thk-master-1:~# systemctl daemon-reload
root@thk-master-1:~# systemctl enable etcd-backup.timer 
Created symlink /etc/systemd/system/timers.target.wants/etcd-backup.timer → /etc/systemd/system/etcd-backup.timer.
root@thk-master-1:~# systemctl start etcd-backup.timer 
root@thk-master-1:~# systemctl status etcd
etcd-backup.service  etcd-backup.timer    etcd.service         
root@thk-master-1:~# systemctl status etcd-backup.timer 
● etcd-backup.timer - Timer for etcd Snapshot Backup (runs at 00:00 and 12:00)
     Loaded: loaded (/etc/systemd/system/etcd-backup.timer; enabled; preset: enabled)
     Active: active (waiting) since Thu 2025-10-09 06:34:44 UTC; 14s ago
    Trigger: Thu 2025-10-09 12:00:00 UTC; 5h 25min left
   Triggers: ● etcd-backup.service

Oct 09 06:34:44 thk-master-1 systemd[1]: Started etcd-backup.timer - Timer for etcd Snapshot Backup (runs at>
lines 1-7/7 (END)

root@thk-master-1:~# systemctl daemon-reload
root@thk-master-1:~# systemctl start etcd-backup.service
root@thk-master-1:~# systemctl status etcd-backup.service
○ etcd-backup.service - etcd Snapshot Backup Service
     Loaded: loaded (/etc/systemd/system/etcd-backup.service; static)
     Active: inactive (dead) since Thu 2025-10-09 06:45:23 UTC; 3s ago
TriggeredBy: ● etcd-backup.timer
    Process: 3450601 ExecStart=/usr/local/bin/etcd_backup.sh (code=exited, status=0/SUCCESS)
   Main PID: 3450601 (code=exited, status=0/SUCCESS)
        CPU: 418ms

Oct 09 06:45:22 thk-master-1 systemd[1]: Starting etcd-backup.service - etcd Snapshot Backup Service...
Oct 09 06:45:23 thk-master-1 systemd[1]: etcd-backup.service: Deactivated successfully.
Oct 09 06:45:23 thk-master-1 systemd[1]: Finished etcd-backup.service - etcd Snapshot Backup Service.
  • 실행 실패시 자세한 에러 로그 확인 : cat /var/log/etcd_backup.log | tail -n 50

5. 백업 로그 삭제 주기 설정

  • /var/log/etcd_backup.log 로그 파일이 무한히 커지는 걸 방지하기 위하여 로테이트 정책을 설정
    /etc/logrotate.d/ 디렉토리는 “각 로그 파일의 회전(주기·보관·압축 등 정책)”을 정의하는 설정들이 모여 있는 곳이다
    - 최근 N일치 로그만 유지 (예: 7일)
    - 오래된 로그는 자동 삭제
    - systemd timer 로 백업이 실행될 때마다 로그 append 하더라도 문제없게 관리
root@thk-master-1:/etc/logrotate.d# pwd
/etc/logrotate.d
root@thk-master-1:/etc/logrotate.d# ll
total 56
drwxr-xr-x   2 root root 4096 Jun 26 12:54 ./
drwxr-xr-x 111 root root 4096 Oct  5 05:24 ../
-rw-r--r--   1 root root  120 Feb  5  2024 alternatives
-rw-r--r--   1 root root  126 Apr 22  2022 apport
-rw-r--r--   1 root root  173 Mar 22  2024 apt
-rw-r--r--   1 root root   91 Jan  4  2024 bootlog
-rw-r--r--   1 root root  130 Oct 14  2019 btmp
-rw-r--r--   1 root root  144 May 19 20:00 cloud-init
-rw-r--r--   1 root root  112 Feb  5  2024 dpkg
-rw-r--r--   1 root root  248 Mar 22  2024 rsyslog
-rw-r--r--   1 root root  270 Apr  2  2024 ubuntu-pro-client
-rw-r--r--   1 root root  209 May 16  2023 ufw
-rw-r--r--   1 root root  235 Feb 12  2024 unattended-upgrades
-rw-r--r--   1 root root  145 Oct 14  2019 wtmp
  • vi /etc/logrotate.d/etcd_backup
/var/log/etcd_backup.log {
    daily                  # 매일 로그 회전
    rotate 7               # 최대 7개 파일 보관 (즉, 1주일치)
    compress               # 오래된 로그를 gzip으로 압축
    delaycompress          # 바로 전 로그는 압축하지 않음 (최근 로그 쉽게 열람 가능)
    missingok              # 파일이 없어도 에러 발생시키지 않음
    notifempty             # 비어 있는 로그는 회전하지 않음
    create 640 root root   # 새 로그 파일 권한
    postrotate
        systemctl reload etcd-backup.timer > /dev/null 2>&1 || true
    endscript
}
  • 디버깅 및 테스트
root@thk-master-1:/etc/logrotate.d# logrotate -fv /etc/logrotate.d/etcd_backup
reading config file /etc/logrotate.d/etcd_backup
acquired lock on state file /var/lib/logrotate/statusReading state from file: /var/lib/logrotate/status
Allocating hash table for state file, size 64 entries
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state
Creating new state

Handling 1 logs

rotating pattern: /var/log/etcd_backup.log  forced from command line (7 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/etcd_backup.log
error: skipping "/var/log/etcd_backup.log" because parent directory has insecure permissions (It's world writable or writable by group which is not "root") Set "su" directive in config file to tell logrotate which user/group should be used for rotation.
Creating new state
회전 시점남는 파일 목록설명
1일차etcd_backup.log, etcd_backup.log.1새로운 로그 생성
2일차.log, .1, .2.gz이전 로그 압축
3일차.log, .1, .2.gz, .3.gz
.........
7일차.log, .1, .2.gz, .3.gz, .4.gz, .5.gz, .6.gz, .7.gz7개 유지
8일차.log, .1, .2.gz, .3.gz, .4.gz, .5.gz, .6.gz, .7.gz.7.gz 삭제됨오래된 파일 자동 삭제



profile
개발기록

0개의 댓글