Cloud Native PostgreSQL #2

kimchigood·2022년 6월 23일

PostgreSQL kubernetes operator

DOIK Study

목록 보기

5/5

이번 포스팅은 Cloud Native PostgreSQL #1 에 이어서 Cloud Native PostgreSQL의 장애테스트, 스케일링(치과 스케일링 아님), 롤링업데이트를 진행하겠다.

1. 장애테스트 셋업

먼저, 장애테스트를 위한 준비를 해주자.


# ID, Name 으로 구성된 테이블을 생성한다.
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# cat ~/DOIK/5/query.sql
───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: /root/DOIK/5/query.sql
───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ CREATE DATABASE test;
   2   │ \c test;
   3   │ CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 TEXT NOT NULL);
   4   │ INSERT INTO t1 VALUES (1, 'Luis');

# SQL 파일 query 실행 (postgreSQL client 파드를 통해서 생성)
kubectl cp ~/DOIK/5/query.sql myclient1:/tmp
kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -f /tmp/query.sql

# 확인
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl exec -it myclient1 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"
 count
-------
     1
(1 row)

2. 장애테스트 재현

실제 운영상황에서 Primary 파드가 죽었을 경우를 재현해서 테스트 해보자

# 프라이머리 파드 정보 확인
kubectl cnpg status mycluster

# [터미널1] 모니터링
watch kubectl get pod -l cnpg.io/cluster=mycluster

# [터미널2] 모니터링
while true; do kubectl exec -it myclient2 -- psql -U postgres -h mycluster-ro -p 5432 -d test -c "SELECT COUNT(*) FROM t1"; date;sleep 1; done

# [터미널3] test 데이터베이스에 다량의 데이터 INSERT
for ((i=301; i<=800; i++)); do kubectl exec -it myclient1 -- psql -U postgres -h mycluster-rw -p 5432 -d test -c "INSERT INTO t1 VALUES ($i, 'Luis$i');";echo; done


# [터미널4] 파드 삭제 >> INSERT 가 중간에 끊어지나요?
kubectl **delete** **pvc**/mycluster-1 **pod**/mycluster-1

(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl cnpg status mycluster
Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-2  57 MB          0/10000000   Primary           OK      BestEffort  1.15.1
mycluster-3  57 MB          0/10000000   Standby (async)   OK      BestEffort  1.15.1
mycluster-4  57 MB          0/10000000   Standby (async)   OK      BestEffort  1.15.1

# 파드 정보 확인
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl get pod -l cnpg.io/cluster=mycluster
NAME          READY   STATUS    RESTARTS   AGE
mycluster-2   1/1     Running   0          5h48m
mycluster-3   1/1     Running   0          5h48m
mycluster-4   1/1     Running   0          12m

장애 테스트 녹화영상

위 상황을 설명하자면,
1. mycluster-1: Primary, mycluster-2,3: Standby인 클러스터에 데이터 500개를 Insert 하고 있는 상태에서 Primary 파드 및 PVC가 삭제된 상황을 재현한 것이다.

Primary 파드가 삭제된 후 Operator는 mycluster-2를 Primary로 선출하고, mycluster-4를 새로 생성하여 Standby로 만든다.
Primary 파드가 삭제된 상환에서 8개 정도 insert가 되지 않았지만, Primary 파드 복구 이후 insert는 잘된 것을 확인할 수 있다.

이전 포스팅 Cloud Native PostgreSQL #1 에서 언급했듯이 PV의 Reclaim policy가 Delete 모드였으므로, PVC와 PV가 모두 삭제되었다.

(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# k get pvc
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mycluster-2   Bound    pvc-6e9a6d44-bee8-49cb-b619-7b14f3d9af3f   3Gi        RWO            local-path     6h9m
mycluster-3   Bound    pvc-403d00c6-eb92-4309-b9cf-8742342cb23e   3Gi        RWO            local-path     6h7m
mycluster-4   Bound    pvc-78215420-a1bf-4c47-9616-3813a9f76f2c   3Gi        RWO            local-path     30m
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# k get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                 STORAGECLASS   REASON   AGE
pvc-403d00c6-eb92-4309-b9cf-8742342cb23e   3Gi        RWO            Delete           Bound    default/mycluster-3   local-path              6h7m
pvc-6e9a6d44-bee8-49cb-b619-7b14f3d9af3f   3Gi        RWO            Delete           Bound    default/mycluster-2   local-path              6h8m
pvc-78215420-a1bf-4c47-9616-3813a9f76f2c   3Gi        RWO            Delete           Bound    default/mycluster-4   local-path              30m

물론 mycluster-4가 새로 생기고, 그 이전에 데이터는 Primary에서 Standby로 복제되고 있긴 했지만 조금 불안해 보인다. 멀티클러스터 사용이나 다른 방법으로 사용은 하겠지만, 나라면 불안해서 Reclaim policy를 Retain으로 줘서 PV가 삭제되는 현상은 막았을 것 같다.

3. Scale out 테스트

PostgreSQL을 운영 중에 Scale out이 필요할 수 있다. 매우 간단하다.

# 현재 인스턴스 조회 : 3개
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl get cluster mycluster
NAME        AGE     INSTANCES   READY   STATUS                     PRIMARY
mycluster   8m31s   3           3       Cluster in healthy state   mycluster-1

# Scale out!
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl patch cluster mycluster --type=merge -p '{"spec":{"instances":5}}' && kubectl get pod -l postgresql=mycluster -w
cluster.postgresql.cnpg.io/mycluster patched
NAME          READY   STATUS    RESTARTS   AGE
mycluster-1   1/1     Running   0          9m4s
mycluster-2   1/1     Running   0          8m42s
mycluster-3   1/1     Running   0          8m32s
mycluster-4   0/1     Pending   0          0s
mycluster-4   0/1     Pending   0          0s
mycluster-4   0/1     Init:0/1   0          0s
mycluster-4   0/1     Init:0/1   0          1s
mycluster-4   0/1     PodInitializing   0          2s
mycluster-4   0/1     Running           0          3s
mycluster-4   1/1     Running           0          3s
mycluster-4   1/1     Running           0          3s
mycluster-5   0/1     Pending           0          0s
mycluster-5   0/1     Pending           0          0s
mycluster-5   0/1     Init:0/1          0          0s
mycluster-5   0/1     Init:0/1          0          0s
mycluster-5   0/1     PodInitializing   0          1s
mycluster-5   0/1     Running           0          2s
mycluster-5   1/1     Running           0          2s
mycluster-5   1/1     Running           0          3s

mtcluster-4,5 가 생성된 것이 확인된다.

4. Rolling Update

PostgreSQL 버전 업그레이드 시 Rolling Update를 지원한다. Rolling Update는 Minor 버전까지만 지원되고 절차는 2가지 프로세스로 나누어진다.

Upgrading CloudNativePG operator is a two-step process:

upgrade the controller and the related Kubernetes resources
upgrade the instance manager running in every PostgreSQL pod

먼저 controller, kubernetes 리소스를 업데이트 한 후 pod를 업데이트 하는 방식이다. 자세한 내용은 공식 docs를 참조하자. docs

업그레이드 설정에 primaryUpdateStrategy가 있는데 2가지 옵션이 있다.

unsupervised(default값) : 자동업데이트
supervised : 수동업데이트 (관리자가 직접 kubectl cnpg promote/restart 명령으로 업데이트)

자, 그럼 실제로 실습을 해보자.

# 현재 이미지 버전 : 14.2
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl cnpg status mycluster | grep Image
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.2

# Primary Pod : mycluster-1
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl cnpg status mycluster 
Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-1  33 MB          0/9000060    Primary           OK      BestEffort  1.15.1
mycluster-2  33 MB          0/9000110    Standby (async)   OK      BestEffort  1.15.1
mycluster-3  33 MB          0/A000000    Standby (async)   OK      BestEffort  1.15.1
mycluster-4  33 MB          0/A000000    Standby (async)   OK      BestEffort  1.15.1
mycluster-5  33 MB          0/A000000    Standby (async)   OK      BestEffort  1.15.1

# Rolling Update
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl patch cluster mycluster --type=merge -p '{"spec":{"imageName":"ghcr.io/cloudnative-pg/postgresql:14.3"}}' && kubectl get pod -l postgresql=mycluster -w
cluster.postgresql.cnpg.io/mycluster patched

# 업그레이드 후 이미지 버전: 14.3
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl cnpg status mycluster | grep Image
PostgreSQL Image:   ghcr.io/cloudnative-pg/postgresql:14.3

# Primary Pod : mycluster-2
(🚴|kubernetes-admin@kubernetes:default) root@k8s-m:~# kubectl cnpg status mycluster 
Instances status
Name         Database Size  Current LSN  Replication role  Status  QoS         Manager Version
----         -------------  -----------  ----------------  ------  ---         ---------------
mycluster-1  33 MB          0/A004CA0    Standby (async)   OK      BestEffort  1.15.1
mycluster-2  33 MB          0/A004CA0    Primary           OK      BestEffort  1.15.1
mycluster-3  33 MB          0/A004CA0    Standby (async)   OK      BestEffort  1.15.1
mycluster-4  33 MB          0/A004CA0    Standby (async)   OK      BestEffort  1.15.1
mycluster-5  33 MB          0/A004CA0    Standby (async)   OK      BestEffort  1.15.1

자, Rolling Update 후 이미지 버전이 바뀌고 Primary 파드도 바뀌었다. Primary 파드가 재시작되는 시간이 있기 때문에 write transaction의 순단현상을 최소화 하기 위해 Primary 파드를 이미 업데이트 된 mycluster-2로 바뀌었다.

보통 Kubernetes에서 Rolling Update란 무중단 배포를 의미하는데, 이 상황은 read는 잘되지만 write transaction에 이슈가 있을 수 있다. 2개이상의 Primary 파드를 적용하는 건 아직 찾지 못했다.

공식 docs의 정의를 보면,

It defines a new Kubernetes resource called Cluster representing a PostgreSQL cluster made up of a _single primary and an optional number of replicas _that co-exist in a chosen Kubernetes namespace for High Availability and offloading of read-only queries.

single primary and an optional number of replicas 라고 나와있어서, 따로 방법이 없는 것 같기도 하다. Multi Cluster에서는 무중단 배포가 가능한지도 찾아봐야겠다.

이것으로 Cloud Native PostgreSQL 포스팅을 마친다.

kimchigood

Shout out to Kubernetes⎈

이전 포스트