Redis Cluster failover

์žญ์žญ์ดยท2021๋…„ 6์›” 29์ผ
1

redis

๋ชฉ๋ก ๋ณด๊ธฐ
6/6
post-thumbnail

Redis Cluster failover

๐ŸŽ ๋ชฉ์ฐจ

๊ฐœ์š”

Redis Cluster์—์„œ ์žฅ์• ๊ฐ€ ๋‚˜๋ฉด ์–ด๋–ป๊ฒŒ ๋ ๊นŒ?
๋ณธ ๋ฌธ์„œ์—์„œ๋Š” ๋‘๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๋‹ค๋ฃฌ๋‹ค.

  • 3๊ฐœ์˜ master ์ค‘ 1๊ฐœ์˜ master fail
  • 3๊ฐœ์˜ master, 3๊ฐœ์˜ slave ์ค‘ 1๊ฐœ์˜ master fail

1. Fail (master: 3)

slave๊ฐ€ ์—†๋Š” master์— ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ ์•Œ์•„๋ณด์ž.
7001, 7002, 7003ํฌํŠธ์—๋Š” ๋ชจ๋‘ redis master๊ฐ€ ์„œ๋น„์Šค๋˜๊ณ , 7002๋…ธ๋“œ๋ฅผ ์ •์ง€ํ•ด๋ณธ๋‹ค.

1.1. node down

$ docker exec -it redis-master-2 bash
$ redis-cli -c -p 7002
127.0.0.1:7002> debug segfault
Error: Connection reset by peer
(0.86s)
  • debug segfault
    • node๋ฅผ ์ •์ง€ํ•œ๋‹ค.

1.2. check

docker log๋ฅผ ์‚ดํŽด๋ณธ๋‹ค.
cluster state๊ฐ€ fail๋กœ ๋ฐ”๋€ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

docker logs -f redis-master-2

1:M 21 Jun 2021 07:28:21.434 * FAIL message received from 3c349984f0bb61490c170ab68f2617a35d9581d6 about 79816979a6dd4b226e476121dd385ed6c25e5151
1:M 21 Jun 2021 07:28:21.435 # Cluster state changed: fail

์ด์ œ 7001๋ฒˆ ๋…ธ๋“œ์— ์ ‘์†ํ•˜์—ฌ cluster ์ •๋ณด๋ฅผ ์‚ดํŽด๋ณด์ž

$ docker exec -it redis-master-1 bash
$ redis-cli -c -p 7001

127.0.0.1:7001> set a b
(error) CLUSTERDOWN The cluster is down

127.0.0.1:7001> cluster info
cluster_state:fail
cluster_slots_assigned:16384
cluster_slots_ok:10922
cluster_slots_pfail:0
cluster_slots_fail:5462
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:3
cluster_my_epoch:1
cluster_stats_messages_ping_sent:435
cluster_stats_messages_pong_sent:461
cluster_stats_messages_sent:896
cluster_stats_messages_ping_received:459
cluster_stats_messages_pong_received:434
cluster_stats_messages_meet_received:2
cluster_stats_messages_fail_received:1
cluster_stats_messages_received:896

127.0.0.1:7001> cluster nodes
027a002ecc012b61a5997f151ad01bccbb65d1c0 127.0.0.1:7001@17001 myself,master - 0 1624260599000 1 connected 0-5460
3c349984f0bb61490c170ab68f2617a35d9581d6 127.0.0.1:7003@17003 master - 0 1624260599580 3 connected 10923-16383
79816979a6dd4b226e476121dd385ed6c25e5151 127.0.0.1:7002@17002 master,fail - 1624260495388 1624260493352 2 disconnected 5461-10922
  • 127.0.0.1:7001> set a b
    • master๊ฐ€ 3๊ฐœ ๋ฏธ๋งŒ์ธ ์ƒํƒœ๋ผ ๋ฐ์ดํ„ฐ ์‚ฝ์ž…์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.
  • cluster_state:fail
    • cluster์ƒํƒœ๊ฐ€ fail๋กœ ์ด์šฉ์ด ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.
  • 127.0.0.1:7001> cluster nodes
    • ๋…ธ๋“œ๋“ค์˜ ์ƒํƒœ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•œ ๊ฒฐ๊ณผ 7002๋ฒˆ master๊ฐ€ disconnected๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

1.3 restart node

7002๋ฒˆ master๋ฅผ ์žฌ์‹œ์ž‘ํ•œ๋‹ค.

$ docker restart redis-master-2
redis-master-2

$ docker ps | grep redis-master-2
de5e52fb0428        redis:6.2.3         "docker-entrypoint.sโ€ฆ"   9 minutes ago       Up 8 seconds                            redis-master-2

1.4 check

docker log๋ฅผ ํ™•์ธํ•ด๋ณด์ž.

$ docker logs redis-master-2
1:M 21 Jun 2021 07:30:56.966 * Node configuration loaded, I'm 79816979a6dd4b226e476121dd385ed6c25e5151
  • Node configuration loaded
    • ๊ธฐ์กด์˜ ๊ตฌ์„ฑ์ด load๋˜์–ด์ง„๋‹ค.

๋…ธ๋“œ๋“ค์˜ ์ƒํƒœ์ •๋ณด๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด 3๊ฐœ์˜ master ๋ชจ๋‘ ์ •์ƒ์ ์œผ๋กœ ์ž‘๋™๋˜๊ณ  ์žˆ๋Š”๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

$ docker exec -it redis-master-1 bash
$ redis-cli -c -p 7001
127.0.0.1:7001> cluster nodes
027a002ecc012b61a5997f151ad01bccbb65d1c0 127.0.0.1:7001@17001 myself,master - 0 1624260765000 1 connected 0-5460
3c349984f0bb61490c170ab68f2617a35d9581d6 127.0.0.1:7003@17003 master - 0 1624260767224 3 connected 10923-16383
79816979a6dd4b226e476121dd385ed6c25e5151 127.0.0.1:7002@17002 master - 0 1624260766218 2 connected 5461-10922

์ด์ œ cluster๊ฐ€ ์ •์ƒํ™”๋œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

127.0.0.1:7001> set a b
-> Redirected to slot [15495] located at 127.0.0.1:7003
OK
127.0.0.1:7003> get a
"b"
127.0.0.1:7003> set b c
-> Redirected to slot [3300] located at 127.0.0.1:7001
OK
127.0.0.1:7001> set c d
-> Redirected to slot [7365] located at 127.0.0.1:7002
OK
127.0.0.1:7002> get c
"d"

2. Fail (master: 3, slave: 3)

redis cluster์—์„œ๋Š” Master-Slave๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด cluster์˜ ์˜์†์„ฑ์„ ๋ณด์žฅํ•ด์ค€๋‹ค.
slave๋Š” master์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณต์ œํ•˜๊ณ  master ์žฅ์•  ๋ฐœ์ƒ ์‹œ slave๊ฐ€ master๋กœ ์Šน๊ฒฉํ•˜์—ฌ cluster๋ฅผ ์œ ์ง€ํ•œ๋‹ค.
master-slave๊ตฌ์กฐ์—์„œ์˜ ์žฅ์•  ๋ฐœ์ƒ ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ์‚ดํŽด๋ณด์ž.

2.1. scenario

์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • master 3, slave 3์˜ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง„๋‹ค.
  • ์ด๋ฆ„์ฃผ์†Œํด๋Ÿฌ์Šคํ„ฐ
    node1192.168.56.100redis-master-1, redis-slave-3
    node2192.168.56.101redis-master-2, redis-slave-1
    node3192.168.56.102redis-master-3, redis-slave-2
  • redis-master-2๊ฐ€ fail์ด ๋‚˜๋Š” ์ƒํ™ฉ์ด๋‹ค.

2.2. node down

192.168.56.101์— ์ ‘์†ํ•˜์—ฌ container๋ฅผ ์ค‘์ง€ํ•œ๋‹ค.

$ docker stop redis-master-2

2.3. check

192.168.56.102์— ์ ‘์†ํ•˜์—ฌ slave์˜ ๋กœ๊ทธ๋ฅผ ํ™•์ธํ•œ๋‹ค.(101๋ฒˆ master์˜ slave๋Š” 102๋ฒˆ ์„œ๋ฒ„์— ์žˆ๋‹ค.)

$ docker logs -f redis-slave-2
1:S 23 Jun 2021 07:49:33.581 # Connection with master lost.
1:S 23 Jun 2021 07:49:33.582 * Caching the disconnected master state.
1:S 23 Jun 2021 07:49:33.582 * Reconnecting to MASTER 192.168.56.101:7001
1:S 23 Jun 2021 07:49:33.582 * MASTER <-> REPLICA sync started
1:S 23 Jun 2021 07:49:33.583 # Error condition on socket for SYNC: Connection refused
...
1:S 23 Jun 2021 07:49:38.756 # Failover election won: I'm the new master.
1:M 23 Jun 2021 07:49:38.757 # Cluster state changed: ok
  • Failover election won: I'm the new master.
    • redis-master-2๊ฐ€ fail์ด ๋‚˜๊ณ , redis-slave-2๊ฐ€ master๋กœ ์Šน๊ฒฉํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

cluster ์ƒํƒœ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•ด๋ณด์ž.

$ docker exec -it redis-master-1 bash
$ redis-cli -c -p 7001
127.0.0.1:7001> cluster nodes
30a99d668af3ddda16e2a9d3ee97fb53a5ebfa6d 192.168.56.100:7002@17002 myself,slave 22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 0 1624434594000 3 connected
22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 192.168.56.102:7001@17001 master - 0 1624434596533 3 connected 10923-16383
094af2ab1db0d147d7f475f3954429ae7d18dee0 192.168.56.102:7002@17002 master - 0 1624434595524 7 connected 5461-10922
c952f5ef4783b5c19129bc630b88e8e3bf602622 192.168.56.101:7001@17001 master,fail - 1624434574580 1624434573000 2 disconnected
5b56d458a0d8e64d5f40ece0a99713dcb9c70723 192.168.56.100:7001@17001 master - 0 1624434595524 1 connected 0-5460
e0d9ee09b593889cd093d217a16a0b535e6abef2 192.168.56.101:7002@17002 slave 5b56d458a0d8e64d5f40ece0a99713dcb9c70723 0 1624434595626 1 connected
  • 094af2ab1db0d147d7f475f3954429ae7d18dee0 192.168.56.102:7002@17002 master
    • ๊ธฐ์กด์— slave์˜€๋˜ 192.168.56.102:7002๊ฐ€ master๋กœ ์Šน๊ฒฉํ•œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
  • c952f5ef4783b5c19129bc630b88e8e3bf602622 192.168.56.101:7001@17001 master,fail
    • ์žฅ์• ๊ฐ€ ๋ฐœ์ƒํ•œ master๋…ธ๋“œ๋Š” fail์ƒํƒœ์ด๊ณ , disconnected ๋˜์–ด์žˆ๋‹ค.

2.4. master node restart

master "์˜€๋˜" redis-master-2๋…ธ๋“œ๋ฅผ ์žฌ๊ธฐ๋™ํ•œ๋‹ค.

$ docker restart redis-master-2

2.5. check

์žฌ๊ธฐ๋™์„ ํ•ด๋„ redis-master-2๋Š” master๋กœ ์Šน๊ฒฉ๋˜์ง€ ์•Š๋Š”๋‹ค.
redis-slave-2๊ฐ€ master๋ฅผ ์œ ์ง€ํ•œ๋‹ค.

$ docker exec -it redis-master-1 bash
$ redis-cli -c -p 7001
127.0.0.1:7001> cluster nodes
30a99d668af3ddda16e2a9d3ee97fb53a5ebfa6d 192.168.56.100:7002@17002 myself,slave 22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 0 1624434646000 3 connected
22110f4ea10f11a8cb6ea283dedfc27c6ffabc07 192.168.56.102:7001@17001 master - 0 1624434646576 3 connected 10923-16383
094af2ab1db0d147d7f475f3954429ae7d18dee0 192.168.56.102:7002@17002 master - 0 1624434646071 7 connected 5461-10922
c952f5ef4783b5c19129bc630b88e8e3bf602622 192.168.56.101:7001@17001 slave 094af2ab1db0d147d7f475f3954429ae7d18dee0 0 1624434647079 7 connected
5b56d458a0d8e64d5f40ece0a99713dcb9c70723 192.168.56.100:7001@17001 master - 0 1624434646575 1 connected 0-5460
e0d9ee09b593889cd093d217a16a0b535e6abef2 192.168.56.101:7002@17002 slave 5b56d458a0d8e64d5f40ece0a99713dcb9c70723 0 1624434646576 1 connected

0๊ฐœ์˜ ๋Œ“๊ธ€