Redis的集群(伸缩)
Redis集群提供了节点的扩容和收缩方案,在不影响集群对外服务的情况下,可以为集群添加节点进行扩容,也可以下线节点进行缩容。其中的原理可理解为槽和对应的数据在不同节点间移动。扩容集群
在Redis的集群(搭建)中搭建了6个节点,其中3个主节点分别维护自己负责的槽和数据,为了后续测试,填充若干测试数据。
$ for i in $(seq 1 70000); do redis-cli -p 6879 -c set key:migrate:test:${i} ${i}; done
$ redis-trib.rb info 127.0.0.1:6879
127.0.0.1:6879 (90cb860b...) -> 20304 keys | 5461 slots | 1 slaves.
127.0.0.1:6881 (fa2acce2...) -> 20201 keys | 5461 slots | 1 slaves.
127.0.0.1:6880 (3f121a67...) -> 20174 keys | 5462 slots | 1 slaves.
60679 keys in 3 masters.
3.70 keys per slot on average.
若加入1个节点实现集群扩容时,要通过相关命令把一部分槽和数据迁移给新节点,按照3个过程进行。
1. 准备新节点
准备6885和6886两个新端口(节点),运行在集群模式下。
2. 加入集群
127.0.0.1:6879> cluster meet 127.0.0.1 6885
127.0.0.1:6879> cluster meet 127.0.0.1 6886
127.0.0.1:6879> cluster nodes
e090ec47b8e66e415d69e9452c9c8e7deccd3624 127.0.0.1:6886 master - 0 1532846452277 0 connected
79c8cf3c11ede4962a2d690c2a2545b86c2f56ed 127.0.0.1:6885 master - 0 1532846453289 0 connected
...
90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-5460
新节点刚开始都是主节点状态,由于没有负责的槽,不能接受读写操作。对于新节点的后续操作一般有两种选择,一个是为它迁移槽和数据实现扩容,一个是作为其它主节点的从节点负责故障转移。
在正式环境建议使用redis-trib.rb add-node命令加入新节点,该命令会对新节点是否包含数据或已经加入其它集群进行检查。
$ redis-trib.rb add-node 127.0.0.1:6885 127.0.0.1:6879
>>> Adding node 127.0.0.1:6885 to cluster 127.0.0.1:6879
...
All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
All 16384 slots covered.
>>> Send CLUSTER MEET to node 127.0.0.1:6885 to make it join the cluster.
New node added correctly.
$ redis-trib.rb add-node 127.0.0.1:6886 127.0.0.1:6879
>>> Adding node 127.0.0.1:6886 to cluster 127.0.0.1:6879
...
M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885
slots: (0 slots) master
0 additional replica(s)
...
All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
All 16384 slots covered.
>>> Send CLUSTER MEET to node 127.0.0.1:6886 to make it join the cluster.
New node added correctly.
3. 迁移槽和数据
新节点加入集群后,需为其迁移槽和相关数据,迁移过程是集群扩容最核心的环节,按照3个步骤进行。
(1)槽迁移计划
加入6885节点后,原有节点负责的槽数量从6380变为4096个。
(2)迁移数据
数据迁移过程是逐个槽进行的,每个槽数据迁移的流程如下:
1)对目标节点发送cluster setslot {slot} importing {sourceNodeId}命令,让目标节点准备导入槽的数据。
2)对源节点发送cluster setslot {slot} migrating {targetNodeId}命令,让源节点准备迁出槽的数据。
3)源节点循环执行cluster getkeysinslot {slot} {count}命令,获取count个属于槽{slot}的键。
4)在源节点上执行migrate {targetIp} {targetPort} "" 0 {timeout} keys {keys...}命令,把获取的键通过Pipeline机制批量迁移到目标节点。
5)重复执行步骤3)和4),直到槽下所有的键值数据迁移到目标节点。
6)向集群内所有主节点发送cluster setslot {slot} node {targetNodeId}命令,通知槽分配给目标节点。
根据上面流程,手动使用命令把源节点6879负责的槽4096迁移到目标节点6885中。
1)目标节点准备导入槽4096数据。
127.0.0.1:6885> cluster setslot 4096 importing 90cb860b7f4ff516304c577bc1e514dc95ecd09b
确认槽4096导入状态开启。
127.0.0.1:6885> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 0 connected [4096- cluster setslot 4096 migrating 99ea0df1d9683affb1271a5092fc8b15b378adba
确认槽4096导出状态开启。
127.0.0.1:6879> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532850180009 0 connected
...
90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-5460
127.0.0.1:6879>
3)批量获取槽4096对应的键,这里获取4个处于该槽的键。
127.0.0.1:6879> cluster getkeysinslot 4096 4
1) "key:migrate:test:13752"
2) "key:migrate:test:16020"
3) "key:migrate:test:20791"
4) "key:migrate:test:5512"
确认这4个键存在于源节点,不在目标节点上。
127.0.0.1:6879> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
1) "13752"
2) "16020"
3) "20791"
4) "5512"
$ redis-cli -p 6885 -c
127.0.0.1:6885> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
-> Redirected to slot located at 127.0.0.1:6879
1) "13752"
2) "16020"
3) "20791"
4) "5512"
批量迁移这4个键。
127.0.0.1:6879> migrate 127.0.0.1 6885 "" 0 5000 keys key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
再次查看这4个键,已不再源节点。
127.0.0.1:6879> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
(error) ASK 4096 127.0.0.1:6885
通知所有主节点槽4096指派给目标节点6885。
127.0.0.1:6879> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
127.0.0.1:6880> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
127.0.0.1:6881> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
127.0.0.1:6885> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
确认源节点6879不再负责槽4096,改为目标节点6885负责。
127.0.0.1:6879> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532851434584 9 connected 4096
...
90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-4095 4097-5460
实际迁移过程中会涉及大量槽,每个槽会有非常多的键,因此redis-trib.rb reshard提供了槽重分片功能,reshard命令简化了槽迁移的过程,剩下槽迁移使用redis-trib.rb完成。
$ redis-trib.rb reshard 127.0.0.1:6879
...
M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885
slots:4096 (1 slots) master
0 additional replica(s)
M: 558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886
slots: (0 slots) master
0 additional replica(s)
...
All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node>
Please enter all the source node> Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes> Source node #1:90cb860b7f4ff516304c577bc1e514dc95ecd09b
Source node #2:3f121a67fab0d74f0d31b69326259e687902e1b3
Source node #3:fa2acce219d088e2b33756dac2e85ca92936a8dd
Source node #4:done
Ready to move 4096 slots.
Source nodes:
M: 90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879
slots:0-4095,4097-5460 (5460 slots) master
1 additional replica(s)
M: 3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881
slots:10923-16383 (5461 slots) master
1 additional replica(s)
Destination node:
M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885
slots:4096 (1 slots) master
0 additional replica(s)
Resharding plan:
Moving slot 5461 from 3f121a67fab0d74f0d31b69326259e687902e1b3
...
Moving slot 11090 from fa2acce219d088e2b33756dac2e85ca92936a8dd
...
Moving slot 1364 from 90cb860b7f4ff516304c577bc1e514dc95ecd09b
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 5461 from 127.0.0.1:6880 to 127.0.0.1:6885: ..
...
Moving slot 12177 from 127.0.0.1:6881 to 127.0.0.1:6885: ....
...
Moving slot 1364 from 127.0.0.1:6879 to 127.0.0.1:6885: .....
查看节点和槽新的映射关系。
127.0.0.1:6879> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532852546583 9 connected 0-1364 4096 5461-6826 10923-12287
558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 master - 0 1532852550630 8 connected
fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881 master - 0 1532852548097 3 connected 12288-16383
3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532852549619 2 connected 6827-10922
...
90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 1365-4095 4097-5460
迁移后使用redis-trib.rb rebalance命令检查节点间槽的均衡性。
$ redis-trib.rb rebalance 127.0.0.1:6879
...
All 16384 slots covered.
*** No rebalancing needed! All nodes are within the 2.0% threshold.
(3)添加从节点
把节点6886作为6885的从节点,保证整个集群的高可用。
127.0.0.1:6886> cluster replicate 99ea0df1d9683affb1271a5092fc8b15b378adba
查看节点6886状态已成为6885的从节点,至此扩容完成。
127.0.0.1:6886> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532852938340 9 connected 0-1364 4096 5461-6826 10923-12287
558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 myself,slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 0 8 connected
...
收缩集群
收缩集群意味着从现有集群中安全下线部分节点。首先要确定下线节点是否有负责的槽,若有,需把槽迁移到其它节点,保证节点下线后整个集群槽和节点映射的完整性。当下线节点不再负责槽或本身是从节点时,就可以通知集群内其它节点忘记下线节点,当所有节点忘记该节点后可以正常关闭。下面按照2个过程进行。
(1)下线迁移槽
下线节点要把自己负责的槽迁移到其它节点,原理和节点扩容槽迁移过程一致。如把6881和6884节点下线,6881是主节点,负责槽(12288-16383),6884是它的从节点。下线6881节点之前,要把它负责的槽迁移到6879,6880和6885这3个节点。由于每次执行reshard命令只能有一个目标节点,因此要执行3次reshard命令,分别迁移1365,1365和1366个槽。
$ redis-trib.rb reshard 127.0.0.1:6879
...
How many slots do you want to move (from 1 to 16384)? 1365
What is the receiving node>
Please enter all the source node> Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes> Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd
Source node #2:done
Ready to move 1365 slots.
Source nodes:
M: fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881
slots:12288-16383 (4096 slots) master
1 additional replica(s)
Destination node:
M: 90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879
slots:1365-4095,4097-5460 (4095 slots) master
1 additional replica(s)
Resharding plan:
Moving slot 12288 from fa2acce219d088e2b33756dac2e85ca92936a8dd
Moving slot 12289 from fa2acce219d088e2b33756dac2e85ca92936a8dd
Do you want to proceed with the proposed reshard plan (yes/no)? yes
...
Moving slot 13651 from 127.0.0.1:6881 to 127.0.0.1:6879: ..
Moving slot 13652 from 127.0.0.1:6881 to 127.0.0.1:6879: ...
槽迁移完成后,6879节点接管了6881节点的1365个槽12288-13652。
继续把1365个,和1366个槽迁移到6880节点,和6885节点。
$ redis-trib.rb reshard 127.0.0.1:6879
...
How many slots do you want to move (from 1 to 16384)? 1365
What is the receiving node>
Please enter all the source node> Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes> Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd
Source node #2:done
...
Do you want to proceed with the proposed reshard plan (yes/no)? yes
...
$ redis-trib.rb reshard 127.0.0.1:6879
...
How many slots do you want to move (from 1 to 16384)? 1366
What is the receiving node>
Please enter all the source node> Type 'all' to use all the nodes as source nodes for the hash slots.
Type 'done' once you entered all the source nodes> Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd
Source node #2:done
...
Do you want to proceed with the proposed reshard plan (yes/no)? yes
...
到此为止,6881节点所有的槽全部迁出完成,集群状态如下:
127.0.0.1:6885> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 12 connected 0-1364 4096 5461-6826 10923-12287 15018-16383
...
fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881 master - 0 1532862908561 3 connected
904db05d81825413702f7eac960cd2f656b217f7 127.0.0.1:6884 slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 1532862911596 12 connected
90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532862913617 10 connected 1365-4095 4097-5460 12288-13652
3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532862907549 11 connected 6827-10922 13653-15017
(2)忘记节点
当下线主节点具有从节点时,需要把该从节点指向到其它主节点。对于主从节点都下线的情况,要先下线从节点再下线主节点,防止不必要的切换。对于6881和6884节点下线操作,命令如下:
$ redis-trib.rb del-node 127.0.0.1:6879 904db05d81825413702f7eac960cd2f656b217f7
>>> Removing node 904db05d81825413702f7eac960cd2f656b217f7 from cluster 127.0.0.1:6879
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
$ redis-trib.rb del-node 127.0.0.1:6879 fa2acce219d088e2b33756dac2e85ca92936a8dd
...
节点下线后,集群最终的状态。
127.0.0.1:6885> cluster nodes
99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 12 connected 0-1364 4096 5461-6826 10923-12287 15018-16383
558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 1532863551984 12 connected
90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532863552997 10 connected 1365-4095 4097-5460 12288-13652
c88a8bbe719e337e9015aa84aab40db06878b728 127.0.0.1:6882 slave 90cb860b7f4ff516304c577bc1e514dc95ecd09b 0 1532863555528 10 connected
3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532863556034 11 connected 6827-10922 13653-15017
200da7d61d40c384a3e55b74434bf229333a5fe8 127.0.0.1:6883 slave 3f121a67fab0d74f0d31b69326259e687902e1b3 0 1532863555023 11 connected
若感兴趣可关注订阅号”数据库最佳实践”(DBBestPractice).
页:
[1]