心心失意 发表于 2018-11-2 09:03:03

Redis的集群(伸缩)

  Redis集群提供了节点的扩容和收缩方案,在不影响集群对外服务的情况下,可以为集群添加节点进行扩容,也可以下线节点进行缩容。其中的原理可理解为槽和对应的数据在不同节点间移动。
  扩容集群
  在Redis的集群(搭建)中搭建了6个节点,其中3个主节点分别维护自己负责的槽和数据,为了后续测试,填充若干测试数据。
  $ for i in $(seq 1 70000); do redis-cli -p 6879 -c set key:migrate:test:${i} ${i}; done
  $ redis-trib.rb info 127.0.0.1:6879
  127.0.0.1:6879 (90cb860b...) -> 20304 keys | 5461 slots | 1 slaves.
  127.0.0.1:6881 (fa2acce2...) -> 20201 keys | 5461 slots | 1 slaves.
  127.0.0.1:6880 (3f121a67...) -> 20174 keys | 5462 slots | 1 slaves.
   60679 keys in 3 masters.
  3.70 keys per slot on average.
  若加入1个节点实现集群扩容时,要通过相关命令把一部分槽和数据迁移给新节点,按照3个过程进行。
  1. 准备新节点
  准备6885和6886两个新端口(节点),运行在集群模式下。
  2. 加入集群
  127.0.0.1:6879> cluster meet 127.0.0.1 6885
  127.0.0.1:6879> cluster meet 127.0.0.1 6886
  127.0.0.1:6879> cluster nodes
  e090ec47b8e66e415d69e9452c9c8e7deccd3624 127.0.0.1:6886 master - 0 1532846452277 0 connected
  79c8cf3c11ede4962a2d690c2a2545b86c2f56ed 127.0.0.1:6885 master - 0 1532846453289 0 connected
  ...
  90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-5460
  新节点刚开始都是主节点状态,由于没有负责的槽,不能接受读写操作。对于新节点的后续操作一般有两种选择,一个是为它迁移槽和数据实现扩容,一个是作为其它主节点的从节点负责故障转移。
  在正式环境建议使用redis-trib.rb add-node命令加入新节点,该命令会对新节点是否包含数据或已经加入其它集群进行检查。
  $ redis-trib.rb add-node 127.0.0.1:6885 127.0.0.1:6879
  >>> Adding node 127.0.0.1:6885 to cluster 127.0.0.1:6879
  ...
   All nodes agree about slots configuration.
  >>> Check for open slots...
  >>> Check slots coverage...
   All 16384 slots covered.
  >>> Send CLUSTER MEET to node 127.0.0.1:6885 to make it join the cluster.
   New node added correctly.
  $ redis-trib.rb add-node 127.0.0.1:6886 127.0.0.1:6879
  >>> Adding node 127.0.0.1:6886 to cluster 127.0.0.1:6879
  ...
  M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885
  slots: (0 slots) master
  0 additional replica(s)
  ...
   All nodes agree about slots configuration.
  >>> Check for open slots...
  >>> Check slots coverage...
   All 16384 slots covered.
  >>> Send CLUSTER MEET to node 127.0.0.1:6886 to make it join the cluster.
   New node added correctly.
  3. 迁移槽和数据
  新节点加入集群后,需为其迁移槽和相关数据,迁移过程是集群扩容最核心的环节,按照3个步骤进行。
  (1)槽迁移计划
  加入6885节点后,原有节点负责的槽数量从6380变为4096个。
  (2)迁移数据
  数据迁移过程是逐个槽进行的,每个槽数据迁移的流程如下:
  1)对目标节点发送cluster setslot {slot} importing {sourceNodeId}命令,让目标节点准备导入槽的数据。
  2)对源节点发送cluster setslot {slot} migrating {targetNodeId}命令,让源节点准备迁出槽的数据。
  3)源节点循环执行cluster getkeysinslot {slot} {count}命令,获取count个属于槽{slot}的键。
  4)在源节点上执行migrate {targetIp} {targetPort} "" 0 {timeout} keys {keys...}命令,把获取的键通过Pipeline机制批量迁移到目标节点。
  5)重复执行步骤3)和4),直到槽下所有的键值数据迁移到目标节点。
  6)向集群内所有主节点发送cluster setslot {slot} node {targetNodeId}命令,通知槽分配给目标节点。
  根据上面流程,手动使用命令把源节点6879负责的槽4096迁移到目标节点6885中。
  1)目标节点准备导入槽4096数据。
  127.0.0.1:6885> cluster setslot 4096 importing 90cb860b7f4ff516304c577bc1e514dc95ecd09b
  确认槽4096导入状态开启。
  127.0.0.1:6885> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 0 connected [4096- cluster setslot 4096 migrating 99ea0df1d9683affb1271a5092fc8b15b378adba
  确认槽4096导出状态开启。
  127.0.0.1:6879> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532850180009 0 connected
  ...
  90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-5460
  127.0.0.1:6879>
  3)批量获取槽4096对应的键,这里获取4个处于该槽的键。
  127.0.0.1:6879> cluster getkeysinslot 4096 4
  1) "key:migrate:test:13752"
  2) "key:migrate:test:16020"
  3) "key:migrate:test:20791"
  4) "key:migrate:test:5512"
  确认这4个键存在于源节点,不在目标节点上。
  127.0.0.1:6879> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
  1) "13752"
  2) "16020"
  3) "20791"
  4) "5512"
  $ redis-cli -p 6885 -c
  127.0.0.1:6885> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
  -> Redirected to slot located at 127.0.0.1:6879
  1) "13752"
  2) "16020"
  3) "20791"
  4) "5512"
  批量迁移这4个键。
  127.0.0.1:6879> migrate 127.0.0.1 6885 "" 0 5000 keys key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
  再次查看这4个键,已不再源节点。
  127.0.0.1:6879> mget key:migrate:test:13752 key:migrate:test:16020 key:migrate:test:20791 key:migrate:test:5512
  (error) ASK 4096 127.0.0.1:6885
  通知所有主节点槽4096指派给目标节点6885。
  127.0.0.1:6879> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
  127.0.0.1:6880> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
  127.0.0.1:6881> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
  127.0.0.1:6885> cluster setslot 4096 node 99ea0df1d9683affb1271a5092fc8b15b378adba
  确认源节点6879不再负责槽4096,改为目标节点6885负责。
  127.0.0.1:6879> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532851434584 9 connected 4096
  ...
  90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 0-4095 4097-5460
  实际迁移过程中会涉及大量槽,每个槽会有非常多的键,因此redis-trib.rb reshard提供了槽重分片功能,reshard命令简化了槽迁移的过程,剩下槽迁移使用redis-trib.rb完成。
  $ redis-trib.rb reshard 127.0.0.1:6879
  ...
  M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885
  slots:4096 (1 slots) master
  0 additional replica(s)
  M: 558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886
  slots: (0 slots) master
  0 additional replica(s)
  ...
   All nodes agree about slots configuration.
  >>> Check for open slots...
  >>> Check slots coverage...
   All 16384 slots covered.
  How many slots do you want to move (from 1 to 16384)? 4096

  What is the receiving node>
  Please enter all the source node>  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes>  Source node #1:90cb860b7f4ff516304c577bc1e514dc95ecd09b
  Source node #2:3f121a67fab0d74f0d31b69326259e687902e1b3
  Source node #3:fa2acce219d088e2b33756dac2e85ca92936a8dd
  Source node #4:done
  Ready to move 4096 slots.
  Source nodes:
  M: 90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879
  slots:0-4095,4097-5460 (5460 slots) master
  1 additional replica(s)
  M: 3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880
  slots:5461-10922 (5462 slots) master
  1 additional replica(s)
  M: fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881
  slots:10923-16383 (5461 slots) master
  1 additional replica(s)
  Destination node:
  M: 99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885
  slots:4096 (1 slots) master
  0 additional replica(s)
  Resharding plan:
  Moving slot 5461 from 3f121a67fab0d74f0d31b69326259e687902e1b3
  ...
  Moving slot 11090 from fa2acce219d088e2b33756dac2e85ca92936a8dd
  ...
  Moving slot 1364 from 90cb860b7f4ff516304c577bc1e514dc95ecd09b
  Do you want to proceed with the proposed reshard plan (yes/no)? yes
  Moving slot 5461 from 127.0.0.1:6880 to 127.0.0.1:6885: ..
  ...
  Moving slot 12177 from 127.0.0.1:6881 to 127.0.0.1:6885: ....
  ...
  Moving slot 1364 from 127.0.0.1:6879 to 127.0.0.1:6885: .....
  查看节点和槽新的映射关系。
  127.0.0.1:6879> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532852546583 9 connected 0-1364 4096 5461-6826 10923-12287
  558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 master - 0 1532852550630 8 connected
  fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881 master - 0 1532852548097 3 connected 12288-16383
  3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532852549619 2 connected 6827-10922
  ...
  90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 myself,master - 0 0 1 connected 1365-4095 4097-5460
  迁移后使用redis-trib.rb rebalance命令检查节点间槽的均衡性。
  $ redis-trib.rb rebalance 127.0.0.1:6879
  ...
   All 16384 slots covered.
  *** No rebalancing needed! All nodes are within the 2.0% threshold.
  (3)添加从节点
  把节点6886作为6885的从节点,保证整个集群的高可用。
  127.0.0.1:6886> cluster replicate 99ea0df1d9683affb1271a5092fc8b15b378adba
  查看节点6886状态已成为6885的从节点,至此扩容完成。
  127.0.0.1:6886> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 master - 0 1532852938340 9 connected 0-1364 4096 5461-6826 10923-12287
  558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 myself,slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 0 8 connected
  ...
  收缩集群
  收缩集群意味着从现有集群中安全下线部分节点。首先要确定下线节点是否有负责的槽,若有,需把槽迁移到其它节点,保证节点下线后整个集群槽和节点映射的完整性。当下线节点不再负责槽或本身是从节点时,就可以通知集群内其它节点忘记下线节点,当所有节点忘记该节点后可以正常关闭。下面按照2个过程进行。
  (1)下线迁移槽
  下线节点要把自己负责的槽迁移到其它节点,原理和节点扩容槽迁移过程一致。如把6881和6884节点下线,6881是主节点,负责槽(12288-16383),6884是它的从节点。下线6881节点之前,要把它负责的槽迁移到6879,6880和6885这3个节点。由于每次执行reshard命令只能有一个目标节点,因此要执行3次reshard命令,分别迁移1365,1365和1366个槽。
  $ redis-trib.rb reshard 127.0.0.1:6879
  ...
  How many slots do you want to move (from 1 to 16384)? 1365

  What is the receiving node>
  Please enter all the source node>  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes>  Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd
  Source node #2:done
  Ready to move 1365 slots.
  Source nodes:
  M: fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881
  slots:12288-16383 (4096 slots) master
  1 additional replica(s)
  Destination node:
  M: 90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879
  slots:1365-4095,4097-5460 (4095 slots) master
  1 additional replica(s)
  Resharding plan:
  Moving slot 12288 from fa2acce219d088e2b33756dac2e85ca92936a8dd
  Moving slot 12289 from fa2acce219d088e2b33756dac2e85ca92936a8dd
  Do you want to proceed with the proposed reshard plan (yes/no)? yes
  ...
  Moving slot 13651 from 127.0.0.1:6881 to 127.0.0.1:6879: ..
  Moving slot 13652 from 127.0.0.1:6881 to 127.0.0.1:6879: ...
  槽迁移完成后,6879节点接管了6881节点的1365个槽12288-13652。
  继续把1365个,和1366个槽迁移到6880节点,和6885节点。
  $ redis-trib.rb reshard 127.0.0.1:6879
  ...
  How many slots do you want to move (from 1 to 16384)? 1365

  What is the receiving node>
  Please enter all the source node>  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes>  Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd
  Source node #2:done
  ...
  Do you want to proceed with the proposed reshard plan (yes/no)? yes
  ...
  $ redis-trib.rb reshard 127.0.0.1:6879
  ...
  How many slots do you want to move (from 1 to 16384)? 1366

  What is the receiving node>
  Please enter all the source node>  Type 'all' to use all the nodes as source nodes for the hash slots.

  Type 'done' once you entered all the source nodes>  Source node #1:fa2acce219d088e2b33756dac2e85ca92936a8dd
  Source node #2:done
  ...
  Do you want to proceed with the proposed reshard plan (yes/no)? yes
  ...
  到此为止,6881节点所有的槽全部迁出完成,集群状态如下:
  127.0.0.1:6885> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 12 connected 0-1364 4096 5461-6826 10923-12287 15018-16383
  ...
  fa2acce219d088e2b33756dac2e85ca92936a8dd 127.0.0.1:6881 master - 0 1532862908561 3 connected
  904db05d81825413702f7eac960cd2f656b217f7 127.0.0.1:6884 slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 1532862911596 12 connected
  90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532862913617 10 connected 1365-4095 4097-5460 12288-13652
  3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532862907549 11 connected 6827-10922 13653-15017
  (2)忘记节点
  当下线主节点具有从节点时,需要把该从节点指向到其它主节点。对于主从节点都下线的情况,要先下线从节点再下线主节点,防止不必要的切换。对于6881和6884节点下线操作,命令如下:
  $ redis-trib.rb del-node 127.0.0.1:6879 904db05d81825413702f7eac960cd2f656b217f7
  >>> Removing node 904db05d81825413702f7eac960cd2f656b217f7 from cluster 127.0.0.1:6879
  >>> Sending CLUSTER FORGET messages to the cluster...
  >>> SHUTDOWN the node.
  $ redis-trib.rb del-node 127.0.0.1:6879 fa2acce219d088e2b33756dac2e85ca92936a8dd
  ...
  节点下线后,集群最终的状态。
  127.0.0.1:6885> cluster nodes
  99ea0df1d9683affb1271a5092fc8b15b378adba 127.0.0.1:6885 myself,master - 0 0 12 connected 0-1364 4096 5461-6826 10923-12287 15018-16383
  558b0fb8d44933e694b46c15d05e595ce5ae4fab 127.0.0.1:6886 slave 99ea0df1d9683affb1271a5092fc8b15b378adba 0 1532863551984 12 connected
  90cb860b7f4ff516304c577bc1e514dc95ecd09b 127.0.0.1:6879 master - 0 1532863552997 10 connected 1365-4095 4097-5460 12288-13652
  c88a8bbe719e337e9015aa84aab40db06878b728 127.0.0.1:6882 slave 90cb860b7f4ff516304c577bc1e514dc95ecd09b 0 1532863555528 10 connected
  3f121a67fab0d74f0d31b69326259e687902e1b3 127.0.0.1:6880 master - 0 1532863556034 11 connected 6827-10922 13653-15017
  200da7d61d40c384a3e55b74434bf229333a5fe8 127.0.0.1:6883 slave 3f121a67fab0d74f0d31b69326259e687902e1b3 0 1532863555023 11 connected
  若感兴趣可关注订阅号”数据库最佳实践”(DBBestPractice).


页: [1]
查看完整版本: Redis的集群(伸缩)