cike0415 发表于 2018-11-7 11:41:46

Redis手动failover-ylw6006

  本文介绍redis主从环境下的手工failover操作及排错过程,实现主实例宕机的时候,将从实例提升为主实例,继续写入数据;等到原主实例恢复后,同步原从实例上的数据完成后,恢复初始的主从实例角色!
  环境介绍
  
操作系统版本均为:rhel5.4 64bit
  
redis版本:2.6.4
  
redis实例端口均为:6379
  
redis实例密码均为:123
  
主实例为server11(192.168.1.112)
  
从实例为server12(192.168.1.113)
  一:未配置持久化情况下的手工切换
  
1:正常情况下,server11为主实例,server12为从实例,数据同步正常
  


[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:1
[*]slave0:192.168.1.113,6379,online
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 config get save
[*]1) "save"
[*]2) ""
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 config get save
[*]1) "save"
[*]2) ""
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 set 5 e
[*]OK
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
[*]"e"
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
  

  2:当主实例挂掉的时候,从实例可以正常查询,但无法写入数据
  


[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 shutdown
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
[*]Could not connect to Redis at 192.168.1.112:6379: Connection refused
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
[*]#/usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
[*](error) READONLY You can't write against a read only slave.
  

  3:将从实例提升为主实例,从而实现数据写入
  


[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 SLAVEOF NO ONE
[*]OK
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:0
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
[*]OK
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
[*]"f"
  

  4:主实例恢复后尝试从server12实例上获取最新的数据,实际测试表明这种方法不可行,最终导致server11和server12的数据不一致,如果强行恢复初始实例角色,则会导致数据丢失
  


[*]# /usr/local/redis2/bin/redis-server /usr/local/redis2/etc/redis.conf
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:0
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123get 5
[*](nil)
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123get 6
[*](nil)
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123get 5
[*]"e"
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123get 6
[*]"f"
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -p 6379 -a 123 SLAVEOF 192.168.1.113 6379
[*]OK
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 10 'Replication'
[*]# Replication
[*]role:slave
[*]master_host:192.168.1.113
[*]master_port:6379
[*]master_link_status:down
[*]master_last_io_seconds_ago:-1
[*]master_sync_in_progress:0
[*]master_link_down_since_seconds:517
[*]slave_priority:100
[*]slave_read_only:1
[*]connected_slaves:0
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:0
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
[*](nil)
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 6
[*](nil)
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
[*]"f"
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
  

  二:开启从实例快照持久化下的测试
  
1:恢复原测试环境后,开启从实例的快照持久化,因为是测试环境,所以设置60秒内如果有1条数据变更则保持一次快照
  


[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 config get save
[*]1) "save"
[*]2) ""
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 config get save
[*]1) "save"
[*]2) "60 1"
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:1
[*]slave0:192.168.1.113,6379,online
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:slave
[*]master_host:192.168.1.112
[*]master_port:6379
  

  2:写入测试数据主从环境数据是否同步正常
  


[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 set 5 e
[*]OK
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
[*]"e"
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
  

  3:模拟主实例宕机,手动将从实例提升为主实例,继续写入新数据
  


[*]# killall -9 redis-server
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
[*]Could not connect to Redis at 192.168.1.112:6379: Connection refused
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
[*](error) READONLY You can't write against a read only slave
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 slaveof no one
[*]OK
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:0
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 set 6 f
[*]OK
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
[*]"f"
  

  4:原主实例恢复后的数据同步及角色复原,这里同步数据采取将从实例的快照文件复制到主实例的方式实现
  


[*]# scp /usr/local/redis2/slave_dump.rdbserver11:/usr/local/redis2/master_dump.rdb
[*]# /usr/local/redis2/bin/redis-server /usr/local/redis2/etc/redis.conf
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 2 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:0
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 5
[*]"e"
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 get 6
[*]"f"
[*]
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 slaveof 192.168.1.112 6379
[*]OK
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 info |grep -A 10 'Replication'
[*]# Replication
[*]role:slave
[*]master_host:192.168.1.112
[*]master_port:6379
[*]master_link_status:up
[*]master_last_io_seconds_ago:1
[*]master_sync_in_progress:0
[*]slave_priority:100
[*]slave_read_only:1
[*]connected_slaves:0
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 5
[*]"e"
[*]# /usr/local/redis2/bin/redis-cli -h 192.168.1.113 -a 123 get 6
[*]"f
[*]
[*]#/usr/local/redis2/bin/redis-cli -h 192.168.1.112 -a 123 info |grep -A 3 'Replication'
[*]# Replication
[*]role:master
[*]connected_slaves:1
[*]slave0:192.168.1.113,6379,online
  

  后续扩展:本文实现的failover过程,到从实例提升到主实例阶段都是可以通过部署keepalive自动实现的,在最后原主实例数据同步和角色复原可以通过shell脚本来调度,下篇文章中将对此进行详细的介绍!
  



页: [1]
查看完整版本: Redis手动failover-ylw6006