设为首页 收藏本站
查看: 864|回复: 0

[经验分享] 玩转MHA高可用集群

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2016-11-29 10:04:28 | 显示全部楼层 |阅读模式
  一、简介    MHA(Master High Availability)目前在MySQL高可用方面是一个相对成熟的解决方案,现在很多大型的电商网站都采用此解决方案例如:某宝、某东、某会,是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中,MHA能做到在0~30秒之内手动或自动(如需自动需结合使用脚本实现)完成数据库的故障切换操作,并且在进行故障切换的过程中,MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用性,就因为有此特性,受到很多大型电商网站的宠爱,并将其进行二次研发。
该软件由两部分组成:MHA Manager(管理节点)和MHA Node(数据节点)。MHA Manager可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。MHA Node运行在每台MySQL服务器上,MHA Manager会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master,然后将所有其他的slave重新指向新的master。整个故障转移过程对应用程序完全透明。
在MHA自动故障切换过程中,MHA试图从宕机的主服务器上保存二进制日志,最大程度的保证数据的不丢失,但这并不总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失了最新的数据。使用MySQL 5.5的半同步复制,可以大大降低数据丢失的风险。MHA可以与半同步复制结合起来。如果只有一个slave已经收到了最新的二进制日志,MHA可以将最新的二进制日志应用于其他所有的slave服务器上,因此可以保证所有节点的数据一致性,有时候可故意设置从节点慢于主节点,当发生意外删除数据库倒是数据丢失时可从从节点二进制日志中恢复。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,出于机器成本的考虑,淘宝也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。

MHA高可用集群架构图:
wKiom1g4S2qgZYLMAAEvfsYGSjE441.jpg



    二、实验配置部署及要求

IP地址规划:
服务器名称IP地址主机名
MySQL Manager10.1.10.65node1.alen.com
master10.1.10.66node2.alren.com
slave0110.1.10.67node3.alren.com
slave0210.1.10.68node4.alren.com


配置要求:
    ①各个节点之间需通过主机名可互相通信(此实现简单自行查找资料解决)

    ②在MHA上需安装mha4mysql-manager及其mha4mysql-node两管理软件

    ③需手动创建配置文件目录及书写配置文件




    三、MHA实战配置

    配置各mysql并启动服务:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
node2:mysql master配置如下:
编辑/etc/my.cnf
innodb_file_per_table = 1
skip_name_resolve = 1
log-bin = master-bin
relay-log = relay-bin
server-id = 1
启动服务执行如下操作:
systemctl start mariadb.service
授权一个有所有权限的账号:
grant all on *.* to 'mhauser'@'10.1.10.%' identified by 'cncn'
授权一个有复制功能的账号:
grant replication slave,replication client on *.* to 'repluser'@'10.1.10.%' identified by 'cncn'
flush privileges;
node3:mysql slave1  
编辑/etc/my.cnf配置文件;
innodb_file_per_table = 1
skip_name_resolve = 1
log-bin = master-bin
relay-log = relay-bin
server-id = 3
read-only = 1
relay-log-purge = 0
启动服务执行如下操作:
systemctl start mariadb.service
change master to MASTER_HOST='10.1.10.66',MASTER_USER='repluser',MASTER_PASSWORD='cncn',MASTER_LOG_FILE='master-bin.000001',MASTER_LOG_POS=245;
start slave
node4:mysql slave2   
编辑/etc/my.cnf配置文件;
innodb_file_per_table = 1
skip_name_resolve = 1
log-bin = master-bin
relay-log = relay-bin
server-id = 4
read-only = 1
relay-log-purge = 0
启动服务执行如下操作:
systemctl start mariadb.service
change master to MASTER_HOST='10.1.10.66',MASTER_USER='repluser',MASTER_PASSWORD='cncn',MASTER_LOG_FILE='master-bin.000001',MASTER_LOG_POS=245;
start slave;




    手动创建mha目录及其创建配置文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
mkdir /etc/masterha  
vim /etc/masterha/app1.conf  
[server default]
user=mhauser
password=cncn
manager_workdir=/data/masterha/app1
manager_log=/data/masterha/app1/manager.log
remote_workdir=/data/masterha/app1
ssh_user=root
repl_user=repluser
repl_password=cncn
ping_interval=1
[server1]
hostname=10.1.10.66
candidate_master=1
[server2]
hostname=10.1.10.67
candidate_master=1
[server3]
hostname=10.1.10.68




    检测各个节点是否可相互通信及各个节点的健康状态信息是否ok   

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
[iyunv@node1 ~]# masterha_check_ssh --conf=/etc/mastermha/app1.conf
Fri Nov 25 21:46:49 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Nov 25 21:46:49 2016 - [info] Reading application default configuration from /etc/mastermha/app1.conf..
Fri Nov 25 21:46:49 2016 - [info] Reading server configuration from /etc/mastermha/app1.conf..
Fri Nov 25 21:46:49 2016 - [info] Starting SSH connection tests..
Fri Nov 25 21:46:51 2016 - [debug]
Fri Nov 25 21:46:49 2016 - [debug]  Connecting via SSH from root@10.1.10.66(10.1.10.66:22) to root@10.1.10.67(10.1.10.67:22)..
Fri Nov 25 21:46:50 2016 - [debug]   ok.
Fri Nov 25 21:46:50 2016 - [debug]  Connecting via SSH from root@10.1.10.66(10.1.10.66:22) to root@10.1.10.68(10.1.10.68:22)..
Fri Nov 25 21:46:51 2016 - [debug]   ok.
Fri Nov 25 21:46:51 2016 - [debug]
Fri Nov 25 21:46:49 2016 - [debug]  Connecting via SSH from root@10.1.10.67(10.1.10.67:22) to root@10.1.10.66(10.1.10.66:22)..
Fri Nov 25 21:46:51 2016 - [debug]   ok.
Fri Nov 25 21:46:51 2016 - [debug]  Connecting via SSH from root@10.1.10.67(10.1.10.67:22) to root@10.1.10.68(10.1.10.68:22)..
Fri Nov 25 21:46:51 2016 - [debug]   ok.
Fri Nov 25 21:46:52 2016 - [debug]
Fri Nov 25 21:46:50 2016 - [debug]  Connecting via SSH from root@10.1.10.68(10.1.10.68:22) to root@10.1.10.66(10.1.10.66:22)..
Fri Nov 25 21:46:51 2016 - [debug]   ok.
Fri Nov 25 21:46:51 2016 - [debug]  Connecting via SSH from root@10.1.10.68(10.1.10.68:22) to root@10.1.10.67(10.1.10.67:22)..
Fri Nov 25 21:46:52 2016 - [debug]   ok.
Fri Nov 25 21:46:52 2016 - [info] All SSH connection tests passed successfully. #说明各节点间的通信正常
[iyunv@node1 ~]# masterha_check_repl --conf=/etc/mastermha/app1.conf
Fri Nov 25 22:14:12 2016 - [warning] Global configuration file /etc/masterha_default.c
Fri Nov 25 22:14:12 2016 - [info] Reading application default configuration from /etc/
Fri Nov 25 22:14:12 2016 - [info] Reading server configuration from /etc/mastermha/app
Fri Nov 25 22:14:12 2016 - [info] MHA::MasterMonitor version 0.56.
Fri Nov 25 22:14:13 2016 - [info] GTID failover mode = 0
Fri Nov 25 22:14:13 2016 - [info] Dead Servers:
Fri Nov 25 22:14:13 2016 - [info] Alive Servers:
Fri Nov 25 22:14:13 2016 - [info]   10.1.10.66(10.1.10.66:3306)
Fri Nov 25 22:14:13 2016 - [info]   10.1.10.67(10.1.10.67:3306)
Fri Nov 25 22:14:13 2016 - [info]   10.1.10.68(10.1.10.68:3306)
Fri Nov 25 22:14:13 2016 - [info] Alive Slaves:
Fri Nov 25 22:14:13 2016 - [info]   10.1.10.67(10.1.10.67:3306)  Version=5.5.44-MariaD
Fri Nov 25 22:14:13 2016 - [info]     Replicating from 10.1.10.66(10.1.10.66:3306)
Fri Nov 25 22:14:13 2016 - [info]     Primary candidate for the new Master (candidate_
Fri Nov 25 22:14:13 2016 - [info]   10.1.10.68(10.1.10.68:3306)  Version=5.5.44-MariaD
Fri Nov 25 22:14:13 2016 - [info]     Replicating from 10.1.10.66(10.1.10.66:3306)
Fri Nov 25 22:14:13 2016 - [info] Current Alive Master: 10.1.10.66(10.1.10.66:3306)
Fri Nov 25 22:14:13 2016 - [info] Checking slave configurations..
Fri Nov 25 22:14:13 2016 - [warning]  relay_log_purge=0 is not set on slave 10.1.10.67
Fri Nov 25 22:14:13 2016 - [warning]  relay_log_purge=0 is not set on slave 10.1.10.68
Fri Nov 25 22:14:13 2016 - [info] Checking replication filtering settings..
Fri Nov 25 22:14:13 2016 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Nov 25 22:14:13 2016 - [info]  Replication filtering check ok.
Fri Nov 25 22:14:13 2016 - [info] GTID (with auto-pos) is not supported
Fri Nov 25 22:14:13 2016 - [info] Starting SSH connection tests..
Fri Nov 25 22:14:16 2016 - [info] All SSH connection tests passed successfully.
Fri Nov 25 22:14:16 2016 - [info] Checking MHA Node version..
Fri Nov 25 22:14:18 2016 - [info]  Version check ok.
Fri Nov 25 22:14:18 2016 - [info] Checking SSH publickey authentication settings on th
Fri Nov 25 22:14:18 2016 - [info] HealthCheck: SSH to 10.1.10.66 is reachable.
Fri Nov 25 22:14:19 2016 - [info] Master MHA Node version is 0.56.
Fri Nov 25 22:14:19 2016 - [info] Checking recovery script configurations on 10.1.10.6
Fri Nov 25 22:14:19 2016 - [info]   Executing command: save_binary_logs --command=testterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000003
Fri Nov 25 22:14:19 2016 - [info]   Connecting to root@10.1.10.66(10.1.10.66:22)..
  Creating /data/masterha/app1 if not exists.. Creating directory /data/masterha/app1.
   ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to master-bin.000003
Fri Nov 25 22:14:19 2016 - [info] Binlog setting check done.
Fri Nov 25 22:14:19 2016 - [info] Checking SSH publickey authentication and checking r
Fri Nov 25 22:14:19 2016 - [info]   Executing command : apply_diff_relay_logs --commanve_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.44-MariaDB-log --managlib/mysql/  --slave_pass=xxx
Fri Nov 25 22:14:19 2016 - [info]   Connecting to root@10.1.10.67(10.1.10.67:22)..
Creating directory /data/masterha/app1.. done.
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-bin.000002
    Temporary relay log file is /var/lib/mysql/relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Nov 25 22:14:20 2016 - [info]   Executing command : apply_diff_relay_logs --commanve_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.44-MariaDB-log --managlib/mysql/  --slave_pass=xxx
Fri Nov 25 22:14:20 2016 - [info]   Connecting to root@10.1.10.68(10.1.10.68:22)..
Creating directory /data/masterha/app1.. done.
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to relay-bin.000002
    Temporary relay log file is /var/lib/mysql/relay-bin.000002
    Testing mysql connection and privileges.. done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Fri Nov 25 22:14:21 2016 - [info] Slaves settings check done.
Fri Nov 25 22:14:21 2016 - [info]
10.1.10.66(10.1.10.66:3306) (current master)
+--10.1.10.67(10.1.10.67:3306)
+--10.1.10.68(10.1.10.68:3306)
Fri Nov 25 22:14:21 2016 - [info] Checking replication health on 10.1.10.67..
Fri Nov 25 22:14:21 2016 - [info]  ok.
Fri Nov 25 22:14:21 2016 - [info] Checking replication health on 10.1.10.68..
Fri Nov 25 22:14:21 2016 - [info]  ok.
Fri Nov 25 22:14:21 2016 - [warning] master_ip_failover_script is not defined.
Fri Nov 25 22:14:21 2016 - [warning] shutdown_script is not defined.
Fri Nov 25 22:14:21 2016 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK. #说明各节点之间的健康状态ok




    启动集群管理,让其后台运行即可,调试阶段可让其在前台运行
1
2
3
4
5
6
7
8
9
[iyunv@node1 ~]# masterha_manager --conf=/etc/masterha/app1.conf
Fri Nov 25 22:15:37 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln280] Configuration file /etc/masterha/app1.conf not found!
Fri Nov 25 22:15:37 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations.  at /usr/bin/masterha_manager line 50.
Fri Nov 25 22:15:37 2016 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Fri Nov 25 22:15:37 2016 - [info] Got exit code 1 (Not master dead).
[iyunv@node1 ~]# masterha_manager  --conf=/etc/mastermha/app1.conf
Fri Nov 25 22:16:40 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Nov 25 22:16:40 2016 - [info] Reading application default configuration from /etc/mastermha/app1.conf..
Fri Nov 25 22:16:40 2016 - [info] Reading server configuration from /etc/mastermha/app1.conf..






    总结:大家可模拟测试,模拟节点故障,查看主节点是否迁移,及上线新的节点,查看其运行状态信息是否正常?高可用MHA在实际生产中可大大减少平均无故障时间,提高数据库的可用性,对敏感类的数据来说不建议结合脚本自动修复故障节点,手动往往相对来说比较安全。




运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-307076-1-1.html 上篇帖子: MySQL innodb table management 下篇帖子: mysql的定时备份
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表