MySQL之高可用架构—MHA

q4561231 发表于 2018-10-9 09:40:02

　　MySQL高可用目前有heartbeat+drbd、MHA、MySQL复制等几种较成熟的方案，heartbeat+drbd的方案可扩展性较差，而且读写都由主服务器负责，从库并不提供读功能，适合于数据增长量不大、一致性要求很高的环境，如银行、金融业等。今天重点讲下MHA的高可用架构。
　　MHA是一款优秀的高可用环境下故障切换和主从提升的高可用软件。在MySQL故障切换过程中，MHA能做到0-30秒之内自动完成数据库的故障切换，并且在切换的过程中，最大限度的保证数据的一致性，以达到真正意义上的高可用。MHA高可用建立在MySQL主从复制的基础上，先了解下MySQL复制最常见的两种方式：

[*]　　异步复制：主库写入并提交事务之后，把记录写进主库二进制日志即返回客户端，主库和从库的数据存在一定的延迟，这样就存在一定的隐患，当主库提交了一个事务，并且写入了二进制日志，而从库尚未得到主库推送的二进制日志时，此时主库宕机，将造成主从服务器的数据不一致。
[*]　　半同步复制：主库在每次提交事务成功时，并不及时反馈给客户端，而是等待其中一个从库也接收到二进制日志并写入中继日志之后，才返回操作成功给客户端。
　　MHA组成：

[*]　　MHA Manager：管理节点，可以单独的部署在一台独立的服务器上，管理多个master-slave集群，也可以部署在一台Slave上。
[*]　　MHA Node：数据节点，运行在每台MySQL服务器上。
　　MHA Manager会定时探测集群中的master节点，当master出现故障时，它可以自动将最新数据的slave提升为新master，然后将其它所有的slave重新指向新的master。整个故障转移过程对应用程序是完全透明的。
　　MHA工作原理：
　　1）从宕机的master保存二进制日志事件
　　2）识别含有最新更新的Slave
　　3) 应用差异的中继日志到其它从服务器
　　4）应用从master保存的二进制日志事件
　　5）提升一个新的Slave为master
　　6）使其它的Slave连接到新的master并复制
　　示例：MHA高可用架构（如果在内网可以关闭防火墙，否则请开启相应的端口）
　　Manager:node1：192.168.154.128
　　Master：node2：192.168.154.156
　　Slave：node3：192.168.154.130
　　Slave：node4：192.168.154.154
　　一配置主从复制：
　　1）主节点：
　　# vim /etc/my.cnf
　　innodb_file_per_table=1    #开启独立的表空间
　　skip_name_resolve          #禁止域名解析
　　log-bin=master-bin
　　relay-log=relay-bin
　　server-id=1
　　# service mysqld restart
　　查看二进制日志信息
　　mysql> show master status;
　　+-------------------+----------+--------------+------------------+
　　| File          | Position | Binlog_Do_DB | Binlog_Ignore_DB |
　　+-------------------+----------+--------------+------------------+
　　| master-bin.000001 |    106 |          |                |
　　+-------------------+----------+--------------+------------------+
　　1 row in set (0.00 sec)
　　建立授权用户：

　　mysql> grant replication slave,replication client on *.* to 'slave'@'192.168.154.%'>　　Query OK, 0 rows affected (0.06 sec)
　　mysql> flush privileges;
　　Query OK, 0 rows affected (0.00 sec)
　　2）从节点：
　　# vim /etc/my.cnf
　　innodb_file_per_table=1
　　skip_name_resolve
　　log-bin=slave-bin
　　relay-log=relay-bin
　　server_id=2
　　read_only=1
　　relay_log_purge=0
　　# service mysqld restart
　　# vim /etc/my.cnf
　　innodb_file_per_table=1
　　skip_name_resolve
　　log-bin=slave-bin
　　relay-log=relay-bin
　　server_id=3
　　read_only=1                #开启只读模式
　　relay_log_purge=0          #关闭自动清理中继日志
　　# service mysqld restart
　　设置同步：
　　mysql> change master to master_host='192.168.154.156',master_user='slave',master_password='slave',master_log_file='master-bin.000001',master_log_pos=106;
　　Query OK, 0 rows affected (0.03 sec)
　　mysql> start slave;
　　Query OK, 0 rows affected (0.01 sec)
　　mysql> show slave status\G
　　*************************** 1. row ***************************
　　Slave_IO_State: Waiting for master to send event
　　Master_Host: 192.168.154.156
　　Master_User: slave
　　Master_Port: 3306
　　Connect_Retry: 60
　　Master_Log_File: master-bin.000001
　　Read_Master_Log_Pos: 354

　　Relay_Log_File:>
　　>
　　>　　Slave_IO_Running: Yes
　　Slave_SQL_Running: Yes
　　Replicate_Do_DB:
　　Replicate_Ignore_DB:
　　Replicate_Do_Table:
　　Replicate_Ignore_Table:
　　Replicate_Wild_Do_Table:
　　Replicate_Wild_Ignore_Table:
　　Last_Errno: 0
　　Last_Error:
　　Skip_Counter: 0
　　Exec_Master_Log_Pos: 354

　　>　　Until_Condition: None
　　Until_Log_File:
　　Until_Log_Pos: 0
　　Master_SSL_Allowed: No
　　Master_SSL_CA_File:
　　Master_SSL_CA_Path:
　　Master_SSL_Cert:
　　Master_SSL_Cipher:
　　Master_SSL_Key:
　　Seconds_Behind_Master: 0
　　Master_SSL_Verify_Server_Cert: No
　　Last_IO_Errno: 0
　　Last_IO_Error:
　　Last_SQL_Errno: 0
　　Last_SQL_Error:
　　1 row in set (0.00 sec)
　　3）在master节点上创建具有管理权限的账号

　　mysql> grant all on *.* to 'zwj'@'192.168.154.%'>　　Query OK, 0 rows affected (0.00 sec)
　　二配置集群间的密钥登陆
　　在node1上：
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.156
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.130
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.154
　　# ssh 192.168.154.154 'ifconfig'          #验证
　　eth0    Link encap:EthernetHWaddr 00:0C:29:67:65:ED
　　inet addr:192.168.154.154Bcast:192.168.154.255Mask:255.255.255.0
　　inet6 addr: fe80::20c:29ff:fe67:65ed/64 Scope:Link
　　UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
　　RX packets:26253 errors:0 dropped:0 overruns:0 frame:0
　　TX packets:42416 errors:0 dropped:0 overruns:0 carrier:0
　　collisions:0 txqueuelen:1000
　　RX bytes:23453164 (22.3 MiB)TX bytes:2514457 (2.3 MiB)
　　Interrupt:19 Base address:0x2024
　　在node2上：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.128
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.130
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.154
　　在node3上：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.128
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.156
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.154
　　在node4上：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.128
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.156
　　# ssh-copy-id -i ./.ssh/id_rsa.pub root@192.168.154.130
　　三安装MHA Manager，在node1上：
# yum install perl-DBD-MySQL -y　　# tar -zxf mha4mysql-node-0.56.tar.gz
　　# cd mha4mysql-node-0.56
　　# perl Makefile.PL
　　# make
　　# make install
　　# yum install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y    #安装MHA Manger依赖的perl模块
　　# tar -zxf mha4mysql-manager-0.56.tar.gz
　　# cd mha4mysql-manager-0.56
　　# perl Makefile.PL
　　# make
　　# make install
　　四安装MySQL node（在所有MySQL服务器上）
#yum install perl-DBD-MySQL -y　　# cd mha4mysql-node-0.56/
　　# perl Makefile.PL
　　# make
　　# make install
　　五创建工作目录，配置MHA:
　　# mkdir -pv /etc/masterha
　　# vim /etc/masterha/appl.cnf
　　
　　user=zwj
　　password=zwj
　　manager_workdir=/etc/masterha/appl
　　manager_log=/etc/masterha/appl/manager.log
　　remote_workdir=/etc/masterha/appl
　　ssh_user=root
　　repl_user=slave
　　repl_password=slave
　　ping_interval=1
　　
　　hostname=192.168.154.156
　　
　　hostname=192.168.154.130
　　candidate_master=1                      #设置为备选的master
　　
　　hostname=192.168.154.154
　　六检查SSH连接状态：
　　# masterha_check_ssh --conf=/etc/masterha/appl.cnf
　　Wed May 10 00:12:58 2017 - Global configuration file /etc/masterha_default.cnf not found. Skipping.
　　Wed May 10 00:12:58 2017 - Reading application default configuration from /etc/masterha/appl.cnf..
　　Wed May 10 00:12:58 2017 - Reading server configuration from /etc/masterha/appl.cnf..
　　Wed May 10 00:12:58 2017 - Starting SSH connection tests..
　　Wed May 10 00:13:15 2017 -
　　Wed May 10 00:12:59 2017 - Connecting via SSH from root@192.168.154.154(192.168.154.154:22) to root@192.168.154.156(192.168.154.156:22)..
　　Wed May 10 00:13:05 2017 - ok.
　　Wed May 10 00:13:05 2017 - Connecting via SSH from root@192.168.154.154(192.168.154.154:22) to root@192.168.154.130(192.168.154.130:22)..
　　Wed May 10 00:13:15 2017 - ok.
　　Wed May 10 00:13:20 2017 -
　　Wed May 10 00:12:58 2017 - Connecting via SSH from root@192.168.154.130(192.168.154.130:22) to root@192.168.154.156(192.168.154.156:22)..
　　Wed May 10 00:13:11 2017 - ok.
　　Wed May 10 00:13:11 2017 - Connecting via SSH from root@192.168.154.130(192.168.154.130:22) to root@192.168.154.154(192.168.154.154:22)..
　　Wed May 10 00:13:20 2017 - ok.
　　Wed May 10 00:13:35 2017 -
　　Wed May 10 00:12:58 2017 - Connecting via SSH from root@192.168.154.156(192.168.154.156:22) to root@192.168.154.130(192.168.154.130:22)..
　　Wed May 10 00:13:15 2017 - ok.
　　Wed May 10 00:13:15 2017 - Connecting via SSH from root@192.168.154.156(192.168.154.156:22) to root@192.168.154.154(192.168.154.154:22)..
　　Wed May 10 00:13:35 2017 - ok.
　　Wed May 10 00:13:35 2017 - All SSH connection tests passed successfully.
　　七检查整个复制环境：
　　# masterha_check_repl --conf=/etc/masterha/appl.cnf
　　...
　　192.168.154.156(192.168.154.156:3306) (current master)
　　+--192.168.154.130(192.168.154.130:3306)
　　+--192.168.154.154(192.168.154.154:3306)
　　Wed May 10 00:33:36 2017 - Checking replication health on 192.168.154.130..
　　Wed May 10 00:33:36 2017 - ok.
　　Wed May 10 00:33:36 2017 - Checking replication health on 192.168.154.154..
　　Wed May 10 00:33:36 2017 - ok.
　　Wed May 10 00:33:36 2017 - master_ip_failover_script is not defined.
　　Wed May 10 00:33:36 2017 - shutdown_script is not defined.
　　Wed May 10 00:33:36 2017 - Got exit code 0 (Not master dead).
　　MySQL Replication Health is OK.
　　八开启MHA Manager监控：
　　# nohup masterha_manager --conf=/etc/masterha/appl.cnf > /etc/masterha/appl/manager.log 2>&1 &
　　 8300
　　查看MHA Manager监控：
　　# masterha_check_status --conf=/etc/masterha/appl.cnf
　　appl (pid:8300) is running(0:PING_OK), master:192.168.154.156
　　关闭MHA Manager监控：
　　# masterha_stop --conf=/etc/masterha/appl.cnf
　　Stopped appl successfully.
　　+Exit 1                nohup masterha_manager --conf=/etc/masterha/appl.cnf > /etc/masterha/appl/manager.log 2>&1
　　九模拟主库宕机：
　　# service mysqld stop
　　Stopping mysqld:
　　查看slave（node4），可见master已发生变化，
　　...
　　mysql> show slave status\G
　　*************************** 1. row ***************************
　　Slave_IO_State: Waiting for master to send event
　　Master_Host: 192.168.154.130
　　Master_User: slave
　　Master_Port: 3306
　　Connect_Retry: 60
　　Master_Log_File: slave-bin.000003
　　Read_Master_Log_Pos: 106

　　Relay_Log_File:>
　　>
　　>　　Slave_IO_Running: Yes
　　Slave_SQL_Running: Yes

页: [1]

运维网's Archiver

MySQL之高可用架构—MHA