MySQL MHA高可用环境搭建

bestjoe · 发表于 2018-10-1 10:50:41

一、安装MHA基本环境
1. 安装MHA node
(1) 基本环境说明，本文参考互联网文章学习，搭建MHA与测试如下。
　　参考文档：http://www.cnblogs.com/xuanzhi201111/p/4231412.html
　　角色             IP地址          主机名
　　=============================================
　　Master             192.168.1.121    node1
　　Slave             192.168.1.122    node2
　　Slave             192.168.1.123    node3
　　Monitor host       192.168.1.125    node5
(2) 在node1,node2,node3,node5操作：
　　# vi /etc/hosts
　　192.168.1.121  node1
　　192.168.1.122  node2
　　192.168.1.123  node3
　　192.168.1.125  node5
安装MHA node节点软件包：
　　# rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
　　# yum install perl-DBD-MySQL perl-CPAN -y
　　# tar xf mha4mysql-node-0.56.tar.gz
　　# cd mha4mysql-node-0.56
　　# perl Makefile.PL
　　# make && make install
2. 安装MHA Manager
　　在node5管理节点上操作：注：MHA Manager主机也是需要安装MHA Node,MHA Manger
　　# yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
　　# tar xf mha4mysql-manager-0.56.tar.gz
　　# cd mha4mysql-manager-0.56
　　# perl Makefile.PL
　　# make && make install
　　#说明：安装的脚本程序都在/usr/local/bin/目录下。
3. 节点间配置SSH登录无密码验证（MHA主机之间使用key登录）
　　在node5（Monitor）：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
　　在node1（Master）：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
　　在node2 (slave)：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
　　在node3 (slave)：
　　# ssh-keygen -t rsa
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
　　# ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
二、搭建主从复制环境
1. 主从复制环境配置过程
(1) mysql安装过程略,但是三节点要创建如下链接
　　node1(主)，node2(主备从）,node3(从)
　　注意：创建如下链接：
　　ln -s /usr/local/mysql/bin/* /usr/local/bin/
node1 my.cnf
　　server-id             = 1
　　binlog-format       = ROW
　　log-bin             = master-bin
　　log-bin-index       = master-bin.index
　　log-slave-updates    = true
　　relay_log_purge       = 0
node2 my.cnf
　　server-id             = 2
　　binlog-format       = ROW
　　log-bin             = master-bin
　　log-bin-index       = master-bin.index
　　log-slave-updates    = true
　　relay_log_purge       = 0
node3 my.cnf
　　binlog-format       = ROW
　　log-bin             = mysql-bin
　　relay-log             = slave-relay-bin
　　relay-log-index       = slave-relay-bin.index
　　log-slave-updates    = true
　　server-id             = 11
　　skip-name-resolve
　　relay_log_purge       = 0
(2) 在node1 (Master)上备份一份完整的数据：
　　# mysqldump -uroot -p123456 --master-data=2 --single-transaction -R --triggers -A > all.sql
　　其中--master-data=2代表备份时刻记录master的Binlog位置和Position。
(3) 在node1 (Master)上创建复制用户:

　　mysql> grant replication slave on *.* to 'repl'@'192.168.1.%'>　　mysql> flush privileges;
(4) 查看主库备份时的binlog名称和位置，MASTER_LOG_FILE和MASTER_LOG_POS：
　　# head -n 30 all.sql | grep 'CHANGE MASTER TO'
　　-- CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000004', MASTER_LOG_POS=120;
(5) 把备份复制到192.168.1.122和192.168.1.123
　　# scp all.sql 192.168.1.122:/root/
　　# scp all.sql 192.168.1.123:/root/
(6) 分别在两台服务器上导入备份，执行复制相关命令
　　在node2、node3 主机上操作：
　　# mysql -uroot -p123456 < all.sql
　　mysql> stop slave;
　　CHANGE MASTER TO
　　MASTER_HOST='192.168.1.121',
　　MASTER_USER='repl',
　　MASTER_PASSWORD='123456',
　　MASTER_LOG_FILE='master-bin.000004',
　　MASTER_LOG_POS=120;
　　mysql> start slave;
　　mysql> show slave status\G
2. 创建MHA管理用户,在master上创建。

　　mysql> grant all privileges on *.* to 'root'@'192.168.1.%'>　　mysql> flush  privileges;
三、配置Keepalived VIP
　　vip配置可以采用两种方式，一种通过keepalived的方式管理虚拟ip的浮动；一人是通过脚本方式，本文通过keepalived方式实现
1. 在node1(Master)与node2（备选主节点）安装keepalived。
　　# wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
　　# tar xf keepalived-1.2.12.tar.gz
　　# cd keepalived-1.2.12
　　# ./configure --prefix=/usr/local/keepalived
　　# make &&  make install
　　# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
　　# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
　　# mkdir /etc/keepalived
　　# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
　　# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
2. 配置keepalived的配置文件，在node1(master)上配置操作如下:
　　注：keepalived配置成backup->backup，即IP地址切换后，主起来后IP地址不切换，本文监控脚本由MHA提供，keepalived不提供对mysqld的监控。
　　# vi /etc/keepalived/keepalived.conf
　　! Configuration File for keepalived
　　global_defs {
　　notification_email {
　　abc@163.com
　　}
　　notification_email_from dba@dbserver.com
　　smtp_server 127.0.0.1
　　smtp_connect_timeout 30
　　router_id MySQL-HA
　　}
　　vrrp_instance VI_1 {
　　state BACKUP
　　interface eth0
　　virtual_router_id 51
　　priority 150
　　advert_int 1
　　nopreempt
　　authentication {
　　auth_type PASS
　　auth_pass 1111
　　}
　　virtual_ipaddress {
　　192.168.1.130
　　}
　　}
3. 配置keepalived的配置文件，在node2(备用节点)上配置操作如下:
　　# vi /etc/keepalived/keepalived.conf
　　! Configuration File for keepalived
　　global_defs {
　　notification_email {
　　abc@163.com
　　}
　　notification_email_from dba@dbserver.com
　　smtp_server 127.0.0.1
　　smtp_connect_timeout 30
　　router_id MySQL-HA
　　}
　　vrrp_instance VI_1 {
　　state BACKUP
　　interface eth0
　　virtual_router_id 51
　　priority 120
　　advert_int 1
　　nopreempt
　　authentication {
　　auth_type PASS
　　auth_pass 1111
　　}
　　virtual_ipaddress {
　　192.168.1.130
　　}
　　}
4. node1,node2启动keepalived服务
　　# service keepalived start
　　# chkconfig keepalived on
5. node1查看VIP启动情况
　　[root@node1 ~]# ip a
　　1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
　　link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
　　inet 127.0.0.1/8 scope host lo
　　inet6 ::1/128 scope host
　　valid_lft forever preferred_lft forever
　　2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
　　link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff
　　inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0
　　inet 192.168.1.130/32 scope global eth0
　　inet6 fe80::20c:29ff:fe4e:5371/64 scope link
　　valid_lft forever preferred_lft forever
四、配置MHA
1. monitor创建MHA的工作目录，并且创建相关配置文件（在软件包解压后的目录里面有样例配置文件）。
　　# mkdir -p /etc/masterha
　　# mkdir -p /var/log/masterha/app1
　　# cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/
　　修改app1.cnf配置文件，修改后的文件内容如下:
　　# cat /etc/masterha/app1.cnf
　　[server default]
　　manager_workdir=/var/log/masterha/app1
　　manager_log=/var/log/masterha/app1/manager.log
　　master_binlog_dir=/usr/local/mysql/data/
　　master_ip_failover_script= /usr/local/bin/master_ip_failover
　　master_ip_online_change_script= /usr/local/bin/master_ip_online_change
　　password=123456
　　user=root
　　ping_interval=1
　　remote_workdir=/tmp
　　repl_password=123456
　　repl_user=repl
　　report_script=/usr/local/bin/send_report
　　ssh_user=root
　　[server1]
　　hostname=192.168.1.121
　　port=3306
　　[server2]
　　hostname=192.168.1.122
　　port=3306
　　candidate_master=1
　　check_repl_delay=0
　　[server3]
　　hostname=192.168.1.123
　　port=3306
　　说明:
　　master_ip_failover_script= /usr/local/bin/master_ip_failover          #MHA自动切换执行的脚本，需要修改
　　master_ip_online_change_script= /usr/local/bin/master_ip_online_change  #手动切换需要执行的脚本，需要修改
　　report_script=/usr/local/bin/send_report                               #切换时发送邮件进行报告，需要修改
2. 设置relay log的清除方式（在每个slave节点上）：
(1)在node2,node3 从节点上操作：
　　将relay_log_purge=0加入my.cnf配置文件，前面已经配置。
(2) 设置定期清理relay脚本（node2,node3上操作）：
　　# cat purge_relay_log.sh
　　#!/bin/bash
　　user=root
　　passwd=123456
　　port=3306
　　log_dir='/data/masterha/log'
　　work_dir='/data'
　　purge='/usr/local/bin/purge_relay_logs'
　　if [ ! -d $log_dir ]
　　then
　　mkdir $log_dir -p
　　fi
　　$purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
　　配置定时计划任务
　　# crontab -e
　　0 4 * * * /bin/bash /root/purge_relay_log.sh
3. 要想把keepalived服务引入MHA，我们只需要修改切换是触发的脚本文件master_ip_failover即可，在该脚本中添加在master发生宕机时对keepalived的处理。
(1) 编辑脚本/usr/local/bin/master_ip_failover，修改后如下:
　　# vi /usr/local/bin/master_ip_failover
　　#!/usr/bin/env perl
　　use strict;
　　use warnings FATAL => 'all';
　　use Getopt::Long;
　　my (
　　$command,       $ssh_user,       $orig_master_host, $orig_master_ip,
　　$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
　　);
　　my $vip = '192.168.1.130';
　　my $ssh_start_vip = "/etc/init.d/keepalived start";
　　my $ssh_stop_vip = "/etc/init.d/keepalived stop";
　　GetOptions(
　　'command=s'       => \$command,
　　'ssh_user=s'       => \$ssh_user,
　　'orig_master_host=s' => \$orig_master_host,
　　'orig_master_ip=s' => \$orig_master_ip,
　　'orig_master_port=i' => \$orig_master_port,
　　'new_master_host=s'  => \$new_master_host,
　　'new_master_ip=s' => \$new_master_ip,
　　'new_master_port=i'  => \$new_master_port,
　　);
　　exit &main();
　　sub main {
　　print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
　　if ( $command eq "stop" || $command eq "stopssh" ) {
　　my $exit_code = 1;
　　eval {
　　print "Disabling the VIP on old master: $orig_master_host \n";
　　&stop_vip();
　　$exit_code = 0;
　　};
　　if ($@) {
　　warn "Got Error: $@\n";
　　exit $exit_code;
　　}
　　exit $exit_code;
　　}
　　elsif ( $command eq "start" ) {
　　my $exit_code = 10;
　　eval {
　　print "Enabling the VIP - $vip on the new master - $new_master_host \n";
　　&start_vip();
　　$exit_code = 0;
　　};
　　if ($@) {
　　warn $@;
　　exit $exit_code;
　　}
　　exit $exit_code;
　　}
　　elsif ( $command eq "status" ) {
　　print "Checking the Status of the script.. OK \n";
　　exit 0;
　　}
　　else {
　　&usage();
　　exit 1;
　　}
　　}
　　sub start_vip() {
　　`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
　　}
　　# A simple system call that disable the VIP on the old_master
　　sub stop_vip() {
　　`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
　　}
　　sub usage {
　　print
　　"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
　　}
(2) 编辑脚本master_ip_online_change，修改后如下:
　　#!/usr/bin/env perl
　　#  Copyright (C) 2011 DeNA Co.,Ltd.
　　#
　　#  This program is free software; you can redistribute it and/or modify
　　#  it under the terms of the GNU General Public License as published by
　　#  the Free Software Foundation; either version 2 of the License, or
　　#  (at your option) any later version.
　　#
　　#  This program is distributed in the hope that it will be useful,
　　#  but WITHOUT ANY WARRANTY; without even the implied warranty of
　　#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
　　#  GNU General Public License for more details.
　　#
　　#  You should have received a copy of the GNU General Public License
　　# along with this program; if not, write to the Free Software
　　#  Foundation, Inc.,
　　#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
　　## Note: This is a sample script and is not complete. Modify the script based on your environment.
　　use strict;
　　use warnings FATAL => 'all';
　　use Getopt::Long;
　　use MHA::DBHelper;
　　use MHA::NodeUtil;
　　use Time::HiRes qw( sleep gettimeofday tv_interval );
　　use Data::Dumper;
　　my $_tstart;
　　my $_running_interval = 0.1;
　　my (
　　$command,             $orig_master_is_new_slave, $orig_master_host,
　　$orig_master_ip,    $orig_master_port,       $orig_master_user,
　　$orig_master_password, $orig_master_ssh_user,    $new_master_host,
　　$new_master_ip,       $new_master_port,       $new_master_user,
　　$new_master_password,  $new_master_ssh_user
　　);
　　my $vip = '192.168.1.130/24';
　　my $key = '1';
　　my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
　　my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
　　my $orig_master_ssh_port = 22;
　　my $new_master_ssh_port = 22;
　　GetOptions(
　　'command=s'             => \$command,
　　'orig_master_is_new_slave' => \$orig_master_is_new_slave,
　　'orig_master_host=s'    => \$orig_master_host,
　　'orig_master_ip=s'       => \$orig_master_ip,
　　'orig_master_port=i'    => \$orig_master_port,
　　'orig_master_user=s'    => \$orig_master_user,
　　'orig_master_password=s' => \$orig_master_password,
　　'orig_master_ssh_user=s' => \$orig_master_ssh_user,
　　'new_master_host=s'       => \$new_master_host,
　　'new_master_ip=s'       => \$new_master_ip,
　　'new_master_port=i'       => \$new_master_port,
　　'new_master_user=s'       => \$new_master_user,
　　'new_master_password=s' => \$new_master_password,
　　'new_master_ssh_user=s' => \$new_master_ssh_user,
　　'orig_master_ssh_port=i' => \$orig_master_ssh_port,
　　'new_master_ssh_port=i' => \$new_master_ssh_port,
　　);
　　exit &main();
　　sub current_time_us {
　　my ( $sec, $microsec ) = gettimeofday();
　　my $curdate = localtime($sec);
　　return $curdate . " " . sprintf( "%06d", $microsec );
　　}
　　sub sleep_until {
　　my $elapsed = tv_interval($_tstart);
　　if ( $_running_interval > $elapsed ) {
　　sleep( $_running_interval - $elapsed );
　　}
　　}
　　sub get_threads_util {
　　my $dbh                   = shift;
　　my $my_connection_id    = shift;
　　my $running_time_threshold = shift;
　　my $type                = shift;
　　$running_time_threshold = 0 unless ($running_time_threshold);
　　$type                = 0 unless ($type);
　　my @threads;
　　my $sth = $dbh->prepare("SHOW PROCESSLIST");
　　$sth->execute();
　　while ( my $ref = $sth->fetchrow_hashref() ) {
　　my $id       = $ref->{Id};
　　my $user    = $ref->{User};
　　my $host    = $ref->{Host};
　　my $command = $ref->{Command};
　　my $state    = $ref->{State};
　　my $query_time = $ref->{Time};
　　my $info    = $ref->{Info};
　　$info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
　　next if ( $my_connection_id == $id );
　　next if ( defined($query_time) && $query_time < $running_time_threshold );
　　next if ( defined($command) && $command eq "Binlog Dump" );
　　next if ( defined($user)    && $user eq "system user" );
　　next
　　if ( defined($command)
　　&& $command eq "Sleep"
　　&& defined($query_time)
　　&& $query_time >= 1 );
　　if ( $type >= 1 ) {
　　next if ( defined($command) && $command eq "Sleep" );
　　next if ( defined($command) && $command eq "Connect" );
　　}
　　if ( $type >= 2 ) {
　　next if ( defined($info) && $info =~ m/^select/i );
　　next if ( defined($info) && $info =~ m/^show/i );
　　}
　　push @threads, $ref;
　　}
　　return @threads;
　　}
　　sub main {
　　if ( $command eq "stop" ) {
　　## Gracefully killing connections on the current master
　　# 1. Set read_only= 1 on the new master
　　# 2. DROP USER so that no app user can establish new connections
　　# 3. Set read_only= 1 on the current master
　　# 4. Kill current queries
　　# * Any database access failure will result in script die.
　　my $exit_code = 1;
　　eval {
　　## Setting read_only=1 on the new master (to avoid accident)
　　my $new_master_handler = new MHA::DBHelper();
　　# args: hostname, port, user, password, raise_error(die_on_error)_or_not
　　$new_master_handler->connect( $new_master_ip, $new_master_port,
　　$new_master_user, $new_master_password, 1 );
　　print current_time_us() . " Set read_only on the new master.. ";
　　$new_master_handler->enable_read_only();
　　if ( $new_master_handler->is_read_only() ) {
　　print "ok.\n";
　　}
　　else {
　　die "Failed!\n";
　　}
　　$new_master_handler->disconnect();
　　# Connecting to the orig master, die if any database error happens
　　my $orig_master_handler = new MHA::DBHelper();
　　$orig_master_handler->connect( $orig_master_ip, $orig_master_port,
　　$orig_master_user, $orig_master_password, 1 );
　　## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
　　$orig_master_handler->disable_log_bin_local();
　　print current_time_us() . " Drpping app user on the orig master..\n";
　　#FIXME_xxx_drop_app_user($orig_master_handler);
　　## Waiting for N * 100 milliseconds so that current connections can exit
　　my $time_until_read_only = 15;
　　$_tstart = [gettimeofday];
　　my @threads = get_threads_util( $orig_master_handler->{dbh},
　　$orig_master_handler->{connection_id} );
　　while ( $time_until_read_only > 0 && $#threads >= 0 ) {
　　if ( $time_until_read_only % 5 == 0 ) {
　　printf
　　"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
　　current_time_us(), $#threads + 1, $time_until_read_only * 100;
　　if ( $#threads < 5 ) {
　　print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
　　foreach (@threads);
　　}
　　}
　　sleep_until();
　　$_tstart = [gettimeofday];
　　$time_until_read_only--;
　　@threads = get_threads_util( $orig_master_handler->{dbh},
　　$orig_master_handler->{connection_id} );
　　}
　　## Setting read_only=1 on the current master so that nobody(except SUPER) can write
　　print current_time_us() . " Set read_only=1 on the orig master.. ";
　　$orig_master_handler->enable_read_only();
　　if ( $orig_master_handler->is_read_only() ) {
　　print "ok.\n";
　　}
　　else {
　　die "Failed!\n";
　　}
　　## Waiting for M * 100 milliseconds so that current update queries can complete
　　my $time_until_kill_threads = 5;
　　@threads = get_threads_util( $orig_master_handler->{dbh},
　　$orig_master_handler->{connection_id} );
　　while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
　　if ( $time_until_kill_threads % 5 == 0 ) {
　　printf
　　"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
　　current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
　　if ( $#threads < 5 ) {
　　print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
　　foreach (@threads);
　　}
　　}
　　sleep_until();
　　$_tstart = [gettimeofday];
　　$time_until_kill_threads--;
　　@threads = get_threads_util( $orig_master_handler->{dbh},
　　$orig_master_handler->{connection_id} );
　　}
　　## Terminating all threads
　　print current_time_us() . " Killing all application threads..\n";
　　$orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
　　print current_time_us() . " done.\n";
　　$orig_master_handler->enable_log_bin_local();
　　$orig_master_handler->disconnect();
　　## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
　　eval {
　　`ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
　　};
　　if ($@) {
　　warn $@;
　　}
　　$exit_code = 0;
　　};
　　if ($@) {
　　warn "Got Error: $@\n";
　　exit $exit_code;
　　}
　　exit $exit_code;
　　}
　　elsif ( $command eq "start" ) {
　　## Activating master ip on the new master
　　# 1. Create app user with write privileges
　　# 2. Moving backup script if needed
　　# 3. Register new master's ip to the catalog database
　　# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
　　# If exit code is 0 or 10, MHA does not abort
　　my $exit_code = 10;
　　eval {
　　my $new_master_handler = new MHA::DBHelper();
　　# args: hostname, port, user, password, raise_error_or_not
　　$new_master_handler->connect( $new_master_ip, $new_master_port,
　　$new_master_user, $new_master_password, 1 );
　　## Set read_only=0 on the new master
　　$new_master_handler->disable_log_bin_local();
　　print current_time_us() . " Set read_only=0 on the new master.\n";
　　$new_master_handler->disable_read_only();
　　## Creating an app user on the new master
　　print current_time_us() . " Creating app user on the new master..\n";
　　#FIXME_xxx_create_app_user($new_master_handler);
　　$new_master_handler->enable_log_bin_local();
　　$new_master_handler->disconnect();
　　## Update master ip on the catalog database, etc
　　`ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
　　$exit_code = 0;
　　};
　　if ($@) {
　　warn "Got Error: $@\n";
　　exit $exit_code;
　　}
　　exit $exit_code;
　　}
　　elsif ( $command eq "status" ) {
　　# do nothing
　　exit 0;
　　}
　　else {
　　&usage();
　　exit 1;
　　}
　　}
　　sub usage {
　　print
　　"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
　　die;
　　}
(3) 编辑脚本send_report，修改后如下:
　　#!/usr/bin/perl
　　use strict;
　　use warnings FATAL => 'all';
　　use Mail::Sender;
　　use Getopt::Long;
　　#new_master_host and new_slave_hosts are set only when recovering master succeeded
　　my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
　　my $smtp='smtp.163.com';
　　my $mail_from='xxxx';
　　my $mail_user='xxxxx';
　　my $mail_pass='xxxxx';
　　my $mail_to=['xxxx','xxxx'];
　　GetOptions(
　　'orig_master_host=s' => \$dead_master_host,
　　'new_master_host=s'  => \$new_master_host,
　　'new_slave_hosts=s'  => \$new_slave_hosts,
　　'subject=s'       => \$subject,
　　'body=s'          => \$body,
　　);
　　mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
　　sub mailToContacts {
　　my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
　　open my $DEBUG, "> /tmp/monitormail.log"
　　or die "Can't open the debug    file:$!\n";
　　my $sender = new Mail::Sender {
　　ctype    => 'text/plain; ',
　　encoding => 'utf-8',
　　smtp       => $smtp,
　　from       => $mail_from,
　　auth       => 'LOGIN',
　　TLS_allowed => '0',
　　authid    => $user,
　　authpwd    => $passwd,
　　to       => $mail_to,
　　subject    => $subject,
　　debug    => $DEBUG
　　};
　　$sender->MailMsg(
　　{ msg => $msg,
　　debug => $DEBUG
　　}
　　) or print $Mail::Sender::Error;
　　return 1;
　　}
　　# Do whatever you want here
　　exit 0;
五、MHA的日常管理
1. 检查SSH配置(node5 Monitor 监控节点上操作)，如下：
　　# masterha_check_ssh --conf=/etc/masterha/app1.cnf
　　Sun May  1 22:05:12 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
　　Sun May  1 22:05:12 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
　　Sun May  1 22:05:12 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
　　Sun May  1 22:05:12 2016 - [info] Starting SSH connection tests..
　　Sun May  1 22:05:14 2016 - [debug]
　　Sun May  1 22:05:12 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..
　　Sun May  1 22:05:13 2016 - [debug] ok.
　　Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..
　　Sun May  1 22:05:13 2016 - [debug] ok.
　　Sun May  1 22:05:14 2016 - [debug]
　　Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..
　　Sun May  1 22:05:13 2016 - [debug] ok.
　　Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..
　　Sun May  1 22:05:14 2016 - [debug] ok.
　　Sun May  1 22:05:14 2016 - [debug]
　　Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..
　　Sun May  1 22:05:14 2016 - [debug] ok.
　　Sun May  1 22:05:14 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..
　　Sun May  1 22:05:14 2016 - [debug] ok.
　　Sun May  1 22:05:14 2016 - [info] All SSH connection tests passed successfully.
2. 检查整个复制环境状况(node5 监控节点上操作)，如下：
　　# masterha_check_repl --conf=/etc/masterha/app1.cnf
　　Sun May  1 22:46:44 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
　　Sun May  1 22:46:44 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
　　Sun May  1 22:46:44 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
　　Sun May  1 22:46:44 2016 - [info] MHA::MasterMonitor version 0.56.
　　Sun May  1 22:46:45 2016 - [info] GTID failover mode = 0
　　Sun May  1 22:46:45 2016 - [info] Dead Servers:
　　Sun May  1 22:46:45 2016 - [info] Alive Servers:
　　Sun May  1 22:46:45 2016 - [info] 192.168.1.121(192.168.1.121:3306)
　　Sun May  1 22:46:45 2016 - [info] 192.168.1.122(192.168.1.122:3306)
　　Sun May  1 22:46:45 2016 - [info] 192.168.1.123(192.168.1.123:3306)
　　Sun May  1 22:46:45 2016 - [info] Alive Slaves:
　　Sun May  1 22:46:45 2016 - [info] 192.168.1.122(192.168.1.122:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
　　Sun May  1 22:46:45 2016 - [info]    Replicating from 192.168.1.121(192.168.1.121:3306)
　　Sun May  1 22:46:45 2016 - [info]    Primary candidate for the new Master (candidate_master is set)
　　Sun May  1 22:46:45 2016 - [info] 192.168.1.123(192.168.1.123:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
　　Sun May  1 22:46:45 2016 - [info]    Replicating from 192.168.1.121(192.168.1.121:3306)
　　Sun May  1 22:46:45 2016 - [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)
　　Sun May  1 22:46:45 2016 - [info] Checking slave configurations..
　　Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).

　　Sun May  1 22:46:45 2016 - [warning] >　　Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).

　　Sun May  1 22:46:45 2016 - [warning] >　　Sun May  1 22:46:45 2016 - [info] Checking replication filtering settings..
　　Sun May  1 22:46:45 2016 - [info]  binlog_do_db= , binlog_ignore_db=
　　Sun May  1 22:46:45 2016 - [info]  Replication filtering check ok.
　　Sun May  1 22:46:45 2016 - [info] GTID (with auto-pos) is not supported
　　Sun May  1 22:46:45 2016 - [info] Starting SSH connection tests..
　　Sun May  1 22:46:46 2016 - [info] All SSH connection tests passed successfully.
　　Sun May  1 22:46:46 2016 - [info] Checking MHA Node version..
　　Sun May  1 22:46:47 2016 - [info]  Version check ok.
　　Sun May  1 22:46:47 2016 - [info] Checking SSH publickey authentication settings on the current master..
　　Sun May  1 22:46:47 2016 - [info] HealthCheck: SSH to 192.168.1.121 is reachable.
　　Sun May  1 22:46:47 2016 - [info] Master MHA Node version is 0.56.
　　Sun May  1 22:46:47 2016 - [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..
　　Sun May  1 22:46:47 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
　　Sun May  1 22:46:47 2016 - [info] Connecting to root@192.168.1.121(192.168.1.121:22)..
　　Creating /tmp if not exists.. ok.
　　Checking output directory is accessible or not..
　　ok.
　　Binlog found at /usr/local/mysql/data/, up to master-bin.000008
　　Sun May  1 22:46:48 2016 - [info] Binlog setting check done.
　　Sun May  1 22:46:48 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
　　Sun May  1 22:46:48 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
　　Sun May  1 22:46:48 2016 - [info] Connecting to root@192.168.1.122(192.168.1.122:22)..
　　Checking slave recovery environment settings..
　　Opening /usr/local/mysql/data/relay-log.info ... ok.

　　>
　　Temporary>　　Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
　　done.
　　Testing mysqlbinlog output.. done.
　　Cleaning up test file(s).. done.
　　Sun May  1 22:46:48 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
　　Sun May  1 22:46:48 2016 - [info] Connecting to root@192.168.1.123(192.168.1.123:22)..
　　Checking slave recovery environment settings..
　　Opening /usr/local/mysql/data/relay-log.info ... ok.

　　>
　　Temporary>　　Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
　　done.
　　Testing mysqlbinlog output.. done.
　　Cleaning up test file(s).. done.
　　Sun May  1 22:46:48 2016 - [info] Slaves settings check done.
　　Sun May  1 22:46:48 2016 - [info]
　　192.168.1.121(192.168.1.121:3306) (current master)
　　+--192.168.1.122(192.168.1.122:3306)
　　+--192.168.1.123(192.168.1.123:3306)
　　Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.122..
　　Sun May  1 22:46:48 2016 - [info]  ok.
　　Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.123..
　　Sun May  1 22:46:48 2016 - [info]  ok.
　　Sun May  1 22:46:48 2016 - [info] Checking master_ip_failover_script status:
　　Sun May  1 22:46:48 2016 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306
　　IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
　　Checking the Status of the script.. OK
　　Sun May  1 22:46:48 2016 - [info]  OK.
　　Sun May  1 22:46:48 2016 - [warning] shutdown_script is not defined.
　　Sun May  1 22:46:48 2016 - [info] Got exit code 0 (Not master dead).
　　MySQL Replication Health is OK.
　　[root@node5 masterha]#
3. 开启MHA Manager监控(node5操作)如下：
　　# mkdir -p  /var/log/masterha/app1/
　　# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
　　参数说明：
　　--remove_dead_master_conf    #该参数代表当发生主从切换后，老的主库的ip将会从配置文件中移除。
　　--manger_log                #日志存放位置
　　--ignore_last_failover       #在缺省情况下，如果MHA检测到连续发生宕机，会生成app1.failover.complete文件，会造成MHA管理进程无法启动。
4. 查看MHA Manager监控是否正常：
　　# masterha_check_status --conf=/etc/masterha/app1.cnf
　　app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121
5. 查看启动日志(node5操作)如下：
　　# tail -n20 /var/log/masterha/app1/manager.log
6. 关闭MHA Manage监控：
(1) 关闭
　　# masterha_stop --conf=/etc/masterha/app1.cnf
(2) 启动
　　# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
六、MHA Failover切换
1. 自动Failover切换
　　(1) 模拟master mysql关闭
　　(2) VIP将会切换到node2
　　(3) /etc/masterha/app1.cnf中将原主服务器配置文件清掉。
　　(4) masterha_manager监控进程会自动退出关闭，并在/var/log/masterha/app1下生成app1.failover.complete文件，manager.log会记录全过程，从服务器会自动从新的主服务器复制。
　　(5) 原主服务器mysqld启动的，需要清掉/var/log/masterha/app1下生成app1.failover.complete文件，添加node1配置文件到/etc/masterha/app1.cnf，通过manager.log中的记录的故障点，重新同步主服务器，成为从节点。
2. 手动Failover切换
　　(1) 先停MHA Manager进程。
　　masterha_stop --conf=/etc/masterha/app1.cnf
　　(2) 停掉master mysqld
　　(3) 手动切换，在Manager主机上操作如下：
　　# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover
　　通过观察日志可以观察切换全过程。
　　(4) 如上节方式恢复节点为从服务器。
3.  正常运行情况下切换(Master正在运行)
　　等补充。
4. 小结
　　通过对MMM，MHA的环境搭建测试，MHA由于采用复制架构，原理简单，在一些对数据要求比较高的环境，为了保证可靠性，最好与半同步结合使用。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

[经验分享] MySQL MHA高可用环境搭建

浏览过的版块

扫码加入运维网微信交流群