设为首页 收藏本站
查看: 1394|回复: 0

[经验分享] MySQL MHA高可用环境搭建

[复制链接]

尚未签到

发表于 2018-10-1 10:50:41 | 显示全部楼层 |阅读模式
一、安装MHA基本环境
1. 安装MHA node
(1) 基本环境说明,本文参考互联网文章学习,搭建MHA与测试如下。
  参考文档:http://www.cnblogs.com/xuanzhi201111/p/4231412.html
  角色                IP地址            主机名
  =============================================
  Master              192.168.1.121     node1
  Slave               192.168.1.122     node2
  Slave               192.168.1.123     node3
  Monitor host        192.168.1.125     node5
(2) 在node1,node2,node3,node5操作:
  # vi /etc/hosts
  192.168.1.121  node1
  192.168.1.122  node2
  192.168.1.123  node3
  192.168.1.125  node5
安装MHA node节点软件包:
  # rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
  # yum install perl-DBD-MySQL perl-CPAN -y
  # tar xf mha4mysql-node-0.56.tar.gz
  # cd mha4mysql-node-0.56
  # perl Makefile.PL
  # make && make install
2. 安装MHA Manager
  在node5管理节点上操作:注:MHA Manager主机也是需要安装MHA Node,MHA Manger
  # yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
  # tar xf mha4mysql-manager-0.56.tar.gz
  # cd mha4mysql-manager-0.56
  # perl Makefile.PL
  # make && make install
  #说明:安装的脚本程序都在/usr/local/bin/目录下。
3. 节点间配置SSH登录无密码验证(MHA主机之间使用key登录)
  在node5(Monitor):
  # ssh-keygen -t rsa
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
  在node1(Master):
  # ssh-keygen -t rsa
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
  在node2 (slave):
  # ssh-keygen -t rsa
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node3
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
  在node3 (slave):
  # ssh-keygen -t rsa
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node1
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node2
  # ssh-copy-id -i /root/.ssh/id_rsa.pub root@node5
二、搭建主从复制环境
1. 主从复制环境配置过程
(1) mysql安装过程略,但是三节点要创建如下链接
  node1(主),node2(主备从),node3(从)
  注意:创建如下链接:
  ln -s /usr/local/mysql/bin/* /usr/local/bin/
node1 my.cnf
  server-id              = 1
  binlog-format          = ROW
  log-bin                = master-bin
  log-bin-index          = master-bin.index
  log-slave-updates      = true
  relay_log_purge        = 0
node2 my.cnf
  server-id              = 2
  binlog-format          = ROW
  log-bin                = master-bin
  log-bin-index          = master-bin.index
  log-slave-updates      = true
  relay_log_purge        = 0
node3 my.cnf
  binlog-format          = ROW
  log-bin                = mysql-bin
  relay-log              = slave-relay-bin
  relay-log-index        = slave-relay-bin.index
  log-slave-updates      = true
  server-id              = 11
  skip-name-resolve
  relay_log_purge        = 0
(2) 在node1 (Master)上备份一份完整的数据:
  # mysqldump -uroot -p123456 --master-data=2 --single-transaction -R --triggers -A > all.sql
  其中--master-data=2代表备份时刻记录master的Binlog位置和Position。
(3) 在node1 (Master)上创建复制用户:

  mysql> grant replication slave on *.* to 'repl'@'192.168.1.%'>  mysql> flush privileges;
(4) 查看主库备份时的binlog名称和位置,MASTER_LOG_FILE和MASTER_LOG_POS:
  # head -n 30 all.sql | grep 'CHANGE MASTER TO'
  -- CHANGE MASTER TO MASTER_LOG_FILE='master-bin.000004', MASTER_LOG_POS=120;
(5) 把备份复制到192.168.1.122和192.168.1.123
  # scp all.sql 192.168.1.122:/root/
  # scp all.sql 192.168.1.123:/root/
(6) 分别在两台服务器上导入备份,执行复制相关命令
  在node2、node3 主机上操作:
  # mysql -uroot -p123456 < all.sql
  mysql> stop slave;
  CHANGE MASTER TO
  MASTER_HOST='192.168.1.121',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_LOG_FILE='master-bin.000004',
  MASTER_LOG_POS=120;
  mysql> start slave;
  mysql> show slave status\G
2. 创建MHA管理用户,在master上创建。

  mysql> grant all privileges on *.* to 'root'@'192.168.1.%'>  mysql> flush  privileges;
三、配置Keepalived VIP
  vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;一人是通过脚本方式,本文通过keepalived方式实现
1. 在node1(Master)与node2(备选主节点)安装keepalived。
  # wget http://www.keepalived.org/software/keepalived-1.2.12.tar.gz
  # tar xf keepalived-1.2.12.tar.gz
  # cd keepalived-1.2.12
  # ./configure --prefix=/usr/local/keepalived
  # make &&  make install
  # cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
  # cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
  # mkdir /etc/keepalived
  # cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
  # cp /usr/local/keepalived/sbin/keepalived /usr/sbin/
2. 配置keepalived的配置文件,在node1(master)上配置操作如下:
  注:keepalived配置成backup->backup,即IP地址切换后,主起来后IP地址不切换,本文监控脚本由MHA提供,keepalived不提供对mysqld的监控。
  # vi /etc/keepalived/keepalived.conf
  ! Configuration File for keepalived
  global_defs {
  notification_email {
  abc@163.com
  }
  notification_email_from dba@dbserver.com
  smtp_server 127.0.0.1
  smtp_connect_timeout 30
  router_id MySQL-HA
  }
  vrrp_instance VI_1 {
  state BACKUP
  interface eth0
  virtual_router_id 51
  priority 150
  advert_int 1
  nopreempt
  authentication {
  auth_type PASS
  auth_pass 1111
  }
  virtual_ipaddress {
  192.168.1.130
  }
  }
3. 配置keepalived的配置文件,在node2(备用节点)上配置操作如下:
  # vi /etc/keepalived/keepalived.conf
  ! Configuration File for keepalived
  global_defs {
  notification_email {
  abc@163.com
  }
  notification_email_from dba@dbserver.com
  smtp_server 127.0.0.1
  smtp_connect_timeout 30
  router_id MySQL-HA
  }
  vrrp_instance VI_1 {
  state BACKUP
  interface eth0
  virtual_router_id 51
  priority 120
  advert_int 1
  nopreempt
  authentication {
  auth_type PASS
  auth_pass 1111
  }
  virtual_ipaddress {
  192.168.1.130
  }
  }
4. node1,node2启动keepalived服务
  # service keepalived start
  # chkconfig keepalived on
5. node1查看VIP启动情况
  [root@node1 ~]# ip a
  1: lo:  mtu 65536 qdisc noqueue state UNKNOWN
  link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
  inet 127.0.0.1/8 scope host lo
  inet6 ::1/128 scope host
  valid_lft forever preferred_lft forever
  2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
  link/ether 00:0c:29:4e:53:71 brd ff:ff:ff:ff:ff:ff
  inet 192.168.1.121/24 brd 192.168.1.255 scope global eth0
  inet 192.168.1.130/32 scope global eth0
  inet6 fe80::20c:29ff:fe4e:5371/64 scope link
  valid_lft forever preferred_lft forever
四、配置MHA
1. monitor创建MHA的工作目录,并且创建相关配置文件(在软件包解压后的目录里面有样例配置文件)。
  # mkdir -p /etc/masterha
  # mkdir -p /var/log/masterha/app1
  # cp mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/masterha/
  修改app1.cnf配置文件,修改后的文件内容如下:
  # cat /etc/masterha/app1.cnf
  [server default]
  manager_workdir=/var/log/masterha/app1
  manager_log=/var/log/masterha/app1/manager.log
  master_binlog_dir=/usr/local/mysql/data/
  master_ip_failover_script= /usr/local/bin/master_ip_failover
  master_ip_online_change_script= /usr/local/bin/master_ip_online_change
  password=123456
  user=root
  ping_interval=1
  remote_workdir=/tmp
  repl_password=123456
  repl_user=repl
  report_script=/usr/local/bin/send_report
  ssh_user=root
  [server1]
  hostname=192.168.1.121
  port=3306
  [server2]
  hostname=192.168.1.122
  port=3306
  candidate_master=1
  check_repl_delay=0
  [server3]
  hostname=192.168.1.123
  port=3306
  说明:
  master_ip_failover_script= /usr/local/bin/master_ip_failover            #MHA自动切换执行的脚本,需要修改     
  master_ip_online_change_script= /usr/local/bin/master_ip_online_change  #手动切换需要执行的脚本,需要修改     
  report_script=/usr/local/bin/send_report                                #切换时发送邮件进行报告,需要修改
2. 设置relay log的清除方式(在每个slave节点上):
(1)在node2,node3 从节点上操作:
  将relay_log_purge=0加入my.cnf配置文件,前面已经配置。
(2) 设置定期清理relay脚本(node2,node3上操作):
  # cat purge_relay_log.sh
  #!/bin/bash
  user=root
  passwd=123456
  port=3306
  log_dir='/data/masterha/log'
  work_dir='/data'
  purge='/usr/local/bin/purge_relay_logs'
  if [ ! -d $log_dir ]
  then
  mkdir $log_dir -p
  fi
  $purge --user=$user --password=$passwd --disable_relay_log_purge --port=$port --workdir=$work_dir >> $log_dir/purge_relay_logs.log 2>&1
  配置定时计划任务
  # crontab -e
  0 4 * * * /bin/bash /root/purge_relay_log.sh
3. 要想把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。
(1) 编辑脚本/usr/local/bin/master_ip_failover,修改后如下:
  # vi /usr/local/bin/master_ip_failover
  #!/usr/bin/env perl
  use strict;
  use warnings FATAL => 'all';
  use Getopt::Long;
  my (
  $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
  $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
  );
  my $vip = '192.168.1.130';
  my $ssh_start_vip = "/etc/init.d/keepalived start";
  my $ssh_stop_vip = "/etc/init.d/keepalived stop";
  GetOptions(
  'command=s'          => \$command,
  'ssh_user=s'         => \$ssh_user,
  'orig_master_host=s' => \$orig_master_host,
  'orig_master_ip=s'   => \$orig_master_ip,
  'orig_master_port=i' => \$orig_master_port,
  'new_master_host=s'  => \$new_master_host,
  'new_master_ip=s'    => \$new_master_ip,
  'new_master_port=i'  => \$new_master_port,
  );
  exit &main();
  sub main {
  print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
  if ( $command eq "stop" || $command eq "stopssh" ) {
  my $exit_code = 1;
  eval {
  print "Disabling the VIP on old master: $orig_master_host \n";
  &stop_vip();
  $exit_code = 0;
  };
  if ($@) {
  warn "Got Error: $@\n";
  exit $exit_code;
  }
  exit $exit_code;
  }
  elsif ( $command eq "start" ) {
  my $exit_code = 10;
  eval {
  print "Enabling the VIP - $vip on the new master - $new_master_host \n";
  &start_vip();
  $exit_code = 0;
  };
  if ($@) {
  warn $@;
  exit $exit_code;
  }
  exit $exit_code;
  }
  elsif ( $command eq "status" ) {
  print "Checking the Status of the script.. OK \n";
  exit 0;
  }
  else {
  &usage();
  exit 1;
  }
  }
  sub start_vip() {
  `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
  }
  # A simple system call that disable the VIP on the old_master
  sub stop_vip() {
  `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
  }
  sub usage {
  print
  "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  }
(2) 编辑脚本master_ip_online_change,修改后如下:
  #!/usr/bin/env perl
  #  Copyright (C) 2011 DeNA Co.,Ltd.
  #
  #  This program is free software; you can redistribute it and/or modify
  #  it under the terms of the GNU General Public License as published by
  #  the Free Software Foundation; either version 2 of the License, or
  #  (at your option) any later version.
  #
  #  This program is distributed in the hope that it will be useful,
  #  but WITHOUT ANY WARRANTY; without even the implied warranty of
  #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  #  GNU General Public License for more details.
  #
  #  You should have received a copy of the GNU General Public License
  #   along with this program; if not, write to the Free Software
  #  Foundation, Inc.,
  #  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
  ## Note: This is a sample script and is not complete. Modify the script based on your environment.
  use strict;
  use warnings FATAL => 'all';
  use Getopt::Long;
  use MHA::DBHelper;
  use MHA::NodeUtil;
  use Time::HiRes qw( sleep gettimeofday tv_interval );
  use Data::Dumper;
  my $_tstart;
  my $_running_interval = 0.1;
  my (
  $command,              $orig_master_is_new_slave, $orig_master_host,
  $orig_master_ip,       $orig_master_port,         $orig_master_user,
  $orig_master_password, $orig_master_ssh_user,     $new_master_host,
  $new_master_ip,        $new_master_port,          $new_master_user,
  $new_master_password,  $new_master_ssh_user
  );
  my $vip = '192.168.1.130/24';
  my $key = '1';
  my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
  my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";
  my $orig_master_ssh_port = 22;
  my $new_master_ssh_port = 22;
  GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
  'orig_master_ssh_port=i'    => \$orig_master_ssh_port,
  'new_master_ssh_port=i'    => \$new_master_ssh_port,
  );
  exit &main();
  sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
  }
  sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
  sleep( $_running_interval - $elapsed );
  }
  }
  sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;
  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();
  while ( my $ref = $sth->fetchrow_hashref() ) {
  my $id         = $ref->{Id};
  my $user       = $ref->{User};
  my $host       = $ref->{Host};
  my $command    = $ref->{Command};
  my $state      = $ref->{State};
  my $query_time = $ref->{Time};
  my $info       = $ref->{Info};
  $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
  next if ( $my_connection_id == $id );
  next if ( defined($query_time) && $query_time < $running_time_threshold );
  next if ( defined($command)    && $command eq "Binlog Dump" );
  next if ( defined($user)       && $user eq "system user" );
  next
  if ( defined($command)
  && $command eq "Sleep"
  && defined($query_time)
  && $query_time >= 1 );
  if ( $type >= 1 ) {
  next if ( defined($command) && $command eq "Sleep" );
  next if ( defined($command) && $command eq "Connect" );
  }
  if ( $type >= 2 ) {
  next if ( defined($info) && $info =~ m/^select/i );
  next if ( defined($info) && $info =~ m/^show/i );
  }
  push @threads, $ref;
  }
  return @threads;
  }
  sub main {
  if ( $command eq "stop" ) {
  ## Gracefully killing connections on the current master
  # 1. Set read_only= 1 on the new master
  # 2. DROP USER so that no app user can establish new connections
  # 3. Set read_only= 1 on the current master
  # 4. Kill current queries
  # * Any database access failure will result in script die.
  my $exit_code = 1;
  eval {
  ## Setting read_only=1 on the new master (to avoid accident)
  my $new_master_handler = new MHA::DBHelper();
  # args: hostname, port, user, password, raise_error(die_on_error)_or_not
  $new_master_handler->connect( $new_master_ip, $new_master_port,
  $new_master_user, $new_master_password, 1 );
  print current_time_us() . " Set read_only on the new master.. ";
  $new_master_handler->enable_read_only();
  if ( $new_master_handler->is_read_only() ) {
  print "ok.\n";
  }
  else {
  die "Failed!\n";
  }
  $new_master_handler->disconnect();
  # Connecting to the orig master, die if any database error happens
  my $orig_master_handler = new MHA::DBHelper();
  $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
  $orig_master_user, $orig_master_password, 1 );
  ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
  $orig_master_handler->disable_log_bin_local();
  print current_time_us() . " Drpping app user on the orig master..\n";
  #FIXME_xxx_drop_app_user($orig_master_handler);
  ## Waiting for N * 100 milliseconds so that current connections can exit
  my $time_until_read_only = 15;
  $_tstart = [gettimeofday];
  my @threads = get_threads_util( $orig_master_handler->{dbh},
  $orig_master_handler->{connection_id} );
  while ( $time_until_read_only > 0 && $#threads >= 0 ) {
  if ( $time_until_read_only % 5 == 0 ) {
  printf
  "%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
  current_time_us(), $#threads + 1, $time_until_read_only * 100;
  if ( $#threads < 5 ) {
  print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
  foreach (@threads);
  }
  }
  sleep_until();
  $_tstart = [gettimeofday];
  $time_until_read_only--;
  @threads = get_threads_util( $orig_master_handler->{dbh},
  $orig_master_handler->{connection_id} );
  }
  ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
  print current_time_us() . " Set read_only=1 on the orig master.. ";
  $orig_master_handler->enable_read_only();
  if ( $orig_master_handler->is_read_only() ) {
  print "ok.\n";
  }
  else {
  die "Failed!\n";
  }
  ## Waiting for M * 100 milliseconds so that current update queries can complete
  my $time_until_kill_threads = 5;
  @threads = get_threads_util( $orig_master_handler->{dbh},
  $orig_master_handler->{connection_id} );
  while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
  if ( $time_until_kill_threads % 5 == 0 ) {
  printf
  "%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
  current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
  if ( $#threads < 5 ) {
  print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
  foreach (@threads);
  }
  }
  sleep_until();
  $_tstart = [gettimeofday];
  $time_until_kill_threads--;
  @threads = get_threads_util( $orig_master_handler->{dbh},
  $orig_master_handler->{connection_id} );
  }
  ## Terminating all threads
  print current_time_us() . " Killing all application threads..\n";
  $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
  print current_time_us() . " done.\n";
  $orig_master_handler->enable_log_bin_local();
  $orig_master_handler->disconnect();
  ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
  eval {
  `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
  };
  if ($@) {
  warn $@;
  }
  $exit_code = 0;
  };
  if ($@) {
  warn "Got Error: $@\n";
  exit $exit_code;
  }
  exit $exit_code;
  }
  elsif ( $command eq "start" ) {
  ## Activating master ip on the new master
  # 1. Create app user with write privileges
  # 2. Moving backup script if needed
  # 3. Register new master's ip to the catalog database
  # We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.
  # If exit code is 0 or 10, MHA does not abort
  my $exit_code = 10;
  eval {
  my $new_master_handler = new MHA::DBHelper();
  # args: hostname, port, user, password, raise_error_or_not
  $new_master_handler->connect( $new_master_ip, $new_master_port,
  $new_master_user, $new_master_password, 1 );
  ## Set read_only=0 on the new master
  $new_master_handler->disable_log_bin_local();
  print current_time_us() . " Set read_only=0 on the new master.\n";
  $new_master_handler->disable_read_only();
  ## Creating an app user on the new master
  print current_time_us() . " Creating app user on the new master..\n";
  #FIXME_xxx_create_app_user($new_master_handler);
  $new_master_handler->enable_log_bin_local();
  $new_master_handler->disconnect();
  ## Update master ip on the catalog database, etc
  `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
  $exit_code = 0;
  };
  if ($@) {
  warn "Got Error: $@\n";
  exit $exit_code;
  }
  exit $exit_code;
  }
  elsif ( $command eq "status" ) {
  # do nothing
  exit 0;
  }
  else {
  &usage();
  exit 1;
  }
  }
  sub usage {
  print
  "Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
  die;
  }
(3) 编辑脚本send_report,修改后如下:
  #!/usr/bin/perl
  use strict;
  use warnings FATAL => 'all';
  use Mail::Sender;
  use Getopt::Long;
  #new_master_host and new_slave_hosts are set only when recovering master succeeded
  my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
  my $smtp='smtp.163.com';
  my $mail_from='xxxx';
  my $mail_user='xxxxx';
  my $mail_pass='xxxxx';
  my $mail_to=['xxxx','xxxx'];
  GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
  );
  mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
  sub mailToContacts {
  my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
  open my $DEBUG, "> /tmp/monitormail.log"
  or die "Can't open the debug      file:$!\n";
  my $sender = new Mail::Sender {
  ctype       => 'text/plain; ',
  encoding    => 'utf-8',
  smtp        => $smtp,
  from        => $mail_from,
  auth        => 'LOGIN',
  TLS_allowed => '0',
  authid      => $user,
  authpwd     => $passwd,
  to          => $mail_to,
  subject     => $subject,
  debug       => $DEBUG
  };
  $sender->MailMsg(
  {   msg   => $msg,
  debug => $DEBUG
  }
  ) or print $Mail::Sender::Error;
  return 1;
  }
  # Do whatever you want here
  exit 0;
五、MHA的日常管理
1. 检查SSH配置(node5 Monitor 监控节点上操作),如下:
  # masterha_check_ssh --conf=/etc/masterha/app1.cnf
  Sun May  1 22:05:12 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sun May  1 22:05:12 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
  Sun May  1 22:05:12 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
  Sun May  1 22:05:12 2016 - [info] Starting SSH connection tests..
  Sun May  1 22:05:14 2016 - [debug]
  Sun May  1 22:05:12 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..
  Sun May  1 22:05:13 2016 - [debug]   ok.
  Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..
  Sun May  1 22:05:13 2016 - [debug]   ok.
  Sun May  1 22:05:14 2016 - [debug]
  Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..
  Sun May  1 22:05:13 2016 - [debug]   ok.
  Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..
  Sun May  1 22:05:14 2016 - [debug]   ok.
  Sun May  1 22:05:14 2016 - [debug]
  Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..
  Sun May  1 22:05:14 2016 - [debug]   ok.
  Sun May  1 22:05:14 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..
  Sun May  1 22:05:14 2016 - [debug]   ok.
  Sun May  1 22:05:14 2016 - [info] All SSH connection tests passed successfully.
2. 检查整个复制环境状况(node5 监控节点上操作),如下:
  # masterha_check_repl --conf=/etc/masterha/app1.cnf
  Sun May  1 22:46:44 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
  Sun May  1 22:46:44 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
  Sun May  1 22:46:44 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
  Sun May  1 22:46:44 2016 - [info] MHA::MasterMonitor version 0.56.
  Sun May  1 22:46:45 2016 - [info] GTID failover mode = 0
  Sun May  1 22:46:45 2016 - [info] Dead Servers:
  Sun May  1 22:46:45 2016 - [info] Alive Servers:
  Sun May  1 22:46:45 2016 - [info]   192.168.1.121(192.168.1.121:3306)
  Sun May  1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)
  Sun May  1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)
  Sun May  1 22:46:45 2016 - [info] Alive Slaves:
  Sun May  1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  Sun May  1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)
  Sun May  1 22:46:45 2016 - [info]     Primary candidate for the new Master (candidate_master is set)
  Sun May  1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled
  Sun May  1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)
  Sun May  1 22:46:45 2016 - [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)
  Sun May  1 22:46:45 2016 - [info] Checking slave configurations..
  Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).

  Sun May  1 22:46:45 2016 - [warning] >  Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).

  Sun May  1 22:46:45 2016 - [warning] >  Sun May  1 22:46:45 2016 - [info] Checking replication filtering settings..
  Sun May  1 22:46:45 2016 - [info]  binlog_do_db= , binlog_ignore_db=
  Sun May  1 22:46:45 2016 - [info]  Replication filtering check ok.
  Sun May  1 22:46:45 2016 - [info] GTID (with auto-pos) is not supported
  Sun May  1 22:46:45 2016 - [info] Starting SSH connection tests..
  Sun May  1 22:46:46 2016 - [info] All SSH connection tests passed successfully.
  Sun May  1 22:46:46 2016 - [info] Checking MHA Node version..
  Sun May  1 22:46:47 2016 - [info]  Version check ok.
  Sun May  1 22:46:47 2016 - [info] Checking SSH publickey authentication settings on the current master..
  Sun May  1 22:46:47 2016 - [info] HealthCheck: SSH to 192.168.1.121 is reachable.
  Sun May  1 22:46:47 2016 - [info] Master MHA Node version is 0.56.
  Sun May  1 22:46:47 2016 - [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..
  Sun May  1 22:46:47 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008
  Sun May  1 22:46:47 2016 - [info]   Connecting to root@192.168.1.121(192.168.1.121:22)..
  Creating /tmp if not exists..    ok.
  Checking output directory is accessible or not..
  ok.
  Binlog found at /usr/local/mysql/data/, up to master-bin.000008
  Sun May  1 22:46:48 2016 - [info] Binlog setting check done.
  Sun May  1 22:46:48 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
  Sun May  1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
  Sun May  1 22:46:48 2016 - [info]   Connecting to root@192.168.1.122(192.168.1.122:22)..
  Checking slave recovery environment settings..
  Opening /usr/local/mysql/data/relay-log.info ... ok.

  >
  Temporary>  Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
  done.
  Testing mysqlbinlog output.. done.
  Cleaning up test file(s).. done.
  Sun May  1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx
  Sun May  1 22:46:48 2016 - [info]   Connecting to root@192.168.1.123(192.168.1.123:22)..
  Checking slave recovery environment settings..
  Opening /usr/local/mysql/data/relay-log.info ... ok.

  >
  Temporary>  Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.
  done.
  Testing mysqlbinlog output.. done.
  Cleaning up test file(s).. done.
  Sun May  1 22:46:48 2016 - [info] Slaves settings check done.
  Sun May  1 22:46:48 2016 - [info]
  192.168.1.121(192.168.1.121:3306) (current master)
  +--192.168.1.122(192.168.1.122:3306)
  +--192.168.1.123(192.168.1.123:3306)
  Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.122..
  Sun May  1 22:46:48 2016 - [info]  ok.
  Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.123..
  Sun May  1 22:46:48 2016 - [info]  ok.
  Sun May  1 22:46:48 2016 - [info] Checking master_ip_failover_script status:
  Sun May  1 22:46:48 2016 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306
  IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===
  Checking the Status of the script.. OK
  Sun May  1 22:46:48 2016 - [info]  OK.
  Sun May  1 22:46:48 2016 - [warning] shutdown_script is not defined.
  Sun May  1 22:46:48 2016 - [info] Got exit code 0 (Not master dead).
  MySQL Replication Health is OK.
  [root@node5 masterha]#
3. 开启MHA Manager监控(node5操作)如下:
  # mkdir -p  /var/log/masterha/app1/
  # nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 &
  参数说明:
  --remove_dead_master_conf     #该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。
  --manger_log                  #日志存放位置
  --ignore_last_failover        #在缺省情况下,如果MHA检测到连续发生宕机,会生成app1.failover.complete文件,会造成MHA管理进程无法启动。
4. 查看MHA Manager监控是否正常:
  # masterha_check_status --conf=/etc/masterha/app1.cnf
  app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121
5. 查看启动日志(node5操作)如下:
  # tail -n20 /var/log/masterha/app1/manager.log
6. 关闭MHA Manage监控:
(1) 关闭
  # masterha_stop --conf=/etc/masterha/app1.cnf
(2) 启动
  # nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1
六、MHA Failover切换
1. 自动Failover切换
  (1) 模拟master mysql关闭
  (2) VIP将会切换到node2
  (3) /etc/masterha/app1.cnf中将原主服务器配置文件清掉。
  (4) masterha_manager监控进程会自动退出关闭,并在/var/log/masterha/app1下生成app1.failover.complete文件,manager.log会记录全过程,从服务器会自动从新的主服务器复制。
  (5) 原主服务器mysqld启动的,需要清掉/var/log/masterha/app1下生成app1.failover.complete文件,添加node1配置文件到/etc/masterha/app1.cnf,通过manager.log中的记录的故障点,重新同步主服务器,成为从节点。
2. 手动Failover切换
  (1) 先停MHA Manager进程。
  masterha_stop --conf=/etc/masterha/app1.cnf
  (2) 停掉master mysqld
  (3) 手动切换,在Manager主机上操作如下:
  # masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover
  通过观察日志可以观察切换全过程。
  (4) 如上节方式恢复节点为从服务器。
3.  正常运行情况下切换(Master正在运行)
  等补充。
4. 小结
  通过对MMM,MHA的环境搭建测试,MHA由于采用复制架构,原理简单,在一些对数据要求比较高的环境,为了保证可靠性,最好与半同步结合使用。


运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-607051-1-1.html 上篇帖子: Corosync+Pacemaker+DRBD+Mysql高可用HA配置 下篇帖子: tomcat+mysql+xwiki搭建开源wiki系统
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表