warning: pssh-2.3.1-4.2.x86_64.rpm: Header V3 RSA/SHA1 Signature, key> Preparing... ########################################### [100%]
1:python-pssh ########################################### [ 33%]
2:pssh ########################################### [ 67%]
3:crmsh ########################################### [100%]
[root@app1 crm]#
[root@app1 crm]# 3. 创建corosync配置文件,app1,app2一样。
cd /etc/corosync/
cp corosync.conf.example corosync.conf
vi /etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2
secauth: on
threads: 0
interface {
ringnumber: 0
bindnetaddr: 10.10.10.0
mcastaddr: 226.94.8.8
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
ver: 1
name: pacemaker
}
aisexec {
user: root
group: root
} 4. 创建认证文件,app1,app2一样
各节点之间通信需要安全认证,需要安全密钥,生成后会自动保存至当前目录下,命名为authkey,权限为400。
[root@app1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 128).
Press keys on your keyboard to generate entropy (bits = 192).
Press keys on your keyboard to generate entropy (bits = 256).
Press keys on your keyboard to generate entropy (bits = 320).
Press keys on your keyboard to generate entropy (bits = 384).
Press keys on your keyboard to generate entropy (bits = 448).
Press keys on your keyboard to generate entropy (bits = 512).
Press keys on your keyboard to generate entropy (bits = 576).
Press keys on your keyboard to generate entropy (bits = 640).
Press keys on your keyboard to generate entropy (bits = 704).
Press keys on your keyboard to generate entropy (bits = 768).
Press keys on your keyboard to generate entropy (bits = 832).
Press keys on your keyboard to generate entropy (bits = 896).
Press keys on your keyboard to generate entropy (bits = 960).
Writing corosync key to /etc/corosync/authkey.
[root@app1 corosync]# 5. 将刚才配置的三个文件同步至app2,同步过去后要修改ha.cf文件中的心跳IP
# scp authkeys corosync.conf root@app2:/etc/corosync/ 6. 启动corosync\pacemaker服务,测试能否正常提供服务
节点1:
[root@app1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync): [OK]
[root@app1 ~]# service pacemaker start
Starting Pacemaker Cluster Manager [OK]
配置服务开机自启动:
chkconfig corosync on
chkconfig pacemaker on
节点2:
[root@app2 ~]# service corosync start
Starting Corosync Cluster Engine (corosync): [OK]
[root@app1 ~]# service pacemaker start
Starting Pacemaker Cluster Manager [OK]
配置服务开机自启动:
chkconfig corosync on
chkconfig pacemaker on 7. 测试corosync,pacemaker,crmsh安装情况 (1) 查看节点情况
[root@app1 ~]# crm status
Last updated: Tue Jan 26 13:13:19 2016
Last change: Mon Jan 25 17:46:04 2016 via cibadmin on app1
Stack:> Current DC: app1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ app1 app2 ] (2) 查看端口启动情况
# netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 10.10.10.25:5404 0.0.0.0:* 2828/corosync
udp 0 0 10.10.10.25:5405 0.0.0.0:* 2828/corosync
udp 0 0 226.94.8.8:5405 0.0.0.0:* 2828/corosync (3) 查看日志
[root@app1 corosync]# tail -f /var/log/cluster/corosync.log
可以查看日志中关键信息:
Jan 23 16:09:30 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Jan 23 16:09:30 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
....
Jan 23 16:09:30 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jan 23 16:09:31 corosync [TOTEM ] The network interface [10.10.10.24] is now up.
Jan 23 16:09:31 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jan 23 16:09:48 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
[root@app1 corosync]# 五、配置pacemaker 1. 基本配置
corosync默认启用了stonith功能,而我们要配置的集群并没有stonith设备,因此在配置集群的全局属性时要对其禁用。
# crm
crm(live)# configure ##进入配置模式
crm(live)configure# property stonith-enabled=false ##禁用stonith设备
crm(live)configure# property no-quorum-policy=ignore ##不具备法定票数时采取的动作
crm(live)configure# rsc_defaults resource-stickiness=100 ##设置默认的资源黏性,只对当前节点有效。
crm(live)configure# verify ##校验
crm(live)configure# commit ##校验没有错误再提交
crm(live)configure# show ##查看当前配置
node app1
node app2
property cib-bootstrap-options: \
dc-version=1.1.11-97629de \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=false \
default-resource-stickiness=100 \
no-quorum-policy=ignore 2. 资源配置
#命令使用经验说明:verify报错的,可以直接退出,也可以采用edit编辑,修改正确为止。
# crm configure edit 可以直接编辑配置文件 (1) 添加VIP
不要单个资源提交,等所有资源及约束一起建立之后提交。
crm(live)configure# primitive vip ocf:heartbeat:IPaddr params ip=192.168.0.26 cidr_netmask=24 nic=eth0:1 op monitor interval=30s timeout=20s on-fail=restart
crm(live)configure# verify (2) 添加drdb服务
crm(live)configure# primitive mydrbd ocf:linbit:drbd params drbd_resource=data op monitor role=Master interval=20 timeout=30 op monitor role=Slave interval=30 timeout=30 op start timeout=240 op stop timeout=100
crm(live)configure# verify
把drbd设为主从资源:
crm(live)configure# ms ms_mydrbd mydrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
crm(live)configure# verify (3) 文件系统挂载服务:
crm(live)configure# primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/data fstype=ext4 op start timeout=60s op stop timeout=60s op monitor interval=30s timeout=40s on-fail=restart
crm(live)configure# verify (4) 创建约束,很关键,VIP,DRBD, 目录挂载均在一台节点上,而且VIP,目录挂载均依懒于主DRBD.
创建组资源,vip与mystore一起。
crm(live)configure# group g_service vip mystore
crm(live)configure# verify
创建位置约束,组资源的启动依懒于drbd主节点
crm(live)configure# colocation c_g_service inf: g_service ms_mydrbd:Master
创建位置约整,mystore存储挂载依赖于drbd主节点
crm(live)configure# colocation mystore_with_drbd_master inf: mystore ms_mydrbd:Master
启动顺序依懒,drbd启动后,创建g_service组资源
crm(live)configure# order o_g_service inf: ms_mydrbd:promote g_service:start
crm(live)configure# verify
crm(live)configure# commit (5) 增加mysql资源
crm(live)# configure
crm(live)configure# primitive mysqld lsb:mysqld op monitor interval=20 timeout=20 on-fail=restart
创建mysql服务与g_service组在一起
crm(live)configure# colocation mysqld_with_g_service inf: mysqld g_service
crm(live)configure# verify
crm(live)configure# show
创建启动顺序,mysql服务在g_service组启动之后再启动
crm(live)configure# order mysqld_after_g_service mandatory: g_service mysqld
crm(live)configure# verify
crm(live)configure# show
crm(live)configure# commit 3. 配置完成后,查看状态
[root@app1 ~]# crm status
Last updated: Fri Apr 29 14:59:14 2016
Last change: Fri Apr 29 14:59:05 2016 via cibadmin on app1
Stack:> Current DC: app1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured
Online: [ app1 app2 ]
Master/Slave Set: ms_mydrbd [mydrbd]
Masters: [ app1 ]
Slaves: [ app2 ]
mysqld (lsb:mysqld): Started app1
Resource Group: g_service
vip (ocf::heartbeat:IPaddr): Started app1
mystore (ocf::heartbeat:Filesystem): Started app1
[root@app1 ~]# 4. 模拟故障切换 (1) app1上操作standby
[root@app1 mysql]# crm node standby app1 (2) app1再查看切换状态:状态转移都很成功。
[root@app1 ~]# crm status
Last updated: Fri Apr 29 15:12:01 2016
Last change: Fri Apr 29 15:01:49 2016 via crm_attribute on app1
Stack:> Current DC: app1 - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
5 Resources configured
Node app1: standby
Online: [ app2 ]
Master/Slave Set: ms_mydrbd [mydrbd]
Masters: [ app2 ]
Stopped: [ app1 ]
mysqld (lsb:mysqld): Started app2
Resource Group: g_service
vip (ocf::heartbeat:IPaddr): Started app2
mystore (ocf::heartbeat:Filesystem): Started app2
[root@app1 ~]# (3) app2上就可以测试mysql登录了:
[root@app2 ~]# mysql -uroot -padmin
Warning: Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection> Server version: 5.6.29-log MySQL Community Server (GPL)
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> \q
Bye (4) app2上查看drbd挂载目录情况
[root@app2 ~]# df -h
Filesystem > /dev/mapper/vg_app2-lv_root 36G 5.0G 29G 16% /
tmpfs 1004M 29M 976M 3% /dev/shm
/dev/sda1 485M 39M 421M 9% /boot
/dev/drbd0 5.0G 249M 4.5G 6% /data
[root@app2 ~]#
[root@app2 ~]#
#说明:切换测试时有时会出现警告提示,影响真实状态查看,可以采用如下方式清除,提示哪个资源报警就清哪个,清理后,再次crm status查看状态显示正常。
Failed actions:
mystore_stop_0 on app1 'unknown error' (1): call=97, status=complete, last-rc-change='Tue Jan 26 14:39:21 2016', queued=6390ms, exec=0ms
[root@app1 ~]# crm resource cleanup mystore
Cleaning up mystore on app1
Cleaning up mystore on app2
Waiting for 2 replies from the CRMd.. OK
[root@app1 ~]# 5. 配置小结
在切换的过程中最大的问题就是DRBD的同步问题,必竟数据都在磁盘上,如果不同步就会造成数据不一致的问题,standby模拟切换其实不能真实模拟drbd的故障转移的。因为在故障转移之后,drbd被stop之后,从库接管主节点会从因stop之后会出现unknownn状态,这时候需要做会数据初始化同步。