本次试验是基于Corosync和NFS对LAMP做高可用; 在一台宕机后另一台可以接替。 很多原理性的理论:HA Cluster—heartbeat v2基于crm配置有介绍;这里就不作介绍;直接进入配置阶段。 一、准备环境 服务器 | IP | 主机名 | httpd+php+mysql | 192.168.0.111 | node1.soul.com | httpd+php+mysql | 192.168.0.112 | node2.soul.com | NFS | 192.168.0.113 | nfs.soul.com | VIP
| 192.168.0.222 | 同步时间
#为了方便,这里单独使用了一台ansible机器,并非实验必须
[iyunv@ansible ~]# ansible nodes -a "date"
node1.soul.com | success | rc=0 >>
Wed Apr 23 09:36:53 CST 2014
node2.soul.com | success | rc=0 >>
Wed Apr 23 09:36:53 CST 2014
nfs.soul.com | success | rc=0 >>
Wed Apr 23 09:36:53 CST 2014
对应的机器上安装软件
[iyunv@node1 ~]# rpm -q httpd php
httpd-2.2.15-29.el6.centos.x86_64
php-5.3.3-26.el6.x86_64
[iyunv@node2 ~]# rpm -q httpd php
httpd-2.2.15-29.el6.centos.x86_64
php-5.3.3-26.el6.x86_64
#两台机器都需要
[iyunv@node1 ~]# chkconfig httpd off
[iyunv@node1 ~]# chkconfig --list httpd
httpd 0:off 1:off 2:off 3:off 4:off 5:off 6:off
#分别在两台机器上安装mysql
#注意的是如果在node1上操作,那node2上就不要初始化数据库了
#这里数据库的存储目录需要指定为NFS的共享目录;且初始化的时候需要先挂载NFS
[iyunv@node1 ~]# mount
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
192.168.0.113:/webstore on /share type nfs (rw,vers=4,addr=192.168.0.113,clientaddr=192.168.0.111)
#如上所示,两台都是如此
安装mysql和nfs
[iyunv@nfs ~]# service nfs start
Starting NFS services: [ OK ]
Starting NFS quotas: [ OK ]
Starting NFS mountd: [ OK ]
Starting NFS daemon: [ OK ]
Starting RPC idmapd: [ OK ]
[iyunv@nfs ~]#
[iyunv@nfs ~]# exportfs -v
[iyunv@nfs ~]# exportfs -v
/webstore 192.168.0.111(rw,wdelay,no_root_squash,no_subtree_check)
/webstore 192.168.0.112(rw,wdelay,no_root_squash,no_subtree_check
#在httpd服务器上查看
[iyunv@node1 ~]# showmount -e 192.168.0.113
Export list for 192.168.0.113:
/webstore 192.168.0.112,192.168.0.111
#所有服务都准备测试好以后,全部关闭,并关闭开启自动启动。
#NFS服务需要开机启动;否则挂不上
二、安装配置corosync和pacemaker
#这里以node1操作
[iyunv@node1 ~]# rpm -q corosync pacemaker
corosync-1.4.1-17.el6.x86_64
pacemaker-1.1.10-14.el6.x86_64
#配置corosync
[iyunv@node1 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
[iyunv@node1 ~]# vim /etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
version: 2 #版本号
secauth: on #认证
threads: 0 #认证时并行线程数
interface {
ringnumber: 0 #环号码
bindnetaddr: 192.168.0.0 #绑定的网络
mcastaddr: 226.94.40.1 #多播地址
mcastport: 5405 #多播端口
ttl: 1 #发送次数
}
}
logging {
fileline: off
to_stderr: no #输入到标准错误
to_logfile: yes #启用日志
to_syslog: no #发送到系统日志
logfile: /var/log/cluster/corosync.log #日志路径
debug: off #是否开启debug
timestamp: on #是否开启时间戳
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
ver: 0 #版本号
name: pacemaker #开启自动启动pacemaker
}
aisexec {
user: root #运行时用户
group: root
}
-- INSERT --
生成认证密钥
[iyunv@node1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
#可能会有与随机数不够导致需要敲键盘;可以选择敲键盘;也可以使用伪随机数
#建议没事多敲敲键盘;即保证了安全;有锻炼了身体
#拷贝authkey corosync.conf到node2
[iyunv@node1 corosync]# ls
authkey corosync.conf.example service.d
corosync.conf corosync.conf.example.udpu uidgid.d
#
[iyunv@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/
authkey 100% 128 0.1KB/s 00:00
corosync.conf 100% 520 0.5KB/s 00:00
[iyunv@node1 corosync]#
#注意权限
启动测试
[iyunv@node1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync): [ OK ]
[iyunv@node1 ~]# ssh node2 'service corosync start'
Starting Corosync Cluster Engine (corosync): [ OK ]
[iyunv@node1 ~]#
#验证是否启动成功
#验证启动是否正常
[iyunv@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Apr 23 11:48:40 corosync [MAIN ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Apr 23 11:48:40 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
#验证初始化成员节点通知是否正常发出
[iyunv@node1 ~]# grep TOTEM /var/log/cluster/corosync.log
Apr 23 11:48:40 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Apr 23 11:48:40 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 23 11:48:40 corosync [TOTEM ] The network interface [192.168.0.111] is now up.
Apr 23 11:48:41 corosync [TOTEM ] Process pause detected for 879 ms, flushing membership messages.
Apr 23 11:48:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr 23 11:48:53 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
#查看pacemaker是否正常启动
[iyunv@node1 ~]# grep pcmk_startup /var/log/cluster/corosync.log
Apr 23 11:48:40 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Apr 23 11:48:40 corosync [pcmk ] Logging: Initialized pcmk_startup
Apr 23 11:48:40 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Apr 23 11:48:40 corosync [pcmk ] info: pcmk_startup: Service: 9
Apr 23 11:48:40 corosync [pcmk ] info: pcmk_startup: Local hostname: node1.soul.com
[iyunv@node1 ~]#
安装crmsh和pssh两个包
[iyunv@node1 ~]# scp -p pssh-2.3.1-2.el6.x86_64.rpm crmsh-1.2.6-4.el6.x86_64.rpm node2:/root
pssh-2.3.1-2.el6.x86_64.rpm 100% 49KB 48.8KB/s 00:00
crmsh-1.2.6-4.el6.x86_64.rpm 100% 484KB 483.7KB/s 00:00
[iyunv@node1 ~]#
[iyunv@node1 ~]# yum -y install crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm
[iyunv@node2 ~]# yum -y install crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm
#安装完成后即可使用crm命令来查看
[iyunv@node1 ~]# crm status
Last updated: Wed Apr 23 11:57:29 2014
Last change: Wed Apr 23 11:49:04 2014 via crmd on node2.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.soul.com node2.soul.com ]
#crm的用法:
[iyunv@node1 ~]# crm
crm(live)# help
#就是这样,一句话也说不清楚,需要慢慢研究,
This is crm shell, a Pacemaker command line interface.
Available commands:
cib manage shadow CIBs
resource resources management
configure CRM cluster configuration
node nodes management
options user preferences
history CRM cluster history
site Geo-cluster support
ra resource agents information center
status show cluster status
help,? show help (help topics for list of topics)
end,cd,up go back one level
quit,bye,exit exit the program
一切准备就绪后;下面就是配置资源了,最繁琐的也是这里
三、配置高可用集群资源
首先做个简单的规划;需要配置哪些资源,先后次序后续启动时很重要的:
1、配置VIP
2、配置NFS共享存储
3、配置httpd服务
4、配置mysql服务
5、配置一个资源组;把上述资源加入该组
所需要的命令或需要使用的代理大致:
#资源类型:
crm(live)# ra
crm(live)ra# classes
lsb
ocf / heartbeat pacemaker
service
stonith
crm(live)ra#
#资源代理,可以此类推查看
crm(live)ra# list lsb
NetworkManager abrt-ccpp abrt-oops abrtd acpid
atd auditd autofs blk-availability bluetooth
corosync corosync-notifyd cpuspeed crond cups
dnsmasq firstboot haldaemon halt htcacheclean
httpd ip6tables iptables irqbalance kdump
killall libvirt-guests lvm2-lvmetad lvm2-monitor mdmonitor
messagebus mysqld netconsole netfs network
nfs nfslock ntpd ntpdate pacemaker
php-fpm portreserve postfix psacct quota_nld
rdisc restorecond rngd rpcbind rpcgssd
rpcidmapd rpcsvcgssd rsyslog sandbox saslauthd
single smartd spice-vdagentd sshd svnserve
sysstat udev-post wdaemon winbind wpa_supplicant
#详细信息
crm(live)ra# info lsb:nfs
lsb:nfs
NFS is a popular protocol for file sharing across networks.
This service provides NFS server functionality, which is \
configured via the /etc/exports file.
Operations' defaults (advisory minimum):
start timeout=15
stop timeout=15
status timeout=15
restart timeout=15
force-reload timeout=15
monitor timeout=15 interval=15
大致了解后,添加资源:
#首先配置几个全局属性信息
#禁用stonith设备,因为这里没有该设备可以使用
crm(live)configure# property stonith-enabled=false
crm(live)configure# verify #校验下
#忽略不满足法定票数时的操作
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# verify
crm(live)configure# commit #确认后提交
#查看配置的信息
crm(live)configure# show
node node1.soul.com
node node2.soul.com
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
配置VIP
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.0.222 op monitor interval=30s timeout=30s on-fail=restart
#参数可以help查看
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node node1.soul.com
node node2.soul.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.0.222" \
op monitor interval="30s" timeout="30s" on-fail="restart"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
#commit后就可以查看状态信息
crm(live)# status
Last updated: Wed Apr 23 12:26:20 2014
Last change: Wed Apr 23 12:25:22 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node1.soul.com node2.soul.com ]
webip (ocf::heartbeat:IPaddr): Started node1.soul.com
配置NFS共享存储
crm(live)configure# primitive webstore ocf:heartbeat:Filesystem \
params device="192.168.0.113:/webstore" \
directory="/share" fstype="nfs" \
op monitor interval=40s timeout=40s \
op start timeout=60s op stop timeout=60s
crm(live)configure# verify
crm(live)configure# show
node node1.soul.com
node node2.soul.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.0.222" \
op monitor interval="30s" timeout="30s" on-fail="restart"
primitive webstore ocf:heartbeat:Filesystem \
params device="192.168.0.113:/webstore" directory="/share" fstype="nfs" \
op monitor interval="40s" timeout="40s" \
op start timeout="60s" interval="0" \
op stop timeout="60s" interval="0"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
配置httpd服务
crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=30s on-fail=restart
crm(live)configure# verify
crm(live)configure# commit
crm(live)# status
Last updated: Wed Apr 23 13:20:54 2014
Last change: Wed Apr 23 13:20:46 2014 via crmd on node2.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.soul.com node2.soul.com ]
webip (ocf::heartbeat:IPaddr): Started node1.soul.com
webstore (ocf::heartbeat:Filesystem): Started node2.soul.com
webserver (lsb:httpd): Started node1.soul.com
配置mysql服务
crm(live)configure# primitive webdb lsb:mysqld op monitor interval=30s timeout=30s on-fail=restart
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Wed Apr 23 13:25:38 2014
Last change: Wed Apr 23 13:25:17 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
4 Resources configured
Online: [ node1.soul.com node2.soul.com ]
webip (ocf::heartbeat:IPaddr): Started node1.soul.com
webstore (ocf::heartbeat:Filesystem): Started node2.soul.com
webserver (lsb:httpd): Started node1.soul.com
webdb (lsb:mysqld): Started node2.soul.com
配置一个资源组;并将以上资源加入该
#从上面的资源状态信息可以看出;资源会自动负载均衡到两台机器
#所以需要让其都归到一个组内
crm(live)configure# group webcluster webip webstore webserver webdb
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Wed Apr 23 13:30:25 2014
Last change: Wed Apr 23 13:29:55 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
4 Resources configured
Online: [ node1.soul.com node2.soul.com ]
Resource Group: webcluster
webip (ocf::heartbeat:IPaddr): Started node1.soul.com
webstore (ocf::heartbeat:Filesystem): Started node1.soul.com
webserver (lsb:httpd): Started node1.soul.com
webdb (lsb:mysqld): Started node1.soul.com
#组添加后就可以看到资源自动转移到同一节点
此时;定义一个顺序约束;让其按照指定顺序启动/关闭
crm(live)configure# help order
Usage:
...............
order <id> {kind|<score>}: <rsc>[:<action>] <rsc>[:<action>] ...
[symmetrical=<bool>]
kind :: Mandatory | Optional | Serialize
crm(live)configure# order ip_store_http_db Mandatory: webip webstore webserver webdb
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show xml
<?xml version="1.0" ?>
<cib num_updates="4" dc-uuid="node2.soul.com" update-origin="node1.soul.com" crm_feature_set="3.0.7" validate-with="pacemaker-1.2" update-client="cibadmin" epoch="14" admin_epoch="0" cib-last-written="Wed Apr 23 13:37:27 2014" have-quorum="1">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-14.el6-368c726"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
<nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
<nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
<nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1398230446"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="node2.soul.com" uname="node2.soul.com"/>
<node id="node1.soul.com" uname="node1.soul.com"/>
</nodes>
<resources>
<group id="webcluster">
<primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
<instance_attributes id="webip-instance_attributes">
<nvpair name="ip" value="192.168.0.222" id="webip-instance_attributes-ip"/>
</instance_attributes>
<operations>
<op name="monitor" interval="30s" timeout="30s" on-fail="restart" id="webip-monitor-30s"/>
</operations>
</primitive>
<primitive id="webstore" class="ocf" provider="heartbeat" type="Filesystem">
<instance_attributes id="webstore-instance_attributes">
<nvpair name="device" value="192.168.0.113:/webstore" id="webstore-instance_attributes-device"/>
<nvpair name="directory" value="/share" id="webstore-instance_attributes-directory"/>
<nvpair name="fstype" value="nfs" id="webstore-instance_attributes-fstype"/>
</instance_attributes>
<operations>
<op name="monitor" interval="40s" timeout="40s" id="webstore-monitor-40s"/>
<op name="start" timeout="60s" interval="0" id="webstore-start-0"/>
<op name="stop" timeout="60s" interval="0" id="webstore-stop-0"/>
</operations>
</primitive>
<primitive id="webserver" class="lsb" type="httpd">
<operations>
<op name="monitor" interval="30s" timeout="30s" on-fail="restart" id="webserver-monitor-30s"/>
</operations>
</primitive>
<primitive id="webdb" class="lsb" type="mysqld">
<operations>
<op name="monitor" interval="30s" timeout="30s" on-fail="restart" id="webdb-monitor-30s"/>
</operations>
</primitive>
</group>
</resources>
<constraints>
<rsc_order id="ip_store_http_db" kind="Mandatory">
<resource_set id="ip_store_http_db-0">
<resource_ref id="webip"/>
<resource_ref id="webstore"/>
<resource_ref id="webserver"/>
<resource_ref id="webdb"/>
</resource_set>
</rsc_order>
</constraints>
</configuration>
</cib>
四、安装论坛测试
#首先查看目前资源运行于哪个节点
[iyunv@node1 ~]# crm status
Last updated: Wed Apr 23 13:52:07 2014
Last change: Wed Apr 23 13:37:27 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
4 Resources configured
Online: [ node1.soul.com node2.soul.com ]
Resource Group: webcluster
webip (ocf::heartbeat:IPaddr): Started node1.soul.com
webstore (ocf::heartbeat:Filesystem): Started node1.soul.com
webserver (lsb:httpd): Started node1.soul.com
webdb (lsb:mysqld): Started node1.soul.com
#可以看出都运行在node1上,然后更改下node1的httpd的配置文件
#更改对应的网页目录到NFS下的/share/www
#在到/share/www创建一个文件进行测试
[iyunv@node1 www]# vim /share/www/index.php
<h1>Page!!!!</h1>
<?php
phpinfo();
?>
#保存测试
测试访问正常。可以安装论坛测试 测试资源转移:
[iyunv@node1 ~]# crm status Last updated: Wed Apr 23 15:15:01 2014 Last change: Wed Apr 23 15:14:42 2014 via crm_attribute on node2.soul.com Stack: classic openais (with plugin) Current DC: node1.soul.com - partition with quorum Version: 1.1.10-14.el6-368c726 2 Nodes configured, 2 expected votes 4 Resources configured Online: [ node1.soul.com node2.soul.com ] Resource Group: webcluster webip (ocf::heartbeat:IPaddr): Started node1.soul.com webstore (ocf::heartbeat:Filesystem): Started node1.soul.com webserver (lsb:httpd): Started node1.soul.com webdb (lsb:mysqld): Started node1.soul.com
#目前查看运行在node1上,现在让node1节点standby: [iyunv@node1 ~]# crm node standby node1.soul.com [iyunv@node1 ~]# crm status Last updated: Wed Apr 23 15:20:51 2014 Last change: Wed Apr 23 15:20:41 2014 via crm_attribute on node1.soul.com Stack: classic openais (with plugin) Current DC: node1.soul.com - partition with quorum Version: 1.1.10-14.el6-368c726 2 Nodes configured, 2 expected votes 4 Resources configured Node node1.soul.com: standby Online: [ node2.soul.com ] Resource Group: webcluster webip (ocf::heartbeat:IPaddr): Started node2.soul.com webstore (ocf::heartbeat:Filesystem): Started node2.soul.com webserver (lsb:httpd): Started node2.soul.com webdb (lsb:mysqld): Started node2.soul.com
#查看全部转移到node2上 #刷新网页查看下
测试一切正常;到此;LAMP的高可用已配置完成。如有问题;可以留言讨论。
|