HA Cluster—Corosync+NFS实现LAMP高可用

而温柔 · 发表于 2014-4-24 09:53:35

本次试验是基于Corosync和NFS对LAMP做高可用；

在一台宕机后另一台可以接替。

很多原理性的理论：HA Cluster—heartbeat v2基于crm配置有介绍；这里就不作介绍；直接进入配置阶段。

一、准备环境

服务器	IP	主机名
httpd+php+mysql	192.168.0.111	node1.soul.com
httpd+php+mysql	192.168.0.112	node2.soul.com
NFS	192.168.0.113	nfs.soul.com
VIP	192.168.0.222

同步时间

#为了方便，这里单独使用了一台ansible机器，并非实验必须
[iyunv@ansible ~]# ansible nodes -a "date"
node1.soul.com | success | rc=0 >>
Wed Apr 23 09:36:53 CST 2014
node2.soul.com | success | rc=0 >>
Wed Apr 23 09:36:53 CST 2014
nfs.soul.com | success | rc=0 >>
Wed Apr 23 09:36:53 CST 2014
对应的机器上安装软件

[iyunv@node1 ~]# rpm -q httpd php
httpd-2.2.15-29.el6.centos.x86_64
php-5.3.3-26.el6.x86_64

[iyunv@node2 ~]# rpm -q httpd php
httpd-2.2.15-29.el6.centos.x86_64
php-5.3.3-26.el6.x86_64

#两台机器都需要
[iyunv@node1 ~]# chkconfig httpd off
[iyunv@node1 ~]# chkconfig --list httpd
httpd          0:off 1:off 2:off 3:off 4:off 5:off 6:off

#分别在两台机器上安装mysql
#注意的是如果在node1上操作，那node2上就不要初始化数据库了
#这里数据库的存储目录需要指定为NFS的共享目录；且初始化的时候需要先挂载NFS
[iyunv@node1 ~]# mount
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
192.168.0.113:/webstore on /share type nfs (rw,vers=4,addr=192.168.0.113,clientaddr=192.168.0.111)
#如上所示，两台都是如此
安装mysql和nfs

[iyunv@nfs ~]# service nfs start
Starting NFS services:                                  [  OK  ]
Starting NFS quotas:                                     [  OK  ]
Starting NFS mountd:                                     [  OK  ]
Starting NFS daemon:                                     [  OK  ]
Starting RPC idmapd:                                     [  OK  ]
[iyunv@nfs ~]#
[iyunv@nfs ~]# exportfs -v
[iyunv@nfs ~]# exportfs -v
/webstore    192.168.0.111(rw,wdelay,no_root_squash,no_subtree_check)
/webstore    192.168.0.112(rw,wdelay,no_root_squash,no_subtree_check

#在httpd服务器上查看
[iyunv@node1 ~]# showmount -e 192.168.0.113
Export list for 192.168.0.113:
/webstore 192.168.0.112,192.168.0.111

#所有服务都准备测试好以后，全部关闭，并关闭开启自动启动。
#NFS服务需要开机启动；否则挂不上
二、安装配置corosync和pacemaker

#这里以node1操作
[iyunv@node1 ~]# rpm -q corosync pacemaker
corosync-1.4.1-17.el6.x86_64
pacemaker-1.1.10-14.el6.x86_64

#配置corosync
[iyunv@node1 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
[iyunv@node1 ~]# vim /etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
compatibility: whitetank
totem {
      version: 2          #版本号
      secauth: on          #认证
      threads: 0          #认证时并行线程数
      interface {
            ringnumber: 0 #环号码
            bindnetaddr: 192.168.0.0 #绑定的网络
            mcastaddr: 226.94.40.1    #多播地址
            mcastport: 5405          #多播端口
            ttl: 1                   #发送次数
      }
}
logging {
      fileline: off
      to_stderr: no             #输入到标准错误
      to_logfile: yes             #启用日志
      to_syslog: no             #发送到系统日志
      logfile: /var/log/cluster/corosync.log #日志路径
      debug: off                #是否开启debug
      timestamp: on             #是否开启时间戳
      logger_subsys {
            subsys: AMF
            debug: off
      }
}
amf {
      mode: disabled
}
service {
      ver: 0             #版本号
      name: pacemaker       #开启自动启动pacemaker
}
aisexec {
      user: root          #运行时用户
      group:  root
}
-- INSERT --
生成认证密钥

[iyunv@node1 corosync]# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Writing corosync key to /etc/corosync/authkey.
#可能会有与随机数不够导致需要敲键盘；可以选择敲键盘；也可以使用伪随机数
#建议没事多敲敲键盘；即保证了安全；有锻炼了身体

#拷贝authkey corosync.conf到node2
[iyunv@node1 corosync]# ls
authkey       corosync.conf.example    service.d
corosync.conf  corosync.conf.example.udpu  uidgid.d
#
[iyunv@node1 corosync]# scp -p authkey corosync.conf node2:/etc/corosync/
authkey                                                    100%  128    0.1KB/s 00:00
corosync.conf                                              100%  520    0.5KB/s 00:00
[iyunv@node1 corosync]#
#注意权限
启动测试

[iyunv@node1 ~]# service corosync start
Starting Corosync Cluster Engine (corosync):             [  OK  ]
[iyunv@node1 ~]# ssh node2 'service corosync start'
Starting Corosync Cluster Engine (corosync): [  OK  ]
[iyunv@node1 ~]#

#验证是否启动成功
#验证启动是否正常
[iyunv@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Apr 23 11:48:40 corosync [MAIN  ] Corosync Cluster Engine ('1.4.1'): started and ready to provide service.
Apr 23 11:48:40 corosync [MAIN  ] Successfully read main configuration file '/etc/corosync/corosync.conf'.

#验证初始化成员节点通知是否正常发出
[iyunv@node1 ~]# grep TOTEM /var/log/cluster/corosync.log
Apr 23 11:48:40 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Apr 23 11:48:40 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Apr 23 11:48:40 corosync [TOTEM ] The network interface [192.168.0.111] is now up.
Apr 23 11:48:41 corosync [TOTEM ] Process pause detected for 879 ms, flushing membership messages.
Apr 23 11:48:41 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Apr 23 11:48:53 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.

#查看pacemaker是否正常启动
[iyunv@node1 ~]# grep pcmk_startup /var/log/cluster/corosync.log
Apr 23 11:48:40 corosync [pcmk  ] info: pcmk_startup: CRM: Initialized
Apr 23 11:48:40 corosync [pcmk  ] Logging: Initialized pcmk_startup
Apr 23 11:48:40 corosync [pcmk  ] info: pcmk_startup: Maximum core file size is: 18446744073709551615
Apr 23 11:48:40 corosync [pcmk  ] info: pcmk_startup: Service: 9
Apr 23 11:48:40 corosync [pcmk  ] info: pcmk_startup: Local hostname: node1.soul.com
[iyunv@node1 ~]#
安装crmsh和pssh两个包

[iyunv@node1 ~]# scp -p pssh-2.3.1-2.el6.x86_64.rpm crmsh-1.2.6-4.el6.x86_64.rpm node2:/root
pssh-2.3.1-2.el6.x86_64.rpm                               100% 49KB  48.8KB/s 00:00
crmsh-1.2.6-4.el6.x86_64.rpm                               100%  484KB 483.7KB/s 00:00
[iyunv@node1 ~]#

[iyunv@node1 ~]# yum -y install crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm

[iyunv@node2 ~]# yum -y install crmsh-1.2.6-4.el6.x86_64.rpm pssh-2.3.1-2.el6.x86_64.rpm

#安装完成后即可使用crm命令来查看
[iyunv@node1 ~]# crm status
Last updated: Wed Apr 23 11:57:29 2014
Last change: Wed Apr 23 11:49:04 2014 via crmd on node2.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.soul.com node2.soul.com ]

#crm的用法：
[iyunv@node1 ~]# crm
crm(live)# help
#就是这样，一句话也说不清楚，需要慢慢研究，
This is crm shell, a Pacemaker command line interface.
Available commands:
cib             manage shadow CIBs
resource       resources management
configure       CRM cluster configuration
node          nodes management
options       user preferences
history       CRM cluster history
site          Geo-cluster support
ra             resource agents information center
status          show cluster status
help,?          show help (help topics for list of topics)
end,cd,up       go back one level
quit,bye,exit exit the program
一切准备就绪后；下面就是配置资源了，最繁琐的也是这里
三、配置高可用集群资源
首先做个简单的规划；需要配置哪些资源，先后次序后续启动时很重要的：
1、配置VIP
2、配置NFS共享存储
3、配置httpd服务
4、配置mysql服务
5、配置一个资源组；把上述资源加入该组
所需要的命令或需要使用的代理大致：

#资源类型：
crm(live)# ra
crm(live)ra# classes
lsb
ocf / heartbeat pacemaker
service
stonith
crm(live)ra#

#资源代理，可以此类推查看
crm(live)ra# list lsb
NetworkManager abrt-ccpp       abrt-oops       abrtd          acpid
atd             auditd          autofs          blk-availability  bluetooth
corosync       corosync-notifyd  cpuspeed       crond          cups
dnsmasq          firstboot       haldaemon       halt             htcacheclean
httpd          ip6tables       iptables       irqbalance       kdump
killall          libvirt-guests lvm2-lvmetad    lvm2-monitor    mdmonitor
messagebus       mysqld          netconsole       netfs          network
nfs             nfslock          ntpd             ntpdate          pacemaker
php-fpm          portreserve    postfix          psacct          quota_nld
rdisc          restorecond    rngd             rpcbind          rpcgssd
rpcidmapd       rpcsvcgssd       rsyslog          sandbox          saslauthd
single          smartd          spice-vdagentd sshd             svnserve
sysstat          udev-post       wdaemon          winbind          wpa_supplicant

#详细信息
crm(live)ra# info lsb:nfs
lsb:nfs
NFS is a popular protocol for file sharing across networks.
            This service provides NFS server functionality, which is \
            configured via the /etc/exports file.
Operations' defaults (advisory minimum):
start       timeout=15
stop       timeout=15
status       timeout=15
restart    timeout=15
force-reload  timeout=15
monitor    timeout=15 interval=15
大致了解后，添加资源：

#首先配置几个全局属性信息
#禁用stonith设备，因为这里没有该设备可以使用
crm(live)configure# property stonith-enabled=false
crm(live)configure# verify #校验下

#忽略不满足法定票数时的操作
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# verify
crm(live)configure# commit    #确认后提交

#查看配置的信息
crm(live)configure# show
node node1.soul.com
node node2.soul.com
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
配置VIP
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.0.222 op monitor interval=30s timeout=30s on-fail=restart
#参数可以help查看
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show
node node1.soul.com
node node2.soul.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.0.222" \
op monitor interval="30s" timeout="30s" on-fail="restart"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"

#commit后就可以查看状态信息
crm(live)# status
Last updated: Wed Apr 23 12:26:20 2014
Last change: Wed Apr 23 12:25:22 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node1.soul.com node2.soul.com ]
webip  (ocf::heartbeat:IPaddr): Started node1.soul.com
配置NFS共享存储

crm(live)configure# primitive webstore ocf:heartbeat:Filesystem \
params device="192.168.0.113:/webstore" \
directory="/share" fstype="nfs" \
op monitor interval=40s timeout=40s \
op start timeout=60s op stop timeout=60s
crm(live)configure# verify
crm(live)configure# show
node node1.soul.com
node node2.soul.com
primitive webip ocf:heartbeat:IPaddr \
params ip="192.168.0.222" \
op monitor interval="30s" timeout="30s" on-fail="restart"
primitive webstore ocf:heartbeat:Filesystem \
params device="192.168.0.113:/webstore" directory="/share" fstype="nfs" \
op monitor interval="40s" timeout="40s" \
op start timeout="60s" interval="0" \
op stop timeout="60s" interval="0"
property $id="cib-bootstrap-options" \
dc-version="1.1.10-14.el6-368c726" \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes="2" \
stonith-enabled="false" \
no-quorum-policy="ignore"
配置httpd服务

crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=30s on-fail=restart
crm(live)configure# verify
crm(live)configure# commit
crm(live)# status
Last updated: Wed Apr 23 13:20:54 2014
Last change: Wed Apr 23 13:20:46 2014 via crmd on node2.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
3 Resources configured
Online: [ node1.soul.com node2.soul.com ]
webip  (ocf::heartbeat:IPaddr): Started node1.soul.com
webstore (ocf::heartbeat:Filesystem): Started node2.soul.com
webserver  (lsb:httpd): Started node1.soul.com
配置mysql服务

crm(live)configure# primitive webdb lsb:mysqld op monitor interval=30s timeout=30s on-fail=restart
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Wed Apr 23 13:25:38 2014
Last change: Wed Apr 23 13:25:17 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
4 Resources configured
Online: [ node1.soul.com node2.soul.com ]
webip  (ocf::heartbeat:IPaddr): Started node1.soul.com
webstore (ocf::heartbeat:Filesystem): Started node2.soul.com
webserver  (lsb:httpd): Started node1.soul.com
webdb  (lsb:mysqld): Started node2.soul.com
配置一个资源组；并将以上资源加入该

#从上面的资源状态信息可以看出；资源会自动负载均衡到两台机器
#所以需要让其都归到一个组内
crm(live)configure# group webcluster webip webstore webserver webdb
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Wed Apr 23 13:30:25 2014
Last change: Wed Apr 23 13:29:55 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
4 Resources configured
Online: [ node1.soul.com node2.soul.com ]
Resource Group: webcluster
   webip  (ocf::heartbeat:IPaddr): Started node1.soul.com
   webstore (ocf::heartbeat:Filesystem): Started node1.soul.com
   webserver  (lsb:httpd): Started node1.soul.com
   webdb  (lsb:mysqld): Started node1.soul.com
#组添加后就可以看到资源自动转移到同一节点
此时；定义一个顺序约束；让其按照指定顺序启动/关闭

crm(live)configure# help order
Usage:
...............
      order <id> {kind|<score>}: <rsc>[:<action>] <rsc>[:<action>] ...
      [symmetrical=<bool>]
      kind :: Mandatory | Optional | Serialize

crm(live)configure# order ip_store_http_db Mandatory: webip webstore webserver webdb
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show xml
<?xml version="1.0" ?>
<cib num_updates="4" dc-uuid="node2.soul.com" update-origin="node1.soul.com" crm_feature_set="3.0.7" validate-with="pacemaker-1.2" update-client="cibadmin" epoch="14" admin_epoch="0" cib-last-written="Wed Apr 23 13:37:27 2014" have-quorum="1">
  <configuration>
<crm_config>
   <cluster_property_set id="cib-bootstrap-options">
      <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.10-14.el6-368c726"/>
      <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="classic openais (with plugin)"/>
      <nvpair id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" value="2"/>
      <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      <nvpair name="no-quorum-policy" value="ignore" id="cib-bootstrap-options-no-quorum-policy"/>
      <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1398230446"/>
   </cluster_property_set>
</crm_config>
<nodes>
   <node id="node2.soul.com" uname="node2.soul.com"/>
   <node id="node1.soul.com" uname="node1.soul.com"/>
</nodes>
<resources>
   <group id="webcluster">
      <primitive id="webip" class="ocf" provider="heartbeat" type="IPaddr">
      <instance_attributes id="webip-instance_attributes">
         <nvpair name="ip" value="192.168.0.222" id="webip-instance_attributes-ip"/>
      </instance_attributes>
      <operations>
         <op name="monitor" interval="30s" timeout="30s" on-fail="restart" id="webip-monitor-30s"/>
      </operations>
      </primitive>
      <primitive id="webstore" class="ocf" provider="heartbeat" type="Filesystem">
      <instance_attributes id="webstore-instance_attributes">
         <nvpair name="device" value="192.168.0.113:/webstore" id="webstore-instance_attributes-device"/>
         <nvpair name="directory" value="/share" id="webstore-instance_attributes-directory"/>
         <nvpair name="fstype" value="nfs" id="webstore-instance_attributes-fstype"/>
      </instance_attributes>
      <operations>
         <op name="monitor" interval="40s" timeout="40s" id="webstore-monitor-40s"/>
         <op name="start" timeout="60s" interval="0" id="webstore-start-0"/>
         <op name="stop" timeout="60s" interval="0" id="webstore-stop-0"/>
      </operations>
      </primitive>
      <primitive id="webserver" class="lsb" type="httpd">
      <operations>
         <op name="monitor" interval="30s" timeout="30s" on-fail="restart" id="webserver-monitor-30s"/>
      </operations>
      </primitive>
      <primitive id="webdb" class="lsb" type="mysqld">
      <operations>
         <op name="monitor" interval="30s" timeout="30s" on-fail="restart" id="webdb-monitor-30s"/>
      </operations>
      </primitive>
   </group>
</resources>
<constraints>
   <rsc_order id="ip_store_http_db" kind="Mandatory">
      <resource_set id="ip_store_http_db-0">
      <resource_ref id="webip"/>
      <resource_ref id="webstore"/>
      <resource_ref id="webserver"/>
      <resource_ref id="webdb"/>
      </resource_set>
   </rsc_order>
</constraints>
  </configuration>
</cib>
四、安装论坛测试

#首先查看目前资源运行于哪个节点
[iyunv@node1 ~]# crm status
Last updated: Wed Apr 23 13:52:07 2014
Last change: Wed Apr 23 13:37:27 2014 via cibadmin on node1.soul.com
Stack: classic openais (with plugin)
Current DC: node2.soul.com - partition with quorum
Version: 1.1.10-14.el6-368c726
2 Nodes configured, 2 expected votes
4 Resources configured
Online: [ node1.soul.com node2.soul.com ]
Resource Group: webcluster
   webip  (ocf::heartbeat:IPaddr): Started node1.soul.com
   webstore (ocf::heartbeat:Filesystem): Started node1.soul.com
   webserver  (lsb:httpd): Started node1.soul.com
   webdb  (lsb:mysqld): Started node1.soul.com
#可以看出都运行在node1上，然后更改下node1的httpd的配置文件
#更改对应的网页目录到NFS下的/share/www
#在到/share/www创建一个文件进行测试
[iyunv@node1 www]# vim /share/www/index.php
<h1>Page!!!!</h1>
<?php
      phpinfo();
?>
#保存测试

测试访问正常。可以安装论坛测试

测试资源转移：

[iyunv@node1 ~]# crm status

Last updated: Wed Apr 23 15:15:01 2014

Last change: Wed Apr 23 15:14:42 2014 via crm_attribute on node2.soul.com

Stack: classic openais (with plugin)

Current DC: node1.soul.com - partition with quorum

Version: 1.1.10-14.el6-368c726

2 Nodes configured, 2 expected votes

4 Resources configured

Online: [ node1.soul.com node2.soul.com ]

Resource Group: webcluster

webip (ocf::heartbeat:IPaddr): Started node1.soul.com

webstore (ocf::heartbeat:Filesystem): Started node1.soul.com

webserver (lsb:httpd): Started node1.soul.com

webdb (lsb:mysqld): Started node1.soul.com

#目前查看运行在node1上，现在让node1节点standby：

[iyunv@node1 ~]# crm node standby node1.soul.com

[iyunv@node1 ~]# crm status

Last updated: Wed Apr 23 15:20:51 2014

Last change: Wed Apr 23 15:20:41 2014 via crm_attribute on node1.soul.com

Stack: classic openais (with plugin)

Current DC: node1.soul.com - partition with quorum

Version: 1.1.10-14.el6-368c726

2 Nodes configured, 2 expected votes

4 Resources configured

Node node1.soul.com: standby

Online: [ node2.soul.com ]

Resource Group: webcluster

webip (ocf::heartbeat:IPaddr): Started node2.soul.com

webstore (ocf::heartbeat:Filesystem): Started node2.soul.com

webserver (lsb:httpd): Started node2.soul.com

webdb (lsb:mysqld): Started node2.soul.com

#查看全部转移到node2上

#刷新网页查看下

测试一切正常；到此；LAMP的高可用已配置完成。如有问题；可以留言讨论。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

Red Hat RHCE 8 (EX294) Cert Guide

c++ size_t 和 int 的区别

HERE 使用 AWS EF 和 JFrog Artifactory 打

C++ 指针大全：从基础到进阶，一篇快速上手

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

HA Cluster—Corosync+NFS实现LAMP高可用

浏览过的版块

扫码加入运维网微信交流群