2331 发表于 2016-1-7 09:53:31

corosycn&pacemaker的高可用web集群


基本拓扑:两台高可用节点:node1:192.168.191.112node2:192.168.191.113NFS服务器:192.168.191.111web服务的流动IP:192.168.191.199
一、准备工作:1).node1---node2 基于主机名通信1.编辑/etc/hosts文件添加如下内容192.168.191.112 node1.liaobin.com node1192.168.191.113 node2.liaobin.com node22.编辑/etc/system/network文件分别修改主机名为node1.liaobin.com和node2.liaobin.com3.重启
2). 时间同步,用ntpd服务器(为了图个方便我试用了data命令将两个节点时间改为一样)# date -s 11:11:113).node1---node2 基于ssh免密码登陆node1:# ssh-keygen -t rsa# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
node2:# ssh-keygen -t rsa# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node13).安装 corosyncpacemaker(yum源指向CD1即可)# yum install -y corosync pacemaker
二、配置corosync(node1上进行)1).复制配置模板为配置文件# cd /etc/corosync# cp corosync.conf.example corosync.conf2).编辑/etc/corosync/corosync.conf(只列出需要改变的配置,以及添加的配置)
————————————修改————————————————
secauth: on   #开启加密功能(若开启,则需要使用corosync-keygen命令生成密钥)bindnetaddr: 192.168.191.0   #设置网络地址,切记是网络地址mcastaddr: 239.25.11.12   #设置多播地址用于传输心跳信息to_logfile: yes    #使用本机文件记录日志logfile: /var/log/cluster/corosync.log      #指明日志文件位置to_syslog: no   #关闭rsyslog日志
————————————添加————————————————
#pacemaker以corosync的插件方式运行,跟随corosync启动而启动service {   ver:0name: pacemaker# use_mgmtd: yes    #以守护进程方式运行,貌似没用,可有可无此项 }
#可有可无的配置,以root用户运行aisexec {      user: root   group:root}3).运行corosync-keygen命令生成密钥文件authkey(直接运行即可)#corosync-keygen4).复制corosync配置见和authkey给另一个节点node2# cd /etc/corosync/# scp corosync.conf authkey node2:/etc/corosync
三、测试corosync能否成功启动(两个节点node1,node2都要做测试)1).查看corosync引擎是否正常启动:# service corosync start; ssh node2 'service corosync start'# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.logMar 26 21:30:29 corosync Corosync Cluster Engine ('1.4.7'): started and ready to provide service.Mar 26 21:30:29 corosync Successfully read main configuration file '/etc/corosync/corosync.conf'.Mar 26 21:31:06 corosync Corosync Cluster Engine exiting with status 0 at main.c:2055.
2).查看初始化成员节点通知是否正常发出:# grepTOTEM/var/log/cluster/corosync.logMar 26 21:30:29 corosync Initializing transport (UDP/IP Multicast).Mar 26 21:30:29 corosync Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).Mar 26 21:30:29 corosync The network interface is now up.
3).检查启动过程中是否有错误产生。下面的错误信息表示packmaker不久之后将不再作为corosync的插件运行,因此,建议使用cman作为集群基础架构服务;此处可安全忽略。# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resourcesMar 26 15:41:56 corosync ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.Mar 26 15:41:56 corosync ERROR: process_ais_conf:Please see Chapter 8 of 'Clusters from Scratch' (http://www.clusterlabs.org/doc) for details on using Pacemaker with CMAN
4).查看pacemaker是否正常启动:# grep pcmk_startup /var/log/cluster/corosync.logMar 26 15:41:56 corosync info: pcmk_startup: CRM: InitializedMar 26 15:41:56 corosync Logging: Initialized pcmk_startupMar 26 15:41:56 corosync info: pcmk_startup: Maximum core file size is: 18446744073709551615Mar 26 15:41:56 corosync info: pcmk_startup: Service: 9Mar 26 15:41:56 corosync info: pcmk_startup: Local hostname: node1.liaobin.com
四、安装crmsh(两个节点都安装,方便查看状态)注意:crmsh依赖于pssh,因此需要一并下载。程序版本:pssh-2.3.1-2.el6.x86_64.rpm,crmsh-2.1-1.6.x86_64.rpm1).安装:#yum -y --nogpgcheck localinstall crmsh*.rpm pssh*.rpm2).查看节点状态:# crm statusLast updated: Thu Mar 26 21:45:07 2015Last change: Thu Mar 26 17:21:29 2015Stack: classic openais (with plugin)Current DC: node2.liaobin.com - partition with quorum   说明DC为node2Version: 1.1.11-97629de2 Nodes configured, 2 expected votes3 Resources configured

Online: [ node1.liaobin.com node2.liaobin.com ]    说明node1 node2都已经上线
五、配置nfs服务以及node1,node2的httpd服务1).nfs服务配置:# mkdir /shared# echo "/shared   192.168.191.*(rw)" >> /etc/exports# service nfs restart2).node1配置:# echo " nfs">/var/www/html/index.html# chkconfig httpd off# service httpd stop3).node2配置:# echo "nfs">/var/www/html/index.html# chkconfig httpd off# service httpd stop六、配置集群(node1上操作)1).关闭stonith设备,此默认配置目前尚不可用验证:(若出现下列错误消息,则需要关闭stonith设备)# crm_verify -L -V   error: unpack_resources:   Resource start-up disabled since no STONITH resources have been defined   error: unpack_resources:   Either configure some or disable STONITH with the stonith-enabled option   error: unpack_resources:   NOTE: Clusters with shared data need STONITH to ensure data integrityErrors found during check: config not valid-V may provide more details关闭:# crm configure property stonith-enabled=false#no-quorum-policy=ignore# 当只有两个节点时需要设置。两节点以上时不要进行设置。2).查看当前配置信息:# crm configure show3).开始添加资源# crm进入配置模式crm(live)# configure配置IP地址检测间隔时间为10s超时时长为20scrm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.191.199 op monitor interval=10s timeout=20s每次做好配置之后,应当用verify检测下有无错误crm(live)configure# verify配置挂载nfs,且启动时超时时长为60s,停止时超时时长为60scrm(live)configure# primitive nfsserver ocf:heartbeat:Filesystem params device=192.168.191.111:/shared directory=/var/www/html fstype=nfs op monitor interval=20s timeout=40s op start timeout=60s op stop timeout=60scrm(live)configure# verify配置httpd服务,检测间隔时长10s,超时时长20scrm(live)configure# primitive webserver lsb:httpd op monitor interval=10s timeout=20scrm(live)configure# verify新建一个组webservice包含 webip nfsserver webserver资源,注意顺序crm(live)configure# group webservice webip nfsserver webserver将资源组webservice对node1的倾向性设置为100,作为webservice组资源启动时候默认启动的节点crm(live)configure# location web_on_node1webservice rule 100: uname eq node1.liaobin.com设置粘性为50,目的是让node1下线后,资源转移到node2上以后,node1上线后不争抢资源。如果node1性能比node2好很多,那么则可以不设置此项,让node1拿回资源。crm(live)configure# property default-resource-stickiness=50查看定义的资源crm(live)configure# show使用cd..返回上一级菜单crm(live)configure# cd ..使用status查看状态,可以看到此时资源运行在node1节点上crm(live)# statusLast updated: Thu Mar 26 22:34:05 2015Last change: Thu Mar 26 22:22:51 2015Stack: classic openais (with plugin)Current DC: node2.liaobin.com - partition with quorumVersion: 1.1.11-97629de2 Nodes configured, 2 expected votes3 Resources configured

Online: [ node1.liaobin.com node2.liaobin.com ]
Resource Group: webservice   webip      (ocf::heartbeat:IPaddr):      Started node1.liaobin.com   nfsserver      (ocf::heartbeat:Filesystem):      Started node1.liaobin.com   webserver      (lsb:httpd):      Started node1.liaobin.com浏览器测试访问:
使用node进入node菜单crm(live)# node使用standby命令让node1进入standby模式crm(live)node# standby切换主机到node2:# crmcrm(live)node# cd ..crm(live)# statusLast updated: Thu Mar 26 22:37:14 2015Last change: Thu Mar 26 22:35:46 2015Stack: classic openais (with plugin)Current DC: node2.liaobin.com - partition with quorumVersion: 1.1.11-97629de2 Nodes configured, 2 expected votes3 Resources configured

Node node1.liaobin.com: standbyOnline: [ node2.liaobin.com ]
Resource Group: webservice   webip      (ocf::heartbeat:IPaddr):      Started node2.liaobin.com   nfsserver      (ocf::heartbeat:Filesystem):      Started node2.liaobin.com   webserver      (lsb:httpd):      Started node2.liaobin.com可以看到此时资源已经切换到node2上了浏览器测试:浏览器访问成功,说明高可用集群已经在正常工作了。切换到主机node1:使用命令online让node1上线crm(live)node# onlinecrm(live)node# cd ..crm(live)# statusLast updated: Thu Mar 26 22:39:06 2015Last change: Thu Mar 26 22:38:58 2015Stack: classic openais (with plugin)Current DC: node2.liaobin.com - partition with quorumVersion: 1.1.11-97629de2 Nodes configured, 2 expected votes3 Resources configured

Online: [ node1.liaobin.com node2.liaobin.com ]
Resource Group: webservice   webip      (ocf::heartbeat:IPaddr):      Started node2.liaobin.com   nfsserver      (ocf::heartbeat:Filesystem):      Started node2.liaobin.com   webserver      (lsb:httpd):      Started node2.liaobin.com可以看到此时资源依然在node2上,并没有切换到倾向性高的node1。



页: [1]
查看完整版本: corosycn&pacemaker的高可用web集群