heartbeat v1+NFS实现web高可用集群（一）

cxg518 发表于 2019-1-7 11:47:21

http://s3.运维网.com/wyfs02/M02/24/88/wKioL1NRNRniGhz1AADz945EOfw699.jpg

第一部分：集群中事务决策各层简介

· Messageing Layer：心跳信息传递层
· ha_aware ：集群事务决策软件。自己能够利用底层心跳信息传递层的功能，调用他的api完成事务决策的软件。
· DC:Designated Coordinator：（选定的协调员）为了防止主节点挂掉之后从节点争抢。有多从节点推举产生。
· CRM：（董事长）Cluster Resources Manager：负责做出决策。高可用集群中，任何资源都不应该自行启动，而是用CRM管理启动与否。
· LRM：（总经理）local resources Manager;让CRM的决策落实执行。真正管理本地资源，让本地资源启动停止，和状态监控。
· RA:resource agent ：RA能够接受CRM的调度用于实现在节点上对某一个资源进行管理的工具，这个工具通常就是脚本。任何一个资源配置都需要依赖一个脚本或者程序。（需要接受四个参数｛start|stop|status|restart｝status：输出状态只能是running和stopped
failover：失效转移，故障转移
failback：失效转回，故障转回

每个层次中所用到的软件
· Messaging Layer：
heartbeat v1，v2,v3
corosync（openAIS）：可用性事务委员会：定义开放的工业标准，为了能够让其理念让大家熟悉，openAIS推出了一个样例模版。corosync
cman（红帽）：集群管理器。
· CRM:集群资源管理器：只要接口和信息层兼容，就可以独立使用。
heartbeat v1：haresour：配置文件；只是一个配置接口，
heartbeat v2:crm：(各节点都运行进程crmd；端口5566,客户端crmsh（shell）用户体验差）：由于其不易配置，有人提供了接口heartbeat-GUI
heartbeat v3：heartbeat+pacemaker+ cluster-glue（黏合器）：
pacemaker：（独立成了一个项目pacemaker）
配置接口:第三方开放配置工具，
CLI:命令行工具，crm（suse），pcs，---python语言研发
GUI:hawk（web界面），LCMC（窗口界面），pacemaker-mgmt
cman + rgmanager：
resource group manager：Failover Domanin：资源组管理器
配置接口：
RHCS：RedHat Cluster Suite
配置接口：Conga（完全生命周期的配置接口）安装，配置，资源，启动
· RA的类型。
heartbeat legacy：heartbeat的传统类型
LSB：脚本/etc/rc.d/init.d/*
OCF:Open Cluster Framework：开放集群框架
provider:pacemaker：资源代理脚本的组织
   linbit    ：提供的资源代理
STONITH:节点隔离
· keepalived:命令。轻量级，跟前边的风格不同。借助vrrp协议完成ip地址资源流转，并利用自己内部的实现脚本调用的接口完成高可用功能。vrrp：虚拟路由冗余协议。
应用场景：
   keepalived+ipvs
   keepalived+haproxy
· RHEL OR CentOS高可用集群解决方案：
红帽5：
　　　自带：RHCS(camn+rgmanager）
　　　选用第三方：corosync+pacemaker，heartbeat（v1或v2），keepalived
红帽6：
   自带：RHCS(cman+rgmanager)
      corosync+rgmanager
      cman+pacemaker
      heartbeat v3 + pacemaker
　　　　 keepalived
应用方法：
　　　做前端负载均衡的高可用：keepalived
　　　做大规模的高可用集群：corosync或（cman+pacemaker）支持多大100个节点

使用共享存储的时候，为了加快速度，存储元数据会保存的内存中，如果有两个集群节点同时对数据写操作，会导致文件系统崩溃。

实现资源隔离的方式：
可能需要借助硬件；1、硬件芯片（需要认证机制）。2、切断电源交换机。3、ssh

两个节点的集群是个特殊集群，断开的时候无法决策，此时可以使用ping node判断。
仲裁设备：
1 ping node ping group
2 qdisk：（红帽）选择一个硬盘
备用节点获取权限的时候为了放至其他集群访问资源，造成文件系统崩溃。需要让其他集群节点彻底kill掉。

第二部分：具体配置布置
　　 1、确保两个节点主机名和解析，时间一至。时间可以使用ntp服务器来统一，请自行安装。在两台node上做相同配置

# ntpdate 172.16.0.1
17 Apr 22:12:41 ntpdate: adjust time server 172.16.0.1 offset -0.001233 sec
# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=node1.nyist.com
# uname -n
node1.nyist.com
# vim /etc/hosts
172.16.20.31 node1.nyist.com node1
172.16.20.32 node2.nyist.com node2
# ping node1
PING node1.syist.com (172.16.20.31) 56(84) bytes of data.
64 bytes from node1.syist.com (172.16.20.31): icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from node1.syist.com (172.16.20.31): icmp_seq=2 ttl=64 time=0.0　　 2、配置两节点实现基于ssh的密钥认证：目的是为了方便使用ssh命令管理对方节点。简化操作，当然，你要是有精力一次又一次的输入秘密验证，那此步骤也可以省略。（两个节点同时做一下操作）
# ssh-keygen -t rsa -P ‘’
# ls .ssh/
authorized_keysid_rsaid_rsa.pub
# ssh-copy-id -i .ssh/id_rsa.pub root@node2.nyist.com
# ssh node2
Last login: Thu Apr 17 22:24:51 2014 from 172.16.20.55
#　　 3、Ok开始安装程序吧、（双节的都要安装）
# yum install perl-TimeDate PyXML libnet net-snmp-libs首先用yum先解决一些依赖。然后用rpm安装，注意，安装heartbeat的时候不要用yum，因为用yum安装会因为版本问题替换掉heartbeat所依赖的一下包。
# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm 4、安装好了让我们来配置吧。安装好heartbeat后，并没有生成配置文件，但是heartbeat提供了配置文件样本，所以需要我们将其copy过来。注意authkey的访问权限需要是600，因为它是跟安全相关的。
# cp /usr/share/doc/heartbeat-2.1.4/authkeys /etc/ha.d
# cp /usr/share/doc/heartbeat-2.1.4/ha.cf /etc/ha.d
# cp /usr/share/doc/heartbeat-2.1.4/haresources /etc/ha.d 5、修改配置文件参数时其适用你的要求
# vim authkeys
auth 2
#1 crc
2 sha1 redhat
#3 md5 Hello!
# chmod 600 authkeys
# vim ha.cf--------主配置文件
#debugfile /var/log/ha-debug------调试的时候开启用
File to write other messages to
#
logfile /var/log/ha-log---------------日志记录位置
#logfacility local0--------------也可以local7收日志
keepalive 1500ms-------------------心跳检测时间间隔
#
#    deadtime: how long-to-declare-host-dead?
deadtime 6--------------------确认死亡时间
warntime 3-------------------节点掉线警告
#    serialserialportname ...---串行心跳线时候使用的
#serial /dev/ttyS0    # Linux
#serial /dev/cuaa0    # FreeBSD
#serial /dev/cuad0    # FreeBSD 6.x
#serial /dev/cua/a    # Solaris
mcast eth0 225.0.100.1 694 1 0---------多播方式通知
auto_failback on-------------------是否实现自动转回。
#stonith baytech /etc/ha.d/conf/stonith.baytech-------stonith设备，有设备了可以启用
node node1.nyist.com-------------------标识节点个数
node node2.nyist.com
#debug 1------------------------------------调试级别
#compression bz2-------------压缩
#compression_threshold 2　　分别给两个节点提供两个简单的验证页面然后验证httpd是否正常服务。切记不能让服务开机自动启动
# curl node1
172.16.20.31@@@node1
# curl node2
curl: (7) couldn't connect to host
# curl node2
172.16.20.32@@@node2
# chkconfig httpd off
# ssh node2 'chkconfig httpd off'　　 6、定义资源管理器开始提供服务。
# vim haresources
node1.nyist.com 172.16.20.100/16/eth0 httpd
定义虚拟IP:172.16.20.100　　（定义资源：1.ip，2.httpd服务）
　　Ip:会自动去/etc/ha.d/resource.d目录下找IP定义的脚本定义IP
　　定义的Httpd服务也会去次目录还有/etc/rc.d/init.d/目录下找相应的服务
　　# ls /etc/ha.d/resource.d/
　　apache          IPaddr          OCF
　　AudibleAlarm    IPaddr2          portblock
　　db2             IPsrcaddr       Raid1
　　Delay          IPv6addr       SendArp
　　Filesystem       LinuxSCSI       ServeRAID
　　hto-mapfuncs    LVM             WAS
　　ICP             LVSSyncDaemonSwapWinPopup
　　ids             MailTo          Xinetd
　　Cp到节点二
　　# scp ha.cf authkeys haresources root@node2:/etc/ha.d
　　 7、启动服务
# service heartbeat start
logd is already running
Starting High-Availability services:
2014/04/17_23:46:35 INFO:Resource is stopped
2014/04/17_23:46:35 INFO:Resource is stopped
Done.
# ssh node2 'service heartbeat start'
# ifconfig
eth0    Link encap:EthernetHWaddr 00:0C:29:02:06:8E
inet addr:172.16.20.31Bcast:172.16.255.255Mask:255.255.0.0
inet6 addr: fe80::20c:29ff:fe02:68e/64 Scope:Link
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
RX packets:33997 errors:0 dropped:0 overruns:0 frame:0
TX packets:6394 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:24361024 (23.2 MiB)TX bytes:777306 (759.0 KiB)
eth0:0 Link encap:EthernetHWaddr 00:0C:29:02:06:8E
inet addr:172.16.20.100Bcast:172.16.255.255Mask:255.255.0.0
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
lo    Link encap:Local Loopback
inet addr:127.0.0.1Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNINGMTU:16436Metric:1
RX packets:78 errors:0 dropped:0 overruns:0 frame:0
TX packets:78 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:9562 (9.3 KiB)TX bytes:9562 (9.3 KiB)　　

　　 8、查看日志可以看出，有一个节点尝试连接，时间超时后自动KILL掉
# tail /var/log/ha-log -f
.heartbeat: 2014/04/18_18:41:12 info: **************************
heartbeat: 2014/04/18_18:41:12 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2014/04/18_18:41:12 info: heartbeat: version 2.1.4
heartbeat: 2014/04/18_18:41:12 info: Heartbeat generation: 1397817331
heartbeat: 2014/04/18_18:41:12 info: glib: UDP multicast heartbeat started for group 228.15.100.1 port 694 interface eth0 (ttl=1 loop=0)
heartbeat: 2014/04/18_18:41:12 info: glib: ping heartbeat started.
heartbeat: 2014/04/18_18:41:12 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2014/04/18_18:41:12 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2014/04/18_18:41:12 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2014/04/18_18:41:12 info: Local status now set to: 'up'
heartbeat: 2014/04/18_18:41:13 info: Link 172.16.0.1:172.16.0.1 up.
heartbeat: 2014/04/18_18:41:13 info: Status update for node 172.16.0.1: status ping
heartbeat: 2014/04/18_18:41:31 WARN: node node2.nyist.com: is dead
heartbeat: 2014/04/18_18:41:31 info: Comm_now_up(): updating status to active
heartbeat: 2014/04/18_18:41:31 info: Local status now set to: 'active'
heartbeat: 2014/04/18_18:41:31 WARN: No STONITH device configured.
heartbeat: 2014/04/18_18:41:31 WARN: Shared disks are not protected.
heartbeat: 2014/04/18_18:41:31 info: Resources being acquired from node2.nyist.com.
harc:2014/04/18_18:41:31 info: Running /etc/ha.d/rc.d/status status
mach_down:2014/04/18_18:41:31 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down:2014/04/18_18:41:31 info: mach_down takeover complete for node node2.nyist.com.
heartbeat: 2014/04/18_18:41:31 info: mach_down takeover complete.
heartbeat: 2014/04/18_18:41:31 info: Initial resource acquisition complete (mach_down)
IPaddr:2014/04/18_18:41:31 INFO:Resource is stopped
heartbeat: 2014/04/18_18:41:31 info: Local Resource acquisition completed.
harc:2014/04/18_18:41:31 info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp:2014/04/18_18:41:31 received ip-request-resp 172.16.20.100/16/eth0 OK yes
ResourceManager:2014/04/18_18:41:31 info: Acquiring resource group: node1.nyist.com 172.16.20.100/16/eth0 httpd
IPaddr:2014/04/18_18:41:31 INFO:Resource is stopped
ResourceManager:2014/04/18_18:41:31 info: Running /etc/ha.d/resource.d/IPaddr 172.16.20.100/16/eth0 start
IPaddr:2014/04/18_18:41:31 INFO: Using calculated netmask for 172.16.20.100: 255.255.0.0
IPaddr:2014/04/18_18:41:31 INFO: eval ifconfig eth0:0 172.16.20.100 netmask 255.255.0.0 broadcast 172.16.255.255
IPaddr:2014/04/18_18:41:31 INFO:Success
ResourceManager:2014/04/18_18:41:31 info: Running /etc/init.d/httpdstart
heartbeat: 2014/04/18_18:41:41 info: Local Resource acquisition completed. (none)
heartbeat: 2014/04/18_18:41:41 info: local resource transition completed.
heartbeat: 2014/04/18_18:43:42 info: Link node2.nyist.com:eth0 up.
heartbeat: 2014/04/18_18:43:42 info: Status update for node node2.nyist.com: status init
heartbeat: 2014/04/18_18:43:42 info: Status update for node node2.nyist.com: status up
harc:2014/04/18_18:43:42 info: Running /etc/ha.d/rc.d/status status
harc:2014/04/18_18:43:42 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2014/04/18_18:43:43 info: Status update for node node2.nyist.com: status active
harc:2014/04/18_18:43:43 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2014/04/18_18:43:44 info: remote resource transition completed.
heartbeat: 2014/04/18_18:43:44 info: node1.nyist.com wants to go standby
heartbeat: 2014/04/18_18:43:44 info: standby: node2.nyist.com can take our foreign resources
heartbeat: 2014/04/18_18:43:44 info: give up foreign HA resources (standby).
heartbeat: 2014/04/18_18:43:45 info: foreign HA resource release completed (standby).
heartbeat: 2014/04/18_18:43:45 info: Local standby process completed .
heartbeat: 2014/04/18_18:43:45 WARN: 1 lost packet(s) for
heartbeat: 2014/04/18_18:43:45 info: remote resource transition completed.
heartbeat: 2014/04/18_18:43:45 info: No pkts missing from node2.nyist.com!
heartbeat: 2014/04/18_18:43:45 info: Other node completed standby takeover of foreign resources.
Ourselves
# tail /var/log/ha-log
heartbeat: 2014/04/18_18:43:41 info: **************************
heartbeat: 2014/04/18_18:43:41 info: Configuration validated. Starting heartbeat 2.1.4
heartbeat: 2014/04/18_18:43:41 info: heartbeat: version 2.1.4
heartbeat: 2014/04/18_18:43:41 info: Heartbeat generation: 1397749464
heartbeat: 2014/04/18_18:43:41 info: glib: UDP multicast heartbeat started for group 228.15.100.1 port 694 interface eth0 (ttl=1 loop=0)
heartbeat: 2014/04/18_18:43:41 info: glib: ping heartbeat started.
heartbeat: 2014/04/18_18:43:41 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2014/04/18_18:43:41 info: G_main_add_TriggerHandler: Added signal manual handler
heartbeat: 2014/04/18_18:43:41 info: G_main_add_SignalHandler: Added signal handler for signal 17
heartbeat: 2014/04/18_18:43:42 info: Local status now set to: 'up'
heartbeat: 2014/04/18_18:43:42 info: Link 172.16.0.1:172.16.0.1 up.
heartbeat: 2014/04/18_18:43:42 info: Status update for node 172.16.0.1: status ping
heartbeat: 2014/04/18_18:43:43 info: Link node1.nyist.com:eth0 up.
heartbeat: 2014/04/18_18:43:43 info: Status update for node node1.nyist.com: status active
harc:2014/04/18_18:43:43 info: Running /etc/ha.d/rc.d/status status
heartbeat: 2014/04/18_18:43:43 info: Comm_now_up(): updating status to active
heartbeat: 2014/04/18_18:43:43 info: Local status now set to: 'active'
heartbeat: 2014/04/18_18:43:44 info: remote resource transition completed.
heartbeat: 2014/04/18_18:43:44 info: remote resource transition completed.
heartbeat: 2014/04/18_18:43:44 info: Local Resource acquisition completed. (none)
heartbeat: 2014/04/18_18:43:44 info: node1.nyist.com wants to go standby
heartbeat: 2014/04/18_18:43:45 info: standby: acquire resources from node1.nyist.com
heartbeat: 2014/04/18_18:43:45 info: acquire local HA resources (standby).
heartbeat: 2014/04/18_18:43:45 info: local HA resource acquisition completed (standby).
heartbeat: 2014/04/18_18:43:45 info: Standby resource acquisition done .
heartbeat: 2014/04/18_18:43:45 info: Initial resource acquisition complete (auto_failback)
heartbeat: 2014/04/18_18:43:45 info: remote resource transition completed.　　 9、用浏览器验证故障专业效果。
　　关闭node1节点的时候转移到node2上。重新打开node1后，转移到了node1.因为我们设置了资源优先级。
http://s3.运维网.com/wyfs02/M00/24/87/wKiom1NRDaTA7LwEAAD6tOUwPAM934.jpg
http://s3.运维网.com/wyfs02/M02/24/87/wKioL1NRDX3TytxdAACIECf6x9c361.jpg
http://s3.运维网.com/wyfs02/M01/24/87/wKiom1NRDaejk_4tAABF8dWnd0U662.jpg、
　　
　　使用nfs创建共享存储：让web使用（假如不允许同时挂载）

[*]　　创建提供NFS的服务器
　　

# mkdir /www/htdoc -p
# setfacl -m u：apache:rwx /www/htdoc/
# vim /etc/exports
/www/htdoc 172.16.0.0/16 (rw)
# vim /www/htdoc/index.html
from NFS server
# servie nfs start

[*]　　去各节点编辑资源：
　　

# vim haresources
node1.nyist.com 172.16.20.100/16/eth0 Filesystem::172.16.20.32:/www/htdoc::/var/www/html::nfs httpd
定义了三个资源有先后顺序之分
172.16.20.100/16/eth0
172.16.20.32:/www/htdoc
httpd　　

　　

[*]　　重启heartbeat服务即可见成果，测试，停掉一台node后，不仅vip回收了，而且挂载的NFS也自动卸载掉了，放置对数据资源的占有，
　　

　　

　　

　　故障转移的日志信息

IPaddr:2014/04/18_22:10:40 INFO:Resource is stopped
ResourceManager:2014/04/18_22:10:40 info: Running /etc/ha.d/resource.d/IPaddr 172.16.20.100/16/eth0 start
IPaddr:2014/04/18_22:10:40 INFO: Using calculated netmask for 172.16.20.100: 255.255.0.0
IPaddr:2014/04/18_22:10:40 INFO: eval ifconfig eth0:0 172.16.20.100 netmask 255.255.0.0 broadcast 172.16.255.255
IPaddr:2014/04/18_22:10:40 INFO:Success
Filesystem:2014/04/18_22:10:40 INFO:Resource is stopped
ResourceManager:2014/04/18_22:10:40 info: Running /etc/ha.d/resource.d/Filesystem 172.16.20.62:/www/htdoc /var/www/html nfs start
Filesystem:2014/04/18_22:10:41 INFO: Running start for 172.16.20.62:/www/htdoc on /var/www/html
Filesystem:2014/04/18_22:10:41 INFO:Success
ResourceManager:2014/04/18_22:10:41 info: Running /etc/init.d/httpdstart
mach_down:2014/04/18_22:10:41 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down:2014/04/18_22:10:41 info: mach_down takeover complete for node node1.nyist.com.
heartbeat: 2014/04/18_22:10:41 info: mach_down takeover complete.
heartbeat: 2014/04/18_22:10:41 info: Initial resource acquisition complete (mach_down)
heartbeat: 2014/04/18_22:10:51 info: Local Resource acquisition completed. (none)
heartbeat: 2014/04/18_22:10:51 info: local resource transition completed.　　测试页面
　　

http://s3.运维网.com/wyfs02/M00/24/89/wKiom1NRNeWjm3RWAABptmMllV8144.jpg
　　

　　

页: [1]

运维网's Archiver

heartbeat v1+NFS实现web高可用集群（一）