Linux 高可用（HA）集群之heartbeat基于crm进行资源管理详解

gaojinguan 发表于 2019-1-6 15:46:09

　　大纲
　　一、环境准备
　　二、拓扑准备
　　三、前提条件
　　四、安装相关软件
　　五、配置 heartbeat（crm 资源管理器）
　　六、crm资源管理器
　　七、crm图形界面配置详解
　　八、高可用集群架构回顾
　　九、crm配置资源
　　十、crm资源约束
　　十一、crm资源配置总结
　　

　　
　　
　　
　　一、环境准备
　　1.操作系统
　　CentOS 5.5 X86_64 最小化安装
　　说明：一般Heartbeat v2.x 都安装在CentOS 5.x系列中，而CentOS 6.x中都用Heartbeat v3.x。
　　2.相关软件（软件下载）

[*]　　Heartbeat 2.1.4
[*]　　Heartbeat-gui 2.1.4
[*]　　Apache 2.2.3
[*]　　Xmanager Enterprise 4
　　3.配置epel YUM源（两节点都要配置）
　　node1,node2:
# wget http://download.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
# rpm -ivh epel-release-5-4.noarch.rpm
warning: epel-release-5-4.noarch.rpm: Header V3 DSA signature: NOKEY, key ID 217521f6
Preparing...             ###########################################
1:epel-release       ###########################################
# rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-5
# yum list　　4.关闭防火墙与SELinux （两节点都要配置）
　　node1,node2:
# service iptables stop
# vim /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#    enforcing - SELinux security policy is enforced.
#    permissive - SELinux prints warnings instead of enforcing.
#    disabled - SELinux is fully disabled.
SELINUX=disabled
# SELINUXTYPE= type of policy in use. Possible values are:
#    targeted - Only targeted network daemons are protected.
#    strict - Full SELinux protection.
SELINUXTYPE=targeted　　二、拓扑准备
　　http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201042RZML.png
　　说明：有两个节点分别为node1与node2，VIP为192.168.1.200,测试机是一台Windows7主机，NFS服务器为192.168.1.208
　　三、前提条件
　　1.节点之间主机名互相解析
　　node1,node2:
# vim /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1             localhost.localdomain localhost
::1          localhost6.localdomain6 localhost6
192.168.1.201node1.test.comnode1
192.168.1.202node2.test.comnode2　　2.节点之间时间得同步
　　node1,node2:
# yum -y install ntp
# ntp 210.72.145.44
# date
2013年 08月 07日星期三 16:06:30 CST　　3.节点之间配置SSH互信
　　node1:
# ssh-keygen-t rsa -f ~/.ssh/id_rsa-P ''
# ssh-copy-id -i .ssh/id_rsa.pub root@node2.test.com
# ssh node2
# ifconfig
eth0    Link encap:EthernetHWaddr 00:0C:29:EA:CE:79
inet addr:192.168.1.202Bcast:192.168.1.255Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:feea:ce79/64 Scope:Link
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
RX packets:2591 errors:0 dropped:0 overruns:0 frame:0
TX packets:1931 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:556754 (543.7 KiB)TX bytes:465692 (454.7 KiB)
lo    Link encap:Local Loopback
inet addr:127.0.0.1Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNINGMTU:16436Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:560 (560.0 b)TX bytes:560 (560.0 b)　　node2:
# ssh-keygen-t rsa -f ~/.ssh/id_rsa-P ''
# ssh-copy-id -i .ssh/id_rsa.pub root@node1.test.com
# ssh node1
# ifconfig
eth0    Link encap:EthernetHWaddr 00:0C:29:23:76:4D
inet addr:192.168.1.201Bcast:192.168.1.255Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe23:764d/64 Scope:Link
UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
RX packets:4603 errors:0 dropped:0 overruns:0 frame:0
TX packets:3914 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:771007 (752.9 KiB)TX bytes:728391 (711.3 KiB)
lo    Link encap:Local Loopback
inet addr:127.0.0.1Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNINGMTU:16436Metric:1
RX packets:32 errors:0 dropped:0 overruns:0 frame:0
TX packets:32 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:2672 (2.6 KiB)TX bytes:2672 (2.6 KiB)　　四、安装相关软件
　　1、客户端软件（Windows 7测试机）
　　重点说明：安装Xmanager Enterprise 4，如下图
　　http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201043CpVP.pnghttp://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201043l4nX.png
　　注：得安装Xmanager Enterprise 4 所有套件，而不是只安装xshell软件。不然后面的crm图形界面无法打开的，切记！
　　2.安装heartbeat
　　heartbeat 安装组件说明

[*]　　heartbeat 核心组件*
[*]　　heartbeat-devel 开发包
[*]　　heartbeat-gui 图形管理接口 * (这个必须安装，这篇博文重点讲解图形资源管理器)
[*]　　heartbeat-ldirectord 为lvs高可用提供规则自动生成及后端realserver健康状态检查的组件
[*]　　heartbeat-pils 装载库插件接口 *
[*]　　heartbeat-stonith 爆头接口 *
　　node1,node2:
# yum -y install heartbeat*　　3.安装httpd （在这篇博文中我们还介绍Web的高可用，因为比较简单容易理解，下面的博文中我们会讲解mysql高可用）
　　node1:
# yum install -y httpd
# service httpd start
启动 httpd：                                           [确定]
# echo "node1.test.com" > /var/www/html/index.html　　测试
　　http://img1.运维网.com/attachment/201308/8/2033581_1375927894xfgl.png
# service httpd stop
停止 httpd：                                           [确定]
# chkconfig httpd off
# chkconfig httpd --list
httpd       0:关闭1:关闭2:关闭3:关闭4:关闭5:关闭6:关闭　　说明：测试完成后关闭服务，并让其开机不启动（注，httpd由heartbeat管理）
　　node2:
# yum install -y httpd
# service httpd start
启动 httpd：                                           [确定]
# echo "node2.test.com" > /var/www/html/index.html　　测试
　　http://img1.运维网.com/attachment/201308/8/2033581_1375927895tJNh.png
# service httpd stop
停止 httpd：                                           [确定]
# chkconfig httpd off
# chkconfig httpd --list
httpd       0:关闭1:关闭2:关闭3:关闭4:关闭5:关闭6:关闭　　说明：测试完成后关闭服务，并让其开机不启动（注，httpd由heartbeat管理）
　　五、配置 heartbeat（crm 资源管理器）
　　1.配置文件说明
# cd /etc/ha.d/
# ls
harcrc.dREADME.configresource.dshellfuncsshellfuncs.rpmsave、　　说明：安装好的heartbeat默认是没有配置文件的，但提供了配置文件样本
# cd /usr/share/doc/heartbeat-2.1.4/
# ls
apphbd.cfCOPYING       faqntips.txt       HardwareGuide.htmlhb_report.txt    README          rsync.txt
authkeys COPYING.LGPL    GettingStarted.htmlHardwareGuide.txt heartbeat_api.htmlRequirements.htmlstartstop
AUTHORS DirectoryMap.txtGettingStarted.txt haresources       heartbeat_api.txt Requirements.txt
ChangeLogfaqntips.html ha.cf             hb_report.html    logd.cf          rsync.html
# cp authkeys ha.cf /etc/ha.d/　　说明：其中有两个配置文件是我们需要的分别为，authkeys、ha.cf

[*]　　authkeys #是节点之间的认证key文件，我们不能让什么服务器都加入集群中来，加入集群中的节点都是需要认证的
[*]　　ha.cf #heartbeat的主配置文件
　　2.配置authkeys文件
# dd if=/dev/random bs=512count=1| openssl md5 #生成密钥随机数
0+1records in
0+1records out
128bytes (128B) copied, 0.000214seconds, 598kB/s
a4d20b0dd3d5e35e0f87ce4266d1dd64
# vim authkeys
#auth 1
#1 crc
#2 sha1 HI!
#3 md5 Hello!
auth 1
1 md5 a4d20b0dd3d5e35e0f87ce4266d1dd64
# chmod 600 authkeys#修改密钥文件的权限为600
# ll
总计 56
-rw------- 1 root root 691 08-07 16:45 authkeys
-rw-r--r-- 1 root root 10539 08-07 16:42 ha.cf
-rwxr-xr-x 1 root root 745 2010-03-21 harc
drwxr-xr-x 2 root root4096 08-07 16:21 rc.d
-rw-r--r-- 1 root root 692 2010-03-21 README.config
drwxr-xr-x 2 root root4096 08-07 16:21 resource.d
-rw-r--r-- 1 root root7862 2010-03-21 shellfuncs
-rw-r--r-- 1 root root7862 2010-03-21 shellfuncs.rpmsave　　3.配置ha.cf文件
# vim ha.cf
主要修改三处（其它都可以默认）：
(1).修改心跳信息的传播方式（这里是组播）
mcast eth0 225.100.100.100 694 1 0
(2).配置集群中的节点数
node node1.test.com
node node2.test.com
(3).启用crm资源管理器
crm respawn　　
　　4.复制以上将个配置文件到node2上
# scp authkeys ha.cf haresources node2:/etc/ha.d/
authkeys                                                                            100%691 0.7KB/s00:00
ha.cf                                                                               100% 10KB10.4KB/s00:00
# ssh node2
Last login: Wed Aug7 16:13:44 2013 from node1.test.com
# ll /etc/ha.d/
总计 56
-rw------- 1 root root 691 08-07 17:12 authkeys
-rw-r--r-- 1 root root 10614 08-07 17:12 ha.cf
-rwxr-xr-x 1 root root 745 2010-03-21 harc
drwxr-xr-x 2 root root4096 08-07 16:24 rc.d
-rw-r--r-- 1 root root 692 2010-03-21 README.config
drwxr-xr-x 2 root root4096 08-07 16:24 resource.d
-rw-r--r-- 1 root root7862 2010-03-21 shellfuncs
-rw-r--r-- 1 root root7862 2010-03-21 shellfuncs.rpmsave　　5.启动node1与node2
# ssh node2 "service heartbeat start"
Strting High-Availability services:
[确定]
logd is already stopped
# service heartbeat start
Starting High-Availability services:
2013/08/07_17:19:22 INFO:Resource is stopped
[确定]　　6.查看一下端口
　　node1:
# netstat -ntulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address             Foreign Address          State    PID/Program name
tcp    0    0 0.0.0.0:615             0.0.0.0:*                LISTEN    2553/rpc.statd
tcp    0    0 0.0.0.0:111             0.0.0.0:*                LISTEN    2514/portmap
tcp    0    0 127.0.0.1:631             0.0.0.0:*                LISTEN    2848/cupsd
tcp    0    0 0.0.0.0:5560             0.0.0.0:*                LISTEN    4592/mgmtd
tcp    0    0 127.0.0.1:25             0.0.0.0:*                LISTEN    3009/sendmail: acce
tcp    0    0 :::22                   :::*                      LISTEN    2835/sshd
udp    0    0 225.100.100.100:694       0.0.0.0:*                            4582/heartbeat: wri
udp    0    0 0.0.0.0:609             0.0.0.0:*                            2553/rpc.statd
udp    0    0 0.0.0.0:612             0.0.0.0:*                            2553/rpc.statd
udp    0    0 0.0.0.0:5353             0.0.0.0:*                            3138/avahi-daemon:
udp    0    0 0.0.0.0:41709             0.0.0.0:*                            4582/heartbeat: wri
udp    0    0 0.0.0.0:111             0.0.0.0:*                            2514/portmap
udp    0    0 0.0.0.0:631             0.0.0.0:*                            2848/cupsd
udp    0    0 0.0.0.0:53247             0.0.0.0:*                            3138/avahi-daemon:
udp    0    0 :::50187                :::*                                  3138/avahi-daemon:
udp    0    0 :::5353                   :::*                                  3138/avahi-daemon:　　node2:
# netstat -ntulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address             Foreign Address          State    PID/Program name
tcp    0    0 0.0.0.0:618             0.0.0.0:*                LISTEN    2556/rpc.statd
tcp    0    0 0.0.0.0:111             0.0.0.0:*                LISTEN    2517/portmap
tcp    0    0 127.0.0.1:631             0.0.0.0:*                LISTEN    2852/cupsd
tcp    0    0 0.0.0.0:5560             0.0.0.0:*                LISTEN    3718/mgmtd
tcp    0    0 127.0.0.1:25             0.0.0.0:*                LISTEN    3014/sendmail: acce
tcp    0    0 :::22                   :::*                      LISTEN    2839/sshd
udp    0    0 0.0.0.0:50222             0.0.0.0:*                            3143/avahi-daemon:
udp    0    0 225.100.100.100:694       0.0.0.0:*                            3708/heartbeat: wri
udp    0    0 0.0.0.0:41152             0.0.0.0:*                            3708/heartbeat: wri
udp    0    0 0.0.0.0:612             0.0.0.0:*                            2556/rpc.statd
udp    0    0 0.0.0.0:615             0.0.0.0:*                            2556/rpc.statd
udp    0    0 0.0.0.0:5353             0.0.0.0:*                            3143/avahi-daemon:
udp    0    0 0.0.0.0:111             0.0.0.0:*                            2517/portmap
udp    0    0 0.0.0.0:631             0.0.0.0:*                            2852/cupsd
udp    0    0 :::44582                :::*                                  3143/avahi-daemon:
udp    0    0 :::5353                   :::*                                  3143/avahi-daemon:　　说明：从上面的端口我们可以看出，我们已经成功的启动了crm资源管理器，crm资源管理器的默认端口是5560。
　　7.查看一下集群状态
　　node1,node2:
# crm_monhttp://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201044oUtK.png
　　说明：从上面的集群状态中我们可以看到，集群中有两个节点，分别为node1和node2状态是online全部在线，我们还可以看到，DC是node2。两个节点的资源是0，说明在我们这个集群中还没有资源。下面我们来详细的说明一下！
　　六、crm资源管理器详解
　　1.haresource 资源管理器
　　 haresource 是heartbeat v1 内置的资源管理器，功能比较简单，不支持图形化管理。到了heartbeat v2时，有了更加强大的资源管理器crm，但同时为了兼容heartbeat v1，在heartbeat v2中同时有haresource与crm资源管理器。在上一篇博文中我们简介了haresource资源管理器（http://freeloda.blog.运维网.com/2033581/1266552），在这一篇博文中我们主要讲解，crm资源管理器。
　　2. crm 资源管理器
　　说明：crm资源管理器是不兼容haresource资源管理器的配置文件haresource，所以我们在/etc/ha.d/haresources配置的资源都是不能使用的，所以我们得重新配置。下面我们说一下，crm资源管理器的配置方式。
　　(1).crm配置方式
　　命令行配置：
# crm
crmadmin    crm_diff    crm_master crm_resource crm_standby crm_verify
crm_attributecrm_failcountcrm_mon    crm_sh       crm_uuid　　图形界面配置:
# hb_gui　　注：本文我们主要讲解，图形界面的配置方式，在下一篇博文中我们主要讲解命令行的配置方式。（heartbeat v2中的命令行配置方式，功能还不够强大，到了heartbeat v3中crm命令行配置方式，非常的强大，到时我们就不用图形界面的方式配置，全部用命令行方式配置）
　　(2).crm配置文件cib
　　路径：
# cd /var/lib/heartbeat/crm/
# ll
总计 16
-rw------- 2 hacluster haclient 885 08-10 10:34 cib.xml
-rw------- 2 hacluster haclient 885 08-10 10:34 cib.xml.last
-rw-r--r-- 2 hacluster haclient32 08-10 10:34 cib.xml.sig
-rw-r--r-- 2 hacluster haclient32 08-10 10:34 cib.xml.sig.last　　查看：
# vim cib.xml

　　说明：大家可以看到cib（Cluster Information Base 集群信息库）配置文件是xml格式的，对于不懂xml的博文来说，很难配置，还好有图形配置方式，但是看懂应该不难。
　　(3).配置crm
　　说明：要使用crm资源管理器，我们得配置一下用户名和密码，heartbeat安装好以后会默认给我新建一个hacluster用户，用来管理集群资源的，下面我们先来给hacluster配置一下密码hacluster,即用户名为hacluster，密码也为hacluster。（注，crm配置文件在DC上才会生效）
# tail /etc/passwd
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:4294967294:4294967294:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin）
avahi:x:70:70:Avahi daemon:/:/sbin/nologin
xfs:x:43:43:X Font Server:/etc/X11/fs:/sbin/nologin
haldaemon:x:68:68:HAL daemon:/:/sbin/nologin
avahi-autoipd:x:100:104:avahi-autoipd:/var/lib/avahi-autoipd:/sbin/nologin
hacluster:x:101:105:heartbeat user:/var/lib/heartbeat/cores/hacluster:/sbin/nologin
apache:x:48:48:Apache:/var/www:/sbin/nologin
ntp:x:38:38::/etc/ntp:/sbin/nologin　　node1:

# passwd hacluster
Changing password for user hacluster.
New UNIX password:
BAD PASSWORD: it is based on a dictionary word
Retype new UNIX password:
passwd: all authentication tokens updated successfully.　　node2:

# passwd hacluster
Changing password for user hacluster.
New UNIX password:
BAD PASSWORD: it is based on a dictionary word
Retype new UNIX password:
passwd: all authentication tokens updated successfully.　　(4).启动crm管理器
# hb_gui
Traceback (most recent call last):
File "/usr/bin/hb_gui", line 41, in ?
import gtk, gtk.glade, gobject
File "/usr/lib64/python2.4/site-packages/gtk-2.0/gtk/__init__.py", line 76, in ?
_init()
File "/usr/lib64/python2.4/site-packages/gtk-2.0/gtk/__init__.py", line 64, in _init
_gtk.init_check()
RuntimeError: could not open display　　在xshell启动hb_gui出错下面我们来说一下，解决方法
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201044pDjO.png
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201045hhJ5.png
　　具体步骤如下：File –> Properties –> SSH –> Tunneling –> Forward X11 connections to: Xmanager,然后重新启动一下xshell，再进行测试。
# hb_gui &http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010462u81.png
　　说明：这下我们打开了，crm图形配置界面，下面我们就进行资源配置！在我们说资源配置之间前我们还得补充一下知识点DC，我们上面多次提到。
　　(5).集群中的DC
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201047WR89.png
　　注：从上面的图中我们可以看到，此集群中的DC是node2，那么什么是DC呢？下面我们就来详细说明一下。
　　 Consensus Cluster Membership在集群服务使用选举机制，允许集群节点决定指定的协调器（Designated Cooridnator－DC）,它来帮助建立仲裁，管理集群节点成员关系和资源分配。DC维护集群的状态和管理策略。其他的节点必须转发状态改变请求到DC中处理。Heartbeat服务检查节点和连接状态来决定失效是否发生，集群事件日志服务(ha-logd)提供集群套件中所有服务的日志功能。能过上面的概念我们基本了解了什么是DC。简单来说，在多节点集群中要有个“头头”就是指定的“协调员”DC，就是一个集群中每个几点上都有一个相当于选举票的权值，而这个权值就是根据服务器的性能进行手动分配的，性能好的可以分配的大点，而所有的其他节点都要听从DC的调度，在一个集群中只有选票达到50%以上才能称为集群系统。如果出现故障了，就会有一个故障转移（failover）点，设置不同的优先级，可以使故障按照优先级的高低进行转移，选择一个性能好的服务器来充当DC，所以在此集群节点中node2是DC。
　　七、crm图形界面配置详解
　　
　　1.登录DC节点
　　说明：一般我们用图形界面配置资源时，都连接到DC节点，在DC节点配置完成所有资源后，DC节点会自动同步资源到其它节点中。
# hb_gui&
5565　　(1).点击连接 –>选择登录
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010475hNr.png
　　(2).输入DC节点名称或IP地址，并输入密码，点击确定
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201048ACmU.png
　　(4).登录后默认显示界面
http://blog.运维网.com/attachment/201308/141324775.png
　　2.配置界面的简单说明
　　(1).HA集群的基本信息
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201050ebPj.png
　　说明：在右边面板中显示了，此集群的一些基本信息，如heartbeat版本，端口号，心跳信息，警报信信息等。
　　(2).HA集群的全局配置
http://blog.运维网.com/attachment/201308/141416584.png
　　说明：重点说明的是前面5 个，其它的大家看一下，了解就行

[*]　　节点数无效策略，当集群中有节点故障时采取的策略，默认是stop，当集群有节点故障时的策略是停止此节点上的服务。
[*]　　对称策略，当某节点故障时，可以转移到其它任意节点上
[*]　　启用Stonith，当有Stonith设备时可以勾选此项，我们这里没有Stonith设备所以没有勾选
[*]　　Stonith操作，默认是reboot，意思是当有节点故障时默认策略是重启
[*]　　缺省资源粘黏性，默认为0，意思是资源可以在任意节点中
　　(3).HA集群的高级配置
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201053LjRL.png
　　说明：这个配置我也不过多讲解，这就是设置为中文的好处，大家都认识字，自己看看了解一下就行，我们这里选择默认就行。
　　(4).DC节点属性
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201054rTxQ.png
　　说明：此集群的DC节点是node2，从上图中可以看出运行正常
　　(5).node1节点的属性
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201055AH4p.png
　　(6).资源
http://blog.运维网.com/attachment/201308/141507235.png
　　说明：可以看出此集群中没有任何资源，其实说白了，资源就是各种服务，我们可以点击上面的加号按钮进行增加资源，关于资源的类型我们下面将详细说明。
　　(7).资源约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010576Saq.png
　　说明：限制条件是指资源的各种约束，下面我们将详细说明资源的约束。
　　八、高可用集群架构回顾
　　说明：在上两篇博文（Linux 高可用（HA）集群基本概念详解 http://freeloda.blog.运维网.com/2033581/1265304，Linux 高可用（HA）集群之Heartbeat详解 http://freeloda.blog.运维网.com/2033581/1265808）中我详细的说明了高用户集群的三层架构，包括Messaging Layer（群集信息层），CRM（资源管理层）、RA（资源代理层）
　　，下面我们对RA(资源代理层)，进行一些相关知识补充，就是和本篇博文相关的知识占点。
　　1.Resource Agent Type（资源代理类型）（注，简单说资源就是服务，如httpd、mysqld、nfs等）

[*]　　Legacy（传统） heartbeat v1 RA
[*]　　LSB (/etc/rc.d/init.d/*)
[*]　　OCF (Open Cluster Framework) 开放式集群框架
　　pacemaker 版本
　　linbit (drbd)版本

[*]　　STONITH(硬件)
　　PDU Power Distribution Units （电源交换机）
　　UPS （不间断电源）
　　Blade Power Control Devices 刀片服务器的电源设备
　　Light-out Devices 轻量级的管理设备(IBM HP Dell 管理模块)
　　Testing Devices 测试性设备 ssh metaware(手动)
　　说明：大家可以看到，我们管理的资源类型主要有四种，第一种是传统的heartbeat v1的资源，就是我们上面所讲的/etc/ha.d/haresource里面的内容，第二种是LSB资源就是遵守Linux标准库的资源，在/etc/rc.d/init.d/*下的所有脚本，包括我们非常熟悉的httpd，mysqld，nfs等都是LSB资源，第三种是OCF资源，是在LSB基础上开发资源，LSB资源有个缺点就是不容易监控，所以有的公司就在此基础上开发了新的资源，遵守OCF构架，比LSB资源更加好用，如pacemaker提供商、linbit提供商，他们都有自己的OCF资源。第四种，就是我们熟悉的STONITH设备，它也是一种资源。
　　2.Resource Classes（资源类别）

[*]　　Primitive（native）:主资源，只能运行在一个节点，如DC
[*]　　clone:克隆资源，主资源克隆N份如，STONITH
[*]　　group:组资源，资源归类如，vip httpd filesystem
[*]　　master/slave:克隆类资源 drbd
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201058ndgS.png
　　3.资源粘性
　　资源粘性（资源与节点倾向性，通过服务器的性能来区分）,也就是说资源对某个节点依赖程度，通过score（分数）定义，某个节点的分数越高，资源就更倾向于此节点
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201058qXQi.png
　　4.资源约束（资源间的倾向性）
　　(1).位置约束（Location）：资源对节点倾向程度，通过score（分数）定义（通过服务器的性能来区分）

[*]　　正值：倾向于此节点
[*]　　负值：倾向逃离此节点
　　说明：通常可以和资源粘性一起使用来决定资源在某个节点上案例如下，此资源肯定会在node2上

[*]　　node1 资源粘性 –>100 位置约束 –>200
[*]　　node2 资源粘性 –>100 位置约束 –>inf（正无穷）
　　(2).顺序约束（Order）：定义资源启动或关闭时的次序

[*]　　vip,ipvs
[*]　　ipvs->vip
　　说明：在做lvs高可用集群中，我们可以定义vip与lvs的启动顺序。
　　(3).排列约束（Coloation）：资源是否能够运行同一节点，资源间的依赖性，通过score（分数）定义

[*]　　正值：可以在一起
[*]　　负值：不能在一起
　　说明：在做web高可用集群时，我们定义httpd与filesystem(NFS)是否运行同一个节点中。
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201059GgC9.png
　　5.资源隔离
　　目的：资源隔离只有一个目的，那就是防止脑裂，脑裂带来的后果是抢占共享存储，导致文件损坏是我们最不想看到的，所以得有资源隔离，当集群中某节点故障时我们采取的措施，一种可以是直接“爆头”，另一种就是阻止故障节点不能访问共享存储。
　　(1).节点级别

[*]　　STONITH（爆头）
　　(2).资源级别

[*]　　控制节点能否访问资源（FC Switch）
　　九、crm配置资源
　　说明：上面我们说了那么多，最终目的是为了配置资源，下面们就来详细说明一下怎么配置资源，案例就是我们上一篇博文中的高可用的Web集群，在上一篇博文中（http://freeloda.blog.运维网.com/2033581/1266552）我们用haresource来配置资源的，现在我们来说说怎么用crm的图形界面来配置高可用的Web集群。
　　1.高可用Web集群中资源

[*]　　VIP
[*]　　httpd
[*]　　filesystem
　　说明：高可用Web集群中有三个资源，分别是VIP、httpd服务、filesystem（NFS,用来存放Web文件的），下面我们就来配置一下这三个资源。
　　2.crm 增加资源
　　(1).新增资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201059JFLs.png
　　(2).新增group资源（注,VIP、httpd、filesystem都是Web高可用集群，所以都在一个组中,这里我们选择group）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201060Lbme.png
　　(3).给组资源增加一个ID号
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201061cb5O.png
　　(4).新增VIP，给VIP取个ID号为webip，设置ip为192.168.1.200，大家还可以看到VIP属于Web Service组
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201061ONBK.png
　　(5).增加VIP参数，如子网掩码等
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201062HIf7.png
　　(6).设置VIP在哪个端口别名上
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201063pmtc.png
　　(7).设置VIP的子网掩码，可以直接输入多少位，也可以写全
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010647uvT.png
　　(8).这是增加好的VIP属性，点击右下脚的“+Add”即可
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201065adMq.png
　　(9).增加好的webip，现在还没有运行，我们可以右击使其运行
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201066ktpq.png
　　(10).启动webip
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201067PnNT.png
　　(11).这是已启动的webip运行在DC节点上
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201069thqq.png
　　(12).继续增加httpd服务
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_137620107011VF.png
　　(13).这里选择就是组资源了，这里选择native资源，意思是group资源中的native资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201071LI7f.png
　　(14).增加httpd服务
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201073UCKH.png
　　(15).已运行的httpd服务与webip，下面我们来测试一下
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201074Ka8r.png
　　(16).直接访问http://192.168.1.200可以查看到httpd服务，基本的高可用Web集群已增加完成
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010749bQj.png
　　(17).现在我们模拟故障，让node2节点成为从节点，看一下服务与IP能不能到node1节点上
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201075XMZN.png
　　(18).大家可以看到，当node2成为备节点时，node1成功的运行了这些资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201076jidC.png
　　(19).我们再进行测试一下，大家可以看到，现在我们访问的是node1节点的资源，下面我们来再增加一下共享存储（NFS）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010774GLR.png
　　(20).增加NFS资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010784HAM.png
　　(21).已增加好的NFS资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201079X1eC.png
　　(22).测试访问一下http://192.168.1.200，这次我们成功的访问到了，NFS共享存储的主目录文件
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201080Ukg4.png
　　(23).现在我们再将node2设置为主节点，大家可以看到此时资源全部回到node2上
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201081mKUq.png
　　说明：此时资源能回到node2是因为我们地ha.cf主配置文件中设置了auto_failback on，高可用的Web集群的三个资源已全部增加完成，下面我们来说一说资源的约束（Constraints）在crm图形界面中是怎么增加的。
　　十、crm资源约束
　　1.资源说明
　　在高可用Web集群中有三个资源分别为：

[*]　　VIP
[*]　　httpd
[*]　　nfs
　　2.增加资源
　　(1).新增资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201082ydhP.png
　　(2).增加native资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201083WE97.png
　　(3).增加webip
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201084Lx66.png
　　(4).增加nfs与httpd资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201085g7pb.png
　　(5).启动所有资源
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201087AWHj.png
　　3.问题说明
　　大家从上面的资源可以看出，我们这次增加的是三个native资源，没有增加group资源，意思没有将vip、httpd、nfs这三个资源放在同一个组中，而增加了三个native资源。大家可以发现这三个资源不在同一个节点上，当我们增加好这三个资源时，启动webip资源时，运行在node2节点上，启动nfs资源时，运行在node1节点，再启动httpd节点时，又运行在node2节点上，可以看出在增加组资源时，我们增加的每个资源会平均分配到各个节点上，这不是我们想要的，我们希望这三个资源全部运行在同一节点上，但我们又没有增加group资源，我们该怎么办呢？这时就要用到，我们的资源约束了，下面我们就来配置一下资源约束。如下图，
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201087eDrS.png
　　从图中我们可以看出有三个资源约束，和我们上面讲解的一致，分别为位置约束、顺序约束、排列约束。
　　4.具体分析
　　现在我们有三个资源分别为webip（VIP）、httpd、nfs，它们怎么来定义约束呢？我们来简单分析一下：
　　(1).排列约束

[*]　　http与nfs必须运行在同一节点上，nfs与webip也必须运行在同节点上
　　(2).顺序约束

[*]　　nfs资源必须在httpd资源先启动
[*]　　webip资源必须在httpd资源先启动
　　(3).位置约束

[*]　　定义三个资源中的某个资源更倾向于某个节点
　　注：下面我们就来实现这些约束
　　5.crm资源约束设置
　　排列约束
　　(1).新建排列约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201088mIJC.png
　　(2).新建http与nfs约束，http必须与nfs在一起（详细说明请看，图中的Description）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201088N8x3.png
　　(3).新建nfs与webip约束，nfs必须与webip在一起
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201089JXZX.png
　　(4).建立好的排列约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010905pQc.png
　　顺序约束
　　(1).新建顺序约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010906qTt.png
　　(2).nfs与httpd的顺序约束（详细说明请看，图中的Description）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201091wzr5.png
　　(3).webip与httpd顺序约束（详细说明请看，图中的Description）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201091CHxk.png
　　(4).webip与nfs顺序约束（详细说明请看，图中的Description）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201092bJ2V.png
　　(5).已建立好的顺序约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_13762010939RMb.png
　　位置约束
　　(1).新建位置约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201093RZWr.png
　　 (2).点击右下脚的Add Expression增加参数（webip更加倾向于node2节点）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201094EW7s.png
　　(3).增加好的位置参数
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201096G6dj.png
　　(4).已建立好的位置约束
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201097KUVo.png
　　6.全部约束汇总
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201098AKGw.png
　　说明：大家可以看到，建立好约束后，所有资源全部在node2节点上。
　　7.测试http://192.168.1.200(测试成功)
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201098DDNO.png
　　8.故障演示（设置DC为备份节点，再进行测试）
　　
　　http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201099FB1r.png
　　测试能否访问http://192.168.1.200(大家可以看到照样可以正常访问)
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201100HF4F.png
　　十一、crm资源配置总结
　　1.资源配置
　　经过上面的两种显示方法中我们可以看，使资源在同一节点，有两种方式，一种是先建立group组资源，后再建立native资源，使资源在同一节点中，但是有一点得注意，那就是资源的增加顺序，如Web高可用集群，增加的顺序应该是webip –> nfs –> httpd。另一种方法是直接增加native资源，后通过资源约束使其资源在同一节点上。
　　2.查看cib.xml文件
# cat /var/lib/heartbeat/crm/cib.xml

　　3.查看一下节点信息（可以看出所有资源都运行在node1上）
http://freeloda.blog.运维网.com/attachment/201308/11/2033581_1376201101ucRy.png
　　4.总结
　　经过两天的努力，终于写完了“Linux 高可用（HA）集群之heartbeat基于crm进行资源管理详解”，希望大家能通过这篇博文能提到一些启发。为了大家能更加熟悉高可用集群，我们在下一篇博客中我们讲解heartbeat+mysql+nfs实现高可用的mysql集群。^_^……
　　

页: [1]

运维网's Archiver

Linux 高可用（HA）集群之heartbeat基于crm进行资源管理详解