corosync + pacemaker + crmsh 配置文件及常用指令介绍

y45t4r3 发表于 2015-1-4 09:19:24

一、corosync、pacemaker各自是什么？
Corosync简介：
   Coreosync在传递信息的时候可以通过一个简单的配置文件来定义信息传递的方式和协议等。它是一个新兴的软件，2008年推出，但其实它并不是一个真正意义上的新软件，在2002年的时候有一个项目Openais它由于过大，分裂为两个子项目，其中可以实现HA心跳信息传输的功能就是Corosync ,它的代码60%左右来源于Openais. Corosync可以提供一个完整的HA功能，但是要实现更多，更复杂的功能，那就需要使用Openais了。Corosync是未来的发展方向。在以后的新项目里，一般采用Corosync，而hb_gui可以提供很好的HA管理功能，可以实现图形化的管理。另外相关的图形化有RHCS的套件luci+ricci.

pacemaker是一个开源的高可用资源管理器(CRM)，位于HA集群架构中资源管理、资源代理(RA)这个层次，它不能提供底层心跳信息传递的功能，要想与对方节点通信需要借助底层的心跳传递服务，将信息通告给对方。通常它与corosync的结合方式有两种：

准备工作：
1、两台主机时间同步，最好做ntp服务以获得更加精准的时间；
2、两台主机名要与uname -n 输出的名字相同；
3、配置hosts本地解析，要与 uname -n 一致；
4、两台主机 root 用户能够基于密钥进行通信；
   注意：因为一旦配置上高可用以后，资源都是受CRM所控制的，所以要将各资源的开机启动关闭

实验环境：
IP：172.16.4.22 hostname：node2
IP：172.16.4.33 hostname：node3

注：为了保证两个节点间的资源一致性，两台主机间的时差最好不要超过 1s。

安装corosync、pacemaker：
yum -y install corosync pacemaker;ssh node2 'yum -y install corosync pacemaker'

corosync配置文件位于 /etc/corosync/目录下：

1
2
3
4
5
6
7
8
9
mv corosync.conf.examplecorosync.conf # 默认会提供一个模版，我们只需要修改一下
# Please read the corosync.conf.5 manual page
compatibility: whitetank# 兼容08.以前的版本
totem {
version: 2 # totme 的版本，不可更改
secauth: off # 安全认证，当使用aisexec时，会非常消耗CPU
threads: 0 # 用于安全认证开启并行线程数
interface {
   ringnumber: 0 # 回环号码，如果一个主机有多块网卡，避免心跳信息回流

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
   bindnetaddr: 172.16.4.0 # 绑定心跳网段 corosync会自动判断本地网卡上配置的哪个IP地址是属于这个网络的，并把这个接口作为多播心跳信息传递的接口
   mcastaddr: 239.245.4.1 # 心跳信息组播地址(所有节点组播地址必须为同一个)
   mcastport: 5405 # 组播时使用的端口
   ttl: 1 #只向外一跳心跳信息，避免组播报文环路
}
}

#totem定义集群内各节点间是如何通信的,totem本是一种协议，专用于corosync专用于各节点间的协议，totem协议是有版本的；

logging {
fileline: off # 指定要打印的行
to_stderr: no # 日志信息是否发往错误输出（建议为否）
to_logfile: yes # 是否记录日志文件
to_syslog: yes # 是否记录于syslog日志-->此类日志记录于/var/log/message中
logfile: /var/log/cluster/corosync.log # 日志存放位置
debug: off #只要不是为了排错，最好关闭debug，它记录的信息过于详细，会占用大量的磁盘IO.
timestamp: on # 是否打印时间戳，利于定位错误，但会产生大量系统调用，消耗CPU资源
logger_subsys {
   subsys: AMF
   debug: off
}
}

如果想让pacemaker在corosync中以插件方式启动，需要在corosync.conf文件中加上如下内容：

1
2
3
4
5
6
7
8
9
10
service{
ver:0# 版本号
name:pacemaker# 模块名 # 启动corosync时同时启动pacemaker
}
# corosync启动后会自动启动 pacemaker (此时会以插件的方式来启动pacemaker)
aisxec {
user:root
group:root
｝
# 启用ais功能时以什么身份来运行，默认就是 root，aisxec区域也可省略；

生成多播信息密钥：

1
2
corosync-keygen 生成传递心跳信息时的预共享密钥，生成密钥时需要用到 /dev/random一共需要1024位的长度
   # 生成后的密钥文件会在配置文件目录下自行生成一个authkey文件；

注意：
corosync-keygen命令生成密钥时会用到 /dev/random
/dev/random是 Linux系统下的随机数生成器，它会从当前系统的内存中一个叫熵池的地址空间中根据系统中断来生成随机数，加密程序或密钥生成程序会用到大量的随机数，就会出现随机数不够用的情况，random 的特性就是一旦熵池中的随机数被取空，会阻塞当前系统进程等待产生中断会继续生成随机数；
   由于此处会用到1024位长度的密钥，可能会存在熵池中的随机数不够用的情况，就会一直阻塞在生成密钥的阶段，两种解决办法：
   1、手动在键盘上输入大量字符，产生系统中断(产生中断较慢，不建议使用)

   2、通过互联网或FTP服务器下载较大的文件（产生中断较快，建议使用）

密钥生成过程：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# corosync-keygen
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 320).
Press keys on your keyboard to generate entropy (bits = 384).
Press keys on your keyboard to generate entropy (bits = 448).
Press keys on your keyboard to generate entropy (bits = 616).
Press keys on your keyboard to generate entropy (bits = 680).
Press keys on your keyboard to generate entropy (bits = 752).
Press keys on your keyboard to generate entropy (bits = 816).
Press keys on your keyboard to generate entropy (bits = 936).
Press keys on your keyboard to generate entropy (bits = 1000).
Writing corosync key to /etc/corosync/authkey.# 此处代表生成成功

# chmod 400 authkey# 密钥文件权限必须为 400 或 600

1
scp -p authkey corosync.conf node3:/etc/corosync/# 将刚生成的密钥与配置文件复制到第二个节点上,并保存权限；

1
2
3
4
# service corosync start;ssh node3 'service corosync start'
Starting Corosync Cluster Engine (corosync):
Starting Corosync Cluster Engine (corosync):
# 用ssh 将启动命令发送到node3 机器上，实现同时启动两个节点上的服务，此后不再做说明

1
2
3
4
ssh -unlp # 查看 corosync 服务是否正常启动
UNCONN    0    0    172.16.4.22:5404    *:*    users:(("corosync",2314,13))
UNCONN    0    0    172.16.4.22:5405    *:*    users:(("corosync",2314,14))
UNCONN    0    0    239.255.4.1:5405    *:*    users:(("corosync",2314,10))

   # 监听在 172.16.4.22 5404与5405端口，组播地址 239.255.4.1 的5405端口发送心跳

1
2
3
4
5
6
7
8
9
crm_mon# 查看各节点状态(可在两个节点上同时查看)
Last updated: Sat Jan3 16:50:33 2015
Last change: Sat Jan3 16:49:53 2015
Stack: classic openais (with plugin)
Current DC: node3.zhangjian.com- partition with quorum # DC为node3 都具有法定票数
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes# 两个节点，具有两票
0 Resources configured             # 配置了0个资源
Online: [ node2.test.com node3.test.com ] # 节点 node2 node3 都在线

1
tail -40/var/log/cluster/corosync.log# (在各节点上看日志，确认corosync工作是否正常)

启动完成后要进行一系列的确认，看各组件工作是否正常：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
# 另外一个节点上也执行同样的命令，来确保 Cluster Engine工作是否正常
tail /var/log/cluster/corosync.log
# 查看两个corosync节点之间成员关系是否初始化，两个节点之间应该开始同步一些集群事务信息
grep"TOTEM"/var/log/cluster/corosync.log
# 查看初始化成员节点通知是否正常发出；

grep pcmk_startup /var/log/cluster/corosync.log
# 查看 pcmk(pacemaker简写) 插件工作是否正常；

grep ERROR /var/log/cluster/corosync.log
# 检查启动过程中是否有错误产生
# 日志出现了哪些错误，如果出现提示pacemaker不应该以插件方式运行直接忽略即可；
# 可能会提示我们PE工作不正常；让我们用crm_verify -L-V 来查看；

1
2
3
4
5
6
7
# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
# corosync默认启用了stonith，而当前集群并没有相应的stonith设备，因此此默认配置目前尚不可用
# 即没有 STONITH 设备，此处实验性目的可以忽略；

，这可以通过如下命令验证：
   注：Stonith 即shoot the other node in the head使Heartbeat软件包的一部分，该组件允许系统自动复位一个失败的服务器使用连接到一个健康的服务器的遥远电源设备，简单的说Stonith设备可以接受一台主机发来的信号从而切断不能传递心跳信息的节点电源，从而避免产生资源争用的设备；
此时我们将node2 节点停掉，因为node2没办法传递心跳信息，node3以为node2出了故障，马上就变成了DC 而且两个节点都不具备法定票数(partition WITHOUT quorum)，再将node2启动起来，就都具有法定票数 (partition quorum)；

安装crmsh软件包：
What 是 crmsh？
pacemaker本身只是一个资源管理器，我们需要一个接口才能对pacemker上的资源进行定义与管理，而crmsh即是pacemaker的配置接口，从pacemaker 1.1.8开始，crmsh 发展成一个独立项目，pacemaker中不再提供。crmsh提供了一个命令行的交互接口来对Pacemaker集群进行管理，它具有更强大的管理功能，同样也更加易用，在更多的集群上都得到了广泛的应用，类似软件还有 pcs；
   注：在crm管理接口所做的配置会同步到各个节点上；

Centos 6官方并没有提供crmsh软件包：
corosync 2.x及crmsh for centos 6下载地址：

1
2
http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/
# 将 yum源指向上面的地址即可；

注：crmsh依赖于四个包：

pssh.noarch 0:2.3.1-5.el6       python-dateutil.noarch 0:1.4.1-6.el6
python-lxml.x86_64 0:2.2.3-1.1.el6 redhat-rpm-config.noarch 0:9.0.3-42.el6.centos

crm的特性：
1、任何操作都需要commit提交后才会生效；

2、想要删除一个资源之前需要先将资源停止

3、可以用help COMMAND 获取该命令的帮助

4、与Linux命令行一样，都支持TAB补全

crm命令的两种工作方式：
1、命令行模式：

1
# crm status

2、交互式模式

1
2
crm# 进入交互式命令行接口
crm(live)# status

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# crm
crm(live)# help # 获取当前可用命令
# 一级子命令
This is crm shell, a Pacemaker command line interface.
Available commands:
cib          manage shadow CIBs # cib沙盒
resource       resources management # 所有的资源都在这个子命令后定义
configure    CRM cluster configuration # 编辑集群配置信息
node          nodes management # 集群节点管理子命令
options       user preferences # 用户优先级
history       CRM cluster history
site          Geo-cluster support
ra             resource agents information center # 资源代理子命令（所有与资源代理相关的程都在此命令之下）
status       show cluster status # 显示当前集群的状态信息
help,?       show help (help topics for list of topics)# 查看当前区域可能的命令
end,cd,up    go back one level # 返回第一级crm(live)#
quit,bye,exit exit the program# 退出crm（live）交互模式

常用子命令介绍：
resource 子命令 # 定义所有资源的状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
crm(live)resource# help
vailable commands:
   status       show status of resources # 显示资源状态信息
   start          start a resource # 启动一个资源
   stop          stop a resource # 停止一个资源
   restart       restart a resource # 重启一个资源
   promote       promote a master-slave resource # 提升一个主从资源
   demote       demote a master-slave resource # 降级一个主从资源
   manage       put a resource into managed mode
   unmanage       put a resource into unmanaged mode
   migrate       migrate a resource to another node # 将资源迁移到另一个节点上
   unmigrate    unmigrate a resource to another node
   param          manage a parameter of a resource # 管理资源的参数
   secret       manage sensitive parameters # 管理敏感参数
   meta          manage a meta attribute # 管理源属性
   utilization    manage a utilization attribute
   failcount    manage failcounts # 管理失效计数器
   cleanup       cleanup resource status # 清理资源状态
   refresh       refresh CIB from the LRM status # 从LRM（LRM本地资源管理）更新CIB（集群信息库），在
   reprobe       probe for resources not started by the CRM # 探测在CRM中没有启动的资源
   trace          start RA tracing # 启用资源代理（RA）追踪
   untrace       stop RA tracing # 禁用资源代理（RA）追踪
   help          show help (help topics for list of topics) # 显示帮助
   end          go back one level # 返回一级（crm(live)#）
   quit          exit the program # 退出交互式程序

configure 子命令 # 资源粘性、资源类型、资源约束

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
crm(live)configure# help
Available commands:
   node          define a cluster node # 定义一个集群节点
   primitive    define a resource # 定义资源
   monitor       add monitor operation to a primitive # 对一个资源添加监控选项（如超时时间，启动失败后的操作）
   group          define a group # 定义一个组类型（将多个资源整合在一起）
   clone          define a clone # 定义一个克隆类型（可以设置总的克隆数，每一个节点上可以运行几个克隆）
   ms             define a master-slave resource # 定义一个主从类型（集群内的节点只能有一个运行主资源，其它从的做备用）
   rsc_template define a resource template # 定义一个资源模板
   location       a location preference # 定义位置约束优先级（默认运行于那一个节点（如果位置约束的值相同，默认倾向性那一个高，就在那一个节点上运行））
   colocation    colocate resources # 排列约束资源（多个资源在一起的可能性）
   order          order resources # 资源的启动的先后顺序
   rsc_ticket    resources ticket dependency
   property       set a cluster property # 设置集群属性
   rsc_defaults set resource defaults # 设置资源默认属性（粘性）
   fencing_topology node fencing order # 隔离节点顺序
   role          define role access rights # 定义角色的访问权限
   user          define user access rights # 定义用用户访问权限
   op_defaults    set resource operations defaults # 设置资源默认选项
   schema       set or display current CIB RNG schema
   show          display CIB objects # 显示集群信息库对
   edit          edit CIB objects # 编辑集群信息库对象（vim模式下编辑）
   filter       filter CIB objects # 过滤CIB对象
   delete       delete CIB objects # 删除CIB对象
   default-timeouts set timeouts for operations to minimums from the meta-data
   rename       rename a CIB object # 重命名CIB对象
   modgroup       modify group # 改变资源组
   refresh       refresh from CIB # 重新读取CIB信息
   erase          erase the CIB # 清除CIB信息
   ptest          show cluster actions if changes were committed
   rsctest       test resources as currently configured
   cib          CIB shadow management
   cibstatus    CIB status management and editing
   template       edit and import a configuration from a template
   commit       commit the changes to the CIB # 将更改后的信息提交写入CIB
   verify       verify the CIB with crm_verify # CIB语法验证
   upgrade       upgrade the CIB to version 1.0
   save          save the CIB to a file # 将当前CIB导出到一个文件中（导出的文件存于切换crm 之前的目录）
   load          import the CIB from a file # 从文件内容载入CIB
   graph          generate a directed graph
   xml          raw xml
   help          show help (help topics for list of topics) # 显示帮助信息
   end          go back one level # 回到第一级(crm(live)#)

node子命令 # 节点管理和状态

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
crm(live)# node
crm(live)node# help
Node management and status commands.
Available commands:
status       show nodes status as XML # 以xml格式显示节点状态信息
show          show node # 命令行格式显示节点状态信息
standby       put node into standby # 模拟指定节点离线（standby在后面必须的FQDN）
online       set node online # 节点重新上线
maintenance    put node into maintenance mode
ready          put node into ready mode
fence          fence node # 隔离节点
clearstate    Clear node state # 清理节点状态信息
delete       delete node # 删除一个节点
attribute    manage attributes
utilization    manage utilization attributes
status-attr    manage status attributes
help          show help (help topics for list of topics)
end          go back one level
quit          exit the program

ra子命令 # 资源代理类别都在此处

1
2
3
4
5
6
7
8
9
10
crm(live)# ra
crm(live)ra# help
Available commands:
   classes       list classes and providers # 为资源代理分类
   list          list RA for a class (and provider)# 显示一个类别中的提供的资源
   meta          show meta data for a RA # 显示一个资源代理序的可用参数（如meta ocf:heartbeat:IPaddr2）
   providers    show providers for a RA and a class
   help          show help (help topics for list of topics)
   end          go back one level
   quit          exit the program

show xml 显示完整的xml格式信息

1
2
3
4
5
6
7
8
crm(live)configure# show
node node2.test.com
node node3.test.com # 当前集群共有三个节点
property cib-bootstrap-options: \
dc-version=1.1.11-97629de \ # DC的版本
cluster-infrastructure="classic openais (with plugin)" \# 底层基础架构(经典的openais，使用plugin方式来运行)
expected-quorum-votes=2 \ # 当前节点一共有两票
stonith-enabled=false # stonith 设备已被禁用

禁用stonith设备：

1
2
3
configure
crm(live)configure# property stonith-enabled=false
crm(live)configure# commit

1
crm_verify -L -V 此时在检查就不会检查 stoith 设备了；

1、尝试配置VIP：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
crm(live)#configure
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=172.16.4.88 nic='eth0' cidr_netmask='16' broadcast='172.16.255.255'
      # 只要IPaddr 不在一个以上资源代理类别下存在，ocf:heartbeat都可以省略
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat Jan3 18:40:23 2015
Last change: Sat Jan3 18:40:19 2015
Stack: classic openais (with plugin)
Current DC: node3.zhangjian.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
1 Resources configured
Online: [ node2.test.com node3.zhangjian.com ]

webip(ocf::heartbeat:IPaddr):Started node2.test.com# VIP已经配置成功

1
2
3
4
5
6
7
# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:38:49:4b brd ff:ff:ff:ff:ff:ff
inet 172.16.4.22/16 brd 172.16.255.255 scope global eth0
inet 172.16.4.88/16 brd 172.16.255.255 scope global secondary eth0
inet6 fe80::20c:29ff:fe38:494b/64 scope link
   valid_lft forever preferred_lft forever

# VIP 地址已经配置成功，crm定义的资源就会传到各个节点，并在各个节点上生效，此时将node2节点转换成standby，VIP就会转移到其它节点上；

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
crm(live)# node
crm(live)node# standby
crm(live)node# cd ..
crm(live)# status
Last updated: Sat Jan3 18:47:37 2015
Last change: Sat Jan3 18:46:17 2015
Stack: classic openais (with plugin)
Current DC: node3.zhangjian.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
1 Resources configured
Node node2.zhangjian.com: standby #　当前节点是 standby状态

Online: [ node3.zhangjian.com ] # 在线的主机变成一个了

webip(ocf::heartbeat:IPaddr):Started node3.zhangjian.com# 注意此行信息已经转移到node3

# 此时在 node3 节点上查看会发现VIP已经转移过来了；

1
crm node online# 再将节点重新上线，但资源并没有回来，说明它不会做 failback(资源流转）

1
service corosync stop

# 将其中一个节点停止，资源就会消失而不是转移到另一个节点上，因为当前是两节点的集群，任何一个节点损坏，其它节点就没办法进行投票，status 中就会变成 WITHOUT quorum，而此时要解决这个问题有两种办法：
1、配置一个仲裁节点；

2、当不具备法定票数时忽略；

   注意：忽略法定票数，可能会导致集群的分裂，在生产环境中不建议使用；

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# crm
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# commit
crm(live)configure# cd ..
crm(live)# status
Last updated: Sat Jan3 20:51:19 2015
Last change: Sat Jan3 20:51:08 2015
Stack: classic openais (with plugin)
Current DC: node2.zhangjian.com - partition WITHOUT quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
1 Resources configured

Online: [ node2.zhangjian.com ]
OFFLINE: [ node3.zhangjian.com ]# 即使有一个节点不在线，且在不具有法定票数时资源也会生效

webip(ocf::heartbeat:IPaddr):Started node2.zhangjian.com

   注：no-quorum-policy={stop|freeze|suicide|ignore} 默认是stop；改成 ignore忽略即可
   ip addr show eth0 查看会发现资源已经生效；

删除一个资源：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
crm(live)# resource
crm(live)resource# stop webip# 删除需先将资源停止
crm(live)resource# cd ..
crm(live)# configure
crm(live)configure# delete webip# 删除一个CIB对象
crm(live)configure# commit    # 想要生效需要提交
crm(live)configure# cd ..
crm(live)# status
Last updated: Sat Jan3 21:01:44 2015
Last change: Sat Jan3 21:01:39 2015
Stack: classic openais (with plugin)
Current DC: node2.zhangjian.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node2.zhangjian.com node3.zhangjian.com ]

注：show 中显示的各行都是CIB对象；也可以 edit 打开vim编辑模式直接删除；

实验：定义一个高可用集群：(包含以下三个资源)
   1、VIP：172.16.4.88
   2、配置httpd 服务    3、FileSystem（NFS）    4、定义约束，保证资源的先后启动顺序，且三个资源需要运行在同一个节点上；

monitor 监控资源
monitor <rsc> [:<role>] <interval>[:<timeout>]

   监控哪个资源哪个角色多长时间监控一次监控超时时长是多少

例：
monitor apcfence 60m:60s监控apcfence 这个资源60分钟监控一次60s 超时

   注：每一个资源都有它的默认监控法则，我们所定义的时长，不应该小于它的默认法则时长；

例如：(获取IPaddr资源的默认监控法则)

1
2
3
4
5
6
7
crm(live)# ra
crm(live)ra# info IPaddr
Operations' defaults (advisory minimum):
start       timeout=20s # 启动时的超时时长
stop       timeout=20s # 停止时的超时时长
status    timeout=20s interval=10s# 监控状态时的操作interval=10s 每隔10s 监控一次
monitor    timeout=20s interval=10s# 监控 10s 监控一次超时长为20s

定义IP：

1
2
3
crm(live)configure# primitive webip IPaddr params ip=172.16.4.88 op monitor interval=10s timeout=20s
crm(live)configure# verify
crm(live)configure# commit

2、两个节点上分别安装上http，并提供不同的主页，将httpd开启启动关闭；

1
# chkconfig --level 2345 httpd off

定义httpd资源：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# crm
crm(live)# configure
crm(live)configure# primitive webserver lsb:httpd op monitor interval=30s timeout=15s
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat Jan3 21:25:49 2015
Last change: Sat Jan3 21:25:45 2015
Stack: classic openais (with plugin)
Current DC: node2.zhangjian.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
2 Resources configured

Online: [ node2.zhangjian.com node3.zhangjian.com ]
webip(ocf::heartbeat:IPaddr):Started node2.zhangjian.com# webip 运行在节点2上

webserver(lsb:httpd):Started node3.zhangjian.com # webserver 运行在节点3上

注意：现在webip与webserver是分别运行在不同的节点上的，默认情况下资源是尽可能均衡的运行在各节点上的；

两种解决办法：
group 组资源，将两个资源定义在一起，做为一组资源而运行；

colocation也可以定义排列约束，也叫协同约束，两个资源必须在一起；

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
crm(live)# configure
crm(live)configure# colocation webserver_with_webip inf: webserver webip # 定义在一起
crm(live)configure# show # 查看刚刚定义是否生效
crm(live)configure# commit
crm(live)configure# cd ..
crm(live)# status

Last updated: Sat Jan3 21:30:14 2015
Last change: Sat Jan3 21:30:08 2015
Stack: classic openais (with plugin)
Current DC: node2.zhangjian.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
2 Resources configured

Online: [ node2.zhangjian.com node3.zhangjian.com ]

webip(ocf::heartbeat:IPaddr):Started node2.zhangjian.com
webserver(lsb:httpd):Started node2.zhangjian.com # 此时两个资源都运行在节点2上

定义顺序约束：

1
2
order webip_before_webserver mandatory: webip webserver
crm(live)configure# commit

注：mandatory 代表强制，webip、webserver 这两个资源必须按照我所给定的顺序启动；

此时就可以用客户机测试，访问 172.16.4.88会访问到节点2上的web页面；
crmnode standby# 将 node2 节点转换成备用节点
再重新用浏览器访问测试，此时访问的就是node3 节点上的web页面了；

crm onde online# 此时将 node2 节点重新上线，资源也不会流转回来

定义节点倾向性：
configure
help location# 获取 location 使用帮助

crm(live)configure# location webip_prefer_node1 webip rule 100: #uname eq node2.zhangjian.com # 约束名资源名约束为100 节点名为 node2
crm(live)configure# commit

此时在用浏览器访问web页面就会变成 node2上的页面，因为资源对node2 的倾向性更大，即使将node2 变成备用模式，资源转移出去了，在让node2重新上线，它立马就会流转回来，因为我们定义了webip对于node2的倾向性是100，默认对所有节点的倾向性都是0，所以只要node2在，它就会运行在节点2上；

粘性：每一个资源对于当前节点的粘性；

bailulu 发表于 2017-8-20 09:44:32

　　写的很详细

h378984295 发表于 2018-6-29 10:27:58

为什么我启动corosync后查看端口时，发现corosync的端口全是127.0.0.1，并不是ip地址，导致了集群搭建失败了。每个节点都只能看到自己。这个什么原因导致的

页: [1]

运维网's Archiver

corosync + pacemaker + crmsh 配置文件及常用指令介绍