Centos7上利用corosync+pacemaker+crmsh构建高可用集群

gfer 发表于 2016-6-1 17:18:02

一、高可用集群框架
资源类型： primitive(native)：表示主资源 group：表示组资源，组资源里包含多个主资源 clone：表示克隆资源 master/slave：表示主从资源资源约束方式：位置约束：定义资源对节点的倾向性排序约束：定义资源彼此能否运行在同一节点的倾向性顺序约束：多个资源启动顺序的依赖关系HA集群常用的工作模型： A/P：两节点，active/passive，工作于主备模型 A/A：两节点，active/active，工作于主主模型 N-M：N>M，N个节点，M个服务，假设每个节点运行一个服务，活动节点数为N，备用节点数为N-M在集群分裂(split-brain)时需要使用到资源隔离，有两种隔离级别： STONITH：节点级别的隔离，通过断开一个节点的电源或者重新启动节点 fencing：资源级别的隔离，类似通过向交换机发出隔离信号，特意让数据无法通过此接口当集群分裂，即分裂后的一个集群的法定票数小于总票数一半时采取对资源的控制策略，二、在centos7上建立Ha clustercentos7(corosync v2 + pacemaker)集群的全生命周期管理工具： pcs: agent(pcsd) crmsh: agentless (pssh)1、集群配置前提时间同步，基于当前正在使用的主机名互相访问，是否会用到仲裁设备
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#更改主机名，两台主机的都要修改（192.168.1.114为ns2.xinfeng.com）（192.168.1.113为ns3.xinfeng.com）
# hostnamectl set-hostname ns2.xinfeng.com
# uname -n
ns2.xinfeng.com
# vim /etc/hosts
192.168.1.114 ns2.xinfeng.com
192.168.1.113 ns3.xinfeng.com
#同步时间
# ntpdate s1a.time.edu.cn
# ssh 192.168.1.113 'date';date
The authenticity of host '192.168.1.113 (192.168.1.113)' can't be established.
ECDSA key fingerprint is 09:f9:39:8c:35:4d:ba:2d:13:4f:3c:9c:b1:58:54:ec.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.113' (ECDSA) to the list of known hosts.
root@192.168.1.113's password:
2016年 05月 28日星期六 13:18:07 CST
2016年 05月 28日星期六 13:18:07 CST

2、安装pcs并启动集群
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
#192.168.1.113
# yum install pcs
#192.168.1.114
# yum install pcs
#用ansible开起服务并使服务开机运行
# vim /etc/ansible/hosts

192.168.1.114
192.168.1.113
# ansible ha -m service -a 'name=pcsd state=started enabled=yes'
192.168.1.113 | SUCCESS => {
"changed": false,
"enabled": true,
"name": "pcsd",
"state": "started"
}
192.168.1.114 | SUCCESS => {
"changed": true,
"enabled": true,
"name": "pcsd",
"state": "started"
}
#用ansible查看服务是否启动
# ansible ha -m shell -a 'systemctl status pcsd'
192.168.1.114 | SUCCESS | rc=0 >>
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
Active: active (running) since 六 2016-05-28 13:36:19 CST; 2min 32s ago
Main PID: 2736 (pcsd)
CGroup: /system.slice/pcsd.service
      ├─2736 /bin/sh /usr/lib/pcsd/pcsd start
      ├─2740 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
      └─2741 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

5月 28 13:36:16 ns2.xinfeng.com systemd: Starting PCS GUI and remote configuration interface...
5月 28 13:36:19 ns2.xinfeng.com systemd: Started PCS GUI and remote configuration interface.

192.168.1.113 | SUCCESS | rc=0 >>
● pcsd.service - PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; vendor preset: disabled)
Active: active (running) since 六 2016-05-28 13:35:26 CST; 3min 24s ago
Main PID: 2620 (pcsd)
CGroup: /system.slice/pcsd.service
      ├─2620 /bin/sh /usr/lib/pcsd/pcsd start
      ├─2624 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
      └─2625 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb

5月 28 13:35:24 ns3.xinfeng.com systemd: Starting PCS GUI and remote configuration interface...
5月 28 13:35:26 ns3.xinfeng.com systemd: Started PCS GUI and remote configuration interface.
#给hacluster用户增加密码
# ansible ha -m shell -a 'echo "123" | passwd --stdin hacluster'
192.168.1.113 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码。
passwd：所有的身份验证令牌已经成功更新。

192.168.1.114 | SUCCESS | rc=0 >>
更改用户 hacluster 的密码。
passwd：所有的身份验证令牌已经成功更新。
#认证节点身份，用户名和密码为上面设置的hacluster和123，注意iptables规则，否则会出现无法联系的情况
# pcs cluster auth ns2.xinfeng.com ns3.xinfeng.com
Username: hacluster
Password:
ns2.xinfeng.com: Authorized
ns3.xinfeng.com: Authorized
#配置集群，集群名字为xinfengcluster，集群中有2个节点
# pcs cluster setup --name xinfengcluster ns2.xinfeng.com ns3.xinfeng.com
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stoppacemaker.service
Redirecting to /bin/systemctl stopcorosync.service
Killing any remaining services...
Removing all cluster configuration files...
ns2.xinfeng.com: Succeeded
ns3.xinfeng.com: Succeeded
Synchronizing pcsd certificates on nodes ns2.xinfeng.com, ns3.xinfeng.com...
ns2.xinfeng.com: Success
ns3.xinfeng.com: Success

Restaring pcsd on the nodes in order to reload the certificates...
ns2.xinfeng.com: Success
ns3.xinfeng.com: Success
#查看下配置文件
# cat /etc/corosync/corosync.conf
totem {    #集群的信息
version: 2 #版本
secauth: off #安全功能是否开起
cluster_name: xinfengcluster #集群名称
transport: udpu #传输协议udpu也可以设置为udp
}

nodelist { #集群中的所有节点
node {
   ring0_addr: ns2.xinfeng.com
   nodeid: 1 #节点ID
}

node {
   ring0_addr: ns3.xinfeng.com
   nodeid: 2
}
}

quorum { #仲裁投票
provider: corosync_votequorum #投票系统
two_node: 1 #是否为2节点集群
}

logging { #日志
to_logfile: yes #是否记录日志
logfile: /var/log/cluster/corosync.log #日志文件位置
to_syslog: yes #是否记录系统日志
}
#启动集群
# pcs cluster start --all
ns3.xinfeng.com: Starting Cluster...
ns2.xinfeng.com: Starting Cluster...
#查看ns2.xinfeng.com节点是否启动
# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id= 192.168.1.114
status= ring 0 active with no faults
#查看ns3.xinfeng.com节点是否启动
# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id= 192.168.1.113
status= ring 0 active with no faults
#查看集群信息
# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.1.114)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.1.113)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
# pcs status
Cluster name: xinfengcluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Sat May 28 14:38:23 2016    Last change: Sat May 28 14:33:15 2016 by hacluster via crmd on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
#DC为全局仲裁节点
2 nodes and 0 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

PCSD Status:
ns2.xinfeng.com: Online
ns3.xinfeng.com: Online

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled

3、使用crmsh配置集群安装opensuse上的yum源centos6
1
2
3
4
cd /etc/yum.repos.d/
wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/network:ha-clustering:Stable.repo
cd
yum -y install crmsh

centos7

1
2
3
4
cd /etc/yum.repos.d/
wget http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo
cd
yum -y install crmsh

在其中一台主机上配置crmsh（192.1681.1.114）
1
2
3
4
5
6
7
8
9
10
#显示当前的集群状态
# crm status
Last updated: Sat May 28 16:52:52 2016    Last change: Sat May 28 14:33:15 2016 by hacluster via crmd on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

4、两个节点上分别装上httpd
1
2
3
4
5
6
7
8
9
# ansible ha -m shell -a 'yum install httpd -y'
# echo "<h1>ns2.xinfeng.com</h1>" > /var/www/html/index.html
# echo "<h1>ns3.xinfeng.com</h1>" > /var/www/html/index.html
#测试能否正常启动，页面是否正常
# ansible ha -m service -a 'name=httpd state=started enabled=yes'
#centos6必须关闭服务，关闭开机启动，之后服务会变成资源，所以请确保服务不启动也不开机启动
# ansible ha -m service -a 'name=httpd state=stopped enabled=no'
#centos7必须关闭服务，开起开机启动
# ansible ha -m service -a 'name=httpd state=stopped enabled=yes'

5、配置集群VIP为192.168.1.91，服务是httpd，将VIP和httpd作为资源来进行配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# crm
crm(live)# ra #进入资源代理
crm(live)ra# classes #查看可以代理的资源类型
lsb
ocf / .isolation heartbeat openstack pacemaker
service
stonith
systemd
crm(live)ra# list systemd #查看systemd类型可代理的服务，其中有httpd
NetworkManager                NetworkManager-wait-online    auditd
brandbot                      corosync                      cpupower
crond                         dbus                            display-manager
dm-event                      dracut-shutdown                ebtables
emergency                      exim                            firewalld
getty@tty1                      httpd                         ip6tables
iptables                      irqbalance                      kdump
kmod-static-nodes             ldconfig                      libvirtd
lvm2-activation                lvm2-lvmetad                   lvm2-lvmpolld
lvm2-monitor                   lvm2-pvscan@8:2                microcode
network                         pacemaker                      pcsd
plymouth-quit                   plymouth-quit-wait             plymouth-read-write
plymouth-start                polkit                         postfix
rc-local                      rescue                         rhel-autorelabel
rhel-autorelabel-mark          rhel-configure                rhel-dmesg
rhel-import-state             rhel-loadmodules                rhel-readonly
rsyslog                         sendmail                      sshd
sshd-keygen                   syslog                         systemd-ask-password-console
systemd-ask-password-plymouth systemd-ask-password-wall       systemd-binfmt
systemd-firstboot             systemd-fsck-root             systemd-hwdb-update
systemd-initctl                systemd-journal-catalog-update systemd-journal-flush
systemd-journald                systemd-logind                systemd-machine-id-commit
systemd-modules-load          systemd-random-seed             systemd-random-seed-load
systemd-readahead-collect       systemd-readahead-done          systemd-readahead-replay
systemd-reboot                systemd-remount-fs             systemd-shutdownd
systemd-sysctl                systemd-sysusers                systemd-tmpfiles-clean
systemd-tmpfiles-setup          systemd-tmpfiles-setup-dev    systemd-udev-trigger
systemd-udevd                   systemd-update-done             systemd-update-utmp
systemd-update-utmp-runlevel    systemd-user-sessions          systemd-vconsole-setup
tuned                         wpa_supplicant
crm(live)ra# cd
crm(live)# configure
#配置资源，资源名为webip，ip为192.168.1.91
crm(live)configure# primitive webip ocf:heartbeat:IPaddr params ip=192.168.1.91
crm(live)configure# show
node 1: ns2.xinfeng.com
node 2: ns3.xinfeng.com
primitive webip IPaddr \
params ip=192.168.1.91
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.13-10.el7_2.2-44eb2dd \
cluster-infrastructure=corosync \
cluster-name=xinfengcluster
crm(live)configure# verify #校验，因为没有隔离设备所以报错
ERROR: error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
crm(live)configure# property stonith-enabled=false #关闭隔离设备的设置
crm(live)configure# verify #再次校验
crm(live)configure# commit #提交，是配置生效
crm(live)configure# cd
crm(live)# status
Last updated: Sat May 28 17:41:46 2016    Last change: Sat May 28 17:41:31 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns2.xinfeng.com #VIP已经启动在了ns2上
crm(live)# quit
bye
# ip addr #验证一下VIP是否已经配置到网卡上
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:91:57:d1 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.114/24 brd 192.168.1.255 scope global dynamic eno16777728
   valid_lft 5918sec preferred_lft 5918sec
inet 192.168.1.91/24 brd 192.168.1.255 scope global secondary eno16777728
   valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe91:57d1/64 scope link
   valid_lft forever preferred_lft forever
#将当前节点切换为备用节点
# crm
crm(live)# node
crm(live)node# standby
crm(live)node# cd
crm(live)# status
Last updated: Sat May 28 17:45:41 2016    Last change: Sat May 28 17:45:34 2016 by root via crm_attribute on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Node ns2.xinfeng.com: standby
Online: [ ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns3.xinfeng.com
#资源重新上线
crm(live)# node online
crm(live)# status
Last updated: Sat May 28 17:46:40 2016    Last change: Sat May 28 17:46:37 2016 by root via crm_attribute on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 1 resource configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns3.xinfeng.com
#配置httpd资源，资源名为httpd
# crm
crm(live)# configure
crm(live)configure# primitive webser systemd:httpd
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat May 28 17:50:15 2016    Last change: Sat May 28 17:49:56 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns3.xinfeng.com
webser (systemd:httpd): Started ns2.xinfeng.com
#将两个资源放在webhttp组中，资源启动顺序是webip，webser
crm(live)# configure
crm(live)configure# group webhttp webip webser
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat May 28 17:52:48 2016    Last change: Sat May 28 17:52:41 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

Resource Group: webhttp
webip (ocf::heartbeat:IPaddr): Started ns3.xinfeng.com
webser (systemd:httpd): Started ns3.xinfeng.com

由于是2个节点，会存在法定票数不足导致的资源不转移的情况，解决此问题的方法有四种：1、可以增加一个ping node节点。2、可以增加一个仲裁磁盘。3、让集群中的节点数成奇数个。4、直接忽略当集群没有法定票数时直接忽略。这里我用的是第四种方式
1
2
3
4
5
# crm
crm(live)# configure
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# verify
crm(live)configure# commit

这样做还不够，因为没有对资源进行监控，所以资源出现问题依然不会转移现在只能测试下资源是否在ns3.xinfeng.com上启动了
要对资源进行监控需要在全局下命令primitive定义资源时一同定义，因此先把之前定义的资源删掉后重新定义
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# crm
crm(live)# resource
crm(live)resource# show
Resource Group: webhttp
webip (ocf::heartbeat:IPaddr): Started
webser (systemd:httpd): Started
crm(live)resource# stop webhttp #停掉所有资源
crm(live)resource# show
Resource Group: webhttp
webip (ocf::heartbeat:IPaddr): (target-role:Stopped) Stopped
webser (systemd:httpd): (target-role:Stopped) Stopped
crm(live)configure# edit #编辑资源定义配置文件

node 1: ns2.xinfeng.com \
   attributes standby=off
node 2: ns3.xinfeng.com
primitive webip IPaddr \          #删除
   params ip=192.168.1.91    #删除
primitive webser systemd:httpd    #删除
group webhttp webip webser \    #删除
   meta target-role=Stopped    #删除
property cib-bootstrap-options: \
   have-watchdog=false \
   dc-version=1.1.13-10.el7_2.2-44eb2dd \
   cluster-infrastructure=corosync \
   cluster-name=xinfengcluster \
   stonith-enabled=false \
   no-quorum-policy=ignore
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat May 28 23:32:01 2016    Last change: Sat May 28 23:31:49 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

6、重新定义带有监控的资源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
crm(live)# configure
#每60秒监控一次，超时时长为20秒，时间不能小于建议时长，否则会报错
crm(live)configure# primitive webip ocf:IPaddr params ip=192.168.1.91 op monitor timeout=20s interval=60s
crm(live)configure# primitive webser systemd:httpd op monitor timeout=20s interval=60s
crm(live)configure# group webhttp webip webser
crm(live)configure# property no-quorum-policy=ignore
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sat May 28 23:41:03 2016    Last change: Sat May 28 23:40:36 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns2.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

Resource Group: webhttp
webip (ocf::heartbeat:IPaddr): Started ns2.xinfeng.com
webser (systemd:httpd): Started ns2.xinfeng.com

测试一下，将服务停掉，20秒后服务又自动会启动

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
   valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
   valid_lft forever preferred_lft forever
2: eno16777728: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:91:57:d1 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.114/24 brd 192.168.1.255 scope global dynamic eno16777728
   valid_lft 6374sec preferred_lft 6374sec
inet 192.168.1.91/24 brd 192.168.1.255 scope global secondary eno16777728
   valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe91:57d1/64 scope link
   valid_lft forever preferred_lft forever
# service httpd stop
Redirecting to /bin/systemctl stophttpd.service

编辑配置文件，随便乱加几行让服务不能启动
1
2
3
# vim /etc/httpd/conf/httpd.conf
# service httpd stop
Redirecting to /bin/systemctl stophttpd.service

服务成功切换到ns3上

7、【注意】当重新恢复httpd服务后记得清除资源的错误信息，否则无法启动资源
1
2
3
4
5
6
7
8
crm(live)# resource
crm(live)resource# cleanup webser #清楚webser之前的错误信息
Cleaning up webser on ns2.xinfeng.com, removing fail-count-webser
Cleaning up webser on ns3.xinfeng.com, removing fail-count-webser
Waiting for 2 replies from the CRMd.. OK
crm(live)resource# show
webip (ocf::heartbeat:IPaddr): Started
webser (systemd:httpd): Started

8、定义资源约束删除组资源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# crm
crm(live)# configure
crm(live)configure# delete webhttp #删除webhttp组
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sun May 29 09:22:57 2016    Last change: Sun May 29 09:22:48 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns3.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns2.xinfeng.com
webser (systemd:httpd): Started ns3.xinfeng.com

排列约束

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# crm
crm(live)# configure
crm(live)configure# colocation webser_with_webip inf: webser webip #定义webser和webip两个资源必须在一起
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sun May 29 09:40:36 2016    Last change: Sun May 29 09:40:28 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns3.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns2.xinfeng.com #可以看到两个资源都在ns2上启动了
webser (systemd:httpd): Started ns2.xinfeng.com

顺序约束

1
2
3
4
5
6
crm(live)# configure
#webip先于webser启动，强制的先启动webip在启动webser
crm(live)configure# order webip_before_webser mandatory: webip webser
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# show xml #查看之前详细定义

位置约束
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#先看下当前的位置
crm(live)# status
Last updated: Sun May 29 09:48:42 2016    Last change: Sun May 29 09:44:35 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns3.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns2.xinfeng.com
webser (systemd:httpd): Started ns2.xinfeng.com
#定义位置约束让资源更倾向于ns3上
crm(live)# configure
定义一个webservice的位置约束在节点2上，资源webip对ns3的倾向性是100
crm(live)configure# location webservice_pref_node2 webip 100: ns3.xinfeng.com
crm(live)configure# verify
crm(live)configure# commit
crm(live)configure# cd
crm(live)# status
Last updated: Sun May 29 10:12:12 2016    Last change: Sun May 29 10:12:04 2016 by root via cibadmin on ns2.xinfeng.com
Stack: corosync
Current DC: ns3.xinfeng.com (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ ns2.xinfeng.com ns3.xinfeng.com ]

Full list of resources:

webip (ocf::heartbeat:IPaddr): Started ns3.xinfeng.com #很明显，资源都转移到了ns3上了
webser (systemd:httpd): Started ns3.xinfeng.com

三、总结    1、当重新恢复资源的服务后一定记得清除资源的错误信息，否则无法启动资源 2、在利用corosync+pacemaker且是两个节点实现高可用时，需要注意的是要设置全局属性把stonith设备关闭，忽略法定票数不大于一半的机制 3、注意selinux和iptables对服务的影响
4、注意节点相互用/etc/hosts来解析
5、节点时间一定要保持同步
6、节点相互间进行无密钥通信

lisimba 发表于 2019-1-20 15:07:00

谢谢分享，学习一下

页: [1]

运维网's Archiver

Centos7上利用corosync+pacemaker+crmsh构建高可用集群