高可用集群之heartbeat

535234 发表于 2016-8-10 10:00:09

高可用集群的基本概念一、什么是高可用集群：
所谓高可用集群就是在其出现故障时，可以把业务自动转移到其他主机上并让服务正常运行的集群架构。

二、heartbeat的概念
Linux-HA的全称是High-Availability Linux，它是一个开源项目，这个开源项目的目标是：通过社区开发者的共同努力，提供一个增强LInux可靠性（reliability）、可用性（availability）和可服务性（serviceability）（RAS）的群集解决方案。其中heartbeat就是Linux-HA项目中的一个组件，也是目前开源HA项目中比较成功的一个例子，它提供了所有HA软件所需要的基本功能，比如心跳检测和资源接管、监测集群中的系统服务、在集群中的节点间转移共享IP地址的所有者等，自1999年开始到现在，Heartbeat在行业内得到了广泛的使用，也发行了很多的版本，可以从linux-HA的官方网站www.linux-ha.org下载到Heartbeat的最新版本。
三、高可用集群的构架层次
1、后端主机层：这一层主要是正在运行在物理主机上的服务。
2、Message Layer：信息传递层，主要传递心跳信息。
3、Cluster Resources Manager(CRM)：集群资源管理器层，这一层是心跳信息传递层管理器。用于管理信息的传递和收集。
4、Local Resources Manager(LRM)：本地资源管理器，用于对于收集到的心跳信息进行资源决策调整，是否转移服务等等。
5、Resource Agent(RA)：资源代理层，这一层主要是具体启动或停止具体资源的脚本。遵循｛start|stop|restart|status｝服务脚本使用格式。
四、HA集群的工作模式
1、A/P：two node：主被模型，要借助于ping node工作

2、N-M：N个节点M个服务，N>M，活动节点为N，备用节点为N-M
3、N-N：N个节点N个服务，N节点都有服务，如果一个坏了，会有一个节点运行多个服务。
4、A/A：双主模型，两个节点都是活动的，两个节点运行两个不同的服务。也可以是同一个服务，比如ipvs,前端基于DNS轮训。
五、配置Heartbeat集群
1、基于Heartbeat V1版harousce配置httpd高可用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
#安装前准备：
1、ssh多节点互信
# ssh-keygen -t rsa
# ssh-copy-id -i root@node2
2、节点时间一致检查
# date ;ssh node2 'date'
2016年 08月 09日星期二 10:57:30 CST
2016年 08月 09日星期二 10:57:30 CST
3、多节点名称解析
# uname -n;ssh node2 'uname -n'
node1.bjwf.com
node2.bjwf.com
4、安装heartbeat
# yum install perl-TimeDate PyXML libnet net-snmp-libs libtool-ltdl -y
# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm
            heartbeat-stonith-2.1.4-12.el6.x86_64.rpm
Preparing...             ###########################################
1:heartbeat-pils       ########################################### [ 33%]
2:heartbeat-stonith    ########################################### [ 67%]
3:heartbeat          ###########################################
#另一个节点一样的方法
5、准备Heartbeat的配置文件
# cd /usr/share/doc/heartbeat-2.1.4/
# cp authkeys ha.cfharesources /etc/ha.d/
# cd /etc/ha.d/
# vim authkeys #配置节点信息通信密码
auth 2
2 sha1 woleigequ@xyoaq`!;q
# chmod 600 authkeys
# vim ha.cf
logfile /var/log/ha-log #定义日志格式
keepalive 2 #指定心跳使用间隔时间为2秒（即每两秒钟发送一次广播）
deadtime 30#指定备用节点在30s内没有收到主节点的心跳信息号，则立即接管主节点的服务资源
warntime 10#指定心跳延迟的时间为10s，当10s钟内备份节点不能接收到主节点的心跳信号时，就会
往日志中写入一个警告日志，但此时不会切换服务。
udpport 694             #监听端口
mcast eth0 225.7.1.1 694 1 0 #节点多播方式传递心跳信息
auto_failback on#用来定义主节点恢复后，是否将服务自动切回
node node1.bjwf.com #Heartbeat集群节点
node node2.bjwf.com
ping 192.168.120.254 #ping节点仲裁
compression bz2
compression_threshold 2
# vi haresources
node1 192.168.120.10/24/eth0 httpd #以node1为dc，定义集群VIP及资源httpd服务
# scp authkeys ha.cf haresources node2:/etc/ha.d
authkeys          100% 659 0.6KB/s 00:00
ha.cf             100%    10KB10.4KB/s 00:00
haresources       100% 5940 5.8KB/s 00:00
6、安装httpd服务
# yum -y install httpd
# echo "node1.bjwf.com" > /var/www/html/index.html
# chkconfig httpd off
# service httpd stop
停止 httpd：                                           [确定]
# yum -y install httpd
# echo "node2.bjwf.com" > /var/www/html/index.html
# chkconfig httpd off
# service httpd stop
停止 httpd：                                           [确定]
7、启动Heartbeat服务，并测试
# service heartbeat start;ssh node2 'service heartbeat start'
logd is already running
Starting High-Availability services:
2016/08/09_11:55:15 INFO:Resource is stopped
Done.

logd is already running
Starting High-Availability services:
2016/08/09_11:55:15 INFO:Resource is stopped
Done.

# netstat -tunlp;ssh node2 'netstat -tunlp'
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address             Foreign Address          State    PID/Program name
tcp    0    0 0.0.0.0:22                0.0.0.0:*                LISTEN    1331/sshd
tcp    0    0 :::80                   :::*                      LISTEN    2292/httpd
tcp    0    0 :::22                   :::*                      LISTEN    1331/sshd
udp    0    0 0.0.0.0:36099             0.0.0.0:*                            1900/heartbeat: wri
udp    0    0 225.7.1.1:694             0.0.0.0:*                            1900/heartbeat: wri
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address             Foreign Address          State    PID/Program name
tcp    0    0 0.0.0.0:22                0.0.0.0:*                LISTEN    1331/sshd
tcp    0    0 :::80                   :::*                      LISTEN    1642/httpd
tcp    0    0 :::22                   :::*                      LISTEN    1331/sshd
udp    0    0 0.0.0.0:45834             0.0.0.0:*                            1886/heartbeat: wri
udp    0    0 225.7.1.1:694             0.0.0.0:*                            1886/heartbeat: wri
# curl http://192.168.120.10 #在node3上测试
node1.bjwf.com
# ifconfig #查看VIP是否在node1上
eth0    Link encap:EthernetHWaddr 00:0C:29:20:EC:07
      inet addr:192.168.120.100Bcast:192.168.120.255Mask:255.255.255.0

eth0:0 Link encap:EthernetHWaddr 00:0C:29:20:EC:07
      inet addr:192.168.120.10Bcast:192.168.120.255Mask:255.255.255.0
# service heartbeat stop #停掉node1，查看是否会自动转移
Stopping High-Availability services:
Done.
# curl http://192.168.120.10
node2.bjwf.com    #服务已经转移
# ifconfig #VIP已经转移
eth0    Link encap:EthernetHWaddr 00:0C:29:25:07:08
      inet addr:192.168.120.101Bcast:192.168.120.255Mask:255.255.255.0

eth0:0 Link encap:EthernetHWaddr 00:0C:29:25:07:08
      inet addr:192.168.120.10Bcast:192.168.120.255Mask:255.255.255.0
      UP BROADCAST RUNNING MULTICASTMTU:1500Metric:1
8、在Node3上安装nfs服务，实现节点间自动切换
# yum -y install nfs-utils #安装服务
# vim /etc/exports #修改配置文件
/data/html    192.168.120.0/24(rw,no_root_squash,async)
# echo "NFS server" > /data/html/index.html
# service rpcbind start #nfs依赖这个服务
# service nfs start    #启动nfs
# showmount -e 192.168.120.102;ssh node2 'showmount -e 192.168.120.102'
Export list for 192.168.120.102:
/data/html 192.168.120.0/24          #node1上测试
Export list for 192.168.120.102:
/data/html 192.168.120.0/24          #node2上测试
# mount -t nfs 192.168.120.102:/data/html/ /var/www/html
# curl http://192.168.120.100
NFS server
# mount -t nfs 192.168.120.102:/data/html/ /var/www/html
# curl http://192.168.120.101
NFS server
#修改node1和node2的haresources配置文件，是Heartbeat能自动挂载目录
# vim /etc/ha.d/haresources
node1.bjwf.com 192.168.120.10/24/eth0 Filesystem::192.168.120.102:/data/html::/var/www/html::nfs httpd
# scp /etc/ha.d/haresources node2:/etc/ha.d/ #改完后传给node2一份
haresources    100% 6008 5.9KB/s 00:00
停止heartbeat，httpd服务并卸载nfs共享目录之后在重启Heartbeat服务
# umount /var/www/html;ssh node2 'umount /var/www/html’
# service httpd stop;ssh node2 'service httpd stop'
# service heartbeat start;ssh node2 'service heartbeat start'
Starting High-Availability services:
2016/08/09_14:42:59 INFO:Resource is stopped
Done.

Starting High-Availability services:
2016/08/09_14:42:59 INFO:Resource is stopped
Done.                         #重启Heartbeat服务
# ifconfig       #节点一上查看资源情况
eth0    Link encap:EthernetHWaddr 00:0C:29:20:EC:07
      inet addr:192.168.120.100Bcast:192.168.120.255Mask:255.255.255.0
eth0:0 Link encap:EthernetHWaddr 00:0C:29:20:EC:07
      inet addr:192.168.120.10Bcast:192.168.120.255Mask:255.255.255.0
# mount |grep html #文件系统已经挂载
192.168.120.102:/data/html on /var/www/html type nfs (rw,vers=4,addr=192.168.120.102,clientaddr=192.168.120.100)
# curl http://192.168.120.10
NFS server       #服务也正常
# service heartbeat stop #关掉节点1，查看节点2是否已经切换
Stopping High-Availability services:
Done.
# ifconfig
eth0    Link encap:EthernetHWaddr 00:0C:29:25:07:08
      inet addr:192.168.120.101Bcast:192.168.120.255Mask:255.255.255.0
eth0:0 Link encap:EthernetHWaddr 00:0C:29:25:07:08
      inet addr:192.168.120.10Bcast:192.168.120.255Mask:255.255.255.0
# mount |grep html
192.168.120.102:/data/html on /var/www/html type nfs (rw,vers=4,addr=192.168.120.102,clientaddr=192.168.120.101)
# netstat -tnlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address             Foreign Address          State    PID/Program name
tcp    0    0 0.0.0.0:22                0.0.0.0:*                LISTEN    1331/sshd
tcp    0    0 0.0.0.0:48665             0.0.0.0:*                LISTEN    -
tcp    0    0 :::80                   :::*                      LISTEN    3370/httpd
tcp    0    0 :::22                   :::*                      LISTEN    1331/sshd
tcp    0    0 :::43097                :::*                      LISTEN    -
# curl http://192.168.120.10
NFS server
#资源已经正常切换至节点2

2、基于Heartbeat V2版配置httpd高可用（crm、heartbeat-gui工具）

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
1、关闭Heartbeat服务，并修改配置
# service heartbeat stop
# service heartbeat stop
# vim ha.cf
crm on
# scp ha.cf node2:/etc/ha.d
ha.cf                100% 10KB10.4KB/s 00:00
2、安装Heartbeat-gui工具
# yum -y install pygtk2-libglade
# rpm -ivh heartbeat-gui-2.1.4-12.el6.x86_64.rpm
# yum -y install pygtk2-libglade
# rpm -ivh heartbeat-gui-2.1.4-12.el6.x86_64.rpm
3、启动Heartbeat服务
# passwd hacluster
# service heartbeat start
# yum -y install xorg-x11-xauth
# passwd hacluster
# service heartbeat start
# yum -y install xorg-x11-xauth
# hb_gui&    #启动图形报错，这块需要安装xmanager
Traceback (most recent call last):
File "/usr/bin/hb_gui", line 41, in <module>
import gtk, gtk.glade, gobject
File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/__init__.py", line 64, in <module>
_init()
File "/usr/lib64/python2.6/site-packages/gtk-2.0/gtk/__init__.py", line 52, in _init
_gtk.init_check()
RuntimeError: could not open display
#Xshell里面设置-文件-属性-SSH-隧道-勾上转发X11连接到xmanager

（1）连接上gui工具

（2）查看状态信息

（3）定义一个组资源

#定义一个VIP

#定义一个服务

#启动组资源

1
2
3
#查看运行节点
# curl http://192.168.120.10
node1.bjwf.com

#切换资源

1
2
# curl http://192.168.120.10
node2.bjwf.com

（4）重新定义组资源（NFS）
#注意：这块需要把刚才定义的组资源里面的webserver删了，先添加filesystem资源

#测试一下

1
2
3
4
5
6
7
8
9
10
11
# curl http://192.168.120.10
NFS server
# ifconfig #查看资源
eth0    Link encap:EthernetHWaddr 00:0C:29:25:07:08
      inet addr:192.168.120.101Bcast:192.168.120.255Mask:255.255.255.0
eth0:0 Link encap:EthernetHWaddr 00:0C:29:25:07:08
      inet addr:192.168.120.10Bcast:192.168.120.138Mask:255.255.255.255
# mount |grep --color "html"
192.168.120.102:/data/html on /var/www/html type nfs (rw,vers=4,addr=192.168.120.102,clientaddr=192.168.120.101)
# netstat -tnlp|grep 80
tcp    0    0 :::80                   :::*                      LISTEN    4652/httpd

#切换一下机器
#查看资源使用及web服务

1
2
3
4
5
6
7
8
9
10
11
# curl http://192.168.120.10
NFS server
# ifconfig #可以看到资源已经切换
eth0    Link encap:EthernetHWaddr 00:0C:29:20:EC:07
      inet addr:192.168.120.100Bcast:192.168.120.255Mask:255.255.255.0
eth0:0 Link encap:EthernetHWaddr 00:0C:29:20:EC:07
      inet addr:192.168.120.10Bcast:192.168.120.138Mask:255.255.255.255
# mount |grep --color "html"
192.168.120.102:/data/html on /var/www/html type nfs (rw,vers=4,addr=192.168.120.102,clientaddr=192.168.120.100)
# netstat -tnlp|grep 80
tcp    0    0 :::80                   :::*                      LISTEN    5494/httpd

###今天先写到这，还有高可用mysql，其实是一个道理，所以这就不写了！以上仅为个人学习整理，如有错漏，大神勿喷~~~

页: [1]

运维网's Archiver

高可用集群之heartbeat