087988 发表于 2016-9-23 09:03:52

CentOS 6.5上部署Heartbeat

环境说明:

主机名角色IP地址VIP
heartbeat01.contoso.comHeartbeat节点1eth0:192.168.49.133
eth1:172.16.49.133(心跳连接)
172.16.49.100
heartbeat02.contoso.comHeartbeat节点2eth0:192.168.49.134
eth1:172.16.49.134(心跳连接)

一、准备工作
以下操作除非特别指明,否则均需在两台服务器上操作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 关闭iptables防火墙并禁用SELinux
/etc/init.d/iptables stop
chkconfig iptables off
sed -i '/^SELINUX/s/enforcing/disabled/' /etc/selinux/config
setenforce 0

# 设置时间同步
crontab -e# 添加计划任务
0 * * * * /usr/sbin/ntpdate   210.72.145.44 64.147.116.229 time.nist.gov
:wq
或者
echo '0 * * * * /usr/sbin/ntpdate   210.72.145.44 64.147.116.229 time.nist.gov' >>/var/spool/cron/root# 添加计划任务
crontab -l# 检查计划任务是否存在
0 * * * * /usr/sbin/ntpdate   210.72.145.44 64.147.116.229 time.nist.gov

# 设置主机名(以heartbeat01为例,heartbeat02同样的方法)
sed -i '/^HOSTNAME/s/^/#/' /etc/sysconfig/network
sed -i '/#HOSTNAME/aHOSTNAME=heartbeat01.contoso.com' /etc/sysconfig/network
grep HOSTNAME /etc/sysconfig/network
hostname heartbeat01.contoso.com
或者
sed -i '/^HOSTNAME/d' /etc/sysconfig/network
echo 'HOSTNAME=heartbeat01.contoso.com' >>/etc/sysconfig/network
grep HOSTNAME /etc/sysconfig/network
hostname heartbeat01.contoso.com

# 编辑/etc/hosts文件
echo -e '192.168.49.133heartbeat01.contoso.com\n192.168.49.134heartbeat02.contoso.com' >>/etc/hosts
tail -2 /etc/hosts

# 添加一条主机路由
/sbin/route add -host 172.16.49.134 dev eth1 # 在heartbeat01上配置
echo '/sbin/route add -host 172.16.49.134 dev eth1' >>/etc/rc.local # 在heartbeat01上配置
/sbin/route add -host 172.16.49.133 dev eth1 # 在heartbeat02上配置
echo '/sbin/route add -host 172.16.49.133 dev eth1' >>/etc/rc.local # 在heartbeat02上配置
route -n #添加之后分别在heartbeat01和heartbeat02上检查




二、安装heartbeat软件

1
2
rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum -y install heartbeat*




三、编辑heartbeat配置文件
1)拷贝配置文件

1
2
3
cp /usr/share/doc/heartbeat-3.0.4/{ha.cf,haresources,authkeys} /etc/ha.d/
ll /etc/ha.d/
cd /etc/ha.d/




2)配置authkeys
# egrep -v "#|^$" authkeys
auth 2
2 sha1 c6091592594cd14c
# egrep -v "#|^$" authkeys
auth 2
2 sha1 c6091592594cd14c
# 两个节点的配置一致
3)配置ha.cf
下面以使用单播的方式为例,给出两个节点的配置:
# egrep -v "#|^$" ha.cf
debugfile /var/log/ha-debug#设置debug文件位置
logfile        /var/log/ha-log#设置日志文件位置
logfacility        local1#设置记录日志的设备
keepalive 2#设置发送心跳报文的时间间隔
deadtime 30#设置确认对端死亡的时间间隔
warntime 10#设置发出最后的心跳警告报文的间隔
initdead 60#设置初始化时间
ucast eth1 172.16.49.134#设定侦听的心跳线的接口和对应的对端接口的IP地址
auto_failback on#启用自动恢复模式,当拥有该资源的属主恢复之后,属主将回收该资源
node        heartbeat01.contoso.com#指定节点1,节点的名称一定要和uname -n的结果一致
node        heartbeat02.contoso.com#指定节点2
ping 172.16.49.1#指定第三方仲裁节点
respawn hacluster /usr/lib64/heartbeat/ipfail#使用这个脚本去侦听对方是否还活着(使用的是ICMP报文检测)
# egrep -v "#|^$" ha.cf
debugfile /var/log/ha-debug
logfile        /var/log/ha-log
logfacility        local1
keepalive 2
deadtime 30
warntime 10
initdead 60
ucast eth1 172.16.49.133
auto_failback on
node        heartbeat01.contoso.com
node        heartbeat02.contoso.com
ping 172.16.49.1
respawn hacluster /usr/lib64/heartbeat/ipfail
# 两个节点的差别只有单播的对端IP不一样,其他都一样
4)配置haresources
echo 'heartbeat01.contoso.comIPaddr::172.16.49.100/24/eth1' >>/etc/ha.d/haresources
# egrep -v "#|^$" haresources
heartbeat01.contoso.comIPaddr::172.16.49.100/24/eth1
# egrep -v "#|^$" haresources
heartbeat01.contoso.comIPaddr::172.16.49.100/24/eth1
# 两个节点的配置一致
四、启动heartbeat并测试

1
/etc/init.d/heartbeat start#分别在heartbeat01和heartbeat02上执行




到两个节点上分别查看VIP:
# ip addr |grep 172.16.49
    inet 172.16.49.133/24 brd 172.16.49.255 scope global eth1
    inet 172.16.49.100/24 brd 172.16.49.255 scope global secondary eth1
# ip addr |grep 172.16.49
    inet 172.16.49.134/24 brd 172.16.49.255 scope global eth1
然后,将heartbeat01上的heartbeat服务关闭,再进行查看:
# /etc/init.d/heartbeat stop
Stopping High-Availability services: Done.

# ip addr |grep 172.16.49
    inet 172.16.49.133/24 brd 172.16.49.255 scope global eth1
# ip addr |grep 172.16.49
    inet 172.16.49.134/24 brd 172.16.49.255 scope global eth1
    inet 172.16.49.100/24 brd 172.16.49.255 scope global secondary eth1
可以看到,VIP已经从heartbeat01上转移到heartbeat02上了。

在VIP切换过程中,从另一台主机ping VIP地址,间断时间非常短暂。
五、检查日志

/var/log/ha-log
================================================
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Comm_now_up(): updating status to active
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Local status now set to: 'active'
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,499)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498gid 499 (pid 7312)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Status update for node heartbeat02.contoso.com: status active
harc(default):        2016/09/22_05:44:26 info: Running /etc/ha.d//rc.d/status status
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: : info: Asking other side for ping node count.
Sep 22 05:44:36 heartbeat01.contoso.com ipfail: : info: No giveup timer to abort.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: Initial resource acquisition complete (T_RESOURCES(us))
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:37 INFO:Resource is stopped
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: Local Resource acquisition completed.
harc(default):        2016/09/22_05:44:37 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default):        2016/09/22_05:44:37 received ip-request-resp IPaddr::172.16.49.100/24/eth1 OK yes
ResourceManager(default):        2016/09/22_05:44:37 info: Acquiring resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:37 INFO:Resource is stopped
ResourceManager(default):        2016/09/22_05:44:37 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 start
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO: Adding inet address 172.16.49.100/24 with broadcast address 172.16.49.255 to device eth1
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO: Bringing device eth1 up
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.49.100 eth1 172.16.49.100 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO:Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: : info: Heartbeat shutdown in progress. (7284)
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: : info: Giving up all HA resources.
ResourceManager(default):        2016/09/22_05:44:40 info: Releasing resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
ResourceManager(default):        2016/09/22_05:44:40 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 stop
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:40 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:40 INFO:Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: : info: All HA resources relinquished.
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: : WARN: 1 lost packet(s) for
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: : info: No pkts missing from heartbeat02.contoso.com!
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: : info: killing /usr/lib64/heartbeat/ipfail process group 7312 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBFIFO process 7288 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBWRITE process 7289 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBREAD process 7290 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBWRITE process 7291 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBREAD process 7292 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7292 exited. 5 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7289 exited. 4 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7290 exited. 3 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7291 exited. 2 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7288 exited. 1 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: heartbeat01.contoso.com Heartbeat shutdown complete.

/var/log/ha-debug
================================================
Sep 22 05:44:14 heartbeat01.contoso.com heartbeat: : info: **************************
Sep 22 05:44:14 heartbeat01.contoso.com heartbeat: : info: Configuration validated. Starting heartbeat 3.0.4
Sep 22 05:44:14 heartbeat01.contoso.com heartbeat: : info: heartbeat: version 3.0.4
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: Heartbeat generation: 1474533038
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ucast: bound send socket to device: eth1
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ucast: set SO_REUSEPORT(w)
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ucast: bound receive socket to device: eth1
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ucast: set SO_REUSEPORT(w)
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ucast: started on port 694 interface eth1 to 172.16.49.134
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: glib: ping heartbeat started.
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: G_main_add_TriggerHandler: Added signal manual handler
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: G_main_add_TriggerHandler: Added signal manual handler
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: G_main_add_SignalHandler: Added signal handler for signal 17
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: Local status now set to: 'up'
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: Link 172.16.49.1:172.16.49.1 up.
Sep 22 05:44:15 heartbeat01.contoso.com heartbeat: : info: Status update for node 172.16.49.1: status ping
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Link heartbeat02.contoso.com:eth1 up.
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Status update for node heartbeat02.contoso.com: status up
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default):        2016/09/22_05:44:26 info: Running /etc/ha.d//rc.d/status status
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Comm_now_up(): updating status to active
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Local status now set to: 'active'
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,499)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : debug: get_delnodelist: delnodelist=
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Starting "/usr/lib64/heartbeat/ipfail" as uid 498gid 499 (pid 7312)
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : info: Status update for node heartbeat02.contoso.com: status active
Sep 22 05:44:26 heartbeat01.contoso.com heartbeat: : debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default):        2016/09/22_05:44:26 info: Running /etc/ha.d//rc.d/status status
Sep 22 05:44:26 heartbeat01.contoso.com ipfail: : debug: PID=7312
Sep 22 05:44:26 heartbeat01.contoso.com ipfail: : debug: Signing in with heartbeat
Sep 22 05:44:27 heartbeat01.contoso.com ipfail: : debug:
Sep 22 05:44:27 heartbeat01.contoso.com ipfail: : debug: auto_failback -> 1 (on)
Sep 22 05:44:27 heartbeat01.contoso.com ipfail: : debug: Setting message filter mode
Sep 22 05:44:28 heartbeat01.contoso.com ipfail: : debug: Starting node walk
Sep 22 05:44:29 heartbeat01.contoso.com ipfail: : debug: Cluster node: 172.16.49.1: status: ping
Sep 22 05:44:29 heartbeat01.contoso.com ipfail: : debug: Cluster node: heartbeat02.contoso.com: status: active
Sep 22 05:44:30 heartbeat01.contoso.com ipfail: : debug:
Sep 22 05:44:30 heartbeat01.contoso.com ipfail: : debug: Cluster node: heartbeat01.contoso.com: status: active
Sep 22 05:44:31 heartbeat01.contoso.com ipfail: : debug: Setting message signal
Sep 22 05:44:31 heartbeat01.contoso.com ipfail: : debug: Waiting for messages...
Sep 22 05:44:32 heartbeat01.contoso.com ipfail: : debug: Got join message from another ipfail client. (heartbeat02.contoso.com)
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: : debug: Found ping node 172.16.49.1!
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: : info: Asking other side for ping node count.
Sep 22 05:44:33 heartbeat01.contoso.com ipfail: : debug: Message sent.
Sep 22 05:44:36 heartbeat01.contoso.com ipfail: : info: No giveup timer to abort.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: remote resource transition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: Initial resource acquisition complete (T_RESOURCES(us))
Sep 22 05:44:37 heartbeat01.contoso.com ipfail: : debug: Other side is now stable.
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:37 INFO:Resource is stopped
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : info: Local Resource acquisition completed.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : debug: StartNextRemoteRscReq(): child count 1
Sep 22 05:44:37 heartbeat01.contoso.com ipfail: : debug: Other side is now stable.
Sep 22 05:44:37 heartbeat01.contoso.com heartbeat: : debug: notify_world: setting SIGCHLD Handler to SIG_DFL
harc(default):        2016/09/22_05:44:37 info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
ip-request-resp(default):        2016/09/22_05:44:37 received ip-request-resp IPaddr::172.16.49.100/24/eth1 OK yes
ResourceManager(default):        2016/09/22_05:44:37 info: Acquiring resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:37 INFO:Resource is stopped
ResourceManager(default):        2016/09/22_05:44:37 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 start
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO: Adding inet address 172.16.49.100/24 with broadcast address 172.16.49.255 to device eth1
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO: Bringing device eth1 up
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-172.16.49.100 eth1 172.16.49.100 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:38 INFO:Success
INFO:Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: : info: Heartbeat shutdown in progress. (7284)
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: : info: Giving up all HA resources.
ResourceManager(default):        2016/09/22_05:44:40 info: Releasing resource group: heartbeat01.contoso.com IPaddr::172.16.49.100/24/eth1
ResourceManager(default):        2016/09/22_05:44:40 info: Running /etc/ha.d/resource.d/IPaddr 172.16.49.100/24/eth1 stop
IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:40 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_172.16.49.100):        2016/09/22_05:44:40 INFO:Success
INFO:Success
Sep 22 05:44:40 heartbeat01.contoso.com heartbeat: : info: All HA resources relinquished.
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: : WARN: 1 lost packet(s) for
Sep 22 05:44:41 heartbeat01.contoso.com ipfail: : debug: Other side is now stable.
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: : info: No pkts missing from heartbeat02.contoso.com!
Sep 22 05:44:41 heartbeat01.contoso.com heartbeat: : info: killing /usr/lib64/heartbeat/ipfail process group 7312 with signal 15
ARPING 172.16.49.100 from 172.16.49.100 eth1
Sent 5 probes (5 broadcast(s))
Received 0 response(s)
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBFIFO process 7288 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBWRITE process 7289 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBREAD process 7290 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBWRITE process 7291 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: killing HBREAD process 7292 with signal 15
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7292 exited. 5 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7289 exited. 4 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7290 exited. 3 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7291 exited. 2 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: Core process 7288 exited. 1 remaining
Sep 22 05:44:43 heartbeat01.contoso.com heartbeat: : info: heartbeat01.contoso.com Heartbeat shutdown complete.


页: [1]
查看完整版本: CentOS 6.5上部署Heartbeat