[iyunv@node1 ~]# tail -f /var/log/messages
Nov 26 07:52:21 node1 heartbeat: [3688]: info: Configuration validated. Starting heartbeat 2.0.8
Nov 26 07:52:21 node1 heartbeat: [3689]: info: heartbeat: version 2.0.8
Nov 26 07:52:21 node1 heartbeat: [3689]: info: Heartbeat generation: 3
Nov 26 07:52:21 node1 heartbeat: [3689]: info: G_main_add_TriggerHandler: Added signal manual handler
Nov 26 07:52:21 node1 heartbeat: [3689]: info: G_main_add_TriggerHandler: Added signal manual handler
Nov 26 07:52:21 node1 heartbeat: [3689]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
Nov 26 07:52:21 node1 heartbeat: [3689]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 - Status: 1
Nov 26 07:52:21 node1 heartbeat: [3689]: info: glib: ping heartbeat started.
Nov 26 07:52:21 node1 heartbeat: [3689]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Nov 26 07:52:21 node1 heartbeat: [3689]: info: Local status now set to: 'up'
Nov 26 07:52:22 node1 heartbeat: [3689]: info: Link node1:eth1 up.
Nov 26 07:52:23 node1 heartbeat: [3689]: info: Link 192.168.60.1:192.168.60.1 up.
Nov 26 07:52:23 node1 heartbeat: [3689]: info: Status update for node 192.168.60.1: status ping
此段日志是Heartbeat在进行初始化配置,例如,Heartbeat的心跳时间间隔、UDP广播端口和ping节点的运行状态等,日志信息到 这里会暂停,等待120秒之后,Heartbeat会继续输出日志,而这个120秒刚好是ha.cf中"initdead"选项的设定时间。此时 Heartbeat的输出信息如下:
Nov 26 07:54:22 node1 heartbeat: [3689]: WARN: node node2: is dead
Nov 26 07:54:22 node1 heartbeat: [3689]: info: Comm_now_up(): updating status to active
Nov 26 07:54:22 node1 heartbeat: [3689]: info: Local status now set to: 'active'
Nov 26 07:54:22 node1 heartbeat: [3689]: info: Starting child client "/usr/lib/heartbeat/ipfail" (694,694)
Nov 26 07:54:22 node1 heartbeat: [3689]: WARN: No STONITH device configured.
Nov 26 07:54:22 node1 heartbeat: [3689]: WARN: Shared disks are not protected.
Nov 26 07:54:22 node1 heartbeat: [3689]: info: Resources being acquired from node2.
Nov 26 07:54:22 node1 heartbeat: [3712]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 694 gid 694 (pid 3712)
在上面这段日志中,由于node2还没有启动,因此会给出"node2: is dead"的警告信息,接下来启动了Heartbeat插件ipfail。由于我们在ha.cf文件中没有配置STONITH,因此日志里也给出了"No STONITH device configured"的警告提示。
继续看下面的日志:
Nov 26 07:54:23 node1 harc[3713]: info: Running /etc/ha.d/rc.d/status status
Nov 26 07:54:23 node1 mach_down[3735]: info: /usr/lib/ heartbeat/mach_down: nice_failback: foreign resources acquired
Nov 26 07:54:23 node1 mach_down[3735]: info: mach_down takeover complete for node node2.
Nov 26 07:54:23 node1 heartbeat: [3689]: info: mach_down takeover complete.
Nov 26 07:54:23 node1 heartbeat: [3689]: info: Initial resource acquisition complete (mach_down)
Nov 26 07:54:24 node1 IPaddr[3768]: INFO: Resource is stopped
Nov 26 07:54:24 node1 heartbeat: [3714]: info: Local Resource acquisition completed.
Nov 26 07:54:24 node1 harc[3815]: info: Running /etc/ha. d/rc.d/ip-request-resp ip-request-resp
Nov 26 07:54:24 node1 ip-request-resp[3815]: received ip- request-resp 192.168.60.200/24/eth0 OK yes
Nov 26 07:54:24 node1 ResourceManager[3830]: info: Acquiring resource group: node1 192.168.60.200/24/eth0 Filesystem: :/dev/sdb5::/webdata::ext3
Nov 26 07:54:24 node1 IPaddr[3854]: INFO: Resource is stopped
Nov 26 07:54:25 node1 ResourceManager[3830]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.60.200/24/eth0 start
Nov 26 07:54:25 node1 IPaddr[3932]: INFO: Using calculated netmask for 192.168.60.200: 255.255.255.0
Nov 26 07:54:25 node1 IPaddr[3932]: DEBUG: Using calculated broadcast for 192.168.60.200: 192.168.60.255
Nov 26 07:54:25 node1 IPaddr[3932]: INFO: eval /sbin/ifconfig eth0:0 192.168.60.200 netmask 255.255.255.0 broadcast 192.168.60.255
Nov 26 07:54:25 node1 avahi-daemon[1854]: Registering new address record for 192.168.60.200 on eth0.
Nov 26 07:54:25 node1 IPaddr[3932]: DEBUG: Sending Gratuitous Arp for 192.168.60.200 on eth0:0 [eth0]
Nov 26 07:54:26 node1 IPaddr[3911]: INFO: Success
Nov 26 07:54:26 node1 Filesystem[4021]: INFO: Resource is stopped
Nov 26 07:54:26 node1 ResourceManager[3830]: info: Running /etc/ha.d/resource.d/ Filesystem/dev/sdb5 /webdata ext3 start
Nov 26 07:54:26 node1 Filesystem[4062]: INFO: Running start for /dev/sdb5 on /webdata
Nov 26 07:54:26 node1 kernel: kjournald starting. Commit interval 5 seconds
Nov 26 07:54:26 node1 kernel: EXT3 FS on sdb5, internal journal
Nov 26 07:54:26 node1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov 26 07:54:26 node1 Filesystem[4059]: INFO: Success
Nov 26 07:54:33 node1 heartbeat: [3689]: info: Local Resource acquisition completed. (none)
Nov 26 07:54:33 node1 heartbeat: [3689]: info: local resource transition completed
上面这段日志是进行资源的监控和接管,主要完成haresources文件中的设置,在这里是启用集群虚拟IP和挂载磁盘分区。
Nov 26 07:57:15 node2 heartbeat: [2110]: info: Link node1:eth1 up.
Nov 26 07:57:15 node2 heartbeat: [2110]: info: Status update for node node1: status active
Nov 26 07:57:15 node2 heartbeat: [2110]: info: Link node1:eth0 up.
Nov 26 07:57:15 node2 harc[2123]: info: Running /etc/ha.d/rc.d/status status
Nov 26 07:57:15 node2 heartbeat: [2110]: info: Comm_now_up(): updating status to active
Nov 26 07:57:15 node2 heartbeat: [2110]: info: Local status now set to: 'active'
Nov 26 07:57:15 node2 heartbeat: [2110]: info: Starting child client "/usr/lib/heartbeat/ipfail" (694,694)
Nov 26 07:57:15 node2 heartbeat: [2110]: WARN: G _CH_dispatch_int: Dispatch function for read child took too long to execute: 70 ms (> 50 ms) (GSource: 0x8f62080)
Nov 26 07:57:15 node2 heartbeat: [2134]: info: Starting "/usr/lib/heartbeat/ipfail" as uid 694 gid 694 (pid 2134)
备份节点检测到node1处于活动状态,没有可以接管的资源,因此仅仅启动了网络监听插件ipfail,监控主节点的心跳。
Nov 26 09:04:09 node1 heartbeat: [3689]: info: Link node2:eth0 dead.
Nov 26 09:04:09 node1 heartbeat: [3689]: info: Link 192.168.60.1:192.168.60.1 dead.
Nov 26 09:04:09 node1 ipfail: [3712]: info: Status update: Node 192.168.60.1 now has status dead
Nov 26 09:04:09 node1 harc[4279]: info: Running /etc/ha.d/rc.d/status status
Nov 26 09:04:10 node1 ipfail: [3712]: info: NS: We are dead. :<
Nov 26 09:04:10 node1 ipfail: [3712]: info: Link Status update: Link node2/eth0 now has status dead
…… 中间部分省略 ……
Nov 26 09:04:20 node1 heartbeat: [3689]: info: node1 wants to go standby [all]
Nov 26 09:04:20 node1 heartbeat: [3689]: info: standby: node2 can take our all resources
Nov 26 09:04:20 node1 heartbeat: [4295]: info: give up all HA resources (standby).
Nov 26 09:04:21 node1 ResourceManager[4305]: info: Releasing resource group: node1 192.168.60.200/24/eth0 Filesystem::/dev/sdb5::/webdata::ext3
Nov 26 09:04:21 node1 ResourceManager[4305]: info: Running /etc/ha.d/resource.d/ Filesystem/dev/sdb5 /webdata ext3 stop
Nov 26 09:04:21 node1 Filesystem[4343]: INFO: Running stop for /dev/sdb5 on /webdata
Nov 26 09:04:21 node1 Filesystem[4343]: INFO: Trying to unmount /webdata
Nov 26 09:04:21 node1 Filesystem[4343]: INFO: unmounted /webdata successfully
Nov 26 09:04:21 node1 Filesystem[4340]: INFO: Success
Nov 26 09:04:22 node1 ResourceManager[4305]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.60.200/24/eth0 stop
Nov 26 09:04:22 node1 IPaddr[4428]: INFO: /sbin/ifconfig eth0:0 192.168.60.200 down
Nov 26 09:04:22 node1 avahi-daemon[1854]: Withdrawing address record for 192.168.60.200 on eth0.
Nov 26 09:04:22 node1 IPaddr[4407]: INFO: Success
备用节点在接管主节点资源时的日志信息如下:
Nov 26 09:02:58 node2 heartbeat: [2110]: info: Link node1:eth0 dead.
Nov 26 09:02:58 node2 ipfail: [2134]: info: Link Status update: Link node1/eth0 now has status dead
Nov 26 09:02:59 node2 ipfail: [2134]: info: Asking other side for ping node count.
Nov 26 09:02:59 node2 ipfail: [2134]: info: Checking remote count of ping nodes.
Nov 26 09:03:02 node2 ipfail: [2134]: info: Telling other node that we have more visible ping nodes.
Nov 26 09:03:09 node2 heartbeat: [2110]: info: node1 wants to go standby [all]
Nov 26 09:03:10 node2 heartbeat: [2110]: info: standby: acquire [all] resources from node1
Nov 26 09:03:10 node2 heartbeat: [2281]: info: acquire all HA resources (standby).
Nov 26 09:03:10 node2 ResourceManager[2291]: info: Acquiring resource group: node1 192.168.60.200/24/eth0 Filesystem::/dev/sdb5::/webdata::ext3
Nov 26 09:03:10 node2 IPaddr[2315]: INFO: Resource is stopped
Nov 26 09:03:11 node2 ResourceManager[2291]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.60.200/24/eth0 start
Nov 26 09:03:11 node2 IPaddr[2393]: INFO: Using calculated netmask for 192.168.60.200: 255.255.255.0
Nov 26 09:03:11 node2 IPaddr[2393]: DEBUG: Using calculated broadcast for 192.168.60.200: 192.168.60.255
Nov 26 09:03:11 node2 IPaddr[2393]: INFO: eval /sbin/ifconfig eth0:0 192.168.60.200 netmask 255.255.255.0 broadcast 192.168.60.255
Nov 26 09:03:12 node2 avahi-daemon[1844]: Registering new address record for 192.168.60.200 on eth0.
Nov 26 09:03:12 node2 IPaddr[2393]: DEBUG: Sending Gratuitous Arp for 192.168.60.200 on eth0:0 [eth0]
Nov 26 09:03:12 node2 IPaddr[2372]: INFO: Success
Nov 26 09:03:12 node2 Filesystem[2482]: INFO: Resource is stopped
Nov 26 09:03:12 node2 ResourceManager[2291]: info: Running /etc/ha.d/resource.d/ Filesystem/dev/sdb5 /webdata ext3 start
Nov 26 09:03:13 node2 Filesystem[2523]: INFO: Running start for /dev/sdb5 on /webdata
Nov 26 09:03:13 node2 kernel: kjournald starting. Commit interval 5 seconds
Nov 26 09:03:13 node2 kernel: EXT3 FS on sdb5, internal journal
Nov 26 09:03:13 node2 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Nov 26 09:03:13 node2 Filesystem[2520]: INFO: Success
3.在主节点上拔去电源线