Heartbeat中Watchdog的使用
内核有它自己的方法处理挂起的系统,叫做watchdog,watchdog只是一个内核模块,它检查计时器确定系统是健康的,如果watchdog认为内核挂起,它可能做出激烈的响应,如重启系统,如果你想保护你的高可用服务器配置, 服务器挂起时导致服务中断,Heartbeat也检测不到,你应该在你的内核中启用watchdog。注意:这里我们讨论的是服务器挂起而不是应用程序问题,Heartbeat(在编写本书时Heartbeat 2之前的版本还不可用)不会监视它控制的资源或应用程序,看其是否健康-- 要监视必须使用另一个软件包,如Mon,将在第四部分中详细讨论它。正常情况下,连接到系统的watchdog设备允许内核判断系统是否挂起(当内核看不到外部的计时器设备正确地更新时,它就知道出现某些错误了)。
watchdog代码也支持用软件替换外部的硬件计时器,该软件叫做 softdog,softdog维护一个内部计时器,在另一个进程写入/dev/watchdog设备文件时更新,如果softdog没有看到进程写入/dev/watchdog文件,它认为内核一定出故障了,它将启动一个内核恐慌,正常情况下,内核恐慌将导致系统关闭,但是你可以修改这个默认行为,将 其改为默认行为为重启系统。
当你在/etc/ha.d/ha.cf文件中启用了watchdog选项 后,Heartbeat将每隔相当于deadtime长的时间写入/dev/watchdog文件(或设备),因此,出现任何导致Heartbeat更新 watchdog设备失败的事情,一旦watchdog超时周期(默认是一分钟)过期,watchdog将启动内核恐慌。
配置信息如下:
# yum install watchdog*
Loaded plugins: rhnplugin, security
This system is not registered with RHN.
RHN support will be disabled.
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package watchdog.i386 0:5.6-1.el5 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
====================================================================================================================================
Package Arch Version Repository Size
====================================================================================================================================
Installing:
watchdog i386 5.6-1.el5 base 66k
Transaction Summary
====================================================================================================================================
Install 1 Package(s)
Update 0 Package(s)
Remove 0 Package(s)
Total download size: 66 k
Is this ok : y
Downloading Packages:
watchdog-5.6-1.el5.i386.rpm |66kB 00:00
Running rpm_check_debug
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : watchdog 1/1
Installed:
watchdog.i386 0:5.6-1.el5
2.开启看门狗选项
# vi /etc/ha.d/ha.cf
watchdog /dev/watchdog取消开头的# 注释
# lsmod
Module SizeUsed by
softdog 99412
3.测试(模拟heartbeat突然崩溃)
# ps -ef|grep heartbeat
root 6384 10 15:04 ? 00:00:00 heartbeat: master control process
nobody 638763840 15:04 ? 00:00:00 heartbeat: FIFO reader
nobody 638863840 15:04 ? 00:00:00 heartbeat: write: ucast eth0
nobody 638963840 15:04 ? 00:00:00 heartbeat: read: ucast eth0
nobody 639063840 15:04 ? 00:00:00 heartbeat: write: ucast eth1
nobody 639163840 15:04 ? 00:00:00 heartbeat: read: ucast eth1
nobody 639263840 15:04 ? 00:00:00 heartbeat: write: ping 172.18.4.50
nobody 639363840 15:04 ? 00:00:00 heartbeat: read: ping 172.18.4.50
root 642055150 15:06 pts/1 00:00:00 grep heartbeat
# Killall -9 heartbeat
# ps -ef|grep heartbeat
root 643055150 15:09 pts/1 00:00:00 grep heartbeat
4.观察日志:
# tail -f /var/log/message
Aug2 15:09:39 Server kernel: SoftDog: Unexpected close, not stopping watchdog!
我们发现 系统已经重启了。。。
参考文章:
http://www.ixdba.net/article/97/2036.html (Heartbeat中Watchdog和Softdog的使用)
页:
[1]