设为首页 收藏本站
查看: 3573|回复: 0

[经验分享] 主板故障导致服务器不定时频繁重启故障解决过程全记录

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2014-11-17 13:22:59 | 显示全部楼层 |阅读模式
服务器:HP DL385 G7
操作系统:suse10 sp3
数据库:oracle 11gR2
集群软件:VCS 双机主备
环境:两台服务器使用VCS软件做的oracle主备切换数据库
故障现象:
1.两台数据库主机不定期频繁重启,每次重启时在操作系统message日志中均没有任何记录;
2.系统启动时,message 日志出现与硬件相关的错误信息
message 日志信息:
-------------------------------------------------------------------------------------------------------------
Oct 27 17:51:01linux10 /usr/sbin/cron[5968]: (CRON) STARTUP (V5.0)
Oct 27 17:51:02linux10 sshd[6047]: Server listening on 0.0.0.0 port 22.
Oct 2717:51:02 linux10 rcpowersaved: CPU frequency scaling is not supported by yourprocessor.
Oct 2717:51:02 linux10 rcpowersaved: enter 'CPUFREQ_ENABLED=no' in/etc/powersave/cpufreq to avoid this warning.
Oct 2717:51:02 linux10 rcpowersaved: Cannot load cpufreq governors - No cpufreqdriver available
Oct 27 17:51:03linux10 rcpowersaved: s2ram does not know your machine. See 's2ram -i' fordetails. (127)
Oct 27 17:51:03linux10 rcpowersaved: Use SUSPEND2RAM_FORCE=yes to override this detection.
Oct 27 17:51:03linux10 modprobe: FATAL: Error running install command for binfmt_misc
Oct 27 17:51:06linux10 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 27 17:51:06linux10 kernel: Floppy drive(s): fd0 is 1.44M
Oct 27 17:51:06linux10 syslog-ng[5762]: Changing permissions on special file /dev/xconsole
Oct 27 17:51:06linux10 syslog-ng[5762]: Changing permissions on special file /dev/tty10
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-10 - disabling barriers
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-11 - disabling barriers
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-12 - disabling barriers
Oct 27 17:51:06linux10 kernel: AppArmor: AppArmor initialized
Oct 27 17:51:06linux10 kernel: audit(1414403451.182:2): info="AppArmor initialized" pid=4403
Oct 27 17:51:06linux10 kernel: floppy0: no floppy controllers found
Oct 27 17:51:06linux10 kernel: ACPI: Power Button (FF) [PWRF]
Oct 27 17:51:06linux10 kernel: rdac: device handler unregistered
Oct 27 17:51:06linux10 kernel: No dock devices found.
Oct 27 17:51:06linux10 kernel: bnx2: eth0: using MSI
Oct 27 17:51:06linux10 kernel: bnx2: eth1: using MSI
Oct 27 17:51:06linux10 kernel: Ethernet Channel Bonding Driver: v3.2.5 (March 21, 2008)
Oct 27 17:51:06linux10 kernel: bonding: Warning: either miimon or arp_interval andarp_ip_target module parameters must be s
pecified, otherwisebonding will not detect link failures! see bonding.txt for details.
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-8 - disabling barriers
Oct 27 17:51:06linux10 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
Oct 27 17:51:06linux10 kernel: bonding: bond0: setting mode to active-backup (1).
Oct 27 17:51:06linux10 kernel: bonding: bond0: Setting MII monitoring interval to 100.
Oct 27 17:51:06linux10 kernel: bonding: bond0: Setting use_carrier to 0.
Oct 27 17:51:06linux10 kernel: bnx2: eth0: using MSI
Oct 27 17:51:06linux10 kernel: bonding: bond0: enslaving eth0 as a backup interface with adown link.
Oct 27 17:51:06linux10 kernel: bnx2: eth1: using MSI
Oct 27 17:51:06linux10 kernel: bonding: bond0: enslaving eth1 as a backup interface with adown link.
Oct 27 17:51:06linux10 kernel: audit(1414403461.814:3): audit_pid=5906 old=0 byauid=4294967295
Oct 27 17:51:06linux10 kernel: llt: module not supported by Novell, setting U taint flag.
Oct 27 17:51:06linux10 kernel: LLT INFO V-14-1-10009 LLT 5.1.100.000-SP1GA Protocol available
Oct 27 17:51:06linux10 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Oct 27 17:51:06linux10 kernel: bonding: bond0: link status definitely up for interface eth1.
Oct 27 17:51:06linux10 kernel: bonding: bond0: making interface eth1 the new active one.
Oct 27 17:51:06linux10 kernel: bonding: bond0: first active interface up!
Oct 27 17:51:06linux10 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
Oct 27 17:51:06linux10 kernel: powernow-k8: Found 4 AMD Opteron(tm) Processor 6134 processors(16 cpu cores) (version 2.20.0
0)
Oct 27 17:51:06linux10 kernel: powernow-k8: MP systems not supported by PSB BIOS structure
…………
Oct 28 18:10:01linux10 /usr/sbin/cron[17099]: (root) CMD (/usr/sbin/ntpdate 172.29.141.162)
Oct 28 18:11:14linux10 zmd: ShutdownManager (WARN): Preparing to sleep...
Oct 28 18:11:15linux10 zmd: ShutdownManager (WARN): Going to sleep, waking up at 10/29/201417:41:08
Oct 28 18:11:49linux10 syslog-ng[5762]: Error connecting to remote hostAF_INET(172.29.141.162:5140), reattempting in 60 sec
onds
…………
-----------------------------------------------------------------------
在上面的日志中出现两个问题分别为:
一、zmd:ShutdownManager (WARN): Preparing to sleep…
二、
Oct 27 17:51:02linux10 rcpowersaved: CPU frequency scaling is not supported by your processor.
Oct 27 17:51:02linux10 rcpowersaved: enter 'CPUFREQ_ENABLED=no' in /etc/powersave/cpufreq toavoid this warning.
Oct 27 17:51:02linux10 rcpowersaved: Cannot load cpufreq governors - No cpufreq driveravailable
…………
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-10 - disabling barriers
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-11 - disabling barriers
Oct 27 17:51:06linux10 kernel: JBD: barrier-based sync failed on dm-12 - disabling barriers

问题一 由ZMD服务器引起,Novell对ZMD服务的解释为:
The zmd daemonperforms software management functions on the ZENworks managed device,including updating, installing, and removing software, and performing basicqueries of the device's package management database. Typically, thesemanagement tasks are initiated through the ZENworks Control Center or the rug,zen-installer, zen-updater, or zen-remover utilities, which means you shouldnot need to interact directly with zmd.
ZMD服务主要负责用户软件的更新、安装管理操作,在开机时自动启动,ZMD服务启动后,默认每六小时联网更新,更新时会占用80端口,因此经常会与tomcat 等服务器产生端口,因此在软件安装或更新完后,可以及时关闭此服务,
#/etc/init.d/novell-zmdstatus
Checking for ZENworksManagement Daemon:                             running
#/etc/init.d/novell-zmdstop
Shutting downZENworks Management Daemon                              done
注:关闭 此服务后,安装软件是比较麻烦,因此在需要时可以在此打开,改服务在更新时有可能会长时间锁定/etc/mtab,因此需要注意。
解决方法:
关闭novell-zmd服务后,此日志消失。
有时我们为了提高开机速度,会将novell-zmd服务进行关闭
chkconfig -deletenovell-zmd

问题二:
单从日志信息上看cpu不支持变频的问题,由于在操作系统和VCS日志中均没有发现其他异常,因此怀疑是服务器硬件出了问题,去机房一看,服务器住面板有电流符号的故障灯显示橘红色,这时基本就能放松了,硬件肯定是不对了,于是收集硬件日志联系HP厂商,经确定是主板故障,更换主板后,服务器没有重启。



运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-30446-1-1.html 上篇帖子: 更换服务器主板导致vcs不能启动解决方案 下篇帖子: 一次HP 阵列卡排障 服务器 记录
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表