nagios监控raid下磁盘和raid状态脚本实现

9780j · 发表于 2016-2-15 15:14:30

Linux下服务器做了硬件raid之后，磁盘的状态比较难定位，windows则可以通过MegaRAID来检测，此脚本通过MegaCli来达到定位raid下哪块磁盘是坏块的功能，在nagios上面可以实现通过定期通过检测以邮箱或者短信等形式，来达到预警的功能，脚本在几台物理机上面测试过，是没问题的，分享给各位，也希望大家能相互讨论，学习。一、安装Megacli:

1	rpm-ivh megacli-8.00.46-2.x86_64.rpm

二、添加脚本到nagios监控：
执行visudo,然后在文件中root ALL=(ALL) ALL下面加入如下一行:

1	nagios ALL=(ALL)NOPASSWD:/usr/local/nagios/libexec/check_raid.sh

并注释以下一行

1	#Defaults requiretty

把脚本放在/usr/local/nagios/libexec目录下,chmod +x check_raid.sh ,赋予x权限,并编辑/usr/local/nagios/etc/nrpe.cfg加入

1	command[check_raid]=/usr/bin/sudo/usr/local/nagios/libexec/check_raid.sh

重启nrpe(根据安装方式的不同，可能有差异)

1 2	#pkill nrpe #/usr/local/nagios/bin/nrpe -c/usr/local/nagios/etc/nrpe.cfg -d

三、监控脚本说明：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

#!/bin/sh
#Program:
# for monitor raid disk state
#history:
#------ First release
#检测是否是LSI卡
rcexist=`dmesg| grep RAID | grep LSI`
if [ ! -n"$rcexist" ]; then
echo "not LSI or no raid"
exit 2
fi

OUTPUT=''

#判断raid类型
R1=`/usr/sbin/MegaCli-cfgdsply -aALL | grep "RAID Level" |awk -F: '{print $2}' | sed -e"s/^[ ]*//" | grep -c "Primary-1, Secondary-0, RAID LevelQualifier-0"`
R0=`/usr/sbin/MegaCli-cfgdsply -aALL | grep "RAID Level" |awk -F: '{print $2}' | sed -e"s/^[ ]*//" | grep -c "Primary-0, Secondary-0, RAID LevelQualifier-0"`
R5=`/usr/sbin/MegaCli-cfgdsply -aALL | grep "RAID Level" |awk -F: '{print $2}' | sed -e"s/^[ ]*//" | grep -c "Primary-5, Secondary-0, RAID LevelQualifier-3"`
R10=`/usr/sbin/MegaCli-cfgdsply -aALL | grep "RAID Level" |awk -F: '{print $2}' | sed -e"s/^[ ]*//" | grep -c "Primary-1, Secondary-3, RAID LevelQualifier-0"`
if [ $R1-ge 2 ];then
OUTPUT+="RAID10 "
elif [ $R1-eq 1 ];then
OUTPUT+="RAID1 "
fi
if [ $R0-ne 0 ];then
OUTPUT+="RAID0 "
fi
if [ $R5-ne 0 ];then
OUTPUT+="RAID5 "
fi
if [ $R10-ne 0 ];then
OUTPUT+="RAID10 "
fi
#以上的if是根据资料和实际情况做了微调
#raid下面总的磁盘数
DiskNum=`/usr/sbin/MegaCli-cfgdsply -aALL | grep -c "Non Coerced Size"`
OUTPUT+="TotalDisk:$DiskNum"

#处于raid中的正常的盘数
OnlineDisk=`/usr/sbin/MegaCli-cfgdsply -aALL | grep "Online" | wc -l`
OUTPUT+="online: $OnlineDisk"
if [$DiskNum -ne $OnlineDisk ];then
echo "CRITICAL:$OUTPUT"
exit 2
fi

#是否有坏的盘
FailDisk=`/usr/sbin/MegaCli-AdpAllInfo -aALL | grep "Failed Disks" | awk '{print $4}'`
if [$FailDisk -eq 0 ];then
OUTPUT+=" failed disk:0 "
else
OUTPUT+=" failed disk:$FailDisk"
echo "CRITICAL: $OUTPUT"
exit 2
fi

#预警的盘以及位置
CriticalDisk=`/usr/sbin/MegaCli-AdpAllInfo -aALL | grep "Critical Disks" | awk '{print $4}'`
if [$CriticalDisk -eq 0 ];then
OUTPUT+="critiDisk is 0"
else
CriDisk=`/usr/sbin/MegaCli -cfgdsply -aALL| grep -E 'Predictive|Slot' | awk \
'{if(NR%3){printf$0":"}else{print $0}}'|awk -F':' '{if($4!=0){print $2+1}}'`
OUTPUT+=" critidisk in $CriDiskslot"
echo "WARNING: $OUTPUT"
exit 1
fi

#MediaErrcount检测坏块和哪块盘
MediaErrcount=`/usr/sbin/MegaCli-pdlist -aALL | grep -E "Media Error" |awk -F’:’ -v errcount=0 \
'{errcount+=$2}END{printerrcount}'`
OtherErrcount=`/usr/sbin/MegaCli-pdlist -aALL | grep -E "Other Error" |awk -F’:’ -v errcount=0 \
'{errcount+=$2}END{printerrcount}'`
#坏盘的位置
if [ $MediaErrcount-ne 0 -o $OtherErrcount -ne 0 ];then
mDoD=`/usr/sbin/MegaCli -pdlist -aALL |grep -E "Media Error|Other Error|Slot" | awk \
'{if(NR%3){printf$0":"}else{print $0}}' | awk -F':' '{if($4!=0||$6!=0){print $2+1}}'`
OUTPUT+=" bad block in $mDoD"
echo "CRITICAL: $OUTPUT"
exit 2
else
OUTPUT+=" mediaerr:0 othererr:0"
fi

#raid状态是否正常
raidstate=`/usr/sbin/MegaCli-LDInfo -Lall -aAll | grep 'State' |awk -F':' '{print $2}' | \
sort |uniq | sed -e "s/^[ ]*//" | awk '{if($0 != "Optimal"){print"bad"}}'`
if ["$raidstate" != "bad" ];then
OUTPUT+=" raidstate:ok"
else
OUTPUT+=" raidstate:bad"
echo "CRITICAL: $OUTPUT"
exit 2
fi
rm -rf./MegaSAS.log
echo$OUTPUT

检测结果如下：

1	RAID5 Total Disk: 4 online: 4 failed disk:0 critidisk is 0 mediaerr:0 othererr:0 raidstate:ok

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

RH199 RHCSA Rapid Track

Red Hat RHCE 8 (EX294) Cert Guide

nagios监控raid下磁盘和raid状态脚本实现

相关帖子

浏览过的版块

扫码加入运维网微信交流群