Linux下服务器做了硬件raid之后,磁盘的状态比较难定位,windows则可以通过MegaRAID来检测,此脚本通过MegaCli来达到定位raid下哪块磁盘是坏块的功能,在nagios上面可以实现通过定期通过检测以邮箱或者短信等形式,来达到预警的功能,脚本在几台物理机上面测试过,是没问题的,分享给各位,也希望大家能相互讨论,学习。

一、安装Megacli:

rpm-ivhmegacli-8.00.46-2.x86_64.rpm

二、添加脚本到nagios监控:

执行visudo,然后在文件中rootALL=(ALL) ALL下面加入如下一行:

nagiosALL=(ALL)NOPASSWD:/usr/local/nagios/libexec/check_raid.sh

并注释以下一行

#Defaultsrequiretty

把脚本放在/usr/local/nagios/libexec目录下,chmod +x check_raid.sh ,赋予x权限,并编辑/usr/local/nagios/etc/nrpe.cfg加入

command[check_raid]=/usr/bin/sudo/usr/local/nagios/libexec/check_raid.sh

重启nrpe(根据安装方式的不同,可能有差异)

#pkillnrpe#/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d

三、监控脚本说明:

#!/bin/sh#Program:#formonitorraiddiskstate#history:#------Firstrelease#检测是否是LSI卡rcexist=`dmesg|grepRAID|grepLSI`if[!-n"$rcexist"];thenecho"notLSIornoraid"exit2fiOUTPUT=''#判断raid类型R1=`/usr/sbin/MegaCli-cfgdsply-aALL|grep"RAIDLevel"|awk-F:'{print$2}'|sed-e"s/^[]*//"|grep-c"Primary-1,Secondary-0,RAIDLevelQualifier-0"`R0=`/usr/sbin/MegaCli-cfgdsply-aALL|grep"RAIDLevel"|awk-F:'{print$2}'|sed-e"s/^[]*//"|grep-c"Primary-0,Secondary-0,RAIDLevelQualifier-0"`R5=`/usr/sbin/MegaCli-cfgdsply-aALL|grep"RAIDLevel"|awk-F:'{print$2}'|sed-e"s/^[]*//"|grep-c"Primary-5,Secondary-0,RAIDLevelQualifier-3"`R10=`/usr/sbin/MegaCli-cfgdsply-aALL|grep"RAIDLevel"|awk-F:'{print$2}'|sed-e"s/^[]*//"|grep-c"Primary-1,Secondary-3,RAIDLevelQualifier-0"`if[$R1-ge2];thenOUTPUT+="RAID10"elif[$R1-eq1];thenOUTPUT+="RAID1"fiif[$R0-ne0];thenOUTPUT+="RAID0"fiif[$R5-ne0];thenOUTPUT+="RAID5"fiif[$R10-ne0];thenOUTPUT+="RAID10"fi#以上的if是根据资料和实际情况做了微调#raid下面总的磁盘数DiskNum=`/usr/sbin/MegaCli-cfgdsply-aALL|grep-c"NonCoercedSize"`OUTPUT+="TotalDisk:$DiskNum"#处于raid中的正常的盘数OnlineDisk=`/usr/sbin/MegaCli-cfgdsply-aALL|grep"Online"|wc-l`OUTPUT+="online:$OnlineDisk"if[$DiskNum-ne$OnlineDisk];thenecho"CRITICAL:$OUTPUT"exit2fi#是否有坏的盘FailDisk=`/usr/sbin/MegaCli-AdpAllInfo-aALL|grep"FailedDisks"|awk'{print$4}'`if[$FailDisk-eq0];thenOUTPUT+="faileddisk:0"elseOUTPUT+="faileddisk:$FailDisk"echo"CRITICAL:$OUTPUT"exit2fi#预警的盘以及位置CriticalDisk=`/usr/sbin/MegaCli-AdpAllInfo-aALL|grep"CriticalDisks"|awk'{print$4}'`if[$CriticalDisk-eq0];thenOUTPUT+="critiDiskis0"elseCriDisk=`/usr/sbin/MegaCli-cfgdsply-aALL|grep-E'Predictive|Slot'|awk\'{if(NR%3){printf$0":"}else{print$0}}'|awk-F':''{if($4!=0){print$2+1}}'`OUTPUT+="critidiskin$CriDiskslot"echo"WARNING:$OUTPUT"exit1fi#MediaErrcount检测坏块和哪块盘MediaErrcount=`/usr/sbin/MegaCli-pdlist-aALL|grep-E"MediaError"|awk-F’:’-verrcount=0\'{errcount+=$2}END{printerrcount}'`OtherErrcount=`/usr/sbin/MegaCli-pdlist-aALL|grep-E"OtherError"|awk-F’:’-verrcount=0\'{errcount+=$2}END{printerrcount}'`#坏盘的位置if[$MediaErrcount-ne0-o$OtherErrcount-ne0];thenmDoD=`/usr/sbin/MegaCli-pdlist-aALL|grep-E"MediaError|OtherError|Slot"|awk\'{if(NR%3){printf$0":"}else{print$0}}'|awk-F':''{if($4!=0||$6!=0){print$2+1}}'`OUTPUT+="badblockin$mDoD"echo"CRITICAL:$OUTPUT"exit2elseOUTPUT+="mediaerr:0othererr:0"fi#raid状态是否正常raidstate=`/usr/sbin/MegaCli-LDInfo-Lall-aAll|grep'State'|awk-F':''{print$2}'|\sort|uniq|sed-e"s/^[]*//"|awk'{if($0!="Optimal"){print"bad"}}'`if["$raidstate"!="bad"];thenOUTPUT+="raidstate:ok"elseOUTPUT+="raidstate:bad"echo"CRITICAL:$OUTPUT"exit2firm-rf./MegaSAS.logecho$OUTPUT

检测结果如下:

RAID5TotalDisk:4online:4faileddisk:0critidiskis0mediaerr:0othererr:0raidstate:ok