Nagios 安装设置
公司服务器越来越多了,本来用一个脚本去检测了,现在改用Nagios
ubuntu 客户端安装脚本
#!/bin/bashtmp_dir=/tmp/nagiosnagios_ser="192.168.1.3"groupadd nagios useradd -g nagios -s /sbin/nologin nagiosif [ ! -d $tmp_dir ]; then mkdir $tmp_dirficd $tmp_dirwget http://downloads.sourceforge.net/project/nagios/nrpe-2.x/nrpe-2.15/nrpe-2.15.tar.gzwget http://nagios-plugins.org/download/nagios-plugins-2.0.1.tar.gz#---- installfor i in `ls -1` do tar xf $idoneapt-get -y --force-yes install openssl ruby1.9.1 build-essentialapt-get -y --force-yes install libssl-dev lm-sensorstar xvf nagios-plugins-2.0.1.tar.gzcd nagios-plugins-2.0.1./configure --with-nagios-user=nagios --with-nagios-group=nagiosmake make installcd ../tar xvf nrpe-2.15.tar.gzcd ./nrpe-2.15./configure --with-ssl-lib=/usr/lib/x86_64-linux-gnumake all make install-plugin make install-daemon make install-daemon-config#mv ./check_* /usr/local/nagios/libexec#chmod 755 -R /usr/local/nagios/libexecchown -R nagios:nagios /usr/local/nagios/cat >/usr/local/nagios/etc/nrpe.cfg<<EOFlog_facility=daemonpid_file=/var/run/nrpe.pidserver_port=5666nrpe_user=nagiosnrpe_group=nagiosallowed_hosts=127.0.0.1,$nagios_ser dont_blame_nrpe=0allow_bash_command_substitution=0debug=0command_timeout=60connection_timeout=300command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Zcommand[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200command[check_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200command[check_alldisk]=/usr/local/nagios/libexec/check_alldisk -w 90 -c 95command[check_http]=/usr/local/nagios/libexec/check_http -H 127.0.0.1 -w 5 -c 10 command[check_ping]=/usr/local/nagios/libexec/check_ping -H 127.0.0.1 -w 3000.0,80% -c 5000.0,100% -p 5 command[check_ssh]=/usr/local/nagios/libexec/check_ssh -4 127.0.0.1 command[check_swap]=/usr/local/nagios/libexec/check_swap -w 30% -c 10%command[check_sensors]=/usr/local/nagios/libexec/check_sensorscommand[check_mdadm]=/usr/local/nagios/libexec/check_mdadmcommand[check_smart]=/usr/local/nagios/libexec/check_smartcommand[check_drbd]=/usr/local/nagios/libexec/check_drbdEOFecho "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d" >> /etc/rc.local/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -drm -rf $tmp_dir
自己折腾的ruby脚本,
1:check_smart 磁盘状态检测
#!/usr/bin/env ruby#0 ok; 1 warning; 2 critical; 3 unknown#echo "nagios ALL=NOPASSWD:/usr/sbin/smartctl" >>/etc/sudoers#CentOS sed -i "s:Defaults requiretty:Defaults:nagios !requiretty:" /etc/sudoers#调用 check_nrpe!check_smarthealth = ""`ls -1 /dev/sd[a-z]* | grep [a-z]$`.split.each do |hdd| status = `sudo /usr/sbin/smartctl -H #{hdd} | grep result | awk -F: '{print $2}'` if status.match(/PASSED/) health = health + hdd + " OK\n" else health = health + hdd + " Fail\n" endendif health.include? "Fail" puts health exit 2endputs healthexit 0
2:check_mdadm 软阵列检测
#!/usr/bin/env ruby#0 ok; 1 warning; 2 critical; 3 unknownstatus = `cat /proc/mdstat`if status.scan('U').size == status.scan('md').size * 2 puts "Soft Raid OK" exit 0else puts "Soft Raid Fail" exit 2end
3:check_drbd DRBD检测
#!/usr/bin/ruby#0 ok; 1 warning; 2 critical; 3 unknownif `cat /proc/drbd`.scan("UpToDate").count == `ls -la /dev/ | grep ^b | grep drbd | wc -l`.to_i * 2 puts "DRBD OK" exit 0else puts "DRBD Critical" exit 2end
4:check_alldisk 检测磁盘空间
#!/usr/bin/env ruby#ARGV[1] min ,ARGV[3] max# -w 90 -c 95#0 ok; 1 warning; 2 critical; 3 unknownspace = ''status = `df -hl -x tmpfs -x devtmpfs | grep -v ^Filesystem`.splitif status.size < 6 #unkown puts "UNKOWN" exit 3end(status.size / 6).times do |x| current_use, min_use, max_use = status[4 + x * 6][0..-2].to_i, ARGV[1].to_i, ARGV[3].to_i if current_use > max_use #critical space = space + status[x * 6] + " " + status[4 + x * 6] + " " + status[5 + x * 6] +" Critical\n" elsif current_use > min_use and current_use <= max_use #warning space = space + status[x * 6] + " " + status[4 + x * 6] + " " + status[5 + x * 6] + " Warning\n" elsif current_use <= min_use #ok space = space + status[x * 6] + " " + status[4 + x * 6] + " " + status[5 + x * 6] + " OK\n" endendif space.include?("Crtitical") puts space exit 2elsif space.include?("Warning") puts space exit 1else puts space exit 0end
服务器安装参考
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。