Nagios监控搭建和配置(笔记)
废话不多说,本博文纯属于个人笔记,可能会出现杂乱无章的感觉,只是把遇到的问题一一的记录下来,方便日后查看,也能帮助遇到类型问题的还在纠结的人。
系统版本及信息
cat/etc/redhat-releaseCentOSrelease6.2(Final)uname-aLinux2.6.32-220.el6.x86_64x86_64x86_64x86_64GNU/Linuxifconfig|sed-n1,2peth0Linkencap:EthernetHWaddr40:F2:E9:29:5F:EAinetaddr:192.168.0.2Bcast:192.168.69.255Mask:255.255.255.0关闭Iptablesselinux
软件版本信息
LAMP/LNMP忽略,任何一个环境都可以,我这里是yum安装的LNMP环境nagios-4.0.5.tar.gznagios-plugins-1.4.16.tar.gznrpe-2.15.tar.gzpnp4nagios-0.6.19.tar.gz
安装Nagios软件准备工作
确保yum能正常使用,建议是配置网络yum,安装系统所需库文件yumgroupinstall"Compatibilitylibraries""Base""Developmenttools"安装lamp及所需包yum-yinstallhttp*php*mysql*perl*net-snmp*openssl*glibcrrdtoolrrdtool-develrrdtool-perlrrdtool-phpchkconfigmysqldonchkconfighttpdonchkconfigsnmpdonservicehttpdstartservicemysqldstartservicesnmpdstart测试ok继续下一步ps-ef|grep-vgrep|grephttpmysqlsnmp#分别查看,web页面访问测试
安装Nagios
1、创建nagios程序用户、组[root@nagios~]#useradd-s/sbin/nologinnagios[root@nagios~]#mkdir/usr/local/nagios[root@nagios~]#chown-Rnagios.nagios/usr/local/nagios/2、编译安装nagios[root@nagiostools]#tarzxfnagios-4.0.5.tar.gz[root@nagiostools]#cdnagios-4.0.5[root@nagiosnagios-4.0.5]#./configure--prefix=/usr/local/nagios[root@nagiosnagios-4.0.5]#makeall&&makeinstall&&makeinstall-init&&makeinstall-commandmode&&makeinstall-config&&makeinstall-webconf[root@nagiosnagios-4.0.5]#echo$?03、加入开机启动chkconfig--addnagioschkconfignagiosonchkconfig--listnagios
安装nagios-plugins 插件
[root@nagiostools]#tarzxfnagios-plugins-1.4.16.tar.gz[root@nagiostools]#cdnagios-plugins-1.4.16[root@nagiostoolsnagios-plugins-1.4.16]#./configure--prefix=/usr/local/nagios/[root@nagiostoolsnagios-plugins-1.4.16]#make[root@nagiostoolsnagios-plugins-1.4.16]#makeinstall[root@nagiostoolsnagios-plugins-1.4.16]#echo$?0
编辑http.conf配置文件
cd/etc/httpd/confcp-ahttpd.confhttpd.conf.bakvimhttpd.conf#添加在最后面即可#######settingfornagios#######ScriptAlias/nagios/cgi-bin"/usr/local/nagios/sbin"<Directory"/usr/local/nagios/sbin">AuthTypeBasicOptionsExecCGIAllowOverrideNoneOrderallow,denyAllowfromallAuthName"nagiosaccess"AuthUserFile/usr/local/nagios/etc/htpasswdRequirevalid-user</Directory>Alias/nagios"/usr/local/nagios/share"<Directory"/usr/local/nagios/share">AuthTypeBasicOptionsExecCGIAllowOverrideNoneOrderallow,denyAllowfromallAuthName"nagiosaccess"AuthUserFile/usr/local/nagios/etc/htpasswdRequirevalid-user</Directory>修改DirectoryIndexindex.htmlindex.html.var为DirectoryIndexindex.phpindex.htmlindex.html.var修改OptionsIndexesFollowSymLinks为OptionsFollowSymLinks#防止网站列目录servicehttpdrestart增加nagios登陆认证文件,一定要用默认的nagiosadmin作为用户,否则需要修改其他文件,修改之前备份,这里就不备份了[root@nagiosetc]#cd/usr/local/nagios/etc[root@nagiosetc]#sed-is@nagiosadmin@nagiosadmin\,admin@gcgi.cfg[root@nagiosetc]#sed-is@\#default_user_name=guest@default_user_name=admin@gcgi.cfg[root@nagiosnagios]#htpasswd-c/usr/local/nagios/etc/htpasswdadminNewpassword:******Re-typenewpassword:******
安装 Nrpe 插件
[root@nagiostools]#tarzxfnrpe-2.15.tar.gz[root@nagiostools]#cdnrpe-2.15[root@nagiosnrpe-2.15]#./configure;makeall;makeinstall-plugin;makeinstall-daemon;makeinstall-daemon-config启动Nrpe[root@nagiosnrpe-2.15]#/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d[root@nagiosnrpe-2.15]#netstat-antl|grep5666tcp000.0.0.0:56660.0.0.0:*LISTEN[root@nagioslibexec]#/usr/local/nagios/libexec/check_nrpe-H127.0.0.1NRPEv2.15关闭Nrpe[root@nagioslibexec]#ps-ef|grep-vgrep|grepnrpe[root@nagioslibexec]#kill-9进程号
检测nagios
[root@nagiosetc]#/usr/local/nagios/bin/nagios-v/usr/local/nagios/etc/nagios.cfgTotalWarnings:0TotalErrors:0表示OK
启动nagios
[root@nagiosetc]#servicenagiosstartstoprestart开启停止重启http://IP/nagios
安装 pnp4nagios 插件
[root@nagiostools]#tarzxfpnp4nagios-0.6.19.tar.gz[root@nagiostools]#cdpnp4nagios-0.6.19[root@nagiostoolspnp4nagios-0.6.19]#./configuremakeallmakeinstallmakeinstall-configmakeinstall-initmakeinstall-webconf创建默认配置文件cd/usr/local/pnp4nagios/etccpmisccommands.cfg-samplemisccommands.cfgcpnagios.cfg-samplenagios.cfgcprra.cfg-samplerra.cfgcdpagescpweb_traffic.cfg-sampleweb_traffic.cfgcd../check_commands/cpcheck_all_local_disks.cfg-samplecheck_all_local_disks.cfgcpcheck_nrpe.cfg-samplecheck_nrpe.cfgcpcheck_nwstat.cfg-samplecheck_nwstat.cfgcp/usr/local/pnp4nagios/libexec/*/usr/local/nagios/libexec/vim/usr/local/nagios/etc/nagios.cfg检查enable_environment_macros=1process_performance_data=1host_perfdata_command=process-host-perfdataservice_perfdata_command=process-service-perfdata提示:如果nagios版本是4.X,上面配置会导致后面,生成不了流量图,报如下错误PNP4NagiosVersion0.6.19Pleasecheckthedocumentationforinformationaboutthefollowingerror.perfdatadirectory"/usr/local/pnp4nagios/var/perfdata/localhost"forhost"localhost"doesnotexist.ReadFAQonlinefile[line]:application/models/data.php[148]:back
出现这个错误的原因是参照
解决方案是使用BulkMode方式vim/usr/local/nagios/etc/nagios.cfg检查enable_environment_macros=1process_performance_data=1添加到最后即可#serviceperformancedataservice_perfdata_file=/usr/local/pnp4nagios/var/service-perfdataservice_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$service_perfdata_file_mode=aservice_perfdata_file_processing_interval=15service_perfdata_file_processing_command=process-service-perfdata-file#hostperformancedatastartingwithNagios3.0host_perfdata_file=/usr/local/pnp4nagios/var/host-perfdatahost_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$host_perfdata_file_mode=ahost_perfdata_file_processing_interval=15host_perfdata_file_processing_command=process-host-perfdata-file保存vim/usr/local/nagios/etc/objects/commands.cfgdefinecommand{command_namecheck_nrpecommand_line$USER1$/check_nrpe-H$HOSTADDRESS$-c$ARG1$}#这一段放在上面即可如下:同步模式设定方法添加到末尾就可以,记住在这个配置文件里面,默认有这个配置,需要找到注释掉,然后将下面的配置添加,如果不注释掉,在你检查nagios的配置文件的时候会报错definecommand{command_nameprocess-service-perfdata-filecommand_line/usr/local/pnp4nagios/libexec/process_perfdata.pl--bulk=/usr/local/pnp4nagios/var/service-perfdata}definecommand{command_nameprocess-host-perfdata-filecommand_line/usr/local/pnp4nagios/libexec/process_perfdata.pl--bulk=/usr/local/pnp4nagios/var/host-perfdata}定义pnp的主机和服务两个模版添加在最后面vim/usr/local/nagios/etc/objects/templates.cfgdefinehost{namehost-pnpaction_url/pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_register0}defineservice{nameservice-pnpaction_url/pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$register0}也可以添加在,其他参数下面省略了,下面这个方法可以减少很多配置主机启用pnp时的时间vim/usr/local/nagios/etc/objects/templates.cfgdefinehost{action_url/pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=_HOST_}defineservice{action_url/pnp4nagios/index.php/graph?host=$HOSTNAME$&srv=$SERVICEDESC$}
先做一下pnp4nagios环境测试添加在httpd.conf最后面
vim/etc/httpd/conf/httpd.confAlias/pnp4nagios"/usr/local/pnp4nagios/share"<Directory"/usr/local/pnp4nagios/share">AllowOverrideNoneOrderallow,denyAllowfromallAuthName"NagiosAccess"AuthTypeBasicAuthUserFile/usr/local/nagios/etc/htpasswdRequirevalid-user</Directory><IFModulemode_rewrite.c>RewirteEngineOnOptionsFollowSymLinksRewirteBase/pnp4nagiosRewirteRule^(application|modules|system)-[F,L]RewirteCond%{REQUEST_FILENAME}!-fRewirteCond%{REQUEST_FILENAME}!-dRewirteRule.*index.php/$0[PT,L]</IfModule>servicehttpdrestart
访问 http://IP/pnp4nagios
cd /usr/local/pnp4nagios/share/
mv install.php install.php.bak
编辑nagios.cfg文件
vim/usr/local/nagios/etc/nagios.cfgcfg_file=/usr/local/nagios/etc/objects/commands.cfgcfg_file=/usr/local/nagios/etc/objects/contacts.cfgcfg_file=/usr/local/nagios/etc/objects/timeperiods.cfgcfg_file=/usr/local/nagios/etc/objects/templates.cfgcfg_file=/usr/local/nagios/etc/objects/localhost.cfgcfg_file=/usr/local/nagios/etc/objects/hosts.cfgcfg_file=/usr/local/nagios/etc/objects/hostgroup.cfgcfg_file=/usr/local/nagios/etc/objects/services.cfg或者cfg_file=/usr/local/nagios/etc/objects/commands.cfgcfg_file=/usr/local/nagios/etc/objects/contacts.cfgcfg_file=/usr/local/nagios/etc/objects/timeperiods.cfgcfg_file=/usr/local/nagios/etc/objects/templates.cfgcfg_file=/usr/local/nagios/etc/objects/localhost.cfgcfg_dir=/usr/local/nagios/etc/objects/apps提示:此操作只是启用了linux主机监控,没有启用windows和switch,如果需要把注释去掉即可,第一种和第二种都可以区别是:第一种共同使用一个配置文件,第二种独立使用配置文件,这里我都会演示,下面以第一种和第二种进行区分
添加主机配置,第一种方法
默认nagios/etc/objects/下面没有service.cfghost.cfghostgroup.cfg这几个配置文件,需要手动添加vimhosts.cfgdefinehost{uselinux-server,host-pnp#这个是根据templates.cfg信息定义,如果上面定义的模板host-pnp添加在definehost和definesevice里面,这儿host-pnp可以不用加,因为linux-server已经包含了host_namecacti#必须是被监控的主机名aliascacti-web#别名随便定义address192.168.0.3#主机ip地址contact_groupsadmins#邮件组,下面会演示}definehost{uselinux-server,host-pnphost_namenginxaliasnginx-webaddress192.168.0.4contact_groupsadmins}有多少机器就这样添加多少台vimhostgroup.cfgdefinehostgroup{hostgroup_nameservers#组名aliasservers_group#别名memberscacti,nginx#主机名多个逗号隔开}vimservice.cfg#所有主机在同一配置文件,很乱####setcactihostdefineservice{uselocal-service,services-pnphost_namecactiservice_descriptionhttpcheck_commandcheck_httpcontact_groupsadminsflap_detection_enabled0}defineservice{uselocal-service,services-pnphost_namecactiservice_descriptionSSH_portcheck_commandcheck_tcp!22contact_groupsadminsflap_detection_enabled0}defineservice{uselocal-service,services-pnphost_namecactiservice_descriptioncheck_/check_commandcheck_nrpe!check_/#使用nrpe检测,客户端需要定义contact_groupsadminsflap_detection_enabled0}####setnginxhostdefineservice{uselocal-service,service-pnphost_namenginxservice_descriptionCheck_free_memcheck_commandcheck_nrpe!check_free_memcontact_groupsadminsflap_detection_enabled0}defineservice{uselocal-service,services-pnphost_namenginxservice_descriptioncheck_/check_commandcheck_nrpe!check_/#使用nrpe检测,客户端需要定义contact_groupsadminsflap_detection_enabled0}有多少就需要添加多少,第一种方法end
添加主机配置,第二种方法
cdnagios/etc/objects/mkdirappcdappvim192.168.0.2.cfg#在一个独立的文件定义所有监控对象,这个没有定义组,意义不大###定义hostdefinehost{uselinux-server,host-pnp#这个是根据templates.cfg信息定义,如果上面定义的模板host-pnp添加在definehost和definesevice里面,这儿host-pnp可以不用加,因为linux-server已经包含了host_namenginx#必须是被监控的主机名aliasnginx-web#别名随便定义address192.168.0.4#主机ip地址contact_groupsadmins#邮件组,下面会演示}###定义servicedefineservice{uselocal-service,service-pnphost_namenginxservice_descriptionCheck_free_memcheck_commandcheck_nrpe!check_free_memcontact_groupsadminsflap_detection_enabled0}defineservice{uselocal-service,services-pnphost_namenginxservice_descriptioncheck_/check_commandcheck_nrpe!check_/#使用nrpe检测,客户端需要定义contact_groupsadminsflap_detection_enabled0}
vim192.168.0.3.cfg###定义hostdefinehost{uselinux-server,host-pnp#这个是根据templates.cfg信息定义,如果上面定义的模板host-pnp添加在definehost和definesevice里面,这儿host-pnp可以不用加,因为linux-server已经包含了host_namecacti#必须是被监控的主机名aliascacti-web#别名随便定义address192.168.0.3#主机ip地址contact_groupsadmins#邮件组,下面会演示}###定义servicedefineservice{uselocal-service,service-pnphost_namecactiservice_descriptionCheck_free_memcheck_commandcheck_nrpe!check_free_memcontact_groupsadminsflap_detection_enabled0}defineservice{uselocal-service,service-pnphost_namecactiservice_descriptionCheck_free_memcheck_commandcheck_nrpe!check_free_memcontact_groupsadminsflap_detection_enabled0}这种办法比第一种方便许多,添加主机2种方法END
nagios邮件报警设置
[root@nagiosobjects]#vimcontacts.cfg#参数详解,请百度definecontact{contact_namenagiosadminusegeneric-contactaliasNagiosAdminservice_notification_period24x7host_notification_period24x7service_notification_optionsw,u,c,rhost_notification_optionsd,u,rservice_notification_commandsnotify-service-by-emailhost_notification_commandsnotify-host-by-emailemailxxxx@163.com}definecontactgroup{contactgroup_nameadmins#这个就是上面那个adminsaliasNagiosAdministratorsmembersnagiosadmin}
检查配置文件是否有错
/usr/local/nagios/bin/nagios-v/usr/local/nagios/etc/nagios.cfgTotalWarnings:0TotalErrors:0Thingslookokay-Noseriousproblemsweredetectedduringthepre-flightcheckservicenagiosrestart服务端配置end
客户端安装配置
需要安装net-snmp,如果有其他错误根据提示进行解决yum-yinstallnet-snmp*1、创建nagios程序用户、组[root@nagios~]#useradd-s/sbin/nologinnagios[root@nagios~]#mkdir/usr/local/nagios[root@nagios~]#chown-Rnagios.nagios/usr/local/nagios/2、安装nagios-plugins插件[root@nagiostools]#tarzxfnagios-plugins-1.4.16.tar.gz[root@nagiostools]#cdnagios-plugins-1.4.16[root@nagiostoolsnagios-plugins-1.4.16]#./configure--prefix=/usr/local/nagios/[root@nagiostoolsnagios-plugins-1.4.16]#make[root@nagiostoolsnagios-plugins-1.4.16]#makeinstall[root@nagiostoolsnagios-plugins-1.4.16]#echo$?03、安装Nrpe插件[root@nagiostools]#tarzxfnrpe-2.15.tar.gz[root@nagiostools]#cdnrpe-2.15[root@nagiosnrpe-2.15]#./configure;makeall;makeinstall-plugin;makeinstall-daemon;makeinstall-daemon-config编辑nrpe.cfgsed-I's/allowed_hosts=127.0.0.1/allowed_hosts=127.0.0.1,192.168.0.2/g'/usr/local/nagios/etc/nrpe.cfgvim/usr/local/nagios/etc/nrpe.cfgcommand[check_swap]=/usr/local/nagios/libexec/check_swap-w20%-c10%command[check_data]=/usr/local/nagios/libexec/check_disk-w20%-c10%-p/datacommand[check_/]=/usr/local/nagios/libexec/check_disk-w20%-c10%-p/command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs-w5-c10-sZcommand[check_total_procs]=/usr/local/nagios/libexec/check_procs-w150-c200保存echo"/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d">>/etc/rc.local启动Nrpe[root@nagiosnrpe-2.15]#/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d[root@nagiosnrpe-2.15]#netstat-antl|grep5666tcp000.0.0.0:56660.0.0.0:*LISTEN这个在服务端操作,确保ok,如果不能请检查客户端防火墙和网络是否允许通信[root@nagioslibexec]#/usr/local/nagios/libexec/check_nrpe-H192.168.0.3NRPEv2.15关闭Nrpe[root@nagioslibexec]#ps-ef|grep-vgrep|grepnrpe[root@nagioslibexec]#kill-9进程号
pnp不出图时候,查看日志
vim/usr/local/pnp4nagios/etc/process_perfdata.cfg
修改
LOG_LEVEL = 0
为
LOG_LEVEL = 2
more/usr/local/pnp4nagios/var/perfdata.log
提示:nagios 监控进程时候,即便pnp配置ok,也不会出图,例如下面的
Total ProcessesOK10-20-2014 16:44:4583d 1h 9m 16s1/3PROCS OK: 503 processeszombie_procsOK10-20-2014 16:46:0083d 1h 7m 58s1/3PROCS OK: 0 processes with STATE = ZPNP4Nagios Version 0.6.19
Please check the documentation for information about the following error.XML file "/usr/local/pnp4nagios/var/perfdata/app-11/Total_Processes.xml" not found.Read FAQ online
file [line]:application/models/data.php[312]:
back
至于原因可以参考,非常详细
http://storysky.blog.51cto.com/628458/583787/
Nagios如果系统监控插件满足不了需求,可以自行开发插件
例如下面是一个内存监控插件,插件是百度找的还是不错的,我这里借用一下
vim/usr/local/nagios/libexec/check_mem#!/bin/bashSTAT_OK=0STAT_WARNING=1STAT_CRITICAL=2STAT_UNKNOWN=3total_mem=`free-m|awk'NR==2{print$2}'`used_mem=`free-m|awk'NR==3{print$3}'`#取的是系统真正用掉的内存free_mem=`free-m|awk'NR==3{print$4}'`#取的是free+cache的内存use_per=`echo"scale=2;$used_mem/$total_mem"|bc|sed's/^.//g'`help(){echo"USAGE:`basename$0`[-w]<usedpercent>[-c]<usedpercent>[-h]"exit-1}whilegetopts":w:c:h"optdocase$optinw)warning=$OPTARG;;c)critical=$OPTARG;;h)help;;?)unkown=$OPTARGecho"error,plasecheckforhelp,USAGE:./`basename$0`-h"exit$STAT_UNKNOWN;;esacdoneif[[$use_per-lt$warning]];thenecho"OK-total:$total_memMB,used:$used_memMB,free:$free_memMB|total_mem=$total_memused_mem=$used_memfree_mem=$free_mem"exit$STAT_OKelif[[$use_per-ge$warning]]&&[[$use_per-lt$critical]];thenecho"WARNING-total:$total_memMB,used:$used_memMB,free:$free_memMB|total_mem=$total_memused_mem=$used_memfree_mem=$free_mem"exit$STAT_WARNINGelseecho"CRITICAL-total:$total_memMB,used:$used_memMB,free:$free_memMB|total_mem=$total_memused_mem=$used_memfree_mem=$free_mem"exit$STAT_CRITICALfifi保存chownnagios.nagioscheck_memchmod+xcheck_mem./check_mem-w80-c90OK-total:15926MB,used:1839MB,free:14086MB|total_mem=15926used_mem=1839free_mem=14086vim/usr/local/nagios/etc/nrpe.cfg添加command[check_free_mem]=/usr/local/nagios/libexec/check_mem-w80-c90重启nrpe在编辑/usr/local/nagios/etc/objects/app/的文件添加defineservice{uselocal-service,service-pnphost_namecactiservice_descriptionCheck_free_memcheck_commandcheck_nrpe!check_free_memcontact_groupsadminsflap_detection_enabled0}检查nagios重启nagios
Windows和交换机监控配置不难,只要思路清晰,肯定能弄出来,nagios配置其实不难,就是有点麻烦而已,只要把配置文件的关系弄明白,一切都很简单
到此全部结束
声明:本站所有文章资源内容,如无特殊说明或标注,均为采集网络资源。如若本站内容侵犯了原著者的合法权益,可联系本站删除。