参照netseek的pdf,centos6 64bit

nagios安装步骤1在做安装之前确认要对该机器拥有root权限。确认你安装好的linux系统上已经安装如下软件包再继续。ApacheGCC编译器GD库与开发库yum-yinstallhttpdgccglibcglibc-commongdgd-devel2建立nagios账号/usr/sbin/useraddnagios&&passwdnagios创建一个用户组名为nagcmd用于从Web接口执行外部命令用户都加到这个组中/usr/sbin/groupaddnagcmd/usr/sbin/usermod‐Gnagcmdnagios/usr/sbin/usermod‐Gnagcmdapache3下载nagios和插件程序包下载Nagios和Nagios插件的软件包(访问http://www.nagios.org/download/站点以获得最新版本)cd/usr/local/srcwgethttp://nchc.dl.sourceforge.net/sourceforge/nagios/nagios-3.0.6.tar.gzwgethttp://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gz4编译与安装nagioscd/usr/local/srctarzxvfnagios-3.0.6.tar.gzcdnagios-3.0.6./configure--with-command-group=nagcmd--prefix=/usr/local/nagiosmakeallmakeinstallmakeinstall-initmakeinstall-configmakeinstall-commandmode验证程序是否被正确安装。切换目录到安装路径(这里是/usr/local/nagios),看是否存在etc、bin、sbin、share、var这五个目录,如果存在则可以表明程序被正确的安装到系统了。后表是五个目录功能的简要说明:5编译并安装nagios插件nagios-pluginscd/usr/local/srctarzxvfnagios-plugins-1.4.13.tar.gzcdnagios-plugins-1.4.13./configure--with-nagios-user=nagios--with-nagios-group=nagios--prefix=/usr/local/nagiosmake&&makeinstall验证:ls/usr/local/nagios/libexec会显示安装的插件文件,即所有的插件都安装在libexec这个目录下6配置WEB接口方法一:直接在安装nagios时makeinstall‐webconf创建一个nagiosadmin的用户用于Nagios的WEB接口登录。记下你所设置的登录口令,一会儿你会用到它。htpasswd‐c/usr/local/nagios/etc/htpasswd.usersnagiosadmin重启Apache服务以使设置生效。servicehttpdrestart方法二:在httpd.conf最后添加如下内容:#fornagiosScriptAlias/nagios/cgi-bin/usr/local/nagios/sbin<Directory"/usr/local/nagios/sbin">OptionsExecCGIAllowOverrideNoneOrderallow,denyAllowfromallAuthName"NagiosAccess"AuthTypeBasicAuthUserFile/usr/local/nagios/etc/htpasswdRequirevalid-user</Directory>Alias/nagios/usr/local/nagios/share<Directory"/usr/local/nagios/share">OptionsNoneAllowOverrideNoneOrderallow,denyAllowfromallAuthName"NagiosAccess"AuthTypeBasicAuthUserFile/usr/local/nagios/etc/htpasswdRequirevalid-user</Directory>htpasswd‐c/usr/local/nagios/etc/htpasswdtestNewpassword:(输入123456)Re‐typenewpassword:(再输入一次密码)Addingpasswordforusertest查看认证文件的内容less/usr/local/nagios/etc/htpasswdtest:OmWGEsBnoGpIc前半部分是用户名test,后面是加密后的密码本例添加的是test用户名,需要改cgi.cfg配置文件,允许test用户vi/usr/local/nagios/etc/cgi.cfgauthorized_for_system_information=testauthorized_for_configuration_information=testauthorized_for_system_commands=testauthorized_for_all_services=testauthorized_for_all_hosts=nagiosadmin,testauthorized_for_all_service_commands=testauthorized_for_all_host_commands=test7启动nagios把Nagios加入到服务列表中以使之在系统启动时自动启动chkconfig‐‐addnagioschkconfignagioson验证Nagios的样例配置文件/usr/local/nagios/bin/nagios‐v/usr/local/nagios/etc/nagios.cfg有可能Nagios3.0.6Copyright(c)1999-2008EthanGalstad(http://www.nagios.org)LastModified:12-01-2008License:GPLError:Cannotopenmainconfigurationfile'/usr/local/‐'forreading!然后赋予权限也不行直接重启nagios服务启动即可Nagios3.0.6starting...(PID=2821)LocaltimeisThuFeb1614:24:25CST2012Bailingoutduetooneormoreerrorsencounteredintheconfigurationfiles.RunNagiosfromthecommandlinewiththe-voptiontoverifyyourconfigbeforerestarting.(PID=2821)如果没有报错,可以启动Nagios服务servicenagiosstartservicehttpdstart8setenforce0(执行这个命令就可了)令SELinux处于容许模式setenforce0如果要永久性更变它,需要更改/etc/selinux/config里的设置并重启系统。不关闭SELinux或是永久性变更它的方法是让CGI模块在SELinux下指定强制目标模式:chcon‐R‐thttpd_sys_content_t/usr/local/nagios/sbin/chcon‐R‐thttpd_sys_content_t/usr/local/nagios/share/9测试登录http://localhost/nagios/输入用户名test和密码123456就可以正常登录了十如何配置监控远程主机1在被监控主机上增加用户useraddnagios设置密码passwdnagios安装nagios插件wgethttp://nchc.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.13.tar.gztarzxvfnagios-plugins-1.4.13.tar.gzcdnagios-plugins-1.4.13./configuremakemakeinstallchownnagios.nagios/usr/local/nagios/chown-Rnagios.nagios/usr/local/nagios/libexec/2nagios安装nrpe的时候步骤(监控与被监控都要安装)tar-zxvfnrpe-2.8.1.tar.gzcdnrpe-2.8.1./configuremakeallmakeinstall-pluginmakeinstall-daemonmakeinstall-daemon-config3vim/usr/local/nagios/etc/nrpe.cfg#allowed_hosts=127.0.0.1allowed_hosts=127.0.0.1,192.168.1.130(192.168.1.130监控端的地址)改/etc/hosts.allow增加监控机ipecho'nrpe:192.168.1.130'>>/etc/hosts.allow4启动服务/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d测试nrpe服务是否正常/usr/local/nagios/libexec/check_nrpe-H127.0.0.1(用127.0.0.1测试不要用localhost测试)NRPEv2.8.15在监控端(192.168.1.130)测试看到如下结果说明成功/etc/init.d/iptablesstop(或者添加允许从被监控端收集信息)/usr/local/nagios/libexec/check_nrpe-H192.168.1.129NRPEv2.8.1然后在监控端1vim/usr/local/nagios/etc/objects/129.cfg内容如下definehost{uselinux-serverhost_name129alias129address192.168.1.129}defineservice{usegeneric-servicehost_name129service_descriptionloadcheck_commandcheck_nrpe!check_load#使用自定参数#check_commandcheck_nrpe!check_load!6.0,5.0,4.0!15.0,8.0,6.0}vim/usr/local/nagios/etc/nagios.cfg添加如下内容#Definitionsformonitoring192.168.1.129cfg_file=/usr/local/nagios/etc/objects/129.cfgvim/usr/local/nagios/etc/objects/commands.cfg#'check_nrpe'commanddefinitiondefinecommand{command_namecheck_nrpecommand_line$USER1$/check_nrpe-H$HOSTADDRESS$-c$ARG1$}监控机nagios重启servicenagiosreload输入http://192.168.1.130/nagios就可看到129已经添加成功nagios监控swap在被监控机的/usr/local/nagios/etc/nrpe.cfgvim/usr/local/nagios/etc/nrpe.cfg添加command[check_swap]=/usr/local/nagios/libexec/check_swap-w20%-c10%nrpe服务重启[root@localhostlibexec]#ps-ef|grepnrpenagios23321014:24?00:00:00/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-droot237328887014:25pts/000:00:00grepnrpekill-92332/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d监控端/usr/local/nagios/etc/objects/commands.cfg添加#check_swapcommanddefinitiondefinecommand{command_namecheck_swapcommand_line$USER1$/check_swap-w$ARG1$-c$ARG2$}在下面的文件中vim/usr/local/nagios/etc/objects/129.cfg添加defineservice{usegeneric-servicehost_name129service_descriptionswapcheck_commandcheck_nrpe!check_swap}重启nagios服务和http服务servicenagiosrestartservicehttpdrestartnagios监控磁盘在被监控机的/usr/local/nagios/etc/nrpe.cfgvim/usr/local/nagios/etc/nrpe.cfg添加command[check_disk]=/usr/local/nagios/libexec/check_disk-w20-c10-p/nrpe服务重启[root@localhostlibexec]#ps-ef|grepnrpenagios23321014:24?00:00:00/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-droot237328887014:25pts/000:00:00grepnrpekill-92332/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d监控端/usr/local/nagios/etc/objects/commands.cfg添加definecommand{command_namecheck_diskcommand_line$USER1$/check_disk-w$ARG1$-c$ARG2$-p$ARG3$}在下面的文件中vim/usr/local/nagios/etc/objects/129.cfg添加defineservice{usegeneric-servicehost_name129service_descriptiondiskcheck_commandcheck_nrpe!check_disk}重启nagios服务和http服务servicenagiosrestartservicehttpdrestartnagios监控内存监控内存脚本如下#######################################!/bin/bash#checkmemoryscriptTOTAL=`free-m|head-2|tail-1|gawk'{print$2}'`USED=`free-m|head-2|tail-1|gawk'{print$3}'`FREE=`free-m|head-2|tail-1|gawk'{print$4}'`#tocalculatefreepercent#usetheexpressionfree*100/totalFREETMP=`expr$FREE\*100`PERCENT=`expr$FREETMP/$TOTAL`echo"$TOTALMBTotalMemory"echo"$USEDMBUsedMemory"echo"$FREEMB($PERCENT%)FreeMemory"exit0######################################在被监控机的/usr/local/nagios/etc/nrpe.cfgvim/usr/local/nagios/etc/nrpe.cfg添加command[check_mem]=/usr/local/nagios/libexec/check_mem-w150-c200把监控脚本check_mnem放到/usr/local/nagios/libexec/并赋予执行权限chmod+x/usr/local/nagios/libexec/check_memchownnagios.nagios/usr/local/nagios/libexec/check_memnrpe服务重启[root@localhostlibexec]#ps-ef|grepnrpenagios23321014:24?00:00:00/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-droot237328887014:25pts/000:00:00grepnrpekill-92332/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d监控端/usr/local/nagios/etc/objects/commands.cfg添加definecommand{command_namecheck_memcommand_line$USER1$/check_mem-w$ARG1$-c$ARG2$}在下面的文件中vim/usr/local/nagios/etc/objects/129.cfg添加defineservice{usegeneric-servicehost_name129service_descriptionmemorycheck_commandcheck_nrpe!check_mem}重启nagios服务和http服务servicenagiosrestartservicehttpdrestartnagios监控http存活状态被监控机不需要任何操作(因为check_http不需要通过nrpe来监控)监控端/usr/local/nagios/etc/objects/commands.cfg已经存在check_http命令故也不需要操作在下面的文件中vim/usr/local/nagios/etc/objects/129.cfg添加defineservice{usegeneric-servicehost_name129service_descriptionhttpcheck_commandcheck_http(这一行要注意不是check_nrpe!check_http这种形式)}重启nagios服务和http服务servicenagiosrestartservicehttpdrestart错误解决方法因为http是采用yum安装的网站文件路径默认是/var/www/html执行下面命令检测时/usr/local/nagios/libexec/check_http-I192.168.1.129报错如下HTTPWARNING:HTTP/1.1403Forbidden原因这是因为/var/www/html下面没有文件所致cd/var/www/htmlecho123>index.html然后过一会nagios检测就ok了nagios监控mysql存活状态被监控机登录数据库授权mysql>grantallprivilegeson*.*to xxxxx@192.168.1.130identifiedby'123456';QueryOK,0rowsaffected(0.09sec)mysql>flushprivileges;QueryOK,0rowsaffected(0.08sec)监控端/usr/local/nagios/etc/objects/commands.cfg添加如下内容#check_mysqlcommanddefinitiondefinecommand{command_namecheck_mysqlcommand_line$USER1$/check_mysql-H$HOSTADDRESS$-P$ARG1$-u$ARG2$-p$ARG3$(liuyu那个pdf有问题)}在下面的文件中vim/usr/local/nagios/etc/objects/129.cfg添加defineservice{usegeneric-servicehost_name129service_descriptionmysqlcheck_commandcheck_mysql!192.168.1.129!3306!xxxx!123456(这一行liuyu文档上是对的这一行要注意不是check_nrpe!check_http这种形式)notifications_enabled0}重启nagios服务和http服务servicenagiosrestartservicehttpdrestartnagios监控tomcat存活状态被监控机不需要任何操作(因为check_tcp!8080不需要通过nrpe来监控)监控端/usr/local/nagios/etc/objects/commands.cfg已经存在check_tcp命令故也不需要操作在下面的文件中vim/usr/local/nagios/etc/objects/hong221.cfg添加defineservice{usegeneric-servicehost_namehong221service_descriptiontomcatcheck_commandcheck_tcp!8080!xxxxx
}收到检测执行下面命令[root@nagiosobjects]#/usr/local/nagios/libexec/check_tcp-H xxxxx -p8080TCPOK-0.141secondresponsetimeonport8080|time=0.141140s;;;0.000000;10.000000重启nagios服务和http服务servicenagiosrestartservicehttpdrestart然后在监控端就可以看到监控页面了nagios配置139邮箱报警关于mail发送邮件139邮箱收不到的解决办法tail-f/var/log/maillog日志报错如下Feb2117:20:49localhostpostfix/qmgr[2072]:A296612227F:from=<root@localhost.localdomain>,size=700,nrcpt=1(queueactive)Feb2117:20:49localhostsendmail[2275]:q1L9KmDa002275:to=xxxxx@139.com,ctladdr=root(0/0),delay=00:00:01,xdelay=00:00:00,mailer=relay,pri=30221,relay=[127.0.0.1][127.0.0.1],dsn=2.0.0,stat=Sent(Ok:queuedasA296612227F)Feb2117:20:49localhostpostfix/smtpd[2276]:disconnectfromlocalhost.localdomain[127.0.0.1]Feb2117:20:50localhostpostfix/smtp[2280]:A296612227F:to=<xxxxx@139.com>,relay=mx1.mail.139.com[221.176.9.178]:25,delay=0.53,delays=0.05/0.01/0.24/0.23,dsn=5.0.0,status=bounced(hostmx1.mail.139.com[221.176.9.178]said:550985a4f43618db72-3c5deMailrejected(inreplytoendofDATAcommand))Feb2117:20:50localhostpostfix/cleanup[2279]:43FB812227E:message-id=<20120221092050.43FB812227E@localhost.localdomain>Feb2117:20:50localhostpostfix/qmgr[2072]:43FB812227E:from=<>,size=2697,nrcpt=1(queueactive)Feb2117:20:50localhostpostfix/bounce[2281]:A296612227F:sendernon-deliverynotification:43FB812227EFeb2117:20:50localhostpostfix/qmgr[2072]:A296612227F:removed经指点是由于hostname(localhost.localdomain)的问题可能会被139邮箱当做垃圾邮件[root@nagiosobjects]#cat/etc/sysconfig/networkNETWORKING=yes#HOSTNAME=localhost.localdomainHOSTNAME=nagios.localdomain[root@nagiosobjects]#cat/etc/hosts192.168.1.130nagios.localdomainnagios#AddedbyNetworkManager127.0.0.1localhost.localdomainlocalhost::1nagios.localdomainnagioslocalhost6.localdomain6localhost6故随便改了一个名字然后重启服务器发现可以使用了139邮箱也能收到邮件了关于服务报警nagios方面的配置监控机上vim/usr/local/nagios/etc/objects/contacts.cfgdefinecontact{contact_namenagiosadmin;Shortnameofuserusegeneric-contact;Inheritdefaultvaluesfromgeneric-contacttemplate(definedabove)aliasNagiosAdmin;Fullnameofuserservice_notification_period24x7host_notification_period24x7service_notification_optionsw,u,c,rhost_notification_optionsd,u,rservice_notification_commandsnotify-service-by-emailhost_notification_commandsnotify-host-by-emailemail xxxxx@139.com(写上你要发送到的邮箱里面139邮箱运维必备);<<*****CHANGETHISTOYOUREMAILADDRESS******}definecontactgroup{contactgroup_nameadminsaliasNagiosAdministratorsmembersnagiosadmin}然后重启nagios服务即可servicenagiosrestart注意在主机配置文件中有下面语句的服务出了问题才会报警notifications_enabled1(1是报警0为不报警)注意申请139邮箱的时候短信要选长格式的邮件到达通知要改成24小时的vimtemplates.cfgdefineservice{namegeneric-service;The'name'ofthisservicetemplateactive_checks_enabled1;Activeservicechecksareenabledpassive_checks_enabled1;Passiveservicechecksareenabled/acceptedparallelize_check1;Activeservicechecksshouldbeparallelized(disablingthiscanleadtomajorperformanceproblems)obsess_over_service1;Weshouldobsessoverthisservice(ifnecessary)check_freshness0;DefaultistoNOTcheckservice'freshness'notifications_enabled1;Servicenotificationsareenabledevent_handler_enabled1;Serviceeventhandlerisenabledflap_detection_enabled1;Flapdetectionisenabledfailure_prediction_enabled1;Failurepredictionisenabledprocess_perf_data1;Processperformancedataretain_status_information1;Retainstatusinformationacrossprogramrestartsretain_nonstatus_information1;Retainnon-statusinformationacrossprogramrestartsis_volatile0;Theserviceisnotvolatilecheck_period24x7;Theservicecanbecheckedatanytimeofthedaymax_check_attempts3;Re-checktheserviceupto3timesinordertodetermineitsfinal(hard)statenormal_check_interval10;Checktheserviceevery10minutesundernormalconditionsretry_check_interval2;Re-checktheserviceeverytwominutesuntilahardstatecanbedeterminedcontact_groupsadmins;Notificationsgetsentouttoeveryoneinthe'admins'groupnotification_optionsw,u,c,r;Sendnotificationsaboutwarning,unknown,critical,andrecoveryeventsnotification_interval10(这个就是间隔多少时间发一次报警信息);Re-notifyaboutserviceproblemseveryhournotification_period24x7;Notificationscanbesentoutatanytimeregister0;DONTREGISTERTHISDEFINITION-ITSNOTAREALSERVICE,JUSTATEMPLATE!}nagios相关错误解决方法错误解决方法一当新增加一台监控主机(举例为129的load)监控项点击SchedulingQueue--129load时StatusInformation:这一项提示为CHECK_NRPE:Sockettimeoutafter10seconds检查1首先在监控主机上执行/usr/local/nagios/libexec/check_nrpe-H192.168.1.129看能不能得到NRPE的版本号然后查看iptables是否有相关限制2查看文件权限cd/usr/local/nagios/etc/objects[root@localhostobjects]#lltotal52-rw-r--r--1rootroot314Feb1615:58129.cfg-rwxrwxrwx1nagiosnagios7856Feb1616:06commands.cfg-rwxrwxrwx1nagiosnagios2166Feb1613:58contacts.cfg-rwxrwxrwx1nagiosnagios5403Feb1613:58localhost.cfg-rwxrwxrwx1nagiosnagios3124Feb1613:58printer.cfg-rwxrwxrwx1nagiosnagios3293Feb1613:58switch.cfg-rwxrwxrwx1nagiosnagios10812Feb1613:58templates.cfg-rwxrwxrwx1nagiosnagios3209Feb1613:58timeperiods.cfg-rwxrwxrwx1nagiosnagios4007Feb1613:58windows.cfg看看新增加的这个监控主机文件权限是不是nagios用户可读可写不可以的话参照其他文件修改如下[root@localhostobjects]#lltotal52-rwxrwxrwx1nagiosnagios314Feb1615:58129.cfg-rwxrwxrwx1nagiosnagios7856Feb1616:06commands.cfg-rwxrwxrwx1nagiosnagios2166Feb1613:58contacts.cfg-rwxrwxrwx1nagiosnagios5403Feb1613:58localhost.cfg-rwxrwxrwx1nagiosnagios3124Feb1613:58printer.cfg-rwxrwxrwx1nagiosnagios3293Feb1613:58switch.cfg-rwxrwxrwx1nagiosnagios10812Feb1613:58templates.cfg-rwxrwxrwx1nagiosnagios3209Feb1613:58timeperiods.cfg-rwxrwxrwx1nagiosnagios4007Feb1613:58windows.cfg