nagios安装及配置

lbdbzj110 · 发表于 2017-4-20 09:02:06

　　本文操作系统基于centos
　　一、安装nagios依赖
　　nagios依赖于php、gcc glibc glibc-common、gd gd-devel
　　yum install php
　　yum install gcc glibc glibc-common
　　yum install gd gd-devel
　　若未安装apache，还需要安装
　　yum install httpd
　　二、用户及组设置
　　/usr/sbin/useradd -m nagios
　　passwd nagios
　　Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.
　　/usr/sbin/groupadd nagios
　　/usr/sbin/usermod -a -G nagios nagios
　　/usr/sbin/usermod -a -G nagios apache
　　三、下载及安装nagios核心
　　wget http://prdownloads.sourceforge.net/sourceforge/nagios/nagios-3.2.3.tar.gz
　　tar xzf nagios-3.2.3.tar.gz
　　cd nagios-3.2.3
　　Run the Nagios configure script, passing the name of the group you created earlier like so:
　　./configure --with-command-group=nagios
　　Compile the Nagios source code.
　　make all
　　Install binaries, init script, sample config files and set permissions on the external command directory.
　　make install
　　make install-init
　　make install-config
　　make install-commandmode
　　Don't start Nagios yet - there's still more that needs to be done...
　　Sample configuration files have now been installed in the /usr/local/nagios/etc directory. These sample files should work fine for getting started with Nagios. You'll need to make just one change before you proceed...
　　Edit the /usr/local/nagios/etc/objects/contacts.cfg config file with your favorite editor and change the email address associated with the nagiosadmin contact definition to the address you'd like to use for receiving alerts.
　　vi /usr/local/nagios/etc/objects/contacts.cfg
　　四、配置成web服务
　　Install the Nagios web config file in the Apache conf.d directory.
　　make install-webconf //还停留在nagios-3.2.3目录下操作此命令，可以看到安装的httpd的目录/etc/httpd/conf.d下生成了nagios.conf
　　Create a nagiosadmin account for logging into the Nagios web interface. Remember the password you assign to this account - you'll need it later.
　　htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
　　Restart Apache to make the new settings take effect.
　　service httpd restart
　　五、安装plugins
　　wget https://www.nagios-plugins.org/download/nagios-plugins-1.5.tar.gz
　　tar zxvf nagios-plugins-1.5.tar.gz
　　cd nagios-plugins-1.5
　　Compile and install the plugins.
　　./configure --with-nagios-user=nagios --with-nagios-group=nagios
　　make
　　make install
　　六、nagios启动
　　Add Nagios to the list of system services and have it automatically start when the system boots.
　　chkconfig --add nagios
　　chkconfig nagios on
　　Verify the sample Nagios configuration files.
　　/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
　　If there are no errors, start Nagios.
　　service nagios start
　　七、Modify SELinux Settings
　　Fedora ships with SELinux (Security Enhanced Linux) installed and in Enforcing mode by default. This can result in "Internal Server Error" messages when you attempt to access the Nagios CGIs.
　　See if SELinux is in Enforcing mode.
　　getenforce
　　Put SELinux into Permissive mode.
　　setenforce 0
　　To make this change permanent, you'll have to modify the settings in /etc/selinux/config and reboot.
　　Instead of disabling SELinux or setting it to permissive mode, you can use the following command to run the CGIs under SELinux enforcing/targeted mode:
　　chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/
　　chcon -R -t httpd_sys_content_t /usr/local/nagios/share/
　　八、进入web端查看
　　http://youip:port/nagios
　　此时只能看到本机最简单的默认配置状况
　　九、nrpe安装时
　　官方文档http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
　　nrpe只是nagios server与监控到其他host的一种方式，nagios也可以使用其他的Nagios Addon Projects来实现比如
　　DNX
　　NRDP
　　NSCA
　　NSClient++
　　见http://www.nagios.org/download/addons
　　客户端(假设客户端ip:192.168.90.111)安装
　　1、useradd nagios
　　passwd nagios(置密码)
　　2、安装xinetd
　　centos6.3自带可能没有安装xinted，但是有点奇怪的是，却有/etc/xinetd.d目录
　　yum install -y xinetd安装完后，可以看到/etc/xinted.conf配置
　　3、需要安装nagios-plugin
　　从官网下载nagios-plugin
　　tar -zxvf nagios-plugins-1.5.tar.gz
　　cd nagios-plugins-1.5
　　./configure
　　make
　　make install
　　chown nagios:nagios /usr/local/nagios
　　chown -R nagios:nagios /usr/local/nagios/libexec
　　4、从官网下载nrpe
　　tar -zxvf nrpe-1.15.tar.gz
　　cd nrpe-1.15
　　./configure
　　make all
　　make install-plugin
　　make install-daemon
　　make install-daemon-config
　　make install-xinetd
　　在安装nrpe过程中./configure时报错
　　checking for SSL headers... configure: error: Cannot find ssl headers
　　yum install -y openssl openssl-devel安装完成后，解决问题
　　5、nrpe配置及系统设置
　　vi /etc/xinetd.d/nrpe
　　修改此行
　　only_from =127.0.0.1 nagios_server_ip(空格)
　　保存退出
　　vi /usr/local/nagios/etc/nrpe.cfg
　　allowed_hosts=127.0.0.1, nagios_server_ip(半角逗号)
　　保存退出
　　vi /etc/services
　　新增以下内容
　　nrpe 5666/tcp # NRPE
　　保存退出
　　vi /etc/sysconfig/iptables
　　新增以下内容
　　-A RH-Firewall-1-INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
　　保存退出
　　service iptables restart
　　service xinetd restart
　　6、检测客户端是否成功安装nrpe
　　/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
　　NRPE v2.15
　　/usr/local/nagios/libexec/check_nrpe -H localhost
　　报错
　　CHECK_NRPE: Error - Could not complete SSL handshake.
　　通常是因为/etc/hosts配置有错误引起，vi /etc/hosts发现有这么一行
　　::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
　　注释掉或删除后正常
　　7、如果要在客户端新增监控命令等设置文件在 /usr/local/nagios/etc/nrpe.cfg
　　服务端nrep安装
　　1、服务端不安装nagios-plugin，直接安装nrpe
　　yum install openssl openssl-devel
　　tar -zxvf nrpe-1.15.tar.gz
　　cd nrpe-1.15
　　./configure
　　make all
　　make install-plugin
　　2、检测刚刚安装的客户端
　　/usr/local/nagios/libexec/check_nrpe -H 192.168.90.111
　　NRPE v2.15
　　说明能正常监控到客户端了
　　3、在服务端修改配置以监控到客户端
　　vi /usr/local/nagios/etc/commands.cfg
　　define command{
　　command_name check_nrpe
　　command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
　　}
　　保存后退出
　　以localhost.cfg为模板新建一个文件remotehost.cfg
　　define host{
　　use linux-box ; Inherit default values from a template host_name remotehost ; The name we're giving to this server
　　alias CentOS 6.3 ; A longer name for the server
　　address 192.168.90.111 ; IP address of the server
　　}
　　define service{
　　use generic-service
　　host_name remotehost
　　service_description CPU Load
　　check_command check_nrpe!check_load
　　}
　　The following service will monitor the the number of currently logged in users on the remote host.
　　define service{
　　use generic-service
　　host_name remotehost
　　service_description Current Users
　　check_command check_nrpe!check_users
　　}
　　The following service will monitor the free drive space on /dev/hda1 on the remote host.
　　define service{
　　use generic-service
　　host_name remotehost
　　service_description /dev/hda1 Free Space
　　check_command check_nrpe!check_hda1
　　}
　　The following service will monitor the total number of processes on the remote host.
　　define service{
　　use generic-service
　　host_name remotehost
　　service_description Total Processes
　　check_command check_nrpe!check_total_procs
　　}
　　The following service will monitor the number of zombie processes on the remote host.
　　define service{
　　use generic-service
　　host_name remotehost
　　service_description Zombie Processes
　　check_command check_nrpe!check_zombie_procs
　　}
　　保存退出
　　vi /usr/local/nagios/etc/nagios.cfg
　　新增一行，使新的配置生效
　　cfg_file=/usr/local/nagios/etc/objects/remotehost.cfg
　　保存退出
　　4、/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
　　会检测是否配置正确，如果配置有问题，会提示修改直到修改正确。
　　5、重起服务
　　service nagios restart
　　再次查看http://192.168.90.101:8080/nagios/
　　看到多出了host及host下的service
　　十、nagios配置详解，官方资料http://nagios.sourceforge.net/docs/nagioscore/3/en/config.html
　　1、配置文件清单
　　/usr/local/nagios/etc目录下
　　cgi.cfg //控制cgi访问的配置文件
　　nagios.cfg //nagios主配置文件
　　resource.cfg //变量定义文件，或叫资源文件，通过在此文件中定义变量，以便让其他配置文件引用，如$USER1
　　objects //objects上当下放置配置文件或配置文件模板，用于定义Nagios对象
　　objects/commands.cfg //命令定义配置文件，里面定义的命令可以被其他配置文件引用
　　objects/contacts.cfg //定义联系人和联系人组的配置文件
　　objects/localhost.cfg //定义监控本地主机的配置文件
　　objects/printer.cfg //定义监控打印机的一个配置文件模板，默认没有启用此文件
　　objects/switch.cfg //监控路由器的一个配置文件模板，默认没有启用此文件
　　objects/templates.cfg //定义主机、服务的一个模板配置文件，可以在其他配置文件中引用
　　objects/timeperiods.cfg //定义nagios监控时间段的配置文件
　　objects/windows.cfg //监控windows主机的一个配置文件模板，默认没有启用此文件
　　2、配置文件之间的关系
　　在nagios的配置过程中涉及到的几个定义有：主机、主机组，服务、服务组，联系人、联系人组，监控时间，监控命令等，
　　从这些定义可以看出，nagios各个配置文件之间是互为关联，彼此引用的。
　　成功配置出一台nagios监控系统，必须要弄清楚每个配置文件之间依赖与被依赖的关系，最重要的有四点：
　　第一：定义监控哪些主机【host】【hostescalation】【hostdependency】、主机组【hostgroup】、服务
　　【service】【serviceescalation】【servicedependency】和服务组【servicegroup】
　　第二：定义这个监控要用什么命令【command】实现，
　　第三：定义监控的时间段【timeperiod】，
　　第四：定义主机或服务出现问题时要通知的联系人【contact】和联系人组【contactgroup】。
　　nagios所有对象
　　Services
　　Service Groups
　　Hosts
　　Host Groups
　　Contacts
　　Contact Groups
　　Commands
　　Time Periods
　　Notification Escalations
　　Notification and Execution Dependencies
　　见文档http://nagios.sourceforge.net/docs/nagioscore/3/en/configobject.html
　　http://nagios.sourceforge.net/docs/nagioscore/3/en/objectdefinitions.html#retain_state_information
　　3、为了使对象更加清楚，建议将nagios各个定义对象创建独立的配置文件：
　　即为：
　　创建hosts.cfg文件来定义主机和主机组
　　创建services.cfg文件来定义服务
　　用默认的contacts.cfg文件来定义联系人和联系人组
　　用默认的commands.cfg文件来定义命令
　　用默认的timeperiods.cfg来定义监控时间段
　　用默认的templates.cfg文件作为资源引用文件
　　nagios主要用于监控主机资源以及服务，在nagios配置中称为对象，为了不必重复定义一些监控对象，Nagios引入了一个模板
　　配置文件，将一些共性的属性定义成模板，以便于多次引用。为了看起来比较清晰，我们把这些通用的对象定义在这就是templates.cfg。
　　然后再其他配置文件里定义具体对象时可以继承通用对象(使得use)
　　十一、commands语法
　　在/usr/local/nagios/libexec下面可以看到nagios自身提供的commands插件，比如check_ping，可以使用check_ping -h查看
　　些插件的详细参数说明。一般会在在commands.cfg文件里进一步定义command，比如
　　define command{
　　command_name check_ping
　　command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
　　}
　　其中参数$HOSTADDRESS$会动态从host_name系统里获取，那么在使用的时候还需要传入两个参数
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description PING
　　check_command check_ping!100.0,20%!500.0,60%
　　}
　　第一个!之后的数据表示第一个参数
　　第二个!之后的数据表示第二个参数
　　十二、最基本的服务监控
　　1、机器网络内部是否正常（如果不通肯定出严重情况，但引起的原因可能是网络设备出问题、也可能是机器自身出问题）
　　define host{
　　use linux-server
　　host_name hxdbmaster
　　address ip
　　hostgroups linux-servers
　　}
　　在服务端定义要监控的机器。如果客户端不安装nagios-plugin，又想监控其网络状态，也可以使用ping服务是否相通
　　define command{
　　command_name check-remotehost-alive
　　command_line $USER1$/check_ping -H $ARG1$ -w 3000.0,80% -c 5000.0,100% -p 5
　　}
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description PING192.168.90.167
　　check_command check-remotehost-alive!192.168.90.167
　　}
　　2、机器磁盘
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description Root Partition
　　check_command check_local_disk!20%!10%!/
　　}
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description HOME Partition
　　check_command check_local_disk!20%!10%!/home
　　}
　　以上表示/根分区(逻辑分区)空间剩20%时警告，10%时严重警告
　　/home根分区(另外一个逻辑分区)空间剩20%时警告，10%时严重警告
　　3、CPU使用情况
　　# 'check_local_load' command definition
　　define command{
　　command_name check_local_load
　　command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
　　}
　　# Define a service to check the load on the local machine.
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description Current Load
　　check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
　　}
　　当一分钟超过5个进程等待；5分钟超过4个进程等待；15分钟超过3进程等待则提升至Waining状态
　　当一分钟超过10个进程等待；5分钟超过6个进程等待；15分钟超过4进程等待则提升至Critical状态
　　4、IO使用情况
　　Nagios 自带的包里没有直接检查硬盘 I/O 的包: check_iostat.
　　不过可以到官网上下载一个.下载页面是:
　　http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_iostat--2D-I-2FO-statistics/details
　　下载完后直接上传到监控端和被监控端的的:/usr/local/nagios/libexec/ 目录.
　　给它执行权限:
　　chmod +x check_iostat
　　查看它的帮助:
　　[iyunv@localhost libexec]# ./check_iostat -help
　　This plugin shows the I/O usage of the specified disk, using the iostat external program.
　　It prints three statistics: Transactions per second (tps), Kilobytes per second
　　read from the disk (KB_read/s) and and written to the disk (KB_written/s)
　　./check_iostat:
　　-d Device to be checked (without the full path, eg. sda)
　　-c Sets the CRITICAL level for tps, KB_read/s and KB_written/s, respectively
　　-w Sets the WARNING level for tps, KB_read/s and KB_written/s, respectively
　　可以看到，它是用来检查硬盘上每秒数据写入读取的。
　　参数分别是:
　　-d: 要检查的设备名称,不用写全路径
　　-c: 当达到多少 KB/S 时就报 CRITICAL 级别的警
　　-w: 当达到多少 KB/S 时就报 WARNING 级别的警
　　查看本机的硬盘信息:
　　[iyunv@localhost libexec]# df -h
　　Filesystem Size Used Avail Use% Mounted on
　　/dev/mapper/VolGroup00-LogVol00
　　128G 27G 95G 22% /
　　/dev/sda1 99M 13M 82M 14% /boot
　　tmpfs 4.0G 0 4.0G 0% /dev/shm
　　上面的信息是 sda1, 那么 -d 后就写 sda
　　另外，还有可能不是 sda 的,如:
　　[iyunv@li387-161 ~]# df -h
　　Filesystem Size Used Avail Use% Mounted on
　　/dev/xvda 79G 38G 40G 49% /
　　tmpfs 1009M 108K 1009M 1% /dev/shm
　　上面的情况,-d 后就写 xvda
　　检查是否能运行:
　　[iyunv@localhost libexec]# ./check_iostat -d sda -w 1000 -c 2000
　　OK - I/O stats tps=1.71 KB_read/s=2.77 KB_written/s=26.77 | 'tps'=1.71; 'KB_read/s'=2.77; 'KB_written/s'=26.77;
　　如果不能运行,报错,先在本机安装 sysstat:
　　[iyunv@localhost libexec]# yum install sysstat
　　如果还报错,那就根据报错的信息一步步解决.
　　比如我这边报过: bc: command not found ; 解决: yum install bc
　　原文见http://blog.sina.com.cn/s/blog_5f54f0be0101ch4p.html
　　比如某机器上有sda、sdb两块硬盘
　　# 'check_iostat' command definition
　　define command{
　　command_name check_local_iostat
　　command_line $USER1$/check_iostat -d $AGR1$ -w $ARG2$ -c $ARG3$
　　}
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description sda io
　　check_command check_local_disk!sda!1000!2000
　　}
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description sdb io
　　check_command check_local_disk!sdb!1000!2000
　　}
　　需要特别注意的是，以上配置都没有错误，但出人意料的是service起来后，返回255代码出错，看起来check_iostat只支持nrpe的方式。
　　通过nrep方式配置
　　修改nrpe.cfg，新加两条配置
　　command[check_iostatsda]=/usr/local/nagios/libexec/check_iostat -d sda -w 1000 -c 2000
　　command[check_iostatsdb]=/usr/local/nagios/libexec/check_iostat -d sdb -w 1000 -c 2000
　　把以上的service配置改成通过nrpe的方式
　　define service{
　　use generic-service
　　host_name localhost
　　service_description sda io[localhost]
　　check_command check_nrpe!check_iostatsda
　　}
　　define service{
　　use generic-service
　　host_name localhost
　　service_description sdb io[localhost]
　　check_command check_nrpe!check_iostatsdb
　　}
　　5、内存使用
　　nagios自带没有直接检查内存检测，但有内存交换区检测check_swap。交换区的使用情况跟内在不一定的关联，因此也可
　　以直接使用check_swap来做内存检测。如果要直接检测内在，从官网上下载
　　http://exchange.nagios.org/directory/Plugins/Operating-Systems/Linux/check_mem/details
　　下载check_mem.zip，解开后有check_mem.pl脚本，放到/libexec目录下
　　chown nagios:nagios check_mem.pl
　　chmod +x check_mem.pl
　　./check_mem.pl -w 90,25 -c 95,50 表示内存使用超过90%，swap使用超过25%警告，使用超过95%，swap使用超过50%严重警告
　　配置如下：
　　# 'check_memory' command definition
　　define command{
　　command_name check_memory
　　command_line $USER1$/check_mem.pl -w $ARG1$ -c $ARG2$
　　}
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description Memory Usage
　　check_command check_memory!90,80!95,90
　　}
　　简单的交换区使用
　　# 'check_local_swap' command definition
　　define command{
　　command_name check_local_swap
　　command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
　　}
　　# Define a service to check the swap usage the local machine.
　　# Critical if less than 10% of swap is free, warning if less than 20% is free(这里的参数表示剩下的空间，
　　与check_memory刚好相反)
　　define service{
　　use local-service ; Name of service template to use
　　host_name localhost
　　service_description Swap Usage
　　check_command check_local_swap!20!10
　　}
　　6、web服务某个页面是否正常及响应时间
　　使用check_http
　　8、流量监控
　　指网卡的流量,在nagios下载安装
　　http://exchange.nagios.org/directory/Plugins/Network-Connections,-Stats-and-Bandwidth/check_traffic-2Esh/details
　　在安装之前查看是否已安装了snmp
　　rpm -qa |grep snmp
　　如果没有安装，必须先安装snmp
　　yum -y install net-snmp*
　　下载check_traffice.sh到/usr/local/nagios/libexec/目录下
　　chown nagios:nagios check_traffic.sh
　　chmod +x check_traffic.sh
　　起动snmpd
　　service snmpd restart
　　./check_traffic.sh -V 2c -C public -H 10.2.112.xx -L
　　出现错误
　　List Interface for host 127.0.0.1.
　　Interface index = No Such Object available on this agent at this OID
　　或者错误
　　Timeout: No Response from 127.0.0.1
　　没配置好/etc/snmp/snmpd.conf
　　删除原有内容，粘贴以下内容到snmpd.conf
　　com2sec notConfigUser 127.0.0.1 public
　　com2sec notConfigUser 192.168.90.xx public
　　# Second, map the security name into a group name:
　　# groupName securityModel securityName
　　group notConfigGroup v1 notConfigUser
　　group notConfigGroup v2c notConfigUser
　　# Third, create a view for us to let the group have rights to:
　　# Make at least snmpwalk -v 1 localhost -c public system fast again.
　　# name incl/excl subtree mask(optional)
　　view systemview included .1.3.6.1.2.1.1
　　view systemview included .1.3.6.1.2.1.2
　　view systemview included .1.3.6.1.2.1.25.1.1
　　view all included .1
　　# Finally, grant the group read-only access to the systemview view.
　　# group context sec.model sec.level prefix read write notif
　　#access notConfigGroup "" any noauth exact mib2 none none
　　access notConfigGroup "" any noauth exact all none none
　　## sec.name source community
　　#com2sec local localhost COMMUNITY
　　#com2sec mynetwork NETWORK/24 COMMUNITY
　　com2sec notConfigUser default public
　　com2sec *.*.*.0 192.168.90.0/24 public
　　192.168.90.换成你所需的网段，xx换成你实际的ip
　　service snmpd restart
　　先直接用snmpwalk -v 2c -c public 192.168.90.xx interfaces
　　输出类似
　　IF-MIB::ifNumber.0 = INTEGER: 2
　　IF-MIB::ifIndex.1 = INTEGER: 1
　　IF-MIB::ifIndex.2 = INTEGER: 2
　　IF-MIB::ifDescr.1 = STRING: lo
　　IF-MIB::ifDescr.2 = STRING: eth0
　　...
　　通了。
　　再使用
　　./check_traffic.sh -V 2c -C public -H localhost -L
　　List Interface for host localhost.
　　Interface index 1 orresponding to lo
　　Interface index 2 orresponding to eth0
　　问题解决。snmp问题解决可参考此文http://blog.renren.com/share/222193096/10751098321
　　# 'check_traffic' command definition
　　define command{
　　command_name check_traffic
　　command_line $USER1$/check_traffic.sh -V 2c -C public -H $HOSTADDRESS$ -I $ARG1$ -w $ARG2$ -c $ARG3$ -K -B
　　}
　　-I表示第几个设置
　　/usr/local/nagios/check_traffic.sh -V 2c -C public -H localhost -L可以看到设备序号，比如本机2表示第一个网卡eth0
　　define service{
　　use local-service
　　host_name localhost
　　service_description net traffic eth0
　　check_command check_traffic!2!700,600!1000,900
　　}
　　9、邮件警告
　　10、短信警告(未配置)
　　11、其他
　　check_procs用法：
　　Usage: check_procs -w <range> -c <range> [-m metric] [-s state] [-p ppid] [-u user] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
　　[-C command] [-t timeout] [-v]
　　现在就来解释下逐个参数的意思
　　-w -c 大家都知道，设置警告和严重警告的范围。一般都是设置一个数字，这样设置的话，进程数比设置的更大才报，比如
　　[iyunv@udb151k libexec]# ./check_procs -w 84 -c 90
　　PROCS OK: 83 processes
　　还具有另一种写法
　　[iyunv@udb151k libexec]# ./check_procs -w 84: -c :90
　　PROCS WARNING: 83 processes 冒号的意思是大于或者小于，这里的意思是小于84 或大于90 报警
　　-m 以什么来衡量报警，后面的参数有
　　PROCS - number of processes (default) 以进程的数量（默认）
　　VSZ - virtual memory size 占用虚拟内存的大小
　　RSS - resident set memory size占用物理内存的大小
　　CPU - percentage CPU 占用CPU的比例
　　-s 以进程的状态加以区分，进程的状态有很多种，详细可ps -exX 查看
　　-p 进程的父进程
　　-u 进程的UID
　　-r 实际使用的物理内存
　　-z 虚拟内存
　　-P 占用CPU
　　-a 设定字符串
　　-C 进程的命令
　　-t 超时设定
　　-a 的缺点：很多时候，我们要监控一个进程是否正常，这个时候很多人都喜欢用-a 加上自己进程的参数名称来监控，这样做其实很容易引起不必要的报警，
　　它会找出所有符合设定的字符串的进程，比如，我们在vi一个同名的文件或者查看该目录下的文件时：
　　[iyunv@udb151k libexec]# ./check_procs -w 1: -c :2 -a mysqld
　　PROCS CRITICAL: 3 processes with args 'mysqld'
　　这个时候用-C是更准确的：
　　[iyunv@udb151k libexec]# ./check_procs -w 1: -c :2 -C mysqld
　　PROCS OK: 1 process with command name 'mysqld'
　　原文：http://hi.baidu.com/zjx416/item/44474b1004b33038b831802f
　　十三、一些问题
　　1、Status Information出现中文乱码？没有解决
　　2、define host时发现check_command可以不用配置，那用什么来检测host的status的呢？
　　答案是不配置默认使用check_ping。可以显式的把command配置起来，比如command check_http，就会发现Status变成DOWN了。

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

nagios安装及配置

扫码加入运维网微信交流群