Nagios监控安装配置及邮件告警

q986 · 发表于 2019-1-12 15:06:40

　　一、简介
　　■1.特性
　　___________________________________________________________
　　Nagios是一款开源的免费网络监视工具，能有效监控Windows、Linux和Unix的主机状态，交换机路由器等网络设置，打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员，在状态恢复后发出正常的邮件或短信通知。
　　Nagios主要监控服务/资源方面，适合多服务器上面的多服务监控，重点并不在图形化的监控，报警系统比cacti强大很多，nagios核心不做任何监控，所有监控都是由插件（脚本）来完成的。
　　Cacti是通过SNMP协议收集被监控服务器信息，而nagios有自己的agent收集信息（NRPE）。
　　■2.工作原理

　　■3.插件说明
　　___________________________________________________________
　　Nagios都是通过插件实现的，通常由一个主程序（nagios）、一个插件程序（nagios-plugins）
　　四个可选的附件（NRPE、NSCA工作客户端，NSclient 工作服务器和客户端、NDOUtils 工作服务端）
　　NRPE：监控linux/unix主机上执行脚本检测以实现对这些主机服务/资源的监控。
　　NSCA：让被监控的linux/unix主机主动将监控信息发送给nagios服务器。
　　NSClient++：用来监控windows主机时安装在windows主机上的组件（NSClient++ plugin）。
　　NDOUtils：将nagios的配置信息和事件产生的数据存入数据库，以实现这些数据的快速检索和处理，用于与Cacti整合。
　　Cacti与Nagios整合需要插件：
　　ndoutils下载：http://sourceforge.net/projects/nagios/files/ndoutils-2.x/ndoutils-2.0.0/ndoutils-2.0.0.tar.gz
　　npc下载：http://dl.cactifans.org/plugins/npc-2.0.4.tar.gz
　　■4.插件下载

　　___________________________________________________________

　　监控主机：
　　wget http://nchc.dl.sourceforge.net/project/nagios/nagios-3.x/nagios-3.4.4/nagios-3.4.4.tar.gz
　　wget http://nchc.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.16/nagios-plugins-1.4.16.tar.gz
　　wget http://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
　　被监控 Linux 主机：
　　wget http://nchc.dl.sourceforge.net/project/nagiosplug/nagiosplug/1.4.16/nagios-plugins-1.4.16.tar.gz
　　wget http://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.14/nrpe-2.14.tar.gz
　　被监控 Windows 主机：
　　http://nsclient.org/nscp/downloads
　　http://files.nsclient.org/0.3.x/NSClient%2B%2B-0.3.9-Win32.zip
　　http://files.nsclient.org/0.3.x/NSClient%2B%2B-0.3.9-x64.zip
　　

　　二、安装
　　
　　■1.安装阿里yum源
　　___________________________________________________________
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-6.repo
wget -P /etc/yum.repos.d/ http://mirrors.aliyun.com/repo/epel-6.repo
yum clean all
yum makecache　　

　　

　　
　　■2.安装apache和php
　　___________________________________________________________
yum install -y httpd nagios nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe　　

　　
　　

　　■3.安装nagios

yum install -y httpd nagios nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe
htpasswd -c /etc/nagios/passwd nagiosadmin
service httpd start ; service nagios start
#创建登录密码
htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
访问nagios页面：http://ip/nagios　　

、

　　■4.cfg配置文件介绍

　　三、nagios监控linux主机
　　■1.安装nagios插件和nrpe
yum install -y nagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe　　■2.配置nrpe

　　vim /etc/nagios/nrpe.cfg
　　找到“allowed_hosts=127.0.0.1” 改为 “allowed_hosts=127.0.0.1,192.168.1.222” 后面的ip为服务端ip; 找到” dont_blame_nrpe=0” 改为  “dont_blame_nrpe=1”
　　/etc/init.d/nrpe start
　　#启动客户端
　　■3.添加监控服务（ssh,ping,http）

vim  /etc/nagios/conf.d/192.168.1.201.cfg
define host{
      use                   linux-server          ; Name of host template to use
                                                      ; This host definition will inherit all variables that are defined
                                                      ; in (or inherited by) the linux-server host template definition.
      host_name             192.168.1.201
      alias                1.201
      address                192.168.1.201
      }
define service{
      use                   generic-service
      host_name             192.168.1.201
      service_description    check_ping
      check_command          check_ping!100.0,20%!200.0,50%
      max_check_attempts 5
      normal_check_interval 1
}
define service{
      use                   generic-service
      host_name             192.168.1.201
      service_description    check_ssh
      check_command          check_ssh
      max_check_attempts    5
      normal_check_interval 1
}
define service{
      use                   generic-service
      host_name             192.168.1.201
      service_description    check_http
      check_command          check_http
      max_check_attempts    5
      normal_check_interval 1
}　　■4.配置文件说明

　　我们定义的配置文件中一共监控了三个service：ssh, ping, http 这三个项目是使用本地的nagios工具去连接远程机器，也就是说即使客户端没有安装nagios-plugins以及nrpe也是可以监控到的。其他的一些service诸如负载、磁盘使用等是需要服务端通过nrpe去连接到远程主机获得信息，所以需要远程主机安装nrpe服务以及相应的执行脚本(nagios-plugins)

　　max_check_attempts 5  #当nagios检测到问题时，一共尝试检测5次都有问题才会告警，如果该数值为1，那么检测到问题立即告警
　　normal_check_interval 1#重新检测的时间间隔，单位是分钟，默认是3分钟
　　notification_interval          60 #在服务出现异常后，故障一直没有解决，nagios再次对使用者发出通知的时间。单位是分钟。如果你认为，所有的事件只需要一次通知就够了，可以把这里的选项设为0。
　　

　　■5.添加监控服务（磁盘负载）

vim /etc/nagios/objects/commands.cfg    #找到define command，增加下面内容：
define command{
      command_name check_nrpe
      command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
      }
vim /etc/nagios/conf.d/192.168.1.201.cfg #添加下面内容
define service{
      use    generic-service
      host_name    192.168.1.201
      service_description    check_load
      check_command          check_nrpe!check_load
      max_check_attempts 5
      normal_check_interval 1
}
define service{
      use    generic-service
      host_name    192.168.1.201
      service_description    check_disk_sda1
      check_command          check_nrpe!check_sda1
      max_check_attempts 5
      normal_check_interval 1
}
define service{
      use    generic-service
      host_name    192.168.1.201
      service_description    check_disk_sda2
      check_command          check_nrpe!check_sda2
      max_check_attempts 5
      normal_check_interval 1
}
#####　　

　　#check_nrpe!check_load ：这里的check_nrpe就是在commands.cfg刚刚定义的，check_load是远程主机上的一个检测脚本

　　vim /etc/nagios/nrpe.cfg
　　#在远程主机上，搜索check_load，这行就是在服务端上要执行的脚本了，我们可以手动执行这个脚本
　　把check_hda1更改一下：/dev/hda1 改为 /dev/sda1
　　[check_sda2]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/sda2
　　再加一行，有几个分区就可以添加几行
　　

　　■6.重启服务测试
　　nagios -v /etc/nagios/nagios.cfg
　　#检查配置文件语法错误
　　service nagios restart
　　/etc/init.d/nrpe restart
　　#在服务端和被监控端分别重启服务

　　四、nagios监控windows主机
　　1.通信原理
　　NSClient++与 Nagios 服务器通信，使用的是Nagios 服务器的 check_nt 插件与nsclient连接

　　2.配置加载监控对象文件
vim /etc/nagios/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/windows.cfg #去掉注释　　

　　3.编辑 commands.cfg 文件定义check_nt对nagios使用（默认已经定义，不需要再配置）
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}　　4.设置要监控的内容（默认配置）
define host{
use windows-server
host_name winserver
alias My Windows Server
address 192.168.1.105 #被监控主机ip
}
define hostgroup{
hostgroup_name windows-servers
alias Windows Servers
}
define service{
use generic-service
host_name winserver
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
define service{
use generic-service
host_name winserver
service_description Uptime
check_command check_nt!UPTIME
}
define service{
use generic-service
host_name winserver
service_description CPU Load
check_command check_nt!CPULOAD!-l 5,80,90
}
define service{
use generic-service
host_name winserver
service_description Memory Usage
check_command check_nt!MEMUSE!-w 80 -c 90
}
define service{
use generic-service
host_name winserver
service_description C:\ Drive Space
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
}
define service{
use generic-service
host_name winserver
service_description W3SVC
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
}
define service{
use generic-service
host_name winserver
service_description Explorer
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
}　　5.重启服务测试
　　

　　nagios -v /etc/nagios/nagios.cfg
　　#检查配置文件语法错误
　　service nagios restart
　　

　　

　　五、nagios邮件报警

　　1.配置contact.cfg
vim /etc/nagios/objects/contacts.cfg       #底端增加如下：
define contact{
   contact_name    123
   use             generic-contact
   alias          szk
   email          szk5043@139.com
}
define contact{
   contact_name    456
   use             generic-contact
   alias          szk
   email          szk5043@139.com
}
define  contactgroup{
   contactgroup_name common
   alias common
   members          123,456
}　　#  需要在配置文件默认底端members nagiosadmin,szk 这一行加上szk组员逗号隔开
　　2.配置192.168.1.201.cfg文件
　　#在需要告警的服务里加上contactgroup

vim /etc/nagios/conf.d/192.168.1.201.cfg
define service{
      use    generic-service
      host_name    192.168.1.201
      service_description    check_load
      check_command          check_nrpe!check_load
      max_check_attempts 5
      normal_check_interval 1
      contact_groups common #common 是在 /etc/nagios/objects/contacts.cfg 定义的收件用户组
      notifications_enabled 1
      notification_period 24x7
      notification_options c,r #这四行配置加在需要发送告警邮件的服务后面
      notifications_enabled 1 ;是否开启提醒功能。1为开启，0为禁用。一般，这个选项会在主配置文件（nagios.cfg）中定义>，效果相同。
      notification_period 24x7 ;发送提醒的时间段。非常重要的主机（服务）我定义为7×24，一般的主机（服务）就定义为上>班时间。如果不在定义的时间段内，无论什么问题发生，都不会发送提醒。
      notification_options:w,u,c,r ;这个是service的状态。w为waning， u为unknown, c为critical, r为recover(恢复了），>类似的还有一个 host对应的状态：d,u,r d = 状态为DOWN, u = 状态为UNREACHABLE , r = 状态恢复为OK，需要加入到host的定义配>置里。
}
nagios -v /etc/nagios/nagios.cfg #必须检测配置文件
service nagios restart  #重启服务　　3.若接收不到邮件
　　a.在服务端查看 /var/log/nagios/nagios.log 里是否有包含 SERVICE NOTIFICATION 的行
　　b.在服务端查看邮件软件 sendmail 或 postfix 是否启动，默认监听25端口 netstat -lnp ，查看邮件日志/var/log/maillog 是否有告警邮件发出
　　c.手动测试是否能发出邮件 mail -s "test" szk5043@139.com < /etc/inittab
　　d.vim /etc/nagios/conf.d/192.168.1.160.cfg 检查服务端上给客户端的配置文件是否给需要邮件告警的服务添加以下四行：
　　contact_groups common
　　notifications_enabled 1
　　notification_period 24x7
　　notification_options c,r
　　若没添加添加后检测配置文件、重启nagios服务
　　

账号		自动登录	找回密码
密码			立即注册

wirelessnetview好用的无线分析工具

Red Hat RHCE 8 (EX294) Cert Guide

Shell从入门到精通（阿良）

亿图图示专家(EDraw Max) V7.9 中文破解版

zabbix3.4.1安装部署+微信推送信息+大屏显

Red Hat OpenShift I: Containers & Kubern

2025 年，C++ 还能“硬核”多久？

Nagios监控安装配置及邮件告警

浏览过的版块

扫码加入运维网微信交流群