nagios报警限制配置escalations.cfg,Contacts.cfg

yesn · 发表于 2015-9-8 10:12:52

　　
　　http://blog.iyunv.com/chen3888015/article/details/7361870
　　

　　以下是官方给出服务器出现问题的一个例子：
　　Notification escalation definitions can have notification ranges that overlap. Take the following example:
　　define serviceescalation{
host_name webserver
service_description HTTP
first_notification 3
last_notification 5
notification_interval 20
contact_groups nt-admins,managers
}
define serviceescalation{
host_name webserver
service_description HTTP
first_notification 4
last_notification 0
notification_interval 30
contact_groups on-call-support
}
　　In the example above:

The nt-admins and managers contact groups get notified on the third notification

All three contact groups get notified on the fourth and fifth notifications

Only the on-call-support contact group gets notified on the sixth (or higher) notification

　　escalations的官方文档给的说明已经很清楚了，特别是这个例子。
也就是说如果我想让手机收到前三条信息，但是第4以及一直到服务器回复正常状态之前的信息发送到邮件里面可以这么做
define serviceescalation{
host_name webserver
service_description HTTP
first_notification 0
last_notification 3
notification_interval 20
contact_groups sms_contract
}
define serviceescalation{
host_name webserver
service_description HTTP
first_notification 4
last_notification 0
notification_interval 30
contact_groups mail_contract
}这样的话就能实现上述的功能了。
……………………………………………………………………………………………………………………………………
官方给出的一个recovery例子：

　　Recovery Notifications
　　Recovery notifications are slightly different than problem notifications when it comes to escalations. Take the following example:
　　define serviceescalation{
host_name webserver
service_description HTTP
first_notification 0
last_notification 3
notification_interval 20
contact_groups sms_contract
}
define serviceescalation{
host_name webserver
service_description HTTP
first_notification 4
last_notification 0
notification_interval 30
contact_groups mail_contract
}
　　If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery.

　　
　　http://bbs.linuxtone.org/thread-2765-1-1.html
　　

　　

OK 了这个经过我实验应该这么实现（官网给出的例子说的不够准确）
define hostescalation{
host_name                192.168.3.250
first_notification       7
last_notification       0
notification_interval    20
escalation_options       w,c
contact_groups          email_admins
}
　　也就是说需要加上escalation_options 里面不定义r recovery恢复信息，那么恢复的信息就会给默认contract_groups

　　http://vincent-nan.blogbus.com/logs/36900981.html
　　

版权声明：转载时请以超链接形式标明文章原始出处和作者信息及本声明http://vincent-nan.blogbus.com/logs/36900981.html
　　Nagios是非常强大的一款监控工具，尤其是它的告警功能，现在网上实现的形式多种多样如结合移动139邮箱、Fetion、MSN等，但是如果服务器出现故障而未能及时的解决，Nagios就会不断的发送告警信息，实在令人头疼。现在用如下方法可以解决Nagios的告警次数问题。
系统环境：CentOS 5.2
Nagios版本：3.0.6
Nagios安装路径：/usr/local/nagios
配置文件内容定义：#基本的配置就不再进行注释了。
　　hosts.cfg
define host{
　　       host_name                      WWW-Server
      alias                               WWW-Server
      address                         193.1.16.100
      check_command             check-host-alive
      max_check_attempts       5
      check_period                   24x7
      notification_interval          10
      notification_period          24x7
      notification_options          d,u,r
      notifications_enabled       1
      contact_groups                chengnan
      }
　　Services.cfg
define service{
      host_name                      WWW-Server
      service_description          Check_HTTP
      check_command                check_http
      max_check_attempts       10
      normal_check_interval       3
      retry_check_interval          2
      check_period                   24x7
      notification_interval          5
      notification_period             24x7
      notification_options          w,u,c,r
      contact_groups                admin
      }define service{
      host_name                   WWW-Server
      service_description          Check_Jetty
      check_command             check_tcp!8080
      max_check_attempts       10
      normal_check_interval    3
      retry_check_interval       2
      check_period                   24x7
      notification_interval          5
      notification_period          24x7
      notification_options       w,u,c,r
      contact_groups             admin
      }
　　Contacts.cfg
define contact{
　　       contact_name                         chengnan
      alias                                     chengnan
      service_notification_period       24x7
      host_notification_period          24x7
      service_notification_options       w,u,c,r
      host_notification_options          d,u,r
      service_notification_commands notify-service-by-email
      host_notification_commands    notify-host-by-email
      email                                     chengnan@139.com          //手机邮箱

　　       }define contactgroup{
      contactgroup_name    chengnan
      alias                            Nagios Administrators
      members                   chengnan
      }
除此之外再定义一个联系人
define contact{
      contact_name                         chengnan_cor
      alias                                        chengnan_cor
      service_notification_period       24x7
      host_notification_period          24x7
      service_notification_options       w,u,c,r
      host_notification_options          d,u,r
      service_notification_commands notify-service-by-email
      host_notification_commands    notify-host-by-email
      email                                     chengnan@company.com    //公司邮箱
      }
define contactgroup{
      contactgroup_name    sysadmin
      alias                            sysadmin
      members                   chengnan_cor
      }
然后创建一个配置文件：
vi escalations.cfg
escalations有自动调整;不断增加; 逐步上升等意思，本身配置文件的功能是当服务在某一告警次数前没有恢复，告警频率周期将会缩短，同时将告警信息发送至指定联系人。
其内容为：
define hostescalation{
host_name             WWW-Server          //被监控主机名称，与Hosts.cfg中一致
first_notification       4                         // 第n条信息起，改变频率间隔
last_notification       0                         // 第n条信息起，恢复频率间隔
notification_interval 30                         // 通知间隔(分)
contact_groups       sysadmin
}
说明：从第4条告警信息起至服务器恢复前，告警信息发送至sysadmin组下的联系人，告警间隔为30分钟1条信息。
　　
define serviceescalation{
host_name             WWW-Server                         //被监控主机名称，与Hosts.cfg中一致
service_description    Check_HTTP,Check_Jetty          //被监控服务名称，与Services.cfg中一致
first_notification       4
last_notification       0
notification_interval 30
contact_groups       sysadmin
}
保存
修改nagios.cfg
vi nagios.cfg
添加：
cfg_file=/usr/local/nagios/etc/objects/escalations.cfg
检查nagios配置文件是否正确
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
重新启动nagios服务:
service nagios restart
测试：
服务器启动后停掉被监控测试机的相应服务，确认告警信息是否按照设置发送至不同信箱
后记
escalations这个功能官方给的定义是notification的扩充，使notification变得更加灵活，方便。文中我使用的方法算是耍了个小聪明，将第四条告警信息后的所有信息全部发送至我公司邮箱直至服务器恢复(recovery的信息还是会发送至手机的)，从而实现限制告警信息发送至手机的条数。
官方文档地址：http://nagios.sourceforge.net/docs/3_0/escalations.html

　　
　　http://linux.chinaitlab.com/safe/897089.html
　　
　　

　　半夜有一些不重要的报警不会影响系统和业务，但是没人处理会一直报道天亮，幸好nagios 提供了一个报警的扩展

vi /usr/local/nagios/etc/objects/escalations.cfg

define serviceescalation{

host_name                   192.168.1.1  ;被监控主机名称，多个用逗号隔开与Hosts.cfg中一致

service_description       SSH             ;被监控服务名称，多个用逗号隔开与services.cfg中一致

first_notification          4                   ; 第4条信息起，改变频率间隔

last_notification          0                   ; 第n条信息起，恢复频率间隔

notification_interval       30             ; 通知间隔（单位：分）

contact_groups       admins

}

define serviceescalation{

host_name                192.168.1.1

service_description       SSH

first_notification          10

last_notification          0

notification_interval       30

contact_groups          boss

}

最后，编辑nagios.cfg文件

#vi /usr/local/nagios/etc/nagios.cfg

添加：

cfg_file=/usr/local/nagios/etc/objects/escalations.cfg

检查nagios配置文件是否正确

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

没有问题就重启nagios服务吧

service nagios restart

说明：报警从第4次起之后都是每隔30分钟发一次报警 ,发给admins 组，到第10次之后 admins组和boss 组都能收到报警时间一各自配置文件为准，本例为30分钟

　　
　　
　　http://yytian.blog.iyunv.com/535845/564942
　　
　　

　　NAGIOS邮件报警
　　NAGIOS的报警功能相当的强悍，除了丰富的监控功能外，我个人比较特别喜欢他的报警功能，对于报警而言，常用的无非就是邮件和短信报警两种方式，本文中主要是写的NAGIOS的邮件报警的功能，下面的文章里面再介绍NAGIOS的短信报警功能的配置与实现。
　　1 安装sendmail组件
首先要确保sendmail相关组件的完整安装，我们可以使用如下的命令来完成sendmail的安装：
# yum install -y sendmail*
然后重新启动sendmail服务：
# service sendmail restart
然后发送测试邮件，验证sendmail的可用性：
# echo "Hello World" | mail abc@abc.com
　　2 邮件报警的配置
我们只需要编辑/usr/local/nagios/etc/object下的contact.cfg文件，在email后添加管理员的邮箱即可。一般而言，如果监控项目的分工不是太细的话~~就是管理员可以负责所有的监控、并对其进行处理的话，可以直接将一个或者多个管理员的邮件地址写上，使用空格或者逗号隔开。
但是如果监控的内容中服务器有单独的管理员，网络有单独的管理员的话，我们就可以定义多个contact(联系人)，然后再用contactgroup(联系组)对各contact进行分组。
例如管理网络的有两人，管理服务器的有两个人，我们就可以定义两个contactgroup，然后定义四个管理员的contact,如下例是当前我正在使用的contact.cfg，服务器管理员有两名，网络管理员有两名：
2.1 contact.cfg的配置
#######################################################################################
#######################################################################################
############## NETWORK ADMINISTRATOR MEMBERS
#######################################################################################
#######################################################################################
define contact{
      contact_name                      zhang1
      use                                     generic-contact
      alias                                     zhang1
      service_notification_period    24x7
      host_notification_period       24x7
      service_notification_options w,u,c,r,f,s
      host_notification_options       d,u,r,f,s
      service_notification_commands notify-service-by-email
      host_notification_commands       notify-host-by-email
   email                                        zhang1@text.com
      }
　　define contact{
      contact_name                      zhang2
      use                                     generic-contact
      alias                                     zhang2
      service_notification_period    24x7
      host_notification_period       24x7
      service_notification_options w,u,c,r,f,s
      host_notification_options       d,u,r,f,s
      service_notification_commands notify-service-by-email
      host_notification_commands    notify-host-by-email
      email                                     zhang2@test.com
      }
　　#######################################################################################
#######################################################################################
############## SYSTEM ADMINISTRATOR MEMBERS
#######################################################################################
#######################################################################################
　　define contact{
      contact_name                      li1
      use                                     generic-contact
      alias                                     li1
      service_notification_period    24x7
      host_notification_period       24x7
      service_notification_options w,u,c,r,f,s
      host_notification_options       d,u,r,f,s
      service_notification_commands notify-service-by-email
      host_notification_commands    notify-host-by-email
      email                                     li1@test.com
      }
　　define contact{
      contact_name                         li1
      use                                        generic-contact
      alias                                        li1
      service_notification_period    24x7
      host_notification_period       24x7
      service_notification_options    w,u,c,r,f,s
      host_notification_options       d,u,r,f,s
      service_notification_commands notify-service-by-email
      host_notification_commands    notify-host-by-email
      email                                        li2@test.com
      }
　　
#######################################################################################
#######################################################################################
############## NETWORK ADMINISTRATOR GROUP
#######################################################################################
#######################################################################################
define contactgroup{
      contactgroup_name          network
      alias                               network
      members                         zhang1,zhang2
      }
　　
　　
　　#######################################################################################
#######################################################################################
############## SYSTEM ADMINISTRATOR GROUP
#######################################################################################
#######################################################################################
define contactgroup{
      contactgroup_name             system
      alias                                  system
      members                            li1,li2
      }
2.2 主机及监控内容的配置
相应的联系人和联系给已经创建好了，接下来的就是在被监控的服务中添加故障的联系人了，以下面定义的监控主机和服务为例
define host{
      use                                  linux-server
      host_name                         SKLuDB1
      alias                                  skludb1
      address                            192.168.19.142
      }
define service{
      use                                  generic-service       ; Name of service template to use
      host_name                      SKLuDB1
      service_description          PING
      check_command             check_ping!100.0,20%!500.0,60%
      contact_groups                network
      }
define service{
      use                                  generic-service
      host_name                         SKLuDB1
      service_description          Uptime
      check_command             check_nt!UPTIME
      contact_groups                system
      }

　　如上面配置所示，当监控主机的ping出现问题的时候，nagios就会查看contact.cfg中定义的联系人组network中的联系人的信息，然后读取各联系人的邮件地址，这样的话，网络中出现故障时就可以直接给zhang1和zhang2二人发邮件了；同理，当服务器出现问题的时候就会给system组的相关人员发送邮件了
　　

　　本文出自 “艳阳天的小窝” 博客，请务必保留此出处http://yytian.blog.iyunv.com/535845/564942

　　
　　http://weiruoyu.blog.iyunv.com/951650/911711
　　
　　

　　单位搭建nagios监控，发现有时候磁盘满了，长时间报警，邮箱很快就塞满了，网上查看资料使用escalations.cfg来限制限制nagios邮件和短信发送次数。
　　1.添加escalations.cfg

#vi /usr/local/nagios/etc/nagios.cfg

　　添加一行

cfg_file=/usr/local/nagios/etc/objects/escalations.cfg

　　
　　2.编辑escalations.cfg

[iyunv@localhost objects]# vi /usr/local/nagios/etc/objects/escalations.cfg

　　添加如下内容

define serviceescalation{

      host_name                      syq_211.103.155.246

      service_description          check-disk

      first_notification             5

      last_notification             0

      notification_interval          180

      contact_groups                hsgroup

}

define hostescalation{

   host_name                      test_time

   first_notification             3

   last_notification             0

   notification_interval          180

   contact_groups                hsgroup

}

　　host_name 名字跟hosts.cfg的名字一样
　　service_description 是service 服务
　　first_notification 第n条信息起，改变频率间隔
　　last_notification  第n条信息起，恢复频率间隔，0代表永远不恢复
　　notification_interval 通知间隔(分)
　　contact_groups 这个组跟contactgroups.cfg里面要一致
　　########################
　　在这里说一下serviceescalation与hostescalation的区别，我验证试验了好久
　　serviceescalation 控制 notify-service-by-email
　　hostescalation 控制 notify-host-by-email
　　参考：
　　http://vincent-nan.blogbus.com/logs/36900981.html
　　本文出自 “魏若愚--专注Linux” 博客，请务必保留此出处http://weiruoyu.blog.iyunv.com/951650/911711

账号		自动登录	找回密码
密码			立即注册

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

c++ size_t 和 int 的区别

nagios报警限制配置escalations.cfg,Contacts.cfg

浏览过的版块

扫码加入运维网微信交流群