设为首页 收藏本站
查看: 1503|回复: 0

nagios配置详解

[复制链接]
累计签到:1 天
连续签到:1 天
发表于 2015-11-27 11:28:21 | 显示全部楼层 |阅读模式
前言:
    文中未标明时,所有陪着之都是在nagios_server上的配置!


配置流程:

nagios.cfg-->hosts.cfg-->services.cfg-->command.cfg


  • 创建hosts.cfg文件来定义主机和主机组
  • 创建services.cfg文件来定义服务
  • 用默认的contacts.cfg文件来定义联系人和联系人组
  • 用默认的commands.cfg文件来定义命令
  • 用默认的timeperiods.cfg来定义监控时间段
  • 用默认的templates.cfg文件作为资源引用文件


/usr/local/nagios/etc/ 目录结构

[iyunv@chboc etc]# tree /usr/local/nagios/etc
/usr/local/nagios/etc
|-- cgi.cfg
|-- htpasswd.users
|-- nagios.cfg
|-- nagios.cfg.bak
|-- nrpe.cfg
|-- objects
|   |-- commands.cfg
|   |-- contacts.cfg
|   |-- hosts.cfg      定义监控remote_hosts和remote_hosts_group
|   |-- hosts.cfg.bak
|   |-- localhost.cfg
|   |-- printer.cfg
|   |-- services.cfg    定义被动模式的监控服务,监控remote_linux的本地资源
|   |-- switch.cfg
|   |-- templates.cfg
|   |-- timeperiods.cfg
|   `-- windows.cfg
|-- resource.cfg
`-- services        定义主动模式的监控服务,监控remote_linux的对外提供的服务
    `-- web.cfg
注意!!创建hosts.cfg,services.cfg,services文件和目录时,修改他们的属主和属组!!

nagios.cfg

[iyunv@chboc etc]# diff nagios.cfg nagios.cfg.bak
34,35d33
< cfg_file=/usr/local/nagios/etc/objects/services.cfg
< cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
38c36
< #cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
---
> cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
54c52
< cfg_dir=/usr/local/nagios/etc/services
---
> #cfg_dir=/usr/local/nagios/etc/servers

  • 可简单理解为:nagios.cfg只是指定nagios启动时加载哪些目录文件而已。

  • 修改nagios.cfg主配置文件,添加services.cfg,hosts.cfg,使得nagios启动过程中自动加载hosts.cfg和services.cfg中的内容。
  • nagios监控一般都是用来监控remote_linux上面的服务,所以在此将cfg_file=/usr/local/nagios/etc/objects/localhost.cfg注释掉。
  • nagios的监控模式简单分为主动模式被动模式(NRPE)


主动模式:一般用来监控web服务,数据库等这些对外提供服务的监控,如:httpd,mysqld,sshd等
被动模式:一般用来监控本地资源,例如负载,内存,硬盘,虚拟内存,磁盘IO,温度,风扇等的监控      (我们也可以通过snmp实现监控部分系统资源)。
主动模式和被动模式是可以相互互换的,没有绝对性。



hosts.cfg

[iyunv@chboc objects]# egrep -v "^$|^#" hosts.cfg
define host{
        use                     linux-server            
        host_name               lnmp
        alias                   198-lnmp
        address                 192.168.1.198
        }
define host{
        use                     linux-server            
        host_name               lamp
        alias                   199-lamp
        address                 192.168.1.199
        }
define hostgroup{
        hostgroup_name  linux-servers ; The name of the hostgroup
        alias           Linux Servers ; Long name of the group
        members         lnmp,lamp     ; Comma separated list of hosts that belong to this group
        }

  • 定义所要监控的remote_linux,并将其分组。
  • 生成hosts.cfg文件:
    head -51 localhost.cfg >hosts.cfg

    chown nagios.nagios /usr/local/nagios/etc/objects/hosts.cfg



services.cfg -->command.cfg
       |  
       +--->nrpe.cfg(remote_linux)

[iyunv@chboc objects]# egrep -v "^$|^#" services.cfg
define service {
        use                             generic-service
        host_name                       lnmp
        service_description             Disk Partition
        check_command                   check_nrpe!check_disk
}
define service {
        use                             generic-service
        host_name                       lnmp
        service_description             load
        check_command                   check_nrpe!check_load
}
define service {
        use                             generic-service
        host_name                       lnmp
        service_description             mem
        check_command                   check_nrpe!check_mem
}
define service {
        use                             generic-service
        host_name                       lnmp
        service_description             swap
        check_command                   check_nrpe!check_swap
}
define service {
        use                             generic-service
        host_name                       lnmp
        service_description             iostat
        check_command                   check_nrpe!check_iostat
}

  • services.cfg 我采用NRPE的被动模式,通过nagios_server主机上的check_nrpe插件,调用运行在renmote_linux上的NRPE daemon监控renmote_linux的本地资源。
  • NRPE原理
  • 30172019-7263dc2d9c9a40dbaa158ed0969e06d4.jpg
  • NRPE 总共由两部分组成:

    check_nrpe 插件,位于监控主机上
    NRPE daemon,运行在远程的Linux主机上(通常就是被监控机)
  • 按照上图,整个的监控过程如下:

  • 当Nagios 需要监控某个远程Linux 主机的服务或者资源情况时:

    Nagios 会运行check_nrpe 这个插件,告诉它要检查什么;
    check_nrpe 插件会连接到远程的NRPE daemon,所用的方式是SSL;
    NRPE daemon 会运行相应的Nagios 插件来执行检查;
    NRPE daemon 将检查的结果返回给check_nrpe 插件,插件将其递交给nagios做处理。
  • 注意:NRPE daemon 需要Nagios 插件安装在远程的Linux主机上,否则,daemon不能做任何的监控



command.cfg
--------------------------------------------------------------------------
[iyunv@chboc objects]# tail -4 commands.cfg
define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }

nrpe.cfg(remote_linux)
------------------------------------------------------------------------log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1,192.168.1.201

dont_blame_nrpe=0
debug=0
command_timeout=60
connection_timeout=300
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,6 -c 30,25,20
command[check_mem]=/usr/local/nagios/libexec/check_memory.pl -w 6% -c 3%
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 8% -p /
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 6 -c 10
command[check_port_80]=/usr/local/nagios/libexec/check_tcp -H localhost -p80


/usr/local/nagios/etc/server/web.cfg

[iyunv@chboc etc]# egrep -v "^$|^#" services/web.cfg
define service{
        use                          generic-service
        host_name                    lnmp
        service_description          blog_url
        check_command                check_http!-I 192.168.1.198
        max_check_attempts      3
        normal_check_interval   2
        retry_check_interval    1
        check_period            24x7
        notification_interval   30
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          admins
        }
define service{
        use                          generic-service
        host_name                    lnmp
        service_description          blog_port80
        check_command                check_tcp!80
        max_check_attempts      3
        normal_check_interval   2
        retry_check_interval    1
        check_period            24x7
        notification_interval   30
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          admins
        }
define service{
        use                          generic-service
        host_name                    lnmp
        service_description          mysqld_port3306
        check_command                check_tcp!3306
        max_check_attempts      3
        normal_check_interval   2
        retry_check_interval    1
        check_period            24x7
        notification_interval   30
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          admins
        }
define service{
        use                          generic-service
        host_name                    lnmp
        service_description          blog_port_80_beidong
        check_command                check_nrpe!check_port_80
        max_check_attempts      3
        normal_check_interval   2
        retry_check_interval    1
        check_period            24x7
        notification_interval   30
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          admins
        }

  • 主动模式不需要调用check_nrpe插件,直接使用command.cfg里定义的命令即可!



检查配置文件

修改/etc/init.d/nagios启动文件,使其检测时显示详细内容。

    vim /etc/init.d/nagios +178
    checkconfig)
                    printf "Running configuration check..."
                   $NagiosBin -v $NagiosCfgFile > /dev/null 2>&1
                    if [ $? -eq 0 ]; then
                            echo " OK."
                    else
                            echo " CONFIG ERROR!  Check your Nagios     configuration."
                            exit 1
                    fi

例如:
我们将command.cfg定义的check_nrpe命令注释掉:
[iyunv@chboc objects]# tail -4 commands.cfg
#define command{
#        command_name    check_nrpe
#        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
#        }
进行检测:
[iyunv@chboc objects]# /etc/init.d/nagios checkconfig
Running configuration check...
Nagios Core 3.5.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-30-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/services.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/hosts.cfg'...
Processing object config directory '/usr/local/nagios/etc/services'...
Processing object config file '/usr/local/nagios/etc/services/web.cfg'...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking services...
Error: Service check command 'check_nrpe' specified in service 'Disk Partition' for host 'lnmp' not defined anywhere!
Error: Service check command 'check_nrpe' specified in service 'blog_port_80_beidong' for host 'lnmp' not defined anywhere!
Error: Service check command 'check_nrpe' specified in service 'iostat' for host 'lnmp' not defined anywhere!
Error: Service check command 'check_nrpe' specified in service 'load' for host 'lnmp' not defined anywhere!
Error: Service check command 'check_nrpe' specified in service 'mem' for host 'lnmp' not defined anywhere!
Error: Service check command 'check_nrpe' specified in service 'swap' for host 'lnmp' not defined anywhere!
        Checked 9 services.
Checking hosts...
Warning: Host 'lamp' has no services associated with it!
        Checked 2 hosts.
Checking host groups...
        Checked 1 host groups.
Checking service groups...
        Checked 0 service groups.
Checking contacts...
        Checked 1 contacts.
Checking contact groups...
        Checked 1 contact groups.
Checking service escalations...
        Checked 0 service escalations.
Checking service dependencies...
        Checked 0 service dependencies.
Checking host escalations...
        Checked 0 host escalations.
Checking host dependencies...
        Checked 0 host dependencies.
Checking commands...
        Checked 24 commands.
Checking time periods...
        Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 1
Total Errors:   6

***> One or more problems was encountered while running the pre-flight check...

     Check your configuration file(s) to ensure that they contain valid
     directives and data defintions.  If you are upgrading from a previous
     version of Nagios, you should be aware that some variables/definitions
     may have been removed or modified in this version.  Make sure to read
     the HTML documentation regarding the config files, as well as the
     'Whats New' section to find out what has changed.

CONFIG ERROR!  Check your Nagios configuration.
在command.cfg添加check_nrpe命令的定义并检测:
[iyunv@chboc objects]# /etc/init.d/nagios checkconfig
Running configuration check...
Nagios Core 3.5.1
Copyright (c) 2009-2011 Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-30-2013
License: GPL

Website: http://www.nagios.org
Reading configuration data...
   Read main config file okay...
Processing object config file '/usr/local/nagios/etc/objects/commands.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/contacts.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/timeperiods.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/templates.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/services.cfg'...
Processing object config file '/usr/local/nagios/etc/objects/hosts.cfg'...
Processing object config directory '/usr/local/nagios/etc/services'...
Processing object config file '/usr/local/nagios/etc/services/web.cfg'...
   Read object config files okay...

Running pre-flight check on configuration data...

Checking services...
        Checked 10 services.
Checking hosts...
        Checked 2 hosts.
Checking host groups...
        Checked 1 host groups.
Checking service groups...
        Checked 0 service groups.
Checking contacts...
        Checked 1 contacts.
Checking contact groups...
        Checked 1 contact groups.
Checking service escalations...
        Checked 0 service escalations.
Checking service dependencies...
        Checked 0 service dependencies.
Checking host escalations...
        Checked 0 host escalations.
Checking host dependencies...
        Checked 0 host dependencies.
Checking commands...
        Checked 25 commands.
Checking time periods...
        Checked 5 time periods.
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...

Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check
OK.
  • 我们可以注意到检测的都是nagios.cfg中指定的文件。




/usr/local/nagios/libexex/

    nagios_server对remote_linux的监控主要是通过/usr/local/nagios/libexex/下的脚本进行的。因此测试的时候可以先以此作为测试,这个是第一步,这个不ok,那肯定不会有监控结果的。
  • 被动模式:
    [iyunv@chboc libexec]# ./check_nrpe -H 192.168.1.198 -c check_port_80

    TCP OK - 0.000 second response time on port 80|time=0.000166s;;;0.000000;10.000000
    [iyunv@chboc libexec]# ./check_nrpe -H 192.168.1.198 -c check_load   
    OK - load average: 0.05, 0.01, 0.00|load1=0.050;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;6.000;20.000;0;
    [iyunv@chboc libexec]# ./check_nrpe -H 192.168.1.198 -c check_iostat
    IOSTAT OK - user 0.04 nice 0.00 sys 0.19 iowait 0.18 idle 0.00  | iowait=0.18%;; idle=0.00%;; user=0.04%;; nice=0.00%;; sys=0.19%;;
    [iyunv@chboc libexec]# ./check_nrpe -H 192.168.1.198 -c check_disk  
    DISK OK - free space: / 4702 MB (57% inode=81%);| /=3437MB;6860;7889;0;8575
    [iyunv@chboc libexec]# ./check_nrpe -H 192.168.1.198 -c check_mem
    CHECK_MEMORY OK - 847M free | free=888504320b;62198906.88:;31099453.44:
    [iyunv@chboc libexec]# ./check_nrpe -H 192.168.1.198 -c check_swap
    SWAP OK - 100% free (1023 MB out of 1023 MB) |swap=1023MB;204;102;0;1023
  • 主动模式
    [iyunv@chboc libexec]# ./check_tcp -H 192.168.1.198 -p3306               
    TCP OK - 0.000 second response time on port 3306|time=0.000478s;;;0.000000;10.000000

注解:

    可以通过 --help查看脚本使用方法。

[iyunv@chboc libexec]# ./check_http --help
......
Usage:
check_http -H <vhost> | -I <IP-address> [-u <uri>] [-p <port>]
       [-w <warn time>] [-c <critical time>] [-t <timeout>] [-L] [-a auth]
       [-b proxy_auth] [-f <ok|warning|critcal|follow|sticky|stickyport>]
       [-e <expect>] [-s string] [-l] [-r <regex> | -R <case-insensitive regex>]
       [-P string] [-m <min_pg_size>:<max_pg_size>] [-4|-6] [-N] [-M <age>]
       [-A string] [-k string] [-S <version>] [--sni] [-C <warn_age>[,<crit_age>]]
       [-T <content-type>] [-j method]
NOTE: One or both of -H and -I must be specified

Options:
-h, --help
    Print detailed help screen
-V, --version
    Print version information
-H, --hostname=ADDRESS
    Host name argument for servers using host headers (virtual host)
    Append a port to include it in the header (eg: example.com:5000)
-I, --IP-address=ADDRESS
    IP address or name (use numeric address if possible to bypass DNS lookup).
-p, --port=INTEGER
    Port number (default: 80)
-4, --use-ipv4
    Use IPv4 connection
-6, --use-ipv6
    Use IPv6 connection
-S, --ssl=VERSION
    Connect via SSL. Port defaults to 443. VERSION is optional, and prevents
    auto-negotiation (1 = TLSv1, 2 = SSLv2, 3 = SSLv3).
--sni
    Enable SSL/TLS hostname extension support (SNI)
-C, --certificate=INTEGER
    Minimum number of days a certificate has to be valid. Port defaults to 443
    (when this option is used the URL is not checked.)

-e, --expect=STRING
    Comma-delimited list of strings, at least one of them is expected in
    the first (status) line of the server response (default: HTTP/1.)
    If specified skips all other status line logic (ex: 3xx, 4xx, 5xx processing)
-s, --string=STRING
    String to expect in the content
-u, --url=PATH
    URL to GET or POST (default: /)
-P, --post=STRING
    URL encoded http POST data
-j, --method=STRING  (for example: HEAD, OPTIONS, TRACE, PUT, DELETE)
    Set HTTP method.
-N, --no-body
    Don't wait for document body: stop reading after headers.
    (Note that this still does an HTTP GET or POST, not a HEAD.)
-M, --max-age=SECONDS
    Warn if document is more than SECONDS old. the number can also be of
    the form "10m" for minutes, "10h" for hours, or "10d" for days.
-T, --content-type=STRING
    specify Content-Type header media type when POSTing

-l, --linespan
    Allow regex to span newlines (must precede -r or -R)
-r, --regex, --ereg=STRING
    Search page for regex STRING
-R, --eregi=STRING
    Search page for case-insensitive regex STRING
--invert-regex
    Return CRITICAL if found, OK if not

-a, --authorization=AUTH_PAIR
    Username:password on sites with basic authentication
-b, --proxy-authorization=AUTH_PAIR
    Username:password on proxy-servers with basic authentication
-A, --useragent=STRING
    String to be sent in http header as "User Agent"
-k, --header=STRING
    Any other tags to be sent in http header. Use multiple times for additional headers
-L, --link
    Wrap output in HTML link (obsoleted by urlize)
-f, --onredirect=<ok|warning|critical|follow|sticky|stickyport>
    How to handle redirected pages. sticky is like follow but stick to the
    specified IP address. stickyport also ensures port stays the same.
-m, --pagesize=INTEGER<:INTEGER>
    Minimum page size required (bytes) : Maximum page size required (bytes)
-w, --warning=DOUBLE
    Response time to result in warning status (seconds)
-c, --critical=DOUBLE
    Response time to result in critical status (seconds)
-t, --timeout=INTEGER
    Seconds before connection times out (default: 10)
-v, --verbose
    Show details for command-line debugging (Nagios may truncate output)
......
Examples:
CHECK CONTENT: check_http -w 5 -c 10 --ssl -H www.verisign.com




运维网声明 1、欢迎大家加入本站运维交流群:群②:261659950 群⑤:202807635 群⑦870801961 群⑧679858003
2、本站所有主题由该帖子作者发表,该帖子作者与运维网享有帖子相关版权
3、所有作品的著作权均归原作者享有,请您和我们一样尊重他人的著作权等合法权益。如果您对作品感到满意,请购买正版
4、禁止制作、复制、发布和传播具有反动、淫秽、色情、暴力、凶杀等内容的信息,一经发现立即删除。若您因此触犯法律,一切后果自负,我们对此不承担任何责任
5、所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其内容的准确性、可靠性、正当性、安全性、合法性等负责,亦不承担任何法律责任
6、所有作品仅供您个人学习、研究或欣赏,不得用于商业或者其他用途,否则,一切后果均由您自己承担,我们对此不承担任何法律责任
7、如涉及侵犯版权等问题,请您及时通知我们,我们将立即采取措施予以解决
8、联系人Email:admin@iyunv.com 网址:www.yunweiku.com

所有资源均系网友上传或者通过网络收集,我们仅提供一个展示、介绍、观摩学习的平台,我们不对其承担任何法律责任,如涉及侵犯版权等问题,请您及时通知我们,我们将立即处理,联系人Email:kefu@iyunv.com,QQ:1061981298 本贴地址:https://www.yunweiku.com/thread-144071-1-1.html 上篇帖子: CentOS Linux 监控安装之Nagios 下篇帖子: nagios_create_services.sh
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

扫码加入运维网微信交流群X

扫码加入运维网微信交流群

扫描二维码加入运维网微信交流群,最新一手资源尽在官方微信交流群!快快加入我们吧...

扫描微信二维码查看详情

客服E-mail:kefu@iyunv.com 客服QQ:1061981298


QQ群⑦:运维网交流群⑦ QQ群⑧:运维网交流群⑧ k8s群:运维网kubernetes交流群


提醒:禁止发布任何违反国家法律、法规的言论与图片等内容;本站内容均来自个人观点与网络等信息,非本站认同之观点.


本站大部分资源是网友从网上搜集分享而来,其版权均归原作者及其网站所有,我们尊重他人的合法权益,如有内容侵犯您的合法权益,请及时与我们联系进行核实删除!



合作伙伴: 青云cloud

快速回复 返回顶部 返回列表