1--关闭服务器和selinux [iyunv@localhost-150~]# chkconfig ip6tables off [iyunv@localhost-150~]# chkconfig iptables off [iyunv@localhost-150~]# chkconfig --list|grep iptables iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off [iyunv@localhost-150~]# chkconfig --list|grep ip6tables ip6tables 0:off 1:off 2:off 3:off 4:off 5:off 6:off [iyunv@localhost-150~]# vi /etc/selinux/config 更改selinux SELINUX=disabled 重启服务器 2--Nagios安装 - 服务端(192.168.1.151) 安装一个epel的扩展源 [iyunv@64 ~]# yum install epel-release (6.7前版本不支持) 安装扩展YUM源 [iyunv@64 ~]# yum install -y httpd nagiosnagios-plugins nagios-plugins-all nrpe nagios-plugins-nrpe 安装主要组件及工具 [iyunv@64 ~]# htpasswd -c /etc/nagios/passwdnagiosadmin 创建用户和密码 [iyunv@64 ~]# vim /etc/nagios/nagios.cfg [iyunv@64 ~]# nagios -v /etc/nagios/nagios.cfg 检测配置文件 Total Warnings: 0 Total Errors: 0 启动服务 [iyunv@64 ~]# service httpd start [iyunv@64 ~]# service nagios start 2. Nagios安装 - 客户端(192.168.1.88) 关闭服务器和selinux [iyunv@localhost~]# chkconfig ip6tables off [iyunv@localhost~]# chkconfig iptables off [iyunv@localhost~]# chkconfig --list|grep iptables iptables 0:off 1:off 2:off 3:off 4:off 5:off 6:off [iyunv@localhost~]# chkconfig --list|grep ip6tables ip6tables 0:off 1:off 2:off 3:off 4:off 5:off 6:off [iyunv@localhost~]# vi /etc/selinux/config SELINUX=disabled 重启服务器
[iyunv@64 ~]# yum install epel-release (6.7前版本不支持) 安装扩展YUM源 [iyunv@localhost ~]#yum install -y nagios-pluginsnagios-plugins-all nrpe nagios-plugins-nrpe [iyunv@64 ~]#vim /etc/nagios/nrpe.cfg 找到“allowed_hosts=127.0.0.1” 改为 “allowed_hosts=127.0.0.1,192.168.1.108” allowed_hosts=127.0.0.1,192.168.1.108 找到”dont_blame_nrpe=0”改为 “dont_blame_nrpe=1” dont_blame_nrpe=1 [iyunv@localhost ~]# /etc/init.d/nrpe start
3. 监控中心(192.168.1.151)添加被监控主机(192.168.1.88) [iyunv@nagios~]# cd /etc/nagios/conf.d/ define host{ use linux-server host_name 192.168.1.88 alias 1.88 address 192.168.1.88 } define service{ use generic-service host_name 192.168.1.88 service_description check_ping check_command check_ping!100.0,20%!200.0,50% max_check_attempts 5 normal_check_interval 1 } define service{ use generic-service host_name 192.168.1.88 service_description check_ssh check_command check_ssh max_check_attempts 5 normal_check_interval 1 notification_interval 60 } define service{ use generic-service host_name 192.168.1.88 service_description check_http check_command check_http max_check_attempts 5 normal_check_interval 1 } 备注: max_check_attempts 5 ;当nagios检测到问题时,一共尝试检测5次都有问题才会告警,如果该数值为1,那么检测到问题立即告警 normal_check_interval 1 ;重新检测的时间间隔,单位是分钟,默认是3分钟 notification_interval 60 ;在服务出现异常后,故障一直没有解决,nagios再次对使用者发出通知的时间。单位是分钟。如果你认为,所有的事件只需要一次通知就够了,可以把这里的选项设为0。 4--监控客户端硬盘,内存情况 #由于需要nagios调用的监控命令都需要在command.cfg模块中定义而前面的check_nrpe在默认的command.cfg中时没有的 这里需要在command.cfg中将其加入进去 [iyunv@nagios conf.d]# vim/etc/nagios/objects/commands.cfg 定义check_nrpe编辑文件添加下面内容: define command{ command_name check_nrpe command_line $USER1$/check_nrpe-H $HOSTADDRESS$ -c $ARG1$ } 在监控的客户端配置中添加以下内容: define service{ use generic-service host_name 192.168.1.88 service_description check_load check_command check_nrpe!check_load max_check_attempts 5 normal_check_interval 1 } define service{ use generic-service host_name 192.168.1.88 service_description check_disk_hda1 check_command check_nrpe!check_hda1 max_check_attempts 5 normal_check_interval 1 } 在客户端查看check_load,check_hda1服务 [iyunv@localhost ~]#df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 18G 1.5G 16G 10% / tmpfs 932M 0 932M 0% /dev/shm /dev/sda1 190M 32M 149M 18% /boot /dev/sdb1 50G 16G 32G 33% /data [iyunv@localhost ~]#vi /etc/nagios/nrpe.cfg [iyunv@localhost ~]# /usr/lib64/nagios/plugins/check_disk -w 20%-c 10% -p /dev/sdb1 [iyunv@localhost~]# vi /etc/nagios/nrpe.cfg command[check_load]=/usr/lib64/nagios/plugins/check_load-w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/lib64/nagios/plugins/check_disk-w 20% -c 10% -p /dev/sdb1 5--服务器端配置邮件告警 [iyunv@64 ~]# vim /etc/nagios/objects/contacts.cfg define contact{ contact_name 123 use generic-contact alias xin } define contact{ contact_name 456 use generic-contact alias aaa } define contactgroup{ contactgroup_name common alias common members 123,456 } 备注: define contactgroup定义组 define contact 定义用户 然后在要需要告警的服务里面加上contactgroup define service{ use generic-service host_name 192.168.1.88 service_description check_ping check_command check_ping!100.0,20%!200.0,50% max_check_attempts 5 normal_check_interval 1 contact_groups common }
在需要告警的服务添加定义组
|