Using Nagios Event Handlers you can perform an action based on the results of a Nagios check. A very straightforward example would be to restart a service. However it is not as simple as you might think.
I use the "check_system_procs" on the localhost of my nagios server itself to check a few services and restart them all should one no longer be running. Since my nagios server is a VPS with limited resources, it sometimes runs out of memory and well... things die.
We need to configure the check and the check's event-handler like so:
define service{
use local-service
host_name localhost
service_description daemons
check_command check_nrpe!check_daemons
event_handler restart-services
}
In your nagios.cfg, makes sure you have "enable_event_handlers=1" to enable the event handlers. There are several other values in the config file you may wish to alter such as the event_handler_timeout.
In your commands.cfg file, make sure you have event_handler defined something like:
The problem we have is that the event_handler runs as the Nagios user, which tyipcally will not be able to restart a service. To test this, just "su - nagios" and try to restart sendmail or apache. We can work around this by using SUDO. Edit the SUDOERS file (visudo) and add something like the lines below to the end of the file.
Essentially we're defining users and commands that can be run via SUDO, without a password, and without a session.
Attached is the script I use (found it on the web) for the scenario described above. Do not forget to make sure the script has the appropriate ownership / permissions. Try executing the script as the nagios user to test it prior to setting up the event_handler in Nagios.