Nagios - Checks and States

Once the host and services are configured on Nagios, checks are used to see if the hosts and services are working as they are supposed to or not. Let us see an example to perform checks on host −

Consider that you have put your host definitions inside host1.cfg file in /usr/local/nagios/etc/objects directory.

cd /usr/local/nagios/etc/objects
gedit host1.cfg

This is how your host definitions look currently −

define host {
   host_name host1

Now let us add check_interval directive. This directive is used to perform scheduled checks of the hosts for the number you set; by default it is in minutes. Using the definition below, checks on the host will be performed after every 3 minutes.

define host {
   host_name host1
   check_interval 3

In Nagios, 2 types of checks are performed on hosts and services −

  • Active Checks
  • Passive Checks

Active Checks

Active checks are initiated by Nagios process and then run on a regular scheduled basis. The check logic inside Nagios process starts the Active check. To monitor hosts and services running on remote machines, Nagios executes plugins and tells what information to collect. Plugin then gets executed on the remote machine where is collects the required information and sends then back to Nagios daemon. Depending on the status received on hosts and services, appropriate action is taken.

The figure shown below shows an active check −

Active check

These are executed on regular intervals, as defined by check_interval and retry_interval.

Passive checks are performed by external processes and the results are given back to Nagios for processing.

Passive checks work as explained here −

An external application checks the status on hosts/services and writes the result to External Command File. When Nagios daemon reads external command file, it reads and sends all the passive checks in the queue to process them later. Periodically when these checks are processed, notifications or alerts are sent depending on the information in check result.

The figure shown below shows a passive check −

Passive check

Thus, the difference between active and passive check is that active checks are run by Nagios and passive checks are run by external applications.

These checks are useful when you cannot monitor hosts/services on a regular basis.

Nagios stores the status of the hosts and services it is monitoring to determine if they are working properly or not. There would be many cases when the failures will happen randomly and they are temporary; hence Nagios uses states to check the current status of a host or service.

There are two types of states −

  • Soft state
  • Hard state

Soft state

When a host or service is down for a very short duration of time and its status is not known or different from previous one, then soft states are used. The host or the services will be tested again and again till the time the status is permanent.

Hard State

When max_check_attempts is executed and status of the host or service is still not OK, then hard state is used. Nagios executes event handlers to handle hard states.

The following figure shows soft states and hard states.

soft hard states