Important: Nagios can be configured to use different object configuration file formats by using arguments to the configure script. This documentation describes how to configure object definitions if you've compiled Nagios with support for default object data routines (i.e. using the --with-default-objects argument to the configure script). Please note that this configuration format is provided primarily for backward compatability. It offers neither the flexibility nor clarity that the template-based object definitions do. I would highly suggest that you consider moving to the template-based config file, as it will be the standard in the future.
Notes
When creating and/or editing configuration files, keep the following in mind:
Index
Host definitions
Host group definitions
Contact definitions
Contact group definitions
Command definitions
Service definitions
Time period definitions
Service escalation definitions
Hostgroup escalation definitions
Service dependency definitions
Host Definition |
Format: | host[<host_name>]=<host_alias>;<address>;<parent_hosts>;<host_check_command>;<max_attempts>;<notification_interval>;<notification_period>;<notify_recovery>;<notify_down>;<notify_unreachable>;<event_handler> |
Example: | host[novell1]=Novell Server #1;192.168.0.1;;check-host-alive;3;120;24x7;1;1;1; |
A host definition is used to define a physical server, workstation, device, etc. that resides on your network. The different arguments to a host definition are described below.
<host_name> | This is a short name used to identify the host. It is used in host group and service definitions to reference this particular host. Hosts can have multiple services (which are monitored) associated with them. When used properly, the $HOSTNAME$ macro will contain this short name. |
<host_alias> | This is a longer name or description used to identify the host. It is provided in order to allow you to more easily identify a particular host. When used properly, the $HOSTALIAS$ macro will contain this alias/description. |
<address> | This is the IP address of the host. You can use a FQDN to identify the host, but if DNS services are not availble this could cause problems. When used properly, the $HOSTADDRESS$ macro will contain this address. |
<parent_hosts> | This is a comma-delimited list of short names of the "parent" hosts for this particular host. Parent hosts are typically routers, switches, firewalls, etc. that lie between the monitoring host and a remote hosts. A router, switch, etc. which is closest to the remote host is considered to be that host's "parent". Read the "Determining Status and Reachability of Network Hosts" document located here for more information. If this host is on the same network segment as the host doing the monitoring (without any intermediate routers, etc.) the host is considered to be on the local network and will not have a parent host. Leave this value blank if the host does not have a parent host (i.e. it is on the same segment as the Nagios host). The order in which you specify parent hosts has no effect on how things are monitored. |
<host_check_command> | This is the short name of the command that should be used to check if the host is up or down. Typically, this command would try and ping the host to see if it is "alive". The command must return a status of OK (0) or Nagios will assume the host is down. If you leave this argument blank, the host will not be checked - Nagios will always assume the host is up. This is useful if you are monitoring printers or other devices that are frequently turned off. The maximum amount of time that the notification command can run is controlled by the host_check_timeout option. |
<max_attempts> | This is the number of times that Nagios will retry the host check command if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert without retrying the host check again. Note: If you do not want to check the status of the host, you must still set this to a minimum value of 1. To bypass the host check, just leave the <host_check_command> option blank. |
<notification_interval> | This is the number of "time units" to wait before re-notifying a contact that this server is still down or unreachable. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this host - only one problem notification will be sent out. |
<notification_period> | This is the short name of the time period during which notifications of events for this host can be sent out to contacts. If a host goes down, becomes unreachable, or recoveries during a time which is not covered by the time period, no notifications will be sent out. Read the "Time Periods" document located here for more information. |
<notify_recovery> | This value determines whether or not notifications should be sent to any contacts if the host is in a RECOVERY state. Set this value to 1 if notifications should be sent out about recovery states, 0 if they shouldn't. Note: If a contact is configured to not receive notifications of host recoveries, they will not be notified, regardless of this setting. |
<notify_down> | This value determines whether or not notifications should be sent to any contacts if the host is in a DOWN state. Set this value to 1 if notifications should be sent out when the host goes down, 0 if they shouldn't. Note: If a contact is configured to not receive notifications about hosts that go down, they will not be notified, regardless of this setting. |
<notify_unreachable> | This value determines whether or not notifications should be sent to any contacts if the host is in an UNREACHABLE state. Set this value to 1 if notifications should be sent out when the host becomes unreachable, 0 if they shouldn't. Note: If a contact is configured to not receive notifications about unreachable hosts, they will not be notified, regardless of this setting. |
<event_handler> | This is the short name of the command that should be run whenever a change in the state of the host is detected (i.e. whenever it goes down or recovers). Read the documentation on event handlers for a more detailed explanation of how to write scripts for handling events. If you do not wish to define an event handler for the host, leave this option blank (as shown in the example above). The maximum amount of time that the event handler command can run is controlled by the event_handler_timeout option. |
Host Group Definition |
Format: | hostgroup[<group_name>]=<group_alias>;<contact_groups>;<hosts> |
Example: | hostgroup[nt-servers]=All NT Servers;nt-admins;nt1,nt2,nt3 |
A host group definition is used to group one or more hosts together for the purposes of simplifying notifications. Each host that you define must be a member of at least one host group - even if it is the only host in that group. Hosts can be in more than one host group. When a host goes down, becomes unreachable, or recovers, Nagios will find which host group(s) the host is a member of, get the contact group for each of those hostgroups, and notify all contacts associated with those contact groups. This may sound complex, but for most people it doesn't have to be. It does, however, allow for flexibility in determining who gets paged for what kind of problems. The different arguments to a host group definition are outlined below.
<group_name> | This is a short name used to identify the host group. |
<group_alias> | This is a longer name or description used to identify the host group. It is provided in order to allow you to more easily identify a particular host group. |
<contact_groups> | This is a list of the short names of the contact groups that should be notified whenever there are problems (or recoveries) with any of the hosts in this host group. Multiple contact groups should be separated by commas. |
<hosts> | This is a list of the short names of hosts that should be included in this group. Multiple host names should be separated by commas. |
Contact Definition |
Format: | contact[<contact_name>]=<contact_alias>;<svc_notification_period>;<host_notification_period>;<svc_notify_recovery>;<svc_notify_critical>;<svc_notify_warning>;lt;host_notify_recovery>;<host_notify_down>;<host_notify_unreachable>;<service_notify_commands>;<host_notify_commands>;<email_address>;<pager> |
Example: | contact[nagiosadmin]=Nagios Administrator;24x7;24x7;1;1;1;1;1;1;notify-by-email,notify-by-epager;host-notify-by-epager;nagiosadmin@localhost.localdomain;pageadmin@localhost.localdomain |
A contact definition is used to identify someone who should be contacted in the event of a problem on your network. The different arguments to a contact definition are described below.
<contact_name> | This is the short name used to identify the contact. It is referenced in contact group definitions. Under the right circumstances, the $CONTACTNAME$ macro will contain this value. |
<contact_alias> | This is a longer name or description for the contact. Under the rights circumstances, the $CONTACTALIAS$ macro will contain this value. |
<svc_notification_period> | This is the short name of the time period during which the contact can be notified about service problems or recoveries. You can think of this as an "on call" time for service notifications for the contact. Read the "Time Periods" document located here for more information on how this works and potential problems that may result from improper use. |
<host_notification_period> | This is the short name of the time period during which the contact can be notified about host problems or recoveries. You can think of this as an "on call" time for host notifications for the contact. Read the "Time Periods" document located here for more information on how this works and potential problems that may result from improper use. |
<svc_notify_recovery> | This value determines whether or not the contact will be notified of service recoveries. Set this value to 1 if the contact should be notified, 0 if they shouldn't. Note: If a service is configured to not send out notifications upon recovery, contacts will not be notified about recoveries for that service, regardless of this setting. |
<svc_notify_critical> | This value determines whether or not the contact will be notified if a service is in a critical state. Set this value to 1 if the contact should be notified of critical states, 0 if they shouldn't. Note: If a service is configured to not send out notifications for critical states, contacts will not be notified about critical states for that service, regardless of this setting. |
<svc_notify_warning> | This value determines whether or not the contact will be notified if a service is in either a warning or an unknown state. Set this value to 1 if the contact should be notified of warning/unknown states, 0 if they shouldn't. Note: If a service is configured to not send out notifications for warning/unknown states, contacts will not be notified about warning/unknown states for that service, regardless of this setting. |
<host_notify_recovery> | This value determines whether or not the contact will be notified if any host recovers. Set this value to 1 if the contact should be notified of hosts that recover, 0 if they shouldn't. Note: If a host is configured to not send out notifications for recoveries, contacts will not be notified when the host recovers, regardless of this setting. |
<host_notify_down> | This value determines whether or not the contact will be notified if any host goes down. Set this value to 1 if the contact should be notified of hosts that go down, 0 if they shouldn't. Note: If a host is configured to not send out notifications for down states, contacts will not be notified when the host goes down, regardless of this setting. |
<host_notify_unreachable> | This value determines whether or not the contact will be notified if any host becomes unreachable. Set this value to 1 if the contact should be notified of hosts that become unreachable, 0 if they shouldn't. Note: If a host is configured to not send out notifications for unreachable states, contacts will not be notified when the host becomes unreachable, regardless of this setting. |
<service_notify_commands> | This is a list of the short names of the commands used to notify the contact of a service problem or recovery. Multiple notification commands should be separated by commas. All notification commands are executed when the contact needs to be notified. The maximum amount of time that a notification command can run is controlled by the notification_timeout option. |
<host_notify_commands> | This is a list of the short names of the commands used to notify the contact of a host problem or recovery. Multiple notification commands should be separated by commas. All notification commands are executed when the contact needs to be notified. The maximum amount of time that a notification command can run is controlled by the notification_timeout option. |
<email_address> | This is the email address for the contact. Depending on how you configure your notification commands, it can be used to send out an alert email to the contact. Under the right circumstances, the $CONTACTEMAIL$ macro will contain this value. fs |
<pager> | This is the pager number for the contact. It can also be an email address to a pager gateway (i.e. pagejoe@pagenet.com). Depending on how you configure your notification commands, it can be used to send out an alert page to the contact. Under the right circumstances, the $CONTACTPAGER$ macro will contact this value. |
Contact Group Definition |
Format: | contactgroup[<group_name>]=<group_alias>;<contacts> |
Example: | contactgroup[nt-admins]=NT Administrators;bbarker,jdoe |
A contact group definition is used to group one or more contacts together for the purpose of sending out alert/recovery notifications. When a host or service has a problem or recovers, Nagios will find the appropriate contact groups to send notifications to, and notify all contacts in those contact groups. This may sound complex, but for most people it doesn't have to be. It does, however, allow for flexibility in determining who gets notified for particular events. The different arguments to a contact group definition are outlined below.
<group_name> | This is a short name used to identify the contact group. |
<group_alias> | This is a longer name or description used to identify the contact group. |
<contacts> | This is a list of the short names of contacts that should be included in this group. Multiple contact names should be separated by commas. |
Command Definition |
Format: | command[<command_name>]=<command_line> |
Example 1: | command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H $HOSTADDRESS$ -w 1000.0,80% -c 2000.0,100% |
Example 2: | command[check_pop]=/usr/local/nagios/libexec/check_pop -H $HOSTADDRESS$ |
Example 3: | command[check_local_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p $ARG1$ |
A command definition is just that. It defines a command. Commands that can be defined include service checks, service notifications, service event handlers, host checks, host notifications, and host event handlers. Command definitions can contain macros, but you must make sure that you include only those macros that are "valid" for the circumstances when the command will be used. More information on what macros are available and when they are "valid" can be found here. The different arguments to a command definition are outlined below.
<command_name> | This is a short name used to identify the command. It is referenced in contact, host, and service definitions. |
<command_line> | This is what is actually executed by Nagios when the command is used for service or host checks, notifications, or event handlers. Before the command line is executed, all valid macros are replaced with their respective values. See the documentation on macros for determining when you can use different macros. Note that the command line is not surrounded in quotes. |
Service Definition |
Format: | service[<host>]=<description>;<volatile>;<check_period>;<max_attempts>;<check_interval>;<retry_interval>;<contactgroups>;<notification_interval>;<notification_period>;<notify_recovery>;<notify_critical>;<notify_warning>;<event_handler>;<check_command> |
Example 1: | service[nt1]=FTP;0;24x7;3;5;1;nt-admins;120;24x7;1;1;1;;check_ftp |
Example 2: | service[nt1]=HTTP;0;24x7;3;5;1;nt-admins;240;24x7;1;1;1;;check_http2!192.168.0.2!/!88 |
Example 3: | service[linux1]=Zombie Processes;0;24x7;3;5;1;linux-admins;240;24x7;1;1;1;;check_procs!5!10!Z |
A service definition is used to identify a "service" that runs on a host. The term "service" is used very loosely. It can mean an actual service that runs on the host (POP, SMTP, HTTP, etc.) or some other type of metric associated with the host (response to a ping, number of logged in users, free disk space, etc.). The different arguments to a service definition are outlined below.
<host> | This is the short name of the host that the service "runs" on or is associated with. | ||||||
<description> | A description of the service, which may contain spaces, dashes, and colons (semicolons, apostrophes, and quotation marks should be avoided). No two services associated with the same host can have the same description. | ||||||
<volatile> | This field is used to denote whether the service is "volatile". Services are normally not volatile. More information on volatile service and how they differ from normal services can be found here. Set this field to 1 to mark the service as being volatile, 0 to mark it as a normal service. | ||||||
<check_period> | This is the short name of the time period that identifies when this service can be checked. Services checks are scheduled in such a way that they are only checked (or rechecked) during times that are valid within the specified service check time period. See the "Time Periods" documentation located here for more information on how time periods works and potentials problems with using them improperly. | ||||||
<max_attempts> | This is the number of times that Nagios will retry the service check if it returns any state other than an OK state. Setting this value to 1 will cause Nagios to generate an alert (if the service check detected a problem) without retrying the service check again. More information on this value can be found in the check scheduling documentation. | ||||||
<check_interval> | This is the number of "time units" to wait before scheduling the next "regular" check of the service. "Regular" checks are those that occur when the service is in an OK state or when the service is in a non-OK state, but has already been rechecked max_attempts number of times. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation. | ||||||
<retry_interval> | This is the number of "time units" to wait before scheduling a re-check of the service. Services are rescheduled at the retry interval when the have changed to a non-OK state. Once the service has been retried max_attempts times without a change in its status, it will revert to being scheduled at its "normal" rate as defined by the check_interval value. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes. More information on this value can be found in the check scheduling documentation. | ||||||
<contactgroups> | This is a comma-delimited list of the short names of contact groups that should be notified about problems or recoveries for this service. If a problem or recovery occurs for this service, Nagios will attempt to notify all the contacts in each contact group (depending on the notification options that are set below). | ||||||
<notification_interval> | This is the number of "time units" to wait before re-notifying a contact that this service is still at a non-OK state. Unless you've changed the interval_length value in the main configuration file from the default value of 60, this number will mean minutes. If you set this value to 0, Nagios will not re-notify contacts about problems for this service - only one problem notification will be sent out. | ||||||
<notification_period> | This is the short name of the time period that identifies when notifications about problems or recoveries for this service may be sent out. If a service problem or recovery occurs outside valid times within this time period, notifications will not be sent out. See the "Time Periods" documentation located here for more information on how time periods works and potentials problems with using them improperly. | ||||||
<notify_recovery> | This value determines whether or not alert notifications will be generated if the service recovers from a non-OK state. Set this value to 1 if the service should generate alerts for recoveries, 0 if it shouldn't. Note: If a contact is configured to not receive recovery notifications, they will not be notified of any recoveries for this service, regardless of this setting. | ||||||
<notify_critical> | This value determines whether or not alert notifications will be generated if the service is in a CRITICAL state. Set this value to 1 if the service should generate alerts for critical states, 0 if it shouldn't. Note: If a contact is configured to not receive critical notifications, they will not be notified of any critical states for this service, regardless of this setting. | ||||||
<notify_warning> | This value determines whether or not alert notifications will be generated if the service is in a WARNING or UNKNOWN state. Set this value to 1 if the service should generate alerts for warning/unknown states, 0 if it shouldn't. Note: If a contact is configured to not receive warning/unknown notifications, they will not be notified of any warning/unknown states for this service, regardless of this setting. | ||||||
<event_handler> | This is the short name of the command that should be run whenever a change in the status of the services is detected (i.e. whenever it goes down or recovers). Read the documentation on event handlers for a more detailed explanation of how to write scripts for handling events. If you do not wish to define an event handler for the service, leave this option blank (as shown in the examples above). The maximum amount of time that the event handler command can run is controlled by the event_handler_timeout option. | ||||||
<check_command> |
This is the command that Nagios will run in order to check the status of the service. There are three command formats that can be used:
|
Time Period Definition |
Format: | timeperiod[<timeperiod_name>]=<timeperiod_alias>;<sunday_ranges>;<monday_ranges>;<tuesday_ranges>;<wenesday_ranges>;<thursday_ranges>;<friday_ranges>;<saturday_ranges>; |
Example 1: | timeperiod[24x7]=All Day, Every Day;00:00-24:00;00:00-24:00;00:00-24:00;00:00-24:00;00:00-24:00;00:00-24:00;00:00-24:00 |
Example 2: | timeperiod[workhours]="Normal" Working Hours;;09:00-17:00;09:00-17:00;09:00-17:00;09:00-17:00;09:00-17:00; |
Example 3: | timeperiod[none]=No Time Is A Good Time;;;;;;; |
Example 4: | timeperiod[nonworkhours]=Non-Work Hours;00:00-24:00;00:00-09:00,17:00-24:00;00:00-09:00,17:00-24:00;00:00-09:00,17:00-24:00;00:00-09:00,17:00-24:00;00:00-09:00,17:00-24:00;00:00-24:00 |
A time period is a list of times during various days that are considered to be "valid" times for notifications and service checks. It consists one or more time periods for each day of the week that "rotate" once the week has come to an end. Exceptions to the normal weekly time range rotations are not suported.
<timeperiod_name> | This is a short name used to identify the time period. |
<timeperiod_alias> | This is a longer name or description used to identify the time period. |
<xday_ranges> | This is a comma-delimited list of time ranges that are "valid" times for a particular day of the week. Notice that there are seven different days for which you must define time ranges (Sunday through Saturday). Each time range is in the form of HH:MM-HH:MM, where hours are specified on a 24 hour clock. For example, 00:15-24:00 means 12:15am in the morning for this day until 12:20am midnight (a 23 hour, 45 minute total time range). If you leave a particular day's time range blank, it means that there are no "valid" times for that day. |
Service Escalation Definition |
Format: | serviceescalation[<host>;<description>]=<first_notification>-<last_notificiation>;<contact_groups>;<notification_interval> |
Examples: |
serviceescalation[linux1;Zombie Processes]=3-5;linux-admins,managers;0 serviceescalation[nt1;HTTP]=6-0;nt-admins,managers,everyone;30 |
A service escalation definition is completely optional and is used to escalate notifications for a particular service. More information on how notification escalations work can be found here.
<host> | This is the short name of the host that the service "runs" on or is associated with. |
<description> | A description of the service, which may contain spaces, dashes, and colons (semicolons, parentheses, and apostrophes are not allowed). No two services associated with the same host can have the same description. |
<first_notification> | This is a number that identifies the first notification for which this escalation is effective. For instance, if you set this value to 3, this escalation will only be used if the service is in a non-OK state long enough for a third escalation to go out. |
<last_notification> | This is a number that identifies the last notification for which this escalation is effective. For instance, if you set this value to 5, this escalation will not be used if more than five notifications are sent out for the specified service. Setting this value to 0 means to keep using this escalation entry forever (no matter how many notifications go out). |
<contact_groups> | This is a list of the short names of the contact groups that should be notified when a service notification is escalated. Multiple contact groups should be separated by commas. |
<notification_interval> | The interval at which notifications should be made while this escalation is valid. If you specify a value of 0 for the interval, Nagios will send the first notification when this escalation definition is valid, but will then prevent any more problem notifications from being sent out for the host. Notifications are sent out again until the service recovers. This is useful if you want to stop having notifications sent out after a certain amount of time. Note: If multiple escalation entries for a service overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used. |
Host Group Escalation Definition |
Format: | hostgroupescalation[<group_name>]=<first_notification>-<last_notificiation>;<contact_groups>;<notification_interval> |
Examples: |
hostgroupescalation[nt-servers]=3-5;nt-admins,managers;0 hostgroupescalation[nt-servers]=6-0;nt-admins,managers,everyone;60 |
A host group escalation definition is completely optional and is used to escalate notifications for hosts in a particular hostgroup. More information on how notification escalations work can be found here.
<group_name> | This is a short name used to identify the host group (as previously defined in a hostgroup definition) that the escalation should apply to. |
<first_notification> | This is a number that identifies the first notification for which this escalation is effective. For instance, if you set this value to 3, this escalation will only be used if a host in the hostgroup is down or unreachable long enough for a third escalation to go out. |
<last_notification> | This is a number that identifies the last notification for which this escalation is effective. For instance, if you set this value to 5, this escalation will not be used if more than five notifications are sent out for any particular host in the specified hostgroup. Setting this value to 0 means to keep using this escalation entry forever (no matter how many notifications go out). |
<contact_groups> | This is a list of the short names of the contact groups that should be notified when a host notification is escalated. Multiple contact groups should be separated by commas. |
<notification_interval> | The interval at which notifications should be made while this escalation is valid. If you specify a value of 0 for the interval, Nagios will send the first notification when this escalation definition is valid, but will then prevent any more problem notifications from being sent out for the host. Notifications are sent out again until the host recovers. This is useful if you want to stop having notifications sent out after a certain amount of time. Note: If multiple escalation entries for a hostgroup overlap for one or more notification ranges, the smallest notification interval from all escalation entries is used. |
Service Dependency Definition |
Format: | servicedepency[<dependent_host>;<dependent_description>]=<host>;<description>;<execution_failure_options>;<notification_failure_options> |
Examples: |
servicedependency[nt1;WWW1 Website]=nt1;HTTP;;wc servicedependency[nt1;WWW2 Website]=nt1;HTTP;wcu;wcu servicedependency[nt1;WWW2 Website]=nt2;SQL Server;c; |
Service dependency definitions are completely optional. They are used to control both the execution of services and notifications for services based on the status of other services that are being monitored. Service dependencies are mainly targeted at advanced users who have complicated monitoring setups. More information on how service dependencies work (read this!) can be found here.
<dependent_host> | This is the short name of the host that the dependent service "runs" on or is associated with. |
<dependent_description> | This is the description of the dependent service. |
<host> | This is the short name of the host that the service we are depending on "runs" on or is associated with. |
<description> | This is the description of the service we are depending on. |
<execution_failure_options> | These options are used to define situations where the dependent service should not be executed. If the service we are depending on is in one of the failure states we specify, the dependent service will not be executed. Valid options are a combination of one or more of the following: o = fail on an OK state, w = fail on a WARNING state, u = fail on an UNKNOWN state, and c = fail on a CRITICAL state. Example: If you specify ocu in this field, the dependency will fail if the service we're depending on is in either an OK, a CRITICAL, or an UNKNOWN state and the dependent service will not be executed. You do not have to specify any failure options in this field. |
<notification_failure_options> | These options are used to define situations where notifications for the dependent service should not be sent out. If the service we are depending on is in one of the failure states we specify, notifications for the dependent service will not be sent to contacts. Valid options are a combination of one or more of the following: o = fail on an OK state, w = fail on a WARNING state, u = fail on an UNKNOWN state, and c = fail on a CRITICAL state. Example: If you specify w in this field, the dependency will fail if the service we're depending on is in a WARNING state and notifications for the dependent service will not be sent out. You do not have to specify any failure options in this field. |