Notes
When creating and/or editing configuration files, keep the following in mind:
Sample Configuration
A sample main configuration file is created in the base directory of the Nagios distribution when you run the configure script. The default name of the main configuration file is nagios.cfg - its usually placed in the etc/ subdirectory of you Nagios installation (i.e. /usr/local/nagios/etc/).
Index
Log file
Object configuration file
Object configuration directory
Resource file
Temp file
Status file
Aggregated status updates option
Aggregated status data update interval
Nagios user
Nagios group
Notifications option
Service check execution option
Passive service check acceptance option
Event handler option
Log rotation method
Log archive path
External command check option
External command check interval
External command file
Comment file
Downtime file
Lock file
State retention option
State retention file
Automatic state retention update interval
Use retained program state option
Syslog logging option
Notification logging option
Service check retry logging option
Host retry logging option
Event handler logging option
Initial state logging option
External command logging option
Passive service check logging option
Global host event handler
Global service event handler
Inter-check sleep time
Inter-check delay method
Service interleave factor
Maximum concurrent service checks
Service reaper frequency
Timing interval length
Agressive host checking option
Flap detection option
Low service flap threshold
High service flap threshold
Low host flap threshold
High host flap threshold
Soft service dependencies option
Service check timeout
Host check timeout
Event handler timeout
Notification timeout
Obsessive compulsive service processor timeout
Performance data processor command timeout
Obsess over services option
Obsessive compulsive service processor command
Performance data processing option
Orphaned service check option
Service freshness checking option
Service freshness checking option
Illegal object name characters
Illegal macro output characters
Administrator email address
Administrator pager
Log File |
Format: | log_file=<file_name> |
Example: | log_file=/usr/local/nagios/var/nagios.log |
This variable specifies where Nagios should create its main log file. This should be the first variable that you define in your configuration file, as Nagios will try to write errors that it finds in the rest of your configuration data to this file. If you have log rotation enabled, this file will automatically be rotated every hour, day, week, or month.
Object Configuration File |
Format: | cfg_file=<file_name> |
Example: |
cfg_file=/usr/local/nagios/etc/hosts.cfg cfg_file=/usr/local/nagios/etc/services.cfg cfg_file=/usr/local/nagios/etc/commands.cfg |
This directive is used to specify an object configuration file that Nagios should use for monitoring. This file has traditionally been called the "host" config file, even though it may contain more than just host definitions. Object configuration files contain definitions for hosts, host groups, contacts, contact groups, services, commands, etc. You can seperate your configuration information into several files and specify multiple cfg_file= statements to have each of them processed.
Object Configuration Directory |
Format: | cfg_dir=<directory_name> |
Example: |
cfg_dir=/usr/local/nagios/etc/commands cfg_dir=/usr/local/nagios/etc/services cfg_dir=/usr/local/nagios/etc/hosts |
This directive is used to specify a directory which contains object configuration files that Nagios should use for monitoring. All files in the directory with a .cfg extension are processed as object config files. You can seperate your configuration files into different directories and specify multiple cfg_dir= statements to have all config files in each directory processed.
Resource File |
Format: | resource_file=<file_name> |
Example: | resource_file=/usr/local/nagios/etc/resource.cfg |
This is used to specify an optional resource file that can contain $USERn$ macro definitions. $USERn$ macros are useful for storing usernames, passwords, and items commonly used in command definitions (like directory paths). The CGIs will not attempt to read resource files, so you can set restrictive permissions (600 or 660) on them to protect sensitive information. You can include multiple resource files by adding multiple resource_file statements to the main config file - Nagios will process them all. See the sample resource.cfg file in the base of the Nagios directory for an example of how to define $USERn$ macros.
Temp File |
Format: | temp_file=<file_name> |
Example: | temp_file=/usr/local/nagios/var/nagios.tmp |
This is a temporary file that Nagios periodically creates to use when updating comment data, status data, etc. The file is deleted when it is no longer needed.
Status File |
Format: | status_file=<file_name> |
Example: | status_file=/usr/local/nagios/var/status.log |
This is the file that Nagios uses to store the current status of all monitored services. The status of all hosts associated with the service you monitor are also recorded here. This file is used by the CGIs so that current monitoring status can be reported via a web interface. The CGIs must have read access to this file in order to function properly. This file is deleted every time Nagios stops and recreated when it starts.
Aggregated Status Updates Option |
Format: | aggregate_status_updates=<0/1> |
Example: | aggregate_status_updates=1 |
This option determines whether or not Nagios will aggregate updates of host, service, and program status data. If you do not enable this option, status data is updated every time a host or service checks occurs. This can result in high CPU loads and file I/O if you are monitoring a lot of services. If you want Nagios to only update status data (in the status file) every few seconds (as determined by the status_update_interval option), enable this option. If you want immediate updates, disable it. I would highly recommend using aggregated updates (even at short intervals) unless you have good reason not to. Values are as follows:
Aggregated Status Update Interval |
Format: | status_update_interval=<seconds> |
Example: | status_update_interval=15 |
This setting determines how often (in seconds) that Nagios will update status data in the status file. The minimum update interval is five seconds. If you have disabled aggregated status updates (with the aggregate_status_updates option), this option has no effect.
Nagios User |
Format: | nagios_user=<username/UID> |
Example: | nagios_user=nagios |
This is used to set the effective user that the Nagios process should run as. After initial program startup and before starting to monitor anything, Nagios will drop its effective privileges and run as this user. You may specify either a username or a UID.
Nagios Group |
Format: | nagios_group=<groupname/GID> |
Example: | nagios_group=nagios |
This is used to set the effective group that the Nagios process should run as. After initial program startup and before starting to monitor anything, Nagios will drop its effective privileges and run as this group. You may specify either a groupname or a GID.
Notifications Option |
Format: | enable_notifications=<0/1> |
Example: | enable_notifications=1 |
This option determines whether or not Nagios will send out notifications when it initially (re)starts. If this option is disabled, Nagios will not send out notifications for any host or service. Note: If you have state retention enabled, Nagios will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:
Service Check Execution Option |
Format: | execute_service_checks=<0/1> |
Example: | execute_service_checks=1 |
This option determines whether or not Nagios will execute service checks when it initially (re)starts. If this option is disabled, Nagios will not actively execute any service checks and will remain in a sort of "sleep" mode (it can still accept passive checks unless you've disabled them). This option is most often used when configuring backup monitoring servers, as described in the documentation on redundancy, or when setting up a distributed monitoring environment. Note: If you have state retention enabled, Nagios will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:
Passive Service Check Acceptance Option |
Format: | accept_passive_service_checks=<0/1> |
Example: | accept_passive_service_checks=1 |
This option determines whether or not Nagios will accept passive service checks when it initially (re)starts. If this option is disabled, Nagios will not accept any passive service checks. Note: If you have state retention enabled, Nagios will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:
Event Handler Option |
Format: | enable_event_handlers=<0/1> |
Example: | enable_event_handlers=1 |
This option determines whether or not Nagios will run event handlers when it initially (re)starts. If this option is disabled, Nagios will not run any host or service event handlers. Note: If you have state retention enabled, Nagios will ignore this setting when it (re)starts and use the last known setting for this option (as stored in the state retention file), unless you disable the use_retained_program_state option. If you want to change this option when state retention is active (and the use_retained_program_state is enabled), you'll have to use the appropriate external command or change it via the web interface. Values are as follows:
Log Rotation Method |
Format: | log_rotation_method=<n/h/d/w/m> |
Example: | log_rotation_method=d |
This is the rotation method that you would like Nagios to use for your log file. Values are as follows:
Log Archive Path |
Format: | log_archive_path=<path> |
Example: | log_archive_path=/usr/local/nagios/var/archives/ |
This is the directory where Nagios should place log files that have been rotated. This option is ignored if you choose to not use the log rotation functionality.
External Command Check Option |
Format: | check_external_commands=<0/1> |
Example: | check_external_commands=1 |
This option determines whether or not Nagios will check the command file for internal commands it should execute. This option must be enabled if you plan on using the command CGI to issue commands via the web interface. Third party programs can also issue commands to Nagios by writing to the command file, provided proper rights to the file have been granted as outlined in this FAQ. More information on external commands can be found here.
External Command Check Interval |
Format: | command_check_interval=<xxx>[s] |
Example: | command_check_interval=1 |
If you specify a number with an "s" appended to it (i.e. 30s), this is the number of seconds to wait between external command checks. If you leave off the "s", this is the number of "time units" to wait between external command checks. Unless you've changed the interval_length value (as defined below) from the default value of 60, this number will mean minutes.
Note: By setting this value to -1, Nagios will check for external commands as often as possible. Each time Nagios checks for external commands it will read and process all commands present in the command file before continuing on with its other duties. More information on external commands can be found here.
External Command File |
Format: | command_file=<file_name> |
Example: | command_file=/usr/local/nagios/var/rw/nagios.cmd |
This is the file that Nagios will check for external commands to process. The command CGI writes commands to this file. Other third party programs can write to this file if proper file permissions have been granted as outline in here. The external command file is implemented as a named pipe (FIFO), which is created when Nagios starts and removed when it shuts down. If the file exists when Nagios starts, the Nagios process will terminate with an error message. More information on external commands can be found here.
Downtime File |
Format: | downtime_file=<file_name> |
Example: | downtime_file=/usr/local/nagios/var/downtime.log |
This is the file that Nagios will use for storing scheduled host and service downtime information. Comments can be viewed and added for both hosts and services through the extended information CGI.
Comment File |
Format: | comment_file=<file_name> |
Example: | comment_file=/usr/local/nagios/var/comment.log |
This is the file that Nagios will use for storing service and host comments. Comments can be viewed and added for both hosts and services through the extended information CGI.
Lock File |
Format: | lock_file=<file_name> |
Example: | lock_file=/tmp/nagios.lock |
This option specifies the location of the lock file that Nagios should create when it runs as a daemon (when started with the -d command line argument). This file contains the process id (PID) number of the running Nagios process.
State Retention Option |
Format: | retain_state_information=<0/1> |
Example: | retain_state_information=1 |
This option determines whether or not Nagios will retain state information for hosts and services between program restarts. If you enable this option, you should supply a value for the state_retention_file variable. When enabled, Nagios will save all state information for hosts and service before it shuts down (or restarts) and will read in previously saved state information when it starts up again.
State Retention File |
Format: | state_retention_file=<file_name> |
Example: | state_retention_file=/usr/local/nagios/var/status.sav |
This is the file that Nagios will use for storing service and host state information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. This file is deleted after Nagios reads in initial state information when it (re)starts. In order to make Nagios retain state information between program restarts, you must enable the retain_state_information option.
Automatic State Retention Update Interval |
Format: | retention_update_interval=<minutes> |
Example: | retention_update_interval=60 |
This setting determines how often (in minutes) that Nagios will automatically save retention data during normal operation. If you set this value to 0, Nagios will not save retention data at regular intervals, but it will still save retention data before shutting down or restarting. If you have disabled state retention (with the retain_state_information option), this option has no effect.
Use Retained Program State Option |
Format: | use_retained_program_state=<0/1> |
Example: | use_retained_program_state=1 |
This setting determines whether or not Nagios will set various program-wide state variables based on the values saved in the retention file. Some of these program-wide state variables that are normally saved across program restarts if state retention is enabled include the enable_notifications, enable_flap_detection, enable_event_handlers, execute_service_checks, and accept_passive_service_checks options. If you do not have state retention enabled, this option has no effect.
Syslog Logging Option |
Format: | use_syslog=<0/1> |
Example: | use_syslog=1 |
This variable determines whether messages are logged to the syslog facility on your local host. Values are as follows:
Notification Logging Option |
Format: | log_notifications=<0/1> |
Example: | log_notifications=1 |
This variable determines whether or not notification messages are logged. If you have a lot of contacts or regular service failures your log file will grow relatively quickly. Use this option to keep contact notifications from being logged.
Service Check Retry Logging Option |
Format: | log_service_retries=<0/1> |
Example: | log_service_retries=1 |
This variable determines whether or not service check retries are logged. Service check retries occur when a service check results in a non-OK state, but you have configured Nagios to retry the service more than once before responding to the error. Services in this situation are considered to be in "soft" states. Logging service check retries is mostly useful when attempting to debug Nagios or test out service event handlers.
Host Check Retry Logging Option |
Format: | log_host_retries=<0/1> |
Example: | log_host_retries=1 |
This variable determines whether or not host check retries are logged. Logging host check retries is mostly useful when attempting to debug Nagios or test out host event handlers.
Event Handler Logging Option |
Format: | log_event_handlers=<0/1> |
Example: | log_event_handlers=1 |
This variable determines whether or not service and host event handlers are logged. Event handlers are optional commands that can be run whenever a service or hosts changes state. Logging event handlers is most useful when debugging Nagios or first trying out your event handler scripts.
Initial States Logging Option |
Format: | log_initial_states=<0/1> |
Example: | log_initial_states=1 |
This variable determines whether or not Nagios will force all initial host and service states to be logged, even if they result in an OK state. Initial service and host states are normally only logged when there is a problem on the first check. Enabling this option is useful if you are using an application that scans the log file to determine long-term state statistics for services and hosts.
External Command Logging Option |
Format: | log_external_commands=<0/1> |
Example: | log_external_commands=1 |
This variable determines whether or not Nagios will log external commands that it receives from the external command file. Note: This option does not control whether or not passive service checks (which are a type of external command) get logged. To enable or disable logging of passive checks, use the log_passive_service_checks option.
Passive Service Check Logging Option |
Format: | log_passive_service_checks=<0/1> |
Example: | log_passive_service_checks=1 |
This variable determines whether or not Nagios will log passive service checks that it receives from the external command file. If you are setting up a distributed monitoring environment or plan on handling a large number of passive checks on a regular basis, you may wish to disable this option so your log file doesn't get too large.
Global Host Event Handler Option |
Format: | global_host_event_handler=<command> |
Example: | global_host_event_handler=log-host-event-to-db |
This option allows you to specify a host event handler command that is to be run for every host state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each host definition. The command argument is the short name of a command that you define in your object configuration file. The maximum amount of time that this command can run is controlled by the event_handler_timeout option. More information on event handlers can be found here.
Global Service Event Handler Option |
Format: | global_service_event_handler=<command> |
Example: | global_service_event_handler=log-service-event-to-db |
This option allows you to specify a service event handler command that is to be run for every service state change. The global event handler is executed immediately prior to the event handler that you have optionally specified in each service definition. The command argument is the short name of a command that you define in your object configuration file. The maximum amount of time that this command can run is controlled by the event_handler_timeout option. More information on event handlers can be found here.
Inter-Check Sleep Time |
Format: | sleep_time=<seconds> |
Example: | sleep_time=1 |
Inter-Check Delay Method |
Format: | inter_check_delay_method=<n/d/s/x.xx> |
Example: | inter_check_delay_method=s |
Service Interleave Factor |
Format: | service_interleave_factor=<s|x> |
Example: | service_interleave_factor=s |
Maximum Concurrent Service Checks |
Format: | max_concurrent_checks=<max_checks> |
Example: | max_concurrent_checks=20 |
Service Reaper Frequency |
Format: | service_reaper_frequency=<frequency_in_seconds> |
Example: | service_reaper_frequency=10 |
Timing Interval Length |
Format: | interval_length=<seconds> |
Example: | interval_length=60 |
Important: The default value for this is set to 60, which means that a "unit value" of 1 in the host configuration file will mean 60 seconds (1 minute). I have not really tested other values for this variable, so proceed at your own risk if you decide to do so!
Agressive Host Checking Option |
Format: | use_agressive_host_checking=<0/1> |
Example: | use_agressive_host_checking=0 |
Flap Detection Option |
Format: | enable_flap_detection=<0/1> |
Example: | enable_flap_detection=0 |
Low Service Flap Threshold |
Format: | low_service_flap_threshold=<percent> |
Example: | low_service_flap_threshold=25.0 |
High Service Flap Threshold |
Format: | high_service_flap_threshold=<percent> |
Example: | high_service_flap_threshold=50.0 |
Low Host Flap Threshold |
Format: | low_host_flap_threshold=<percent> |
Example: | low_host_flap_threshold=25.0 |
High Host Flap Threshold |
Format: | high_host_flap_threshold=<percent> |
Example: | high_host_flap_threshold=50.0 |
Soft Service Dependencies Option |
Format: | soft_state_dependencies=<0/1> |
Example: | soft_state_dependencies=0 |
Service Check Timeout |
Format: | service_check_timeout=<seconds> |
Example: | service_check_timeout=60 |
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each service check normally finishes executing within this time limit. If a service check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
Host Check Timeout |
Format: | host_check_timeout=<seconds> |
Example: | host_check_timeout=60 |
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off plugins which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each host check normally finishes executing within this time limit. If a host check runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
Event Handler Timeout |
Format: | event_handler_timeout=<seconds> |
Example: | event_handler_timeout=60 |
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each event handler command normally finishes executing within this time limit. If an event handler runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
Notification Timeout |
Format: | notification_timeout=<seconds> |
Example: | notification_timeout=60 |
There is often widespread confusion as to what this option really does. It is meant to be used as a last ditch mechanism to kill off commands which are misbehaving and not exiting in a timely manner. It should be set to something high (like 60 seconds or more), so that each notification command finishes executing within this time limit. If a notification command runs longer than this limit, Nagios will kill it off thinking it is a runaway processes.
Obsessive Compulsive Service Processor Timeout |
Format: | ocsp_timeout=<seconds> |
Example: | ocsp_timeout=5 |
Performance Data Processor Command Timeout |
Format: | perfdata_timeout=<seconds> |
Example: | perfdata_timeout=5 |
Obsess Over Services Option |
Format: | obsess_over_services=<0/1> |
Example: | obsess_over_services=1 |
Obsessive Compulsive Service Processor Command |
Format: | ocsp_command=<command> |
Example: | ocsp_command=obsessive_service_handler |
This option allows you to specify a command to be run after every service check, which can be useful in distributed monitoring. This command is executed after any event handler or notification commands. The command argument is the short name of a command definition that you define in your host configuration file. The maximum amount of time that this command can run is controlled by the ocsp_timeout option. More information on distributed monitoring can be found here.
Performance Data Processing Option |
Format: | process_performance_data=<0/1> |
Example: | process_performance_data=1 |
Orphaned Service Check Option |
Format: | check_for_orphaned_services=<0/1> |
Example: | check_for_orphaned_services=0 |
This option allows you to enable or disable checks for orphaned service checks. Orphaned service checks are checks which ahve been executed and have been removed from the event queue, but have not had any results reported in a long time. Since no results have come back in for the service, it is not rescheduled in the event queue. This can cause service checks to stop being executed. Normally it is very rare for this to happen - it might happen if an external user or process killed off the process that was being used to execute a service check. If this option is enabled and Nagios finds that results for a particular service check have not come back, it will log an error message and reschedule the service check. If you start seeing service checks that never seem to get rescheduled, enable this option and see if you notice any log messages about orphaned services.
Service Freshness Checking Option |
Format: | check_service_freshness=<0/1> |
Example: | check_service_freshness=0 |
This option determines whether or not Nagios will periodically check the "freshness" of service checks. Enabling this option is useful for helping to ensure that passive service checks are received in a timely manner. More information on freshness checking can be found here.
Service Freshness Check Interval |
Format: | freshness_check_interval=<seconds> |
Example: | freshness_check_interval=60 |
This setting determines how often (in seconds) Nagios will periodically check the "freshness" of service check results. If you have disabled service freshness checking (with the check_service_freshness option), this option has no effect. More information on freshness checking can be found here.
Illegal Object Name Characters |
Format: | illegal_object_name_chars=<chars...> |
Example: | illegal_object_name_chars=`~!$%^&*"|'<>?,()= |
This options allows you to specify illegal characters that cannot be used in host names, service descriptions, or names of other object types. Nagios will allow you to use most characters in object definitions, but I recommend not using the characters shown in the example above. Doing may give you problems in the web interface, notification commands, etc.
Illegal Macro Output Characters |
Format: | illegal_macro_output_chars=<chars...> |
Example: | illegal_macro_output_chars=`~$^&"|'<> |
This options allows you to specify illegal characters that should be stripped from macros before being used in notifications, event handlers, and other commands. This DOES NOT affect macros used in service or host check commands. You can choose to not strip out the characters shown in the example above, but I recommend you do not do this. Some of these characters are interpreted by the shell (i.e. the backtick) and can lead to security problems. The following macros are stripped of the characters you specify:
$OUTPUT$, $PERFDATA$
Administrator Email Address |
Format: | admin_email=<email_address> |
Example: | admin_email=root@localhost.localdomain |
Administrator Pager |
Format: | admin_pager=<pager_number_or_pager_email_gateway> |
Example: | admin_pager=pageroot@localhost.localdomain |