Understanding the Issue
What Does PENDING State Mean?
In Nagios, the PENDING state indicates that a host or service check has not yet been executed. While this is expected during initial startup or configuration reloads, prolonged PENDING status suggests deeper scheduling or configuration issues.
Impact on Monitoring Operations
Services stuck in PENDING don't trigger notifications, logs, or escalations. In large-scale deployments, this could silently suppress entire layers of monitoring visibility, delaying incident response.
Root Causes
- Improper check_interval or retry_interval: Extremely high intervals or missing values can delay or prevent execution.
- Disabled active checks: Misconfiguration in `check_command` or disabling active checks globally or per service.
- Scheduler overload: Large environments with too many checks and insufficient Nagios daemon resources.
- Corrupted retention files: Retention.dat or status.dat inconsistencies after abrupt service restarts.
- Missing or misconfigured timeperiods: Services scheduled outside valid time windows won’t run.
Diagnostic Workflow
1. Check Service Definition
Ensure each service has a valid `check_command` and intervals set:
define service { host_name myserver service_description CPU Load check_command check_load check_interval 5 retry_interval 1 active_checks_enabled 1 }
2. Review Nagios Scheduler Load
Use Nagios web UI or CLI to inspect the number of checks vs capacity:
ps -ef | grep nagios top iostat -xz 1
If CPU or IO is saturated, consider increasing `max_concurrent_checks` or distributing load.
3. Inspect nagios.log
Key entries like "Warning: The check of service ... was not executed" may indicate config issues:
/usr/local/nagios/var/nagios.log
4. Validate Time Periods
Ensure defined time periods are not excluding checks unintentionally:
define timeperiod { timeperiod_name 24x7 alias 24 Hours A Day, 7 Days A Week sunday 00:00-24:00 monday 00:00-24:00 ... }
5. Restart Nagios Cleanly
Stop Nagios and clear cache files:
service nagios stop rm -f /usr/local/nagios/var/status.dat rm -f /usr/local/nagios/var/retention.dat service nagios start
Monitor whether services begin transitioning from PENDING.
Best Practices and Long-Term Fixes
- Use proper intervals: Avoid extreme check/retry intervals. Stick to 5-10 min checks and 1-2 min retries for most services.
- Scale out with distributed monitoring: Use Nagios Remote Plugin Executor (NRPE) or mod_gearman to offload checks.
- Validate all objects post-edit: Always run `nagios -v /etc/nagios/nagios.cfg` after config changes.
- Monitor Nagios itself: Set alerts for scheduler latency, check queue depth, and daemon uptime.
- Version upgrades: Ensure you are not affected by known bugs in older versions related to the event scheduler.
Conclusion
When services remain in a PENDING state in Nagios, it's often symptomatic of deeper systemic issues—be it resource constraints, misconfiguration, or scheduling conflicts. Through structured diagnosis—starting from service definitions and check intervals to scheduler load and time periods—engineers can restore full monitoring functionality. In mission-critical environments, proactively applying these best practices ensures that Nagios remains a reliable part of your DevOps toolchain.
FAQs
1. Is it normal for services to be in PENDING after a Nagios restart?
Yes, briefly. However, they should transition to OK/WARNING/CRITICAL within the first check interval (typically a few minutes).
2. Can I force Nagios to immediately check all PENDING services?
Yes, use the "Schedule a forced check" option in the web UI or use the `submit_check_result` external command via CLI.
3. Do passive checks affect PENDING status?
Yes. If a service relies only on passive checks and none have been submitted, it will remain in PENDING until one arrives.
4. What logs help the most when troubleshooting PENDING states?
The main Nagios log (`nagios.log`) and configuration validation output are the most helpful for identifying root causes.
5. Is there a risk in deleting status.dat or retention.dat?
Only if Nagios is running. Stop the service before deleting these files. They will be regenerated cleanly on restart.