User Tools

Site Tools


Table of Contents



We have a nagios instance at Alerts for testnodes are sent to ceph-infra AT redhat DOT com while alerts for production services are sent to dmick and dgalloway. Alfredo and Andrew also get alerts for the Jenkins CI stuff.

NRPE is configured on nagios-monitored hosts using the common role in ceph-cm-ansible.


Load, Disk Space, and HTTP are built-in Nagios checks performed on applicable host. Some of these are configurable and found in /etc/nagios-plugins/config/.


Calls /usr/libexec/ on applicable hosts.


Calls /usr/libexec/ on applicable hosts.

LRC Health

This checks in with mira021 where a custom nagios plugin is in place. It currently whitelists 'failing to respond to cache pressure' when anything but HEALTH_OK is returned.

root@mira021:~# tail -n 1 /etc/nagios/nrpe_local.cfg
command[check_ceph_health]=/usr/lib/nagios/plugins/ceph-nagios-plugins/src/check_ceph_health --name client.nagios -k /etc/ceph/client.nagios.keyring --whitelist 'failing to respond to cache pressure'
services/nagios.txt · Last modified: 2017/05/19 20:07 by dgalloway