check_oomkiller

Description

LINUX ONLY - Check for OOM-Killer (out of memory killer) Activity

Project Details

Current Version

2.0

Last Release Date

2010-09-07

Owner

John Chivian

License

GPL

Compatible With

Nagios 3.x

Project Files

File	Description
check_oomkiller.pl.txt	check_oomkiller client plugin
check_oomkiller.c.txt	check_oomkiller suid wrapper

Project Notes

The LINUX OS will assassinate "big memory" processes during extreme memory shortages as a self defense. This plugin checks for such activity and returns a critical status if any has occurred since the previous check.

This plugin was written on and for RHEL4 using Nagios and NRPE and may need to be tweaked for other distro's.

IMPORTANT - This check requires read only access to the system messages file /var/log/messages which is not by default available to unprivileged accounts. For this reason it is either necessary to make /var/log/messages readable to the nagios user account (DON'T DO THIS), or it is necessary to write a small compiled wrapper program around the script and install the wrapper as a root owned SUID executable (DO THIS!)

Installation Instructions

1) Put the PERL script and C program in the nagios/libexec directory on the client system that will be checked.
2) Edit the C program if needed changing the REAL_PATH definition for your environment.
3) Compile the C program and install it as an SUID application. (chmod 4555 and chown root)
4) Use the following plugin definition on the client system in the nrpe.cfg configuration file. As with the C program edit the path if needed for your environment.

command[check_oomkiller]=/usr/local/nagios/libexec/check_oomkiller

5) Use the following service check definition on the Nagios server to perform the check on monitored systems.

define service{
use generic-service
host_name possible-oom-killer-victim
service_description OOM Killer
check_command check_nrpe60!check_oomkiller
max_check_attempts 1
}

Because each instance of the OOM-Killer check resets the current status, the service check definition on the Nagios server MUST contain "max_check_attempts 1". If you don't do this you will NEVER be notified.

Also notice that I am using a custom check_command called check_nrpe60. The only difference between check_nrpe60 and the standard check_nrpe is the addition of a 60 second timeout specification (see below). This is necessary because on systems with large /var/log/messages files (or busy systems with few CPU cycles to spare) the standard NRPE check on the server can timeout before the plugin has actually completed on the client.

define command{
command_name check_nrpe60
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ <b>-t 60</b> -c $ARG1$
}

The plugin returns a warning status if it can't perform its task, and a critical status if any OOM killer activity has taken place. If the status is critical, it also returns extended status information detailing the PID's and users affected.

And finally, it is worth noting that on a properly tuned system this activity will probably not occur. We discovered it "by accident" when a physical server was converted to a virtual machine and not given the same amount of memory it had previously. When we identified and applied the correct memory tuning parameter (vm.lower_zone_protection applied in /etc/syctl.conf in this case) the OOM-Killer activity ceased.

User Reviews (1)

March 31, 2017

Appears to work still on Centos/RHEL7 but having trouble with NRPE

by: bmoreitdan

I've got the compiled C wrapper working on the command line as ./check_oomkiller, but when attempting to run with NRPE as ./check_nrpe -H 127.0.0.1 -c check_oomkiller I continue to get NRPE: Unable to read output. Using NRPE 3.0.1. I have also update my nrpe.cfg file with the matching command and restarted nrpe.

Add a Review

You must be logged in to submit a review.

Thank you for your review!

Description

Project Details

Project Files

Project Notes

User Reviews (1)

Appears to work still on Centos/RHEL7 but having trouble with NRPE

Add a Review

Welcome to the New Nagios Exchange!

check_oomkiller

Thank you for your review!

Description

Project Details

Project Files

Project Notes

User Reviews (1)

Appears to work still on Centos/RHEL7 but having trouble with NRPE

Add a Review

Recommend

Welcome to the New Nagios Exchange!