Search Exchange

Search All Sites

Nagios Log Server Now Available - Download Now

Nagios Live Webinars

Let our experts show you how Nagios can help your organization.

Contact Us

Phone: 1-888-NAGIOS-1
Email: sales@nagios.com

Login

Remember Me

box293_check_vmware

Bookmark and Share

Rating
14 votes
Favoured:
2
Current Version
2015-03-03
Last Release Date
2015-03-03
Compatible With
  • Nagios 3.x
  • Nagios 4.x
  • Nagios XI
Owner
E-mail
License
GPL
Hits
24127
Files:
FileDescription
box293_check_vmware.zipPlugin and Manual
Manual.pdfManual
This Plugin allows you to monitor a VMware vCenter / ESX(i) environment using your Nagios monitoring solution.

IMPORTANT:
This Plugin is NOT designed to be run on your Nagios host, instead it is offloaded to the VMware vSphere Management Assistant (vMA). This is due to some performance issues that occur with the VMware SDK which can easily overload your Nagios host.

How all of this works is explained in the manual including full instructions to get you up and running as quickly as possible.

The manual is included with the plugin.

The plugin allows you to monitor the following:
Cluster_CPU_Usage
Cluster_DRS_Status
Cluster_EVC_Status
Cluster_HA_Status
Cluster_Memory_Usage
Cluster_Resource_Info
Cluster_Swapfile_Status
Cluster_vMotion_Info
Datastore_Cluster_Status
Datastore_Cluster_Usage
Datastore_Performance
Datastore_Performance_Overall
Datastore_Usage
Guest_CPU_Info
Guest_CPU_Usage
Guest_Disk_Performance
Guest_Disk_Usage
Guest_Host
Guest_Memory_Info
Guest_Memory_Usage
Guest_NIC_Usage
Guest_Snapshot
Guest_Status
Host_CPU_Info
Host_CPU_Usage
Host_License_Status
Host_Memory_Usage
Host_OS_Name_Version
Host_pNIC_Status
Host_pNIC_Usage
Host_Status
Host_Storage_Adapter_Info
Host_Storage_Adapter_Performance
Host_Switch_Status
Host_vNIC_Status
vCenter_License_Status
vCenter_Name_Version
To Do / Wish List
Here is a list of items that are going to be addressed sometime in the future:
* A request was made to output usage checks with percentage values as well, for checks like Cluster_CPU_Usage, Cluster_Memory_Usage, Datastore_Usage, Host_CPU_Usage. Also to be able to alerts on percentages.
* For any host related checks where the host is in standby mode, return the status as OK instead of critical
* Create a Host Up/Down check to replace the standard ping check for scenarios where ESX(i) hosts are in standby mode, return the status as OK instead of critical or unknown
* Query the events of a VM to look for a specific event (such as a backup) and trigger alerts if it is not found
* Being able to connect to multiple vCenter servers when in linked mode
* Create Host checks to determine the state of services on the ESX(i) hosts like ntp, ssh, syslog
* Create host NTP checks to determine the difference between their defined NTP server and the local clock
* Look at the viability to Check the vCenter NTP time against all host NTP time and detect and specific drift
* Support for NFS Datastores
* Look at the viability of checking the internal disks of the guest operating systems
NOTE: new suggestions are added to the bottom of the list and priority is given to the top list items.


I have a mailing list that I will send an email to when I update this plugin. This way you can find out as soon as a new version of this plugin is available.
To Subscribe:
* Send an email to updates+subscribe@box293.com
* You will receive an email with a link you need to follow to create a subscription request
* Click the link to open it in a web browser
* You will need to type your email address and click submit
* You will receive another email with a link you need to follow to complete your subscription
* You will now be subscribed!
* Check your spam folder if the emails are not received


Twitter: @Box293

Version Notes:
2014-04-15
* Offical release version

2014-05-07
* Fixed bug where hosts were incorrectly reporting they are in Maintenance Mode (reported by Marvin Holze and Steven Miller)
* Added functions for upcoming Nagios XI Wizard

2014-05-09
* Fixed bug in Cluster_Memory_Usage check where the Memory Used was not being correctly reported (reported by Vitaly Burshteyn). This also affected the Cluster_Resource_Info check.

2014-05-10
* Fixed bug in Cluster_CPU_Usage check where the CPU Used was not being correctly reported (reported by Vitaly Burshteyn). This also affected the Cluster_Resource_Info check.

2014-08-24
* Improved debugging, creates a debugging file when in debug mode
* All checks that output performance data now have the name of the check appended to the end of the performance data surrounded by square brackets. This makes the use of templates in PNP easy
* Fixed bug in Host_pNIC_Status where the incorrect amount of pNICs were being calculated when specifying which pNICs to check
* Fixed bug in Host_pNIC_Status where --nic_state was not correctly triggering a CRITICAL state
* Fixed bug in Host_pNIC_Status where the phrase "NOT Connected" was appearing twice on a disconnected pNIC
* Fixed bug with Host_Switch_Status check, only the first switch was being reported and would not find more than one switch if the host had more than one
* Fixed bug with Guest_Disk_Usage where the "Disk Usage" was reported as 0 when the guest had snapshots
* Added a Version argument to report the plugin version
* Added check Guest_Status which reports on Power State, Uptime, VMware Tools Version and Status, IP Address, Hostname, ESX(i) Host Guest Is Running On, Consolidation State and Guest Version

2014-12-13
* Added option AlwaysOK for drs_automation_level so the check will always return an OK state (requested by Willem D’Haese)
* Added option AlwaysOK for drs_dpm_level so the check will always return an OK state (requested by Willem D’Haese)
* Added option AlwaysOK for ha_host_monitoring so the check will always return an OK state
* Added option AlwaysOK for ha_admission_control so the check will always return an OK state
* Added check Datastore_Performance_Overall which will return the Datastore Performance for ALL connected hosts to the datastore (requested by Willem D’Haese)
* Added check Datastore_Cluster_Usage (requested by snapon_admin)
* Added check Datastore_Cluster_Status (requested by snapon_admin)
* Updated the Nagios XI Wizard checks List_Datastores, List_Guest, List_Hosts and List_vCenter_Objects with improved encoding to allow UTF-8 characters (reported by DingGuo Xiao)
* Fixed bug in Datastore_Usage to limit the amount of decimal places returned for the Used Space value
* Fixed bug in certain checks like Guest_Snapshot where guests have special characters like a backslash (reported by Dennis Peere)
* Updated Host_Status checks to report Triggered Alarms and trigger warning and critical states if the alarms have not been acknowledged in vCenter (requested by Pierre-François Gallic, Ian Bergeron, Jacob Estrin, Brice Courault)
* Added argument --perfdata_option which allows you to disable the check name being appended to the end of the performance data string in square brackets, as some monitoring systems like Centreon do not like this (reported/requested by Bruno Guerpillon)

2015-01-29
* Fixed bug in Guest_CPU_Usage where high CPU usage could result in a negative free value
* Added --modifier argument to allow request and response data to be modified for Host and Guest checks (requested by Willem D’Haese). An exmaple how this is used: your Nagios host objects have the address serverxx.box293.local but they are named in the vCenter inventory as serverxx. The --modifier argument will allow you to remove the '.box293.local'. This allows for the use of more generic service definitions in Nagios which means less configurations required. Detailed examples are provided in the manual
* Added Guest_Host check for determining if the ESX(i) host the guest is running on matches the parent_hosts defined in Nagios (requested by Virgil Hoover and other attendees at the Nagios World Conference 2014). This check will work in conjunction with the upcoming box293_event_handler plugin to run on the Nagios host ... stay tuned!
* Added the --query_url, --query_username, --query_password and --service_status_info arguments to allow the plugin to query Nagios for checks like Guest_Host to determine Nagios parent object directive
* Added more debugging to the Nagios XI Wizard List_xxx checks
* --debug option will now show how long the plugin ran for
* All cluster checks now report the name of the cluster at the beginning of the status output (requested by Willem D’Haese)

2015-03-03
* Added argument --exclude_snapshot to be used with the Guest_Snapshot check. This allows you to exclude snapshots that contain specific text in the NAME of the snapshot (requested by Pierre-François Gallic)
* Changed --perfdata_option to allow you to specify what metrics you want the specific check to use / report on, applies to all checks that return performance data. See manual for full details for each check (requested by Bruno Guerpillon)
* Fixed bug in Cluster_HA_Status that caused check to fail when the Slot Size had been defined using vSphere Web Interface (reported by Daniel Vleeshakker)
* Re-fixed bug in Datastore_Usage to limit the amount of decimal places returned for the Used Space value
* Fixed bug in Datastore_Cluster_Usage to limit the amount of decimal places returned for the Used Space value
* Manual now recommends using the 'nice' command to execute box293_check_vmware. This makes the plugin execute at lower process schedule and makes the vMA more stable
Reviews (7)
We are using these checks for a couple of months now. Only issue was some timeouts on the check due to a high system load. Glad to see this is fixed on the latest version!

Had some issues with the check once. Found out it was a locked user account. Great support from Troy!

All in all: keep up the good work! :)
bygerpion, March 3, 2015
This is exactly what we needed to monitor our VMWare Infrastructure.
Troy is a great developper who try to take in consideration any idea that could lead to improve his job.
Thanks !
byPikmin, February 18, 2015
The manual was very easy to follow and the amount of stuff that can be monitored is amazing
This plugin was simple to install and I had it running checks against two vCenters and several ESXi hosts in no time. What questions I did have, the developer answered almost immediately. If you are looking to use Nagios to give an eye into VMware you have found the right tool.
Obviously a lot of work has gone into the implementation of this service check and the accompanying documentation. A qualified VMware and Nagios systems administrator will have no issue getting things up and running with relative ease.

In my case I am using a single Nagios server and a single VMA appliance (new for this purpose) to monitor two vCenter systems and the underlying clusters, hosts and guests.

I did initially have an issue in which some instances of the service check failed on one of the vCenter servers, but this was traced to a problem with the vCenter embedded database, and the issue "magically" resolved itself after a database rebuild.
byfusfeld, June 27, 2014
Plugin had a couple hiccups to get installed but the author was incredibly helpful and we were able to work past it. Still by far the best documented plugin i've seen, and works as advertised. Very very helpful.
byssmiller_gfsu, May 6, 2014
1 of 1 people found this review helpful
This plugin seems far more complete then most others. I found a few minor bugs with Cluster_Memory_Usage, Cluster_CPU_Usage, and Host_Status. The diff file will fix this as of 5/6/2014):

vi-admin@mpvmat:~> diff -u orig/box293_check_vmware.pl box293_check_vmware.pl
--- orig/box293_check_vmware.pl 2014-05-06 10:09:44.000000000 -0400
+++ box293_check_vmware.pl 2014-05-06 14:55:18.000000000 -0400
@@ -714,7 +714,7 @@
my $host_maintenance_mode = $cluster_current_host->get_property('summary.runtime.inMaintenanceMode');

# See if the host is in maintenance mode
- if ($host_maintenance_mode eq 'true') {
+ if ($host_maintenance_mode eq 'false') {
# Get the overall CPU used by the current host
my $cluster_cpu_usage_current_host = $cluster_current_host->get_property('summary.quickStats.overallCpuUsage');
# Convert the $cluster_cpu_usage_current_host to SI
@@ -726,7 +726,7 @@
# Get how many cores this host has
$cpu_cores_available = $cpu_cores_available + $cluster_current_host->get_property('summary.hardware.numCpuCores');

- } # End if ($host_maintenance_mode eq 'true') {
+ } # End if ($host_maintenance_mode eq 'false') {
} # End if ($host_uptime_state_flag == 0) {
} # End if ($host_connection_state_flag == 0) {
} # End foreach (@{$cluster_hosts}) {
@@ -1181,7 +1181,7 @@
my $host_maintenance_mode = $cluster_current_host->get_property('summary.runtime.inMaintenanceMode');

# See if the host is in maintenance mode
- if ($host_maintenance_mode eq 'true') {
+ if ($host_maintenance_mode eq 'false') {
# Get the overall memory used by the current host
my $cluster_memory_usage_current_host = $cluster_current_host->get_property('summary.quickStats.overallMemoryUsage');
# Convert the $cluster_memory_usage_current_host to SI
@@ -1189,7 +1189,7 @@

# Add this to cluster_memory_usage
$cluster_memory_usage = $cluster_memory_usage + $cluster_memory_usage_current_host;
- } # End if ($host_maintenance_mode eq 'true') {
+ } # End if ($host_maintenance_mode eq 'false') {
} # End if ($host_uptime_state_flag == 0) {
} # End if ($host_connection_state_flag == 0) {
} # End foreach (@{$cluster_hosts}) {
@@ -4167,7 +4167,7 @@
} # End if (defined($host_issues)) {

# See if the host is in maintenance mode
- if ($host_maintenance_mode eq 'false') {
+ if ($host_maintenance_mode eq 'true') {
$exit_message = Build_Exit_Message('Exit', $exit_message, 'Host in Maintenance Mode');
$exit_state = Build_Exit_State($exit_state, 'OK');
$host_status_flag = 1;
Owner's reply

Thanks for that ssmiller_gfsu, these have all been fixed in release 2014-05-10.