Search Exchange

Search All Sites

Nagios Live Webinars

Let our experts show you how Nagios can help your organization.

Contact Us

Phone: 1-888-NAGIOS-1
Email: sales@nagios.com

Login

Remember Me

Directory Tree

box293_check_vmware

Rating
34 votes
Favoured:
8
Current Version
2016-10-02
Last Release Date
2016-10-02
Compatible With
  • Nagios 3.x
  • Nagios 4.x
  • Nagios XI
Owner
License
GPL
Hits
93276
Files:
FileDescription
box293_check_vmware.zipPlugin and Manual
Manual.pdfManual
Network Monitoring Software - Download Nagios XI
Log Management Software - Nagios Log Server - Download
Netflow Analysis Software - Nagios Network Analyzer - Download
This Plugin allows you to monitor a VMware vCenter / ESX(i) environment using your Nagios monitoring solution.

IMPORTANT:
This Plugin is NOT designed to be run on your Nagios host, instead it is offloaded to the VMware vSphere Management Assistant (vMA). This is due to some performance issues that occur with the VMware SDK which can easily overload your Nagios host. While the vMA has been depreciated by VMware, it is still available for download. I am looking at making a similar appliance available to replace the vMA. Rest assured an easy solution will be made available.

How all of this works is explained in the manual including full instructions to get you up and running as quickly as possible.

The manual is included with the plugin.

The plugin allows you to monitor the following:
Cluster_CPU_Usage
Cluster_DRS_Status
Cluster_EVC_Status
Cluster_HA_Status
Cluster_Memory_Usage
Cluster_Resource_Info
Cluster_Swapfile_Status
Cluster_Time_Drift
Cluster_vMotion_Info
Datastore_Cluster_Status
Datastore_Cluster_Usage
Datastore_Performance
Datastore_Performance_Overall
Datastore_Usage
Guest_CPU_Info
Guest_CPU_Usage
Guest_Disk_Performance
Guest_Disk_Usage
Guest_Host
Guest_Memory_Info
Guest_Memory_Usage
Guest_NIC_Usage
Guest_Snapshot
Guest_Status
Host_CPU_Info
Host_CPU_Usage
Host_License_Status
Host_Memory_Usage
Host_OS_Name_Version
Host_pNIC_Status
Host_pNIC_Usage
Host_Service
Host_Status
Host_Storage_Adapter_Info
Host_Storage_Adapter_Performance
Host_Switch_Status
Host_Up_Down_State
Host_vNIC_Status
Tasks_Events
vCenter_License_Status
vCenter_Name_Version
To Do / Wish List
Here is a list of items that are going to be addressed sometime in the future:
* Look at the viability of checking the internal disks of the guest operating systems
* A host swap usage check
* For Guest_Snapshot check, only show snapshots that exceed the defined thresholds (instead of all snapshots)
* For Guest_Snapshot check, the snaphot generating a critical should be at the beginning of the service output
* Allow the Datastore_Usage check to work for all datastores instead of specifically needing to define them
* Allow Cluster_DRS_Status to return a warning instead of a critical for things like DPM state?
* CPU ready (and CPU load) of ESXi servers
* vCenter top-level alarms
* Performance data output without units. Instead of '23GHz' output '23'.
* For the Host_Status check see if the name of the problem can be included in the status
* Host_pNIC_Status / Host_pNIC_Usage - Option to to skip pNIC not added to any vSwitch
* Make it possible to split the Guest_Status in different "subchecks" (Tools, GuestIp, etc.)
* New check(s) "Cluster_Datastore_*" .. list all Datastore in a specific cluster, like Datastore_* checks today, but without having to specify datastore names manually

NOTE: new suggestions are added to the bottom of the list and priority is given to the top list items.


I have a mailing list that I will send an email to when I update this plugin. This way you can find out as soon as a new version of this plugin is available.
To Subscribe:
* Send an email to updates+subscribe@box293.com
* You will receive an email with a link you need to follow to create a subscription request
* Click the link to open it in a web browser
* You will need to type your email address and click submit
* You will receive another email with a link you need to follow to complete your subscription
* You will now be subscribed!
* Check your spam folder if the emails are not received


Twitter: @Box293

Version Notes:
2014-04-15
* Offical release version

2014-05-07
* Fixed bug where hosts were incorrectly reporting they are in Maintenance Mode (reported by Marvin Holze and Steven Miller)
* Added functions for upcoming Nagios XI Wizard

2014-05-09
* Fixed bug in Cluster_Memory_Usage check where the Memory Used was not being correctly reported (reported by Vitaly Burshteyn). This also affected the Cluster_Resource_Info check.

2014-05-10
* Fixed bug in Cluster_CPU_Usage check where the CPU Used was not being correctly reported (reported by Vitaly Burshteyn). This also affected the Cluster_Resource_Info check.

2014-08-24
* Improved debugging, creates a debugging file when in debug mode
* All checks that output performance data now have the name of the check appended to the end of the performance data surrounded by square brackets. This makes the use of templates in PNP easy
* Fixed bug in Host_pNIC_Status where the incorrect amount of pNICs were being calculated when specifying which pNICs to check
* Fixed bug in Host_pNIC_Status where --nic_state was not correctly triggering a CRITICAL state
* Fixed bug in Host_pNIC_Status where the phrase "NOT Connected" was appearing twice on a disconnected pNIC
* Fixed bug with Host_Switch_Status check, only the first switch was being reported and would not find more than one switch if the host had more than one
* Fixed bug with Guest_Disk_Usage where the "Disk Usage" was reported as 0 when the guest had snapshots
* Added a Version argument to report the plugin version
* Added check Guest_Status which reports on Power State, Uptime, VMware Tools Version and Status, IP Address, Hostname, ESX(i) Host Guest Is Running On, Consolidation State and Guest Version

2014-12-13
* Added option AlwaysOK for drs_automation_level so the check will always return an OK state (requested by Willem D’Haese)
* Added option AlwaysOK for drs_dpm_level so the check will always return an OK state (requested by Willem D’Haese)
* Added option AlwaysOK for ha_host_monitoring so the check will always return an OK state
* Added option AlwaysOK for ha_admission_control so the check will always return an OK state
* Added check Datastore_Performance_Overall which will return the Datastore Performance for ALL connected hosts to the datastore (requested by Willem D’Haese)
* Added check Datastore_Cluster_Usage (requested by snapon_admin)
* Added check Datastore_Cluster_Status (requested by snapon_admin)
* Updated the Nagios XI Wizard checks List_Datastores, List_Guest, List_Hosts and List_vCenter_Objects with improved encoding to allow UTF-8 characters (reported by DingGuo Xiao)
* Fixed bug in Datastore_Usage to limit the amount of decimal places returned for the Used Space value
* Fixed bug in certain checks like Guest_Snapshot where guests have special characters like a backslash (reported by Dennis Peere)
* Updated Host_Status checks to report Triggered Alarms and trigger warning and critical states if the alarms have not been acknowledged in vCenter (requested by Pierre-François Gallic, Ian Bergeron, Jacob Estrin, Brice Courault)
* Added argument --perfdata_option which allows you to disable the check name being appended to the end of the performance data string in square brackets, as some monitoring systems like Centreon do not like this (reported/requested by Bruno Guerpillon)

2015-01-29
* Fixed bug in Guest_CPU_Usage where high CPU usage could result in a negative free value
* Added --modifier argument to allow request and response data to be modified for Host and Guest checks (requested by Willem D’Haese). An exmaple how this is used: your Nagios host objects have the address serverxx.box293.local but they are named in the vCenter inventory as serverxx. The --modifier argument will allow you to remove the '.box293.local'. This allows for the use of more generic service definitions in Nagios which means less configurations required. Detailed examples are provided in the manual
* Added Guest_Host check for determining if the ESX(i) host the guest is running on matches the parent_hosts defined in Nagios (requested by Virgil Hoover and other attendees at the Nagios World Conference 2014). This check will work in conjunction with the upcoming box293_event_handler plugin to run on the Nagios host ... stay tuned!
* Added the --query_url, --query_username, --query_password and --service_status_info arguments to allow the plugin to query Nagios for checks like Guest_Host to determine Nagios parent object directive
* Added more debugging to the Nagios XI Wizard List_xxx checks
* --debug option will now show how long the plugin ran for
* All cluster checks now report the name of the cluster at the beginning of the status output (requested by Willem D’Haese)

2015-03-03
* Added argument --exclude_snapshot to be used with the Guest_Snapshot check. This allows you to exclude snapshots that contain specific text in the NAME of the snapshot (requested by Pierre-François Gallic)
* Changed --perfdata_option to allow you to specify what metrics you want the specific check to use / report on, applies to all checks that return performance data. See manual for full details for each check (requested by Bruno Guerpillon)
* Fixed bug in Cluster_HA_Status that caused check to fail when the Slot Size had been defined using vSphere Web Interface (reported by Daniel Vleeshakker)
* Re-fixed bug in Datastore_Usage to limit the amount of decimal places returned for the Used Space value
* Fixed bug in Datastore_Cluster_Usage to limit the amount of decimal places returned for the Used Space value
* Manual now recommends using the 'nice' command to execute box293_check_vmware. This makes the plugin execute at lower process schedule and makes the vMA more stable

2015-05-21
* Updated all host related checks to return an OK status IF the host is in Standby Mode. Specifically applies to the checks Datastore_Performance, Datastore_Performance_Overall, Host_CPU_Info, Host_CPU_Usage, Host_License_Status, Host_Memory_Usage, Host_OS_Name_Version, Host_pNIC_Status, Host_pNIC_Usage, Host_Status, Host_Storage_Adapter_Info, Host_Storage_Adapter_Performance, Host_Switch_Status, Host_vNIC_Status
* Created the check Host_Up_Down_State to be used as a host object check, helpful for hosts that are in Standby Mode and you don't want to be alerted about this as Standby Mode is normal behaviour. This check also introduced the argument --standby_exit_state which allows you to report a DOWN state if the host is in standby mode
* Standby checks Requested by Willem D'Haese and Hans Bos
* Fixed bug in Datastore_Cluster_Status check where it was not returning any output, reported by Luc Lesouef

2015-08-03
* Updated all checks to work with vSphere API versions v 4.0 onwards. Some features get introduced by VMware in different API releases and the plugin was not allowing for these differences. API problem reported by Andrea Setti. Specific checks updated are:
** Cluster_CPU_Usage
** Cluster_Memory_Usage
** Datastore_Cluster_Status (only valid in vSphere 5.0 onwards)
** Datastore_Cluster_Usage (only valid in vSphere 5.0 onwards)
** Guest_CPU_Info (# of cores only reported in vSphere 5.0 onwards, CPU Reservation only reported on directly connected ESXi hosts v 5.0 onwards ... via vCenter works for 4.0 onwards)
** Guest_CPU_Usage
** Guest_Disk_Performance
** Guest_Disk_Usage
** Guest_Memory_Info (Memory Reservation only reported on directly connected ESXi hosts v 5.0 onwards ... via vCenter works for 4.0 onwards)
** Guest_Memory_Usage
** Guest_NIC_Usage (Packets only reported for VMs running on ESXi hosts 5.0 onwards)
** Guest_Status (Uptime only reported for guests running on ESXi hosts 4.1 onwards, consolidation state only reported for VMs running on ESXi hosts 5.0 onwards)
** Host_CPU_Info
** Host_CPU_Usage
** Host_License_Status
** Host_Memory_Usage
** Host_OS_Name_Version
** Host_pNIC_Status
** Host_pNIC_Usage
** Host_Status
** Host_Storage_Adapter_Info
** Host_Storage_Adapter_Performance (will not work on hosts less than 4.1)
** Host_Switch_Status
** Host_Up_Down_State (no uptime or perfdata on hosts less than 4.1)
** Host_vNIC_Status
* Fixed List_Hosts check used by Nagios XI wizard so that it correctly detectsif a host has storage adapters or datastores. Reported by maddev
* Fixed some issues with guest consolidation detection
* Updated Guest_Status to alert if guestToolsNotRunning is detected, critical by default. Reported by Olivier Cheron

2016-03-24
* Fixed RAW disk mapping for Guest_Disk_Usage, reported by Wibo Lammerts.
* Fixed bug with Guest_Status check where the IP Address objects were not being accessed correctly. Reported by Sebastian Hutter and Peter Stavanja.
* Fixed bug where Guest Consolidation was not returning the correct exit state. Reported by Olivier Cheron, Richard Temple and Jonathan Young.
* Fixed a bug in the --debug option that was overwriting the debug log when the plugin reached the end.
* Updated --debug option so it would create the debug log file in the directory which the plugin is run from.
* Added Percentages as metric to check, inlcuding warning and critical thresholds. Optional and not included by default. Requested by Jeroen van Schelt. Applies to:
** Cluster_CPU_Usage
** Cluster_Memory_Usage
** Cluster_Resource_Info
** Datastore_Cluster_Usage
** Datastore_Usage
** Guest_CPU_Usage
** Guest_Disk_Usage
** Guest_Memory_Usage
** Host_CPU_Usage
** Host_Memory_Usage
* Added the ability to use a config file for storing plugin preferences, currently --concurrent_checks, --server, --timeout can now be defined in config file ~/.visdkrc as a way of reducing check command complexity. Refer to the manual on how to use the config file.
* Added check Tasks_Events to allow you to search the Tasks and Events and match/nomatch a string. Requested by Andrew Haynes.
* Fixed bug in List_Datastores check which was not correctly detecting if a datastore has an offline hosts, causing the Nagios XI wizard to report "No datastores found!". Reported by dlukinski.
* Updated Perfdata_Process fuction to better detect if the timestamp value exists. Reported by Alexander Golikov.
* Added error checking to correctly report username/password issues instead of 'returned status 1'.

2016-05-10
* Major performance improvments to the script due to switch statements being replaced with if/else statements.
** Expect a two fold decrease in CPU usage of the plugin and significantly reduced execution times.
** Plugin is now compatible with the Centreon Perl Connector (the reason behind the plugin overhaul). NOTE: Centreon Perl Connector is not officially supported by me, end user was responsible for overhauling the plugin to allow it to work with the Centreon Perl Connector.
** Plugin overhaul undertaken by CPF-Informatique.
* Datastore_Performance and Datastore_Performance_Overall checks have been improved, they now work with NFS datastores. Requested by Branislav7, Nicola Bianchi, David Beck, Christoph Leitl. Improvements performed by CPF-Informatique.
* Script now retries when communication with the VMware API fails and properly exits when it did not succeed. Default number of retries is 2. This helps preventing empty script outputs. Improvements performed by CPF-Informatique.
* Uncommented some code I had commented out for testing and forgot about, guest checks are now correctly optimized.
* Added new check vSphere_Desktop_License to query the Desktop Host Licenses and allow thresholds to be triggered for used or free. Requested by Jason Dunn.

2016-10-02
* Correct a bug where UP status was returned instead of WARNING/CRITICAL, and correct some undefined variables. Corrections performed by CPF-Informatique.
* Plugin now checks for for pipe symbols before performance data string and removes them if present, reported by Pavel Novotný when using the Host_Storage_Adapter_Performance check (fix applies to any check with performance data).
* Resolved some issues with Tasks_Events checks that ended up failing with "communication with the VMware API failed after 2 retries", reported by Yann Renard.
* Fixed bug where the plugin was not reporting that it could not find an object (like in the case where the end user incorrectly typed the object), it was instead reporting "communication with the VMware API failed after 2 retries".
* Fixed bug in Cluster_Memory_Usage if there were no hosts in the cluster.
* Replaced typographical quotes with normal quotes in pod help to stop wide character error, reported by Kent Johannessen and Sebastian Schneider.
* Fixed bug in Guest_CPU_Usage where status output was displaying the value for each core as the total value. This only occurred in the status output and NOT the performance data string, hence all existing collected performance data is valid.
* Fixed a bug in the --debug option that was overwriting the debug log when the plugin reached the end for the checks List_Datastore_Clusters, List_Datastores, List_Guests, List_Hosts, List_vCenter_Objects.
* Guest_Snapshot performance improvements when querying a larger amount of guests, code improvements supplied by Aaron Cheeseman.
* Added check Host_Service to check the services running on a Host, the startup policy or if they are running, requested by John Chivian.
* Added check Cluster_Time_Drift to check for a NTP time drift for all the hosts in a cluster, requested by John Chivian.
* Added some extra object testing in the Host_vNIC_Status check to prevent check from stalling and consuming 100% CPU, reported by Willem D’Haese.
Reviews (23)
Obviously a lot of work has gone into the implementation of this service check and the accompanying documentation. A qualified VMware and Nagios systems administrator will have no issue getting things up and running with relative ease.

In my case I am using a single Nagios server and a single VMA appliance (new for this purpose) to monitor two vCenter systems and the underlying clusters, hosts and guests.

I did initially have an issue in which some instances of the service check failed on one of the vCenter servers, but this was traced to a problem with the vCenter embedded database, and the issue "magically" resolved itself after a database rebuild.
byfusfeld, June 27, 2014
Plugin had a couple hiccups to get installed but the author was incredibly helpful and we were able to work past it. Still by far the best documented plugin i've seen, and works as advertised. Very very helpful.
byssmiller_gfsu, May 6, 2014
1 of 1 people found this review helpful
This plugin seems far more complete then most others. I found a few minor bugs with Cluster_Memory_Usage, Cluster_CPU_Usage, and Host_Status. The diff file will fix this as of 5/6/2014):

vi-admin@mpvmat:~> diff -u orig/box293_check_vmware.pl box293_check_vmware.pl
--- orig/box293_check_vmware.pl 2014-05-06 10:09:44.000000000 -0400
+++ box293_check_vmware.pl 2014-05-06 14:55:18.000000000 -0400
@@ -714,7 +714,7 @@
my $host_maintenance_mode = $cluster_current_host->get_property('summary.runtime.inMaintenanceMode');

# See if the host is in maintenance mode
- if ($host_maintenance_mode eq 'true') {
+ if ($host_maintenance_mode eq 'false') {
# Get the overall CPU used by the current host
my $cluster_cpu_usage_current_host = $cluster_current_host->get_property('summary.quickStats.overallCpuUsage');
# Convert the $cluster_cpu_usage_current_host to SI
@@ -726,7 +726,7 @@
# Get how many cores this host has
$cpu_cores_available = $cpu_cores_available + $cluster_current_host->get_property('summary.hardware.numCpuCores');

- } # End if ($host_maintenance_mode eq 'true') {
+ } # End if ($host_maintenance_mode eq 'false') {
} # End if ($host_uptime_state_flag == 0) {
} # End if ($host_connection_state_flag == 0) {
} # End foreach (@{$cluster_hosts}) {
@@ -1181,7 +1181,7 @@
my $host_maintenance_mode = $cluster_current_host->get_property('summary.runtime.inMaintenanceMode');

# See if the host is in maintenance mode
- if ($host_maintenance_mode eq 'true') {
+ if ($host_maintenance_mode eq 'false') {
# Get the overall memory used by the current host
my $cluster_memory_usage_current_host = $cluster_current_host->get_property('summary.quickStats.overallMemoryUsage');
# Convert the $cluster_memory_usage_current_host to SI
@@ -1189,7 +1189,7 @@

# Add this to cluster_memory_usage
$cluster_memory_usage = $cluster_memory_usage + $cluster_memory_usage_current_host;
- } # End if ($host_maintenance_mode eq 'true') {
+ } # End if ($host_maintenance_mode eq 'false') {
} # End if ($host_uptime_state_flag == 0) {
} # End if ($host_connection_state_flag == 0) {
} # End foreach (@{$cluster_hosts}) {
@@ -4167,7 +4167,7 @@
} # End if (defined($host_issues)) {

# See if the host is in maintenance mode
- if ($host_maintenance_mode eq 'false') {
+ if ($host_maintenance_mode eq 'true') {
$exit_message = Build_Exit_Message('Exit', $exit_message, 'Host in Maintenance Mode');
$exit_state = Build_Exit_State($exit_state, 'OK');
$host_status_flag = 1;
Owner's reply

Thanks for that ssmiller_gfsu, these have all been fixed in release 2014-05-10.

Page 2 of 2