Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
Check SMART status modified
Pre requisite : smartmontools >= 5.39 (for megaraid)
Warning bug : smartctl 5.9, http://sourceforge.net/apps/trac/smartmontools/ticket/46
It's a Kurt Yoder modified version of "check_smart" plugin for monitoring SCSI/SAS disks behind LSI MegaRAID controllers, also work with Dell controllers (PERC2/3/4/5/6).
This plugin does SMART monitoring both ATA and SCSI disks, has an easy usage syntax, and automatically produces perfdata for all applicable metrics.
To use, download file and make it executable (Linux/Unix: `chmod 755 check_smart`). Run with `./check_smart -h` to get usage information. Run with `--debug` flag to see exactly what it checks.
Tested under Nagios 3 and nrpe.
Warning : for testing only, I don't know anything on perl and it's the first time I modify a script.
Warning bug : smartctl 5.9, http://sourceforge.net/apps/trac/smartmontools/ticket/46
It's a Kurt Yoder modified version of "check_smart" plugin for monitoring SCSI/SAS disks behind LSI MegaRAID controllers, also work with Dell controllers (PERC2/3/4/5/6).
This plugin does SMART monitoring both ATA and SCSI disks, has an easy usage syntax, and automatically produces perfdata for all applicable metrics.
To use, download file and make it executable (Linux/Unix: `chmod 755 check_smart`). Run with `./check_smart -h` to get usage information. Run with `--debug` flag to see exactly what it checks.
Tested under Nagios 3 and nrpe.
Warning : for testing only, I don't know anything on perl and it's the first time I modify a script.
Reviews (14)
if you see error like:
UNKNOWN: No health status line found
Add to sudoers:
nagios ALL=(root) NOPASSWD: /usr/sbin/smartctl
UNKNOWN: No health status line found
Add to sudoers:
nagios ALL=(root) NOPASSWD: /usr/sbin/smartctl
byrafamiga, August 4, 2013
Here's the patch for SATA drives within the array [found on IBM x3630 M3 server]
--- check_smart.pl.orig 2013-08-05 12:54:09.409283146 +0200
+++ check_smart.pl 2013-08-05 12:57:24.580906887 +0200
@@ -7,7 +7,13 @@
# Changes and Modifications
# =========================
# Feb 3, 2009: Kurt Yoder - initial version of script 1.0
-# Jan 27, 2010: Philippe Genonceaux - modifications for compatibility with megaraid, use smartmontool version >= 5.39
+# Jan 27, 2010: Philippe Genonceaux - modifications for compatibility with
+# megaraid, use smartmontool version >= 5.39
+# Aug 5, 2013: Rafal Frühling - added --parse-as-ata switch to enable
+# checking of megaraid-compatible array drives which are SATA and
+# report ATA-like status [found on IBM x3630 M3 server with LSI Logic
+# / Symbios Logic MegaRAID SAS 2108 Liberator controller]
+#
# Add this line to /etc/sudoers: "nagios ALL=(root) NOPASSWD: /usr/sbin/smartctl"
use strict;
@@ -25,7 +31,7 @@
$ENV{'BASH_ENV'}='';
$ENV{'ENV'}='';
-use vars qw($opt_d $opt_debug $opt_h $opt_i $opt_n $opt_v);
+use vars qw($opt_d $opt_debug $opt_h $opt_i $opt_n $opt_v $opt_force_ata);
Getopt::Long::Configure('bundling');
GetOptions(
"debug" => \$opt_debug,
@@ -34,6 +40,7 @@
"i=s" => \$opt_i, "interface=s" => \$opt_i,
"n=s" => \$opt_n, "number=s" => \$opt_n,
"v" => \$opt_v, "version" => \$opt_v,
+ "parse-as-ata" => \$opt_force_ata,
);
if ($opt_v) {
@@ -107,7 +114,7 @@
my $line_str = 'SMART overall-health self-assessment test result: '; # ATA SMART line
my $ok_str = 'PASSED'; # ATA SMART OK string
-if ($interface eq 'megaraid'.",".$number or 'scsi'){
+if (!$opt_force_ata && ($interface eq 'megaraid'.",".$number or 'scsi')){
$line_str = 'SMART Health Status: '; # SCSI OR MEGARAID SMART line
$ok_str = 'OK'; #SCSI OR MEGARAID SMART OK string
}
@@ -202,7 +209,7 @@
my @perfdata = qw//;
# separate metric-gathering and output analysis for ATA vs SCSI SMART output
-if ($interface eq 'ata'){
+if ($interface eq 'ata' || $opt_force_ata){
foreach my $line(@output){
# get lines that look like this:
# 9 Power_On_Minutes 0x0032 241 241 000 Old_age Always - 113h+12m
--- check_smart.pl.orig 2013-08-05 12:54:09.409283146 +0200
+++ check_smart.pl 2013-08-05 12:57:24.580906887 +0200
@@ -7,7 +7,13 @@
# Changes and Modifications
# =========================
# Feb 3, 2009: Kurt Yoder - initial version of script 1.0
-# Jan 27, 2010: Philippe Genonceaux - modifications for compatibility with megaraid, use smartmontool version >= 5.39
+# Jan 27, 2010: Philippe Genonceaux - modifications for compatibility with
+# megaraid, use smartmontool version >= 5.39
+# Aug 5, 2013: Rafal Frühling - added --parse-as-ata switch to enable
+# checking of megaraid-compatible array drives which are SATA and
+# report ATA-like status [found on IBM x3630 M3 server with LSI Logic
+# / Symbios Logic MegaRAID SAS 2108 Liberator controller]
+#
# Add this line to /etc/sudoers: "nagios ALL=(root) NOPASSWD: /usr/sbin/smartctl"
use strict;
@@ -25,7 +31,7 @@
$ENV{'BASH_ENV'}='';
$ENV{'ENV'}='';
-use vars qw($opt_d $opt_debug $opt_h $opt_i $opt_n $opt_v);
+use vars qw($opt_d $opt_debug $opt_h $opt_i $opt_n $opt_v $opt_force_ata);
Getopt::Long::Configure('bundling');
GetOptions(
"debug" => \$opt_debug,
@@ -34,6 +40,7 @@
"i=s" => \$opt_i, "interface=s" => \$opt_i,
"n=s" => \$opt_n, "number=s" => \$opt_n,
"v" => \$opt_v, "version" => \$opt_v,
+ "parse-as-ata" => \$opt_force_ata,
);
if ($opt_v) {
@@ -107,7 +114,7 @@
my $line_str = 'SMART overall-health self-assessment test result: '; # ATA SMART line
my $ok_str = 'PASSED'; # ATA SMART OK string
-if ($interface eq 'megaraid'.",".$number or 'scsi'){
+if (!$opt_force_ata && ($interface eq 'megaraid'.",".$number or 'scsi')){
$line_str = 'SMART Health Status: '; # SCSI OR MEGARAID SMART line
$ok_str = 'OK'; #SCSI OR MEGARAID SMART OK string
}
@@ -202,7 +209,7 @@
my @perfdata = qw//;
# separate metric-gathering and output analysis for ATA vs SCSI SMART output
-if ($interface eq 'ata'){
+if ($interface eq 'ata' || $opt_force_ata){
foreach my $line(@output){
# get lines that look like this:
# 9 Power_On_Minutes 0x0032 241 241 000 Old_age Always - 113h+12m
byslugsshell, December 19, 2012
a) Package "sudo" is required
b) Dangerous settings in /etc/sudoers are required
b) User "nagios" needs shell like /bin/bash
c) Selinux complaints, gosh why could that be xD
d) Script it self needs fiddling until it works
Don't use this script if you don't want to compromise your system security!
Try using the official check_ide_smart plugin!
Sorry, only my opinion.
Anyway thanks to the creator of this script for investing time.
b) Dangerous settings in /etc/sudoers are required
b) User "nagios" needs shell like /bin/bash
c) Selinux complaints, gosh why could that be xD
d) Script it self needs fiddling until it works
Don't use this script if you don't want to compromise your system security!
Try using the official check_ide_smart plugin!
Sorry, only my opinion.
Anyway thanks to the creator of this script for investing time.
byMiron, November 23, 2012
If you receive NRPE error "UNKNOWN: No health status line found", check that you have added "/usr/bin/sudo " into nrpe.cfg before command like this:
command[check_smart]=/usr/bin/sudo /usr/local/nagios/libexec/check_smart -i $ARG1$ -d $ARG2$
And now in the file /etc/sudoers at the end add line:
nagios ALL=(ALL) NOPASSWD:/usr/local/nagios/libexec/check_smart
now try:
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_smart -a ata /dev/sda
and you will got:
OK: no SMART errors detected|Raw_Read_Error_............
:)
command[check_smart]=/usr/bin/sudo /usr/local/nagios/libexec/check_smart -i $ARG1$ -d $ARG2$
And now in the file /etc/sudoers at the end add line:
nagios ALL=(ALL) NOPASSWD:/usr/local/nagios/libexec/check_smart
now try:
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_smart -a ata /dev/sda
and you will got:
OK: no SMART errors detected|Raw_Read_Error_............
:)
byjohn.barlow@guru.com.au, September 30, 2012
To handle SAT interface drives, modify line 65 to:
if(grep {$opt_i eq $_} ('ata', 'sat', 'scsi', 'megaraid')){
Cool - now trying to set up graphs (anybody got any examples ??)
if(grep {$opt_i eq $_} ('ata', 'sat', 'scsi', 'megaraid')){
Cool - now trying to set up graphs (anybody got any examples ??)
Always check if SELINUX is disabled. I was getting "NRPE: Unable to read output" until figured out that SELINUX is enabled. After disabling SELINUX, plugin works just fine!
Requires a little bit of fiddling.
If you get a No Health Status Line error with NRPE
there is a fix for this. What I did to resolve this issues was go to /etc/passwd
My nagios user was :
nagios:x:108:119::/var/lib/nagios:/bin/false
I changed it to :
nagios:x:108:119::/var/lib/nagios:/bin/sh
And now it's working like a charm. Hopefully this helps some one. It took me forever to figure out why It wasn't working auto-magically, but worked when I ran the command manually.
If you get a No Health Status Line error with NRPE
there is a fix for this. What I did to resolve this issues was go to /etc/passwd
My nagios user was :
nagios:x:108:119::/var/lib/nagios:/bin/false
I changed it to :
nagios:x:108:119::/var/lib/nagios:/bin/sh
And now it's working like a charm. Hopefully this helps some one. It took me forever to figure out why It wasn't working auto-magically, but worked when I ran the command manually.
byufreier, February 18, 2012
Hi,
first of all many thanks for your effort in modifying this plugin to make it useable with megaraid controlers. Could it be that the plugin might be confused sometimes about its own results?
In my RAID1 on an PERC/5 on the command line
check_smart -d /dev/sda -n [0|1] -i megaraid --debug
produces right results of the smart's status of both HDDs on the screen, so far the script works completely okay. But as the final results I get:
##################################################
(debug) FINAL STATUS: UNKNOWN
##################################################
(debug) final status/output:
UNKNOWN: No health status line found|
although both HDDs are okay according to the other results of check_smartd.
Any help would be appreciated.
first of all many thanks for your effort in modifying this plugin to make it useable with megaraid controlers. Could it be that the plugin might be confused sometimes about its own results?
In my RAID1 on an PERC/5 on the command line
check_smart -d /dev/sda -n [0|1] -i megaraid --debug
produces right results of the smart's status of both HDDs on the screen, so far the script works completely okay. But as the final results I get:
##################################################
(debug) FINAL STATUS: UNKNOWN
##################################################
(debug) final status/output:
UNKNOWN: No health status line found|
although both HDDs are okay according to the other results of check_smartd.
Any help would be appreciated.
Actually using the scsi interface instead of the ata.
The caveat with the /etc/sudoers = check for and comment the "Defaults requiretty" line.
The caveat with the /etc/sudoers = check for and comment the "Defaults requiretty" line.
byMajed, August 29, 2011
first i was getting a sudo error so i installed sudo. then i had the line 110 error so i added the line if (!defined($number)) { $number = 0; } in 110 but i started to get unknown state in command line then i added the 2 line on 110 which are :
if (!defined($number)) { $number = 0; }
if ($interface eq 'megaraid'.",".$number or $interface eq 'scsi'){
so when i run:
./check_smart -i ata -d /dev/sda
i get:
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=3175 Start_Stop_Count=0 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=30537 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=37 Power-Off_Retract_Count=25 Load_Cycle_Count=37 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Multi_Zone_Error_Rate=0
but then when i define it in nrpe as:
command[check_smart]=/usr/local/nagios/libexec/check_smart -i ata -d /dev/sda
and in nagios as:
define service{
use local-service
host_name storage
service_description smart WD
check_command check_nrpe!check_smart!
notifications_enabled 1
i get as output in nagios:
critical: NRPE: Unable to read output
anyone had it work through nrpe?
if (!defined($number)) { $number = 0; }
if ($interface eq 'megaraid'.",".$number or $interface eq 'scsi'){
so when i run:
./check_smart -i ata -d /dev/sda
i get:
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=3175 Start_Stop_Count=0 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=30537 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=37 Power-Off_Retract_Count=25 Load_Cycle_Count=37 Temperature_Celsius=42 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Multi_Zone_Error_Rate=0
but then when i define it in nrpe as:
command[check_smart]=/usr/local/nagios/libexec/check_smart -i ata -d /dev/sda
and in nagios as:
define service{
use local-service
host_name storage
service_description smart WD
check_command check_nrpe!check_smart!
notifications_enabled 1
i get as output in nagios:
critical: NRPE: Unable to read output
anyone had it work through nrpe?
How do i configure for NRPE.
Nagios now tells me "UNKNOWN: No health status line found"
But when i run the command like:
/usr/local/nagios/libexec/check_smart -i ata -d /dev/sda1
I get:
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=5258 Start_Stop_Count=58 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=31969 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=58 Temperature_Celsius=28 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=17 Multi_Zone_Error_Rate=0
NRPE config:
command[check_smart]=/usr/local/nagios/libexec/check_smart -i ata -d $ARG1$
Nagios Service:
define service{
use generic-service
host_name XXXX
service_description S.M.A.R.T. /dev/sda1
check_command check_nrpe!check_smart!/dev/sda1
}
Nagios now tells me "UNKNOWN: No health status line found"
But when i run the command like:
/usr/local/nagios/libexec/check_smart -i ata -d /dev/sda1
I get:
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=5258 Start_Stop_Count=58 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=31969 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=58 Temperature_Celsius=28 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=17 Multi_Zone_Error_Rate=0
NRPE config:
command[check_smart]=/usr/local/nagios/libexec/check_smart -i ata -d $ARG1$
Nagios Service:
define service{
use generic-service
host_name XXXX
service_description S.M.A.R.T. /dev/sda1
check_command check_nrpe!check_smart!/dev/sda1
}
The plugin did not process the status for me correctly as it was always selecting the SCSI/MEGARAID line_str to match on due to a comparison error. Line 110 is as follows in the download:
if ($interface eq 'megaraid'.",".$number or 'scsi'){
Replace this line by the following 2 lines below:
if (!defined($number)) { $number = 0; }
if ($interface eq 'megaraid'.",".$number or $interface eq 'scsi'){
Aside from this the plugin works brilliant!
if ($interface eq 'megaraid'.",".$number or 'scsi'){
Replace this line by the following 2 lines below:
if (!defined($number)) { $number = 0; }
if ($interface eq 'megaraid'.",".$number or $interface eq 'scsi'){
Aside from this the plugin works brilliant!
Regarding the warning: just add the following line @110 :
if (!defined($number)) { $number = 0; }
if (!defined($number)) { $number = 0; }
Hi, There is a small bug in plugin. When I run it in RHEL with
./check_smart -d /dev/sda -i ata --debug
I get
Use of uninitialized value in concatenation (.) or string at ./check_smart line 110.
Quite obviously the problem is that $number variable in this line is not set. So to get it working I had to comment out the megaraid check. Everything else works great. Thanks
./check_smart -d /dev/sda -i ata --debug
I get
Use of uninitialized value in concatenation (.) or string at ./check_smart line 110.
Quite obviously the problem is that $number variable in this line is not set. So to get it working I had to comment out the megaraid check. Everything else works great. Thanks