Search Exchange
Search All Sites
Nagios Live Webinars
Let our experts show you how Nagios can help your organization.Login
Directory Tree
Directory
bmalynovytch
bybmalynovytch, August 2, 2012
This plugin works as expected.
Great job.
You'll find above a small patch of a modification I did to ignore "active, checking" states being identified as critical.
Every weeks all my servers using MD are being checked at night, which triggers tons of unwanted notifications about "raid recovery".
The patch also include a workaround to a wrong information (recovering) provided by mdadm with raid10 while in check state.
Regards,
Benjamin
--- check_md_raid 2012-08-02 12:31:25.900899840 +0200
+++ check_md_raid.new 2012-08-02 14:08:01.873932844 +0200
@@ -35,6 +35,9 @@
# Full path to the mdadm utility check on the Raid state
BIN = "/sbin/mdadm"
+SYNCACTION = "/sys/block/%s/md/sync_action"
+READLINK = "/bin/readlink"
+CAT = "/bin/cat"
def end(status, message):
"""exits the plugin with first arg as the return code and the second
@@ -119,7 +122,13 @@
# This happens when the array is under heavy usage but it's \
# normal and the array recovers within seconds
continue
- elif "recovering" in state:
+ elif "recovering" in state or "check" in state:
+ real_array_path = os.popen("%s -f %s " % (READLINK, array) ).readlines()[0].split()[0]
+ real_array_id = real_array_path.split("/")[-1]
+ real_state = os.popen( ( "%s " + SYNCACTION ) % (CAT,real_array_id) ).readlines()[0].split()[0]
+ if "check" in real_state:
+ message += 'Array "%s" is in state "checking", ' % shortname
+ continue
extra_info = None
for line in detailed_output:
if "Rebuild Status" in line:
@@ -141,8 +150,8 @@
message += 'Array %s is in state "%s" (%s), ' \
% (shortname, state, raidlevel)
status = CRITICAL
-
- message = message.rstrip(", ")
+ if not status == OK and message:
+ message = message.rstrip(", ")
if status == OK:
message += "All arrays OK"
Great job.
You'll find above a small patch of a modification I did to ignore "active, checking" states being identified as critical.
Every weeks all my servers using MD are being checked at night, which triggers tons of unwanted notifications about "raid recovery".
The patch also include a workaround to a wrong information (recovering) provided by mdadm with raid10 while in check state.
Regards,
Benjamin
--- check_md_raid 2012-08-02 12:31:25.900899840 +0200
+++ check_md_raid.new 2012-08-02 14:08:01.873932844 +0200
@@ -35,6 +35,9 @@
# Full path to the mdadm utility check on the Raid state
BIN = "/sbin/mdadm"
+SYNCACTION = "/sys/block/%s/md/sync_action"
+READLINK = "/bin/readlink"
+CAT = "/bin/cat"
def end(status, message):
"""exits the plugin with first arg as the return code and the second
@@ -119,7 +122,13 @@
# This happens when the array is under heavy usage but it's \
# normal and the array recovers within seconds
continue
- elif "recovering" in state:
+ elif "recovering" in state or "check" in state:
+ real_array_path = os.popen("%s -f %s " % (READLINK, array) ).readlines()[0].split()[0]
+ real_array_id = real_array_path.split("/")[-1]
+ real_state = os.popen( ( "%s " + SYNCACTION ) % (CAT,real_array_id) ).readlines()[0].split()[0]
+ if "check" in real_state:
+ message += 'Array "%s" is in state "checking", ' % shortname
+ continue
extra_info = None
for line in detailed_output:
if "Rebuild Status" in line:
@@ -141,8 +150,8 @@
message += 'Array %s is in state "%s" (%s), ' \
% (shortname, state, raidlevel)
status = CRITICAL
-
- message = message.rstrip(", ")
+ if not status == OK and message:
+ message = message.rstrip(", ")
if status == OK:
message += "All arrays OK"
Thank you for this excellent plugin !
You'll find below a patch of my own, allowing to ignore warnings if ad_num is different from number of registered slave.
In most cases, people would prefer being warned, but in my case, the same 802.3ad is bound on 2 different switches, generating 2x2 802.3ad, with one being "master", the 2 other links being "waiting" for a failure to become active.
This leads to being warned because 2 slaves over 4 seem to be missing in the active 802.3ad bonding, which is half true and half false.
I therefore don't wan't to be warned.
Regards,
Benjamin
--- check_linux_bonding.orig 2012-07-24 10:52:55.973316334 +0200
+++ check_linux_bonding 2012-07-24 11:10:44.681319464 +0200
@@ -78,6 +78,7 @@
-n, --no-bonding Alert level if no bonding interfaces found [ok]
--slave-down Alert level if a slave is down [warning]
--disable-sysfs Don't use sysfs (default), use procfs
+ --ignore-num-ad Don't warn if num_ad_ports != num_slaves
-b, --blacklist Blacklist failed interfaces
-d, --debug Debug output, reports everything
-h, --help Display this help text
@@ -110,6 +111,7 @@
'linebreak' => undef,
'verbose' => 0,
'disable_sysfs' => 0,
+ 'ignore_num_ad' => 0,
'slave_down' => 'warning',
);
@@ -124,6 +126,7 @@
'linebreak=s' => \$opt{linebreak},
'v|verbose' => \$opt{verbose},
'disable-sysfs' => \$opt{disable_sysfs},
+ 'ignore-num-ad' => \$opt{ignore_num_ad},
'slave-down=s' => \$opt{slave_down},
) or do { print $USAGE; exit $E_UNKNOWN };
@@ -490,7 +493,7 @@
$b, $bonding{$b}{mode};
report($msg, $E_CRITICAL);
}
- elsif (defined $bonding{$b}{ad_num} and $bonding{$b}{ad_num} != scalar keys %slave) {
+ elsif ($opt{ignore_num_ad} == 0 and defined $bonding{$b}{ad_num} and $bonding{$b}{ad_num} != scalar keys %slave) {
my $msg = sprintf 'Bonding interface %s [%s]: Number of AD ports (%d) does not equal the number of slaves (%d)',
$b, $bonding{$b}{mode}, $bonding{$b}{ad_num}, scalar keys %slave;
report($msg, $E_WARNING);
You'll find below a patch of my own, allowing to ignore warnings if ad_num is different from number of registered slave.
In most cases, people would prefer being warned, but in my case, the same 802.3ad is bound on 2 different switches, generating 2x2 802.3ad, with one being "master", the 2 other links being "waiting" for a failure to become active.
This leads to being warned because 2 slaves over 4 seem to be missing in the active 802.3ad bonding, which is half true and half false.
I therefore don't wan't to be warned.
Regards,
Benjamin
--- check_linux_bonding.orig 2012-07-24 10:52:55.973316334 +0200
+++ check_linux_bonding 2012-07-24 11:10:44.681319464 +0200
@@ -78,6 +78,7 @@
-n, --no-bonding Alert level if no bonding interfaces found [ok]
--slave-down Alert level if a slave is down [warning]
--disable-sysfs Don't use sysfs (default), use procfs
+ --ignore-num-ad Don't warn if num_ad_ports != num_slaves
-b, --blacklist Blacklist failed interfaces
-d, --debug Debug output, reports everything
-h, --help Display this help text
@@ -110,6 +111,7 @@
'linebreak' => undef,
'verbose' => 0,
'disable_sysfs' => 0,
+ 'ignore_num_ad' => 0,
'slave_down' => 'warning',
);
@@ -124,6 +126,7 @@
'linebreak=s' => \$opt{linebreak},
'v|verbose' => \$opt{verbose},
'disable-sysfs' => \$opt{disable_sysfs},
+ 'ignore-num-ad' => \$opt{ignore_num_ad},
'slave-down=s' => \$opt{slave_down},
) or do { print $USAGE; exit $E_UNKNOWN };
@@ -490,7 +493,7 @@
$b, $bonding{$b}{mode};
report($msg, $E_CRITICAL);
}
- elsif (defined $bonding{$b}{ad_num} and $bonding{$b}{ad_num} != scalar keys %slave) {
+ elsif ($opt{ignore_num_ad} == 0 and defined $bonding{$b}{ad_num} and $bonding{$b}{ad_num} != scalar keys %slave) {
my $msg = sprintf 'Bonding interface %s [%s]: Number of AD ports (%d) does not equal the number of slaves (%d)',
$b, $bonding{$b}{mode}, $bonding{$b}{ad_num}, scalar keys %slave;
report($msg, $E_WARNING);
Owner's reply
Thanks for the patch. It has been included in version 1.3.2 of the plugin.