Build precise queries to find exactly what you need
Press ESC to close
@bmalynovytch
Favorites0
Views
Projects0
This plugin works as expected. Great job. You'll find above a small patch of a modification I did to ignore "active, checking" states being identified as critical. Every weeks all my servers using MD are being checked at night, which triggers tons of unwanted notifications about "raid recovery". The patch also include a workaround to a wrong information (recovering) provided by mdadm with raid10 while in check state. Regards, Benjamin --- check_md_raid 2012-08-02 12:31:25.900899840 +0200 +++ check_md_raid.new 2012-08-02 14:08:01.873932844 +0200 @@ -35,6 +35,9 @@ # Full path to the mdadm utility check on the Raid state BIN = "/sbin/mdadm" +SYNCACTION = "/sys/block/%s/md/sync_action" +READLINK = "/bin/readlink" +CAT = "/bin/cat" def end(status, message): """exits the plugin with first arg as the return code and the second @@ -119,7 +122,13 @@ # This happens when the array is under heavy usage but it's # normal and the array recovers within seconds continue - elif "recovering" in state: + elif "recovering" in state or "check" in state: + real_array_path = os.popen("%s -f %s " % (READLINK, array) ).readlines()[0].split()[0] + real_array_id = real_array_path.split("/")[-1] + real_state = os.popen( ( "%s " + SYNCACTION ) % (CAT,real_array_id) ).readlines()[0].split()[0] + if "check" in real_state: + message += 'Array "%s" is in state "checking", ' % shortname + continue extra_info = None for line in detailed_output: if "Rebuild Status" in line: @@ -141,8 +150,8 @@ message += 'Array %s is in state "%s" (%s), ' % (shortname, state, raidlevel) status = CRITICAL - - message = message.rstrip(", ") + if not status == OK and message: + message = message.rstrip(", ") if status == OK: message += "All arrays OK"
Reviewed 13 years ago
Thank you for this excellent plugin ! You'll find below a patch of my own, allowing to ignore warnings if ad_num is different from number of registered slave. In most cases, people would prefer being warned, but in my case, the same 802.3ad is bound on 2 different switches, generating 2x2 802.3ad, with one being "master", the 2 other links being "waiting" for a failure to become active. This leads to being warned because 2 slaves over 4 seem to be missing in the active 802.3ad bonding, which is half true and half false. I therefore don't wan't to be warned. Regards, Benjamin --- check_linux_bonding.orig 2012-07-24 10:52:55.973316334 +0200 +++ check_linux_bonding 2012-07-24 11:10:44.681319464 +0200 @@ -78,6 +78,7 @@ -n, --no-bonding Alert level if no bonding interfaces found [ok] --slave-down Alert level if a slave is down [warning] --disable-sysfs Don't use sysfs (default), use procfs + --ignore-num-ad Don't warn if num_ad_ports != num_slaves -b, --blacklist Blacklist failed interfaces -d, --debug Debug output, reports everything -h, --help Display this help text @@ -110,6 +111,7 @@ 'linebreak' => undef, 'verbose' => 0, 'disable_sysfs' => 0, + 'ignore_num_ad' => 0, 'slave_down' => 'warning', ); @@ -124,6 +126,7 @@ 'linebreak=s' => $opt{linebreak}, 'v|verbose' => $opt{verbose}, 'disable-sysfs' => $opt{disable_sysfs}, + 'ignore-num-ad' => $opt{ignore_num_ad}, 'slave-down=s' => $opt{slave_down}, ) or do { print $USAGE; exit $E_UNKNOWN }; @@ -490,7 +493,7 @@ $b, $bonding{$b}{mode}; report($msg, $E_CRITICAL); } - elsif (defined $bonding{$b}{ad_num} and $bonding{$b}{ad_num} != scalar keys %slave) { + elsif ($opt{ignore_num_ad} == 0 and defined $bonding{$b}{ad_num} and $bonding{$b}{ad_num} != scalar keys %slave) { my $msg = sprintf 'Bonding interface %s [%s]: Number of AD ports (%d) does not equal the number of slaves (%d)', $b, $bonding{$b}{mode}, $bonding{$b}{ad_num}, scalar keys %slave; report($msg, $E_WARNING);