Nagios Simple Automate Tools OVERVIEW: nsat is a nagios command wrapper distributed via GPL, which placed between nagios daemon and command to collect numerical data, and store it in file by rrdtools. after that, data accesiable from php script, which called self or from nagios frontend. sat distributed via GPL. REQUIERMENTS: apache php nagios perl rrdtools INSTALLATION ------------ DISTRUBUTION CONTENT: etc/ sat.rrd.cfg - config file. ------------------------------- this file needed, for determine type of running command by 'check_rrdv2', parse output, and correct work web frontend sat.pwd.cfg ----------- password file for hosts sat.cfg - enviroment variables config file, must be interpeted in perl and php. change if default path changed. look at comemnts ------- in. sat.php - php config processor, add some php specific variables. load and exectue sat.cfg file. ------- variables: 'color' - array that contain graphics colors. 'def' - array that contain source name. if you want to collect more than 5 data source per one file, u must add any entires to this dimension. $width,$height - size of created graphs libexec/ check_rrdv2 - main command wrapper. ----------- sat.func.pl - function library ---------- libexec/cisco cisco checkers cpu - cpu usag checker (snmp) memory - memory free checkr (snmp) intf-isdn-service - voip trunk's checker, layer2 status checker (rsh) intf-rsh - interface status checker (rsh) intf-snmp - interface status checker (snmp) ifindex - snmp ifTable collector, nedded for interface checker's tool check_nts - NSClient wrapper --------- wrapper with transparent authenticate phrase insertion share/ graph.php - scripts which return 'png' image created by posted arguments from sat.php -------- sat.php - main php frontend file. ------- HOWTO RUN: 1) copy all files to nagios directory. look carefuly, that all config file must be readable by nagios and web servers. than, if path not default (/usr/local/nagios), change variable NAGIOS and NAGETC at top of libexec/check_rrdv2 and share/sat.php files. 2) create directory 'sat/rrd' in nagios/var (default) or elsewhere. don't forget to correct config file. check that directory writable by nagios, and readable by apache. 3) insert in nagios config 'checkcommands.cfg' text like this: ------ define command{ command_name rrd command_line $USER1$/check_rrdv2 -h $HOSTADDRESS$ -c $ARG1$ -a $ARG2$ -e $ARG3$ -l $ARG4$ } ------ 4) enter in sat.rrd.cfg line for you command. config file format: example for my own command, which produce output onf interface status like this "State: OK (up/up), Rate 1.0/3.9 (500) , IErr/iCRC/OErr 0/0/0"; there is 1 and 2 digits is a tranfser rate, 3,4,5 input error rate { command check_intf_full!.* parser (\d+\.\d).(\d+\.\d).*(\d+).(\d+).(\d+) inter 300 legend Input/Kbps,Output/Kbps,iErr,iCRC,oERR extexec rsh HOST -l username show interface ARG graph 1,2!3,4,5 gperiod 6h,24h comment interface_stat } line begins with keyword, one from: perl regexp splited in two part by '!'. first - command name, passed with '-c'. second - argument passed with '-a'. perl regexp, aplying to output from executed command and collect data. every () is a one data source (DS). in this example we have 5 sources. interval in seconds when wrapper runing by nagios. here every 5 minits. notes to data source, splited by ',' ordered how 'parser' line adition command, runing by web frontend (WF) and printing with created diagrams. HOST substituted automaticaly by host ip, ARG by argument, RSHLOGIN - rshlogin parameter fron sat.pwd.cfg, COMMUNITY - community parameter. there is execute remote shell, and last cisco interface status retreiving. gdiagram definition. numbers of DS agregated to one diagram splited by commas, and groups splited by '!'. here two diagram defined: first contain input/output rate of interface, and second error rate on interface. rrdtool offset values. here every defined diagrams would be create twice: first with 6h period and second 24h period. comments on graphis 5) run check_rrdv2 from console under nagios uid. read carefuly help. add paqrameters and run again with arguments, check var/sat/rrd directory for .rrd file appears in hostip subdirectory. if any problem, run check_rrdv2 with flag -d, and read debug information. 6) if all ok, access share/sat.php from web browser. if script called withour para,eters, programm scan rrd directory and print all founded .rrd files. just click on it and graphiks contructed from selected file aproaching. 7) done. now enter text in nagios.cfg like this: ---------------- define service{ use generic-service ; Name of service template to us host_name your_hist_name service_description FastEthernet0/1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 3 active_checks_enabled 1 passive_checks_enabled 1 contact_groups nagios notification_interval 240 notification_period 24x7 notification_options w,u,c,r check_command rrd!your_command!arguments!extras!limits } -------------- look at last line. nagios will run check_rrdv2 command, and your original command and arguments passed to it. than check_rrdv2 executing your command as '$USER/your_command 127.0.0.1 argument', proceed output, and return to nagios all output and exit code. some notes about 'extras' etc. programm has a bug, which i cannot resolve. if you run set check_command for something like this: rrd!command!argument!!1:lt40:lt30 it wouldn't work. it's a bug in parameter mechanism. if you forced to set 'extras' argument to empty (you don't need this argument) set the '.' string in this position. rrdwrapper will detect this argument and correctly set it to empty. this work for 'argument' parameter too. so you cannot set later argument if anyone of previous is empty. some examples of workaround: rrd:command - OK rrd:command!argument!!limit - WRONG rrd:command!!!limit - WRONG rrd:command!.!.!limit - OK rrd!command!argument!.!limit - OK EXTENDED INFORMATION LINK you can add link to sat.php from nagios web interface by extended information nagios future. for exmaple: 'link' rrdfile generated from your_command_name, addedd '___' and argument with any substitute. look to 'strrpl' function in check_rrdv2 file. in version 0.2 and later you can call sat.php without rrdfile variable, but set 'command' to your own (in my check_intf_full) and argument (FastEthernet0/0) withous any conversion. if any trouble, uncoment the string and check generated and realy rrd filename. cisco devices ------------ in libexec/cisco placed some my script integrated in nsat architecture. if you use interface status check module, don't forget to add 'ifindex' service to nagios for same host. this module generate .idx file contained ifTable information, for decrease snmp traffic at all. i setup period to run about 12h, and execute it from nagios console if immidet;ly changes needed. for controlling cpu/memory limits just use universal limit checker in check_rrdv2 module. rsh with timeout support ------------ in hard loading network, rsh time to time halted for long time, it's deprectaed. inetutils-1.4.1.rsh.diff file contain diffs from rsh.c file for timeout support. just apply it to rsh.c fromt same packet and put rsh to nagios/libexec directory. default timeout set to 15 seconds, it enough dor my request. i'm not experinced in C/C++ so cannot make safe redefined timeout. Sparc/x86 architecture notice i have run this wrapper on sparc servers, and found the following strange thing. if wrapper execute external programm, than exitcode of this programm most greate than normal: so, if programm exit with code 1, wrapper got the '256', 2 -> 512 etc. may be it dependenced of reversed order hogh/low byte. so, if your have trouble with exitcode try look following line in check_rrdv2 script "uncoment this line for x86, and comment out next" and make targetted changes.. if any can explain how to native solve this problem - please ask to me. i haven't time to determine it, so make just workaround. Password manager since version 0.4.3 password manager implemented to rrdwrapper. now, you can set 'COMMUNITY' string anywhere in parameters, and if sat.pwd.cfg contain information about host - this string will be replaces with 'community' parameter from last. this inserted for NSClient plugin with authentication enabled, but maybe useful for other. for example: services.cfg ---------------- define service{ use generic-service ; Name of service template to us host_name ans-k16 service_description CPU is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 contact_groups me notification_interval 120 notification_period 24x7 notification_options w,u,c,r check_command rrd!check_nt!"-v CPULOAD"!"-l 5,80,90 -s COMMUNITY" } nagios will run check_rrdv2 .... -s COMMUNITY, and replace last string with 'community' variable for this host. TROUBLESHOOTING: i have no problems :) vlad f halilow. vfh _at_ ratel.ru any comments or questions by email. "Nagios is a registered trademark of Ethan Galstad."