Background =============================================================================== BGP is a fairly hairy medium sized beast which can be hard for new admins to handle. This script was developed to catch some of the problems that can burn an admin when using and monitoring BGP. Several different things are checked to determine the health and status of the BGP connections with neighbors. Often these problems will be made obvious when other services start to fail because of BGP, but this script is meant to help catch BGP problems before they cause larger problems in the network. This script is written for BGP4 on a Cisco router, but it could easily be modified to use with other vendors if the correct OIDs were identified. Architecture =============================================================================== The check_bgp_neighbors script checks 4 different aspects of BGP. 1. BGP Connection Status - This is a basic check which relies on the internal implementation of BGP by Cisco to determine the BGP connection status. This is read from a specific MIB on the Cisco router. 2. Number of prefixes in memory - This value is polled for each BGP neighbor from a specific Cisco MIB. In older books this value is usually polled through remote commands, but now it can be retrieved over snmp. This determines the total number of prefixes in memory which have been recieved from each neighbor. 3. BGP Messages received during the nagios polling period - This is counter of the total number of BGP specific messages received from each neighbor. The current value is always calculated by saving the last value and subtracting it from the current value. This provides a differential number which can be averaged. 4. BGP Messages sent during the nagios polling period - This is a counter of the total number of BGP specific messages sent to each neighbor. The current value is always calculated by saving the last value and subtracting it from the current value. This provides a differential number which can be averaged. Installation =============================================================================== The following is an example installation for people that may not be completely comfortable creating new nagios commands/checks. 1. Copy this script to your nagios server e.g. /usr/local/nagios/libexec/ 2. Add nagios command definition like below # 'check_bgp_all' command definition, follow redirects define command{ command_name check_bgp_all command_line $USER1$/check_bgp_neighbors -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ -n $ARG2$ } 3. Optional: Add a nagios hostgroup like example below # Associated in svc-bgp.cfg define hostgroup{ hostgroup_name svc-bgp1 alias BGP Check 1 } # Associated in svc-bgp.cfg define hostgroup{ hostgroup_name svc-bgp2 alias BGP Check 2 } 4. Optional: Add a specific file with the host checks. The first 10.0.0.1 ip address is the eBGP neighor, the 172.16.0.2 ip address is the iBGP neighbor. The 172.16.0.0/12 network is what connects the two local iBGP peers together. define service{ use server-service hostgroup_name svc-bgp1 service_description BGP Check 1 check_command check_bgp_all!10.0.0.1!172.16.0.2 } define service{ use server-service hostgroup_name svc-bgp2 service_description BGP Check 2 check_command check_bgp_all!192.168.0.1!172.16.0.1 } 5. Optional: Finally, add the service check to the host definitions for your routers. define host{ use network-host host_name router1 hostgroups svc-bgp1 } define host{ use network-host host_name router2 hostgroups svc-bgp2 } Tuning & Tips =============================================================================== 1. We have found that a 2 minute nagios polling period works fairly well 2. As of the time of this writing doing eBGP with full tables should put about 230K-250K prefixes in memory for each peer, so the total with one eBGP and one iBGP partner will be about 500K. 3. Set the thresholds low enough for prefixes, rx messages, and tx messages sufficiently low enough to not be paged during low update times. Having a threshold even if low will at least determine if you are talking to your nieghbors. This has burned us before because we had prefixes in memory, but they slowly dwindled because we were not receiving messages even though the BGP connection was not flaggin down to our eBGP neighbor. 4. Graphing these OIDs in Cacti or MRTG can provide you with good baselines, which can be used to set lower bounds for tx messages, rx messages, and number or prefixes in memory.