Network Monitoring: SNMP knows stuff

My Multicast Packet Generator

Being a consultant I have the pleasure of being exposed to many customer networks, each network has its own challenges and each is continually growing with new packet generating devices deployed. Some packet generator are more important than others and indeed while a packet is just a packet for us network people if the packets stop moving around the network the way they should it can be a great cause for alarm. I have a customer with such a packet generating devices and it chucks out a 600k udp multicast stream (a single UDP payload) every 5 seconds and if this screws up for any reason then the application is not happy and hundreds of users are very upset. If the packets stop for any length of time then a Severity 1 incident is raised and the customer needs an ass to kick.

Note: We had to have the servers MTU size reduce to 1470 so that when the packet hit the MPLS it would fit through the GRE tunnel for multicast. Without this the packets were fragmented and process switched until the CPU had enough and started to drop packets and the application just canít handle the packet loss.

My #1 Rule of Network Monitoring

One of the best pieces of advice I can give about network monitoring is if you can detect the an issue before the customer comes to you with it, and know you are already working on it is so much better than them coming to you and letting you know you have a problem. If I was writing 10 rules of network support #1 would be:

  1. Know about an issue before the customer realizes they have an issue.

SNMP+Perl (or†anything†for that matter)

So I have a customer with this multicast app which means the “ip mroute” table is very important to them and the NOC would “show ip mroute” on a daily basis for key sites to ensure the presence on the S,G route needed for this multicast application to function. Recently we had the issue where the customer was not best pleased to find the application not working, and sure when the NOC checked the “show ip mroute” the S,G was missing, now we are in reactive mode, which is never a nice place to be, anyway this was resolve but there was no out of the box tools (free) which could help.

Now we use Nagios extensively for monitoring the status of the network and find it a fabulous tool for providing current network status and it has the ability to be expanded by custom to include your own script/plugins. We also use other tools for more indepth analysis of issues e.g. Ciscoworks etc. Nagios has built in scripts using Perl, therefore it is easy for me to steal existing code using the same language, but generally I would use anything to hand.

SNMP knows stuff

I had no idea if SNMP could help me here but having a programming backgroud I already had the outline of a program in my head.

  1. Connect to device
  2. Read multicast information
  3. Detect the S,G route
  4. If present then report OK
  5. If not present then report Problem

So first step connect to device I can already steal from existing code snippet off the web or on the nagios server.

Read multicast information ? Ok lets google ìcisco multicast mibî I find some results but they mean not a jot, so take some OIDís and use snmpwalk. I use a router I know has the S,G route so I look for entries that corresponded the the S or the G. Eventually I find an OID which does not necessarily equate directly to the “show ip mroute” command, but does show whether or not the S,G is present which is all I need.

For me now I can plug this into Nagios which will automatically check for the presence of this specific item. It also reduced the morning check for the NOC as they only need to check the status screen and not logon to the router.

Final Thought

So now we will know if the multicast is broke and the NOC can start escalating before the customer calls.I hope this is useful not for how to monitor multicast but to highlight my #1 rule of network support ìKnow about an issue before the customer realizes they have an issue.î and to show you that if you want to monitor something you donít need to go off and buy a tool, make it yourself.

Here is the code and the nagios command configuration just in case you want to use it yourself :

#!/usr/bin/perl -w
# Author: John McManus
# Email:
use SNMP;
use Switch;
$hostname = $ARGV[0];
$community = $ARGV[1];
$sourceserver = $ARGV[2];
$mroute= $ARGV[3];
#ciscoIpMRouteNextHopOutLimit
$oid_MCAST_info † † † † † † † †= "1.3.6.1.4.1.9.10.2.1.1.3.1.9";
#print "Community:$community n";
#print "ip:$hostname n";
#print "oid:$oid_MCAST_info n";
#print "mroute:$mroute n";
#print "sourceserver:$sourceserver n";
$mcastSG=$mroute . "." . $sourceserver ;
#print "sg:$mcastSG n";
#print "Searching for S,G : ($sourceserver,$mroute) nn";
@SGEntries= `snmpwalk -v 2c -c $community †$hostname $oid_MCAST_info | grep $mcastSG`;
chomp(@SGEntries);
#foreach (@SGEntries) {
# †print "$_n";
# †}
#print "size: " . @SGEntries .".n";
#foreach (@ipaddress) {
# †print "$_n";
# † †}
#exit (2);
# Check state and output
switch (@SGEntries) {
case †1 {
print "OK: †Multicast Route Good $hostname has SG route ($sourceserver,$mroute) n";
exit(0);
}
case †0 {
print "CRITICAL: †Multicast Route MISSING $hostname has NO SG route ($sourceserver,$mroute) n";
exit(2);
}
else {
print "WARNING: †Multicast Route in UNKNOWN State please check router, look like multiple SG entries Good $hostname has SG route ($sourceserver,$mroute) n";
exit(1);
}
}

Here is the command.cfg for nagios


define command{
command_name † †check_MCAST_SGRoute
command_line † †$USER1$/check_MCAST_SGRoute.pl $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$
}

Here is the switch.cfg for nagios


define service{
use † † † † † † † † † † generic-service ; Inherit values from a template
host_name † † † † † † † SiteA_1002_001P,Siteb_3825_001P,SiteC_3745
service_description † † CheckMulticastRoute
normal_check_interval † 1 † † † † † † † ; Check the service every 5 minutes under nor
retry_check_interval † †0.5 † † † † † † † ; Re-check the service every minute until its
check_command † † † † † check_MCAST_SGRoute!verysecret!10.16.1.141!239.0.3.22
contact_groups †NOC
servicegroups † † † † † CheckMulticastRoute
}

  • Kal

    Your blog reminded me of a presentation by the ISC folks (BIND and DHCP fame) to nanog a few years back. Found here http://www.isc.org/community/presentations under “Managing IP Networks with Free Software “. I’ve often found myself following what I think to be the best quote regarding network monitoring anywhere ” Do Somethingñ Store stuff, compare stuff, ping stuff, pipe stuff through awk”. It’s old and says things that sound obvious, but still very relevant.

    It’s why I always encourage aspiring network engineers to learn basic scripting. The power of an engineer with a few scripting tools can be truly awesome. Just make sure you document what you do :) And if you manage any kind of network filter or firewall, please learn to use NMAP.

  • Paul B

    Good stuff. I like the way Nagios emails me when a site is unreachable or not too. Would you have a preferred source for me to read up on snmpwalk, before I start wading through Google results? Thanks.

  • Serge

    SNMP should be your first choice for getting info from the router. If you run into a scenario where it can’t be used and you actually have to CLI to the router, use Rancid. It’s free and can automate the login/logout part of the logic. You can also tell Rancid what commands to run while it’s logged in (ex: show ip mroute). All you have to do then is call rancid from either a Unix shell script or perl (or others) and parse the data returned. If you’re route is not present, trigger an email.

    Serge

  • Pietro

    Hi John, GREAT SCRIPT !!!  I’ve been looking for something like this for ages and finally found it.  I had almost lost hope.
    Thanks very much for posting this script.  However I just have one slight problem and that is I don’t know what to do with the first script.
    Please let me explain… you’ve given clear instructions on the command.cfg and switch.cfg but I don’t know what to do with the first part of your script.
    Do I need to implement the script into a brand new file? (if so, what do I name it? and how do I get Nagios to run it?) or do I insert it into an existing nagios file?
    Your help would be greatly appreciated.
    Many Thanks.
    Pete

Subscribe For Weekly Updates by Email

Get a Weekly Summary of Latest Articles and Posts to your Email Inbox Every Sunday

Thanks for signing up. Look for the email from MailChimp & make sure you confirm your email address. You may need to check your spam or gmail settings to be sure of receiving the email.

Note: You can unsubscribe at any time using the link at the bottom of every email.