My Multicast Packet Generator
Being a consultant I have the pleasure of being exposed to many customer networks, each network has its own challenges and each is continually growing with new packet generating devices deployed. Some packet generator are more important than others and indeed while a packet is just a packet for us network people if the packets stop moving around the network the way they should it can be a great cause for alarm. I have a customer with such a packet generating devices and it chucks out a 600k udp multicast stream (a single UDP payload) every 5 seconds and if this screws up for any reason then the application is not happy and hundreds of users are very upset. If the packets stop for any length of time then a Severity 1 incident is raised and the customer needs an ass to kick.
Note: We had to have the servers MTU size reduce to 1470 so that when the packet hit the MPLS it would fit through the GRE tunnel for multicast. Without this the packets were fragmented and process switched until the CPU had enough and started to drop packets and the application just canít handle the packet loss.
My #1 Rule of Network Monitoring
One of the best pieces of advice I can give about network monitoring is if you can detect the an issue before the customer comes to you with it, and know you are already working on it is so much better than them coming to you and letting you know you have a problem. If I was writing 10 rules of network support #1 would be:
- Know about an issue before the customer realizes they have an issue.
SNMP+Perl (or†anything†for that matter)
So I have a customer with this multicast app which means the “ip mroute” table is very important to them and the NOC would “show ip mroute” on a daily basis for key sites to ensure the presence on the S,G route needed for this multicast application to function. Recently we had the issue where the customer was not best pleased to find the application not working, and sure when the NOC checked the “show ip mroute” the S,G was missing, now we are in reactive mode, which is never a nice place to be, anyway this was resolve but there was no out of the box tools (free) which could help.
Now we use Nagios extensively for monitoring the status of the network and find it a fabulous tool for providing current network status and it has the ability to be expanded by custom to include your own script/plugins. We also use other tools for more indepth analysis of issues e.g. Ciscoworks etc. Nagios has built in scripts using Perl, therefore it is easy for me to steal existing code using the same language, but generally I would use anything to hand.
SNMP knows stuff
I had no idea if SNMP could help me here but having a programming backgroud I already had the outline of a program in my head.
- Connect to device
- Read multicast information
- Detect the S,G route
- If present then report OK
- If not present then report Problem
So first step connect to device I can already steal from existing code snippet off the web or on the nagios server.
Read multicast information ? Ok lets google ìcisco multicast mibî I find some results but they mean not a jot, so take some OIDís and use snmpwalk. I use a router I know has the S,G route so I look for entries that corresponded the the S or the G. Eventually I find an OID which does not necessarily equate directly to the “show ip mroute” command, but does show whether or not the S,G is present which is all I need.
For me now I can plug this into Nagios which will automatically check for the presence of this specific item. It also reduced the morning check for the NOC as they only need to check the status screen and not logon to the router.
Final Thought
So now we will know if the multicast is broke and the NOC can start escalating before the customer calls.I hope this is useful not for how to monitor multicast but to highlight my #1 rule of network support ìKnow about an issue before the customer realizes they have an issue.î and to show you that if you want to monitor something you donít need to go off and buy a tool, make it yourself.
Here is the code and the nagios command configuration just in case you want to use it yourself :
#!/usr/bin/perl -w
# Author: John McManus
# Email:
use SNMP;
use Switch;
$hostname = $ARGV[0];
$community = $ARGV[1];
$sourceserver = $ARGV[2];
$mroute= $ARGV[3];
#ciscoIpMRouteNextHopOutLimit
$oid_MCAST_info † † † † † † † †= "1.3.6.1.4.1.9.10.2.1.1.3.1.9";
#print "Community:$community n";
#print "ip:$hostname n";
#print "oid:$oid_MCAST_info n";
#print "mroute:$mroute n";
#print "sourceserver:$sourceserver n";
$mcastSG=$mroute . "." . $sourceserver ;
#print "sg:$mcastSG n";
#print "Searching for S,G : ($sourceserver,$mroute) nn";
@SGEntries= `snmpwalk -v 2c -c $community †$hostname $oid_MCAST_info | grep $mcastSG`;
chomp(@SGEntries);
#foreach (@SGEntries) {
# †print "$_n";
# †}
#print "size: " . @SGEntries .".n";
#foreach (@ipaddress) {
# †print "$_n";
# † †}
#exit (2);
# Check state and output
switch (@SGEntries) {
case †1 {
print "OK: †Multicast Route Good $hostname has SG route ($sourceserver,$mroute) n";
exit(0);
}
case †0 {
print "CRITICAL: †Multicast Route MISSING $hostname has NO SG route ($sourceserver,$mroute) n";
exit(2);
}
else {
print "WARNING: †Multicast Route in UNKNOWN State please check router, look like multiple SG entries Good $hostname has SG route ($sourceserver,$mroute) n";
exit(1);
}
}
Here is the command.cfg for nagios
define command{
command_name † †check_MCAST_SGRoute
command_line † †$USER1$/check_MCAST_SGRoute.pl $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$
}
Here is the switch.cfg for nagios
define service{
use † † † † † † † † † † generic-service ; Inherit values from a template
host_name † † † † † † † SiteA_1002_001P,Siteb_3825_001P,SiteC_3745
service_description † † CheckMulticastRoute
normal_check_interval † 1 † † † † † † † ; Check the service every 5 minutes under nor
retry_check_interval † †0.5 † † † † † † † ; Re-check the service every minute until its
check_command † † † † † check_MCAST_SGRoute!verysecret!10.16.1.141!239.0.3.22
contact_groups †NOC
servicegroups † † † † † CheckMulticastRoute
}