This isn’t a full formed post, more of an inline thinking thing. Apologies for the lack of rigour.
I have been digging through a Data Centre Interconnect project where I need to deliver Layer 2 and Layer 3 connectivity between Data Centre sites. One of the Cisco Design Guides for VPLS clearly refers to several design options that use EEM to detect an upstream failure and then make configuration changes that activate redundancy configuration.
So far so good.
Not So Good
However, EEM is a low priority process of IOS and could easily be affected during a critical event. If that critical events caused a melt-down of the configuration such as a broadcast storm, spanning tree loop gone wrong, or pseudowire failure that induced a loop that meant that the CPU resource might not be available to for EEM script to run. If the EEM script doesn’t execute the failover doesn’t occur. Worse, the excession event (( Excession – 1 definition – something so technologically superior that it appears as magic to the viewer )) is likely to remain in effect since some other rectifying action has not (and cannot) take place.
Not so good.
I don’t feel in control of this process and would rather rely on Spanning Tree or MPLS LDP to reconverge in this sort of event rather than a subsystem that doesn’t seem to be very important to Cisco.
There are some mitigations such as implementing CPU Hog protection, Control Plane Policing, Event Damping, Syslog rate limiting etc but I don’t feel comfortable about these processes at all. They require strong configuration discipline within the operational team and that doesn’t normally happen as a recurrent activity. I might get good discipline for a year or so, but long term ? Not so likely eh ?
I like the idea of Embedded Event Manager, and the possibilities that it offers for flexibility and enhanced capabilities. The question is whether EEM is ready for highly critical services or suitable for some twiddling around the edge of my network where it’s not so vital ?
I attended a Cisco EEM session at Networkers in 2009 and was mightily impressed. But some comments during the session indicated that Cisco doesn’t take EEM seriously. If Cisco can’t, then it’s hard for me to put it into the core of my network.
So, for now, EEM can’t be trusted. Why and How Cisco put it into these Design Guides as a Validated Solution is a mystery because I can’t see how it’s reliable.