I take the view that many people don’t appreciate Spanning Tree Protocol for it’s unique ability. It’s certainly a protocol designed in a different times and for different reasons. Today, STP has scalability problems and they are well explained in Ivan Pepelnjak’s Transparent Bridging (aka L2 Switching) Scalability Issues just this week.
There are very few mitigation techniques to solve the BUM problem and some of the current STP optimisations will break, or fail unexpectedly in large East/West network designs. For example, the use of Port Fast1 means that some traffic loops can occur before the BPDU loop is detected and ports are shutdown. Not common but it can happen and sometimes get out of control in very fast networks causing the usual STP meltdown. Thus, we still need to address the limits of spanning tree as a technology
The TRILL Effect
While there are a number of advantages to TRILL, the short term gain is to reduce the impact of the STP domains. If you can agree that a single, very large, STP domain is a problem then you should also agree that several smaller STP domains would be an improvement. Let’s assume that that a typical network would look something like this. We have a core of 6 switches with a sample of six access layer switches. One pair of the core are the root switches and cabling looks approximately like this:
Lets replace STP in the core with TRILL. Now that we have a loop free core, we can choose to create a full mesh ( or partial mesh according to your needs) thus the cabling between switches can be shown like so:
Lets add back the Access Layer switches to each of the TRILL core switches, and map the STP domains
This stylised diagram shows the impact of TRILL to reduce the size of the STP domain. Remember that there is no routing in this discussion, only switching at Layer 2 therefore poor technologies like VMware vMotion and Microsoft’s NLB will still work even if they are not connected to the same STP area.
Of course, there is one weakness here. Consider if someone connects two access switches in two different STP domains. It’s my understanding that TRILL will still handle this loop by interoperating with STP but I need to do some more research here before I could be confident about that.
Two Layer Model
The three layer switch model is well established using access / distribution / core layers. This was only necessary when silicon was slow and expensive. Today, two layers is more than enough to troubleshoot and maintain so don’t add more complexity, just keep it simple.
The EtherealMind View
It’s difficult to avoid STP meltdowns in certain scenarios but, with careful design and attention to detail on your STP enhancements you can make a very safe L2 networks. But, you can mitigate the impact of this risk by creating smaller STP domains and using a TRILL / SPB does exactly that.
Today, this design isn’t very practical because vendors want to charge a hefty extra premium for TRILL / SPB features. In my view, TRILL / SPB isn’t worth the price that vendors want to charge for almost all networks. Therefore I’d recommend waiting another year or two before committing. In the meantime, you can start discussion and planning around the future of Data Centre or Campus LAN and how you can take advantage of Equal Cost Multipathing in your network core with TRILL / SPB.
I have nothing to disclose in this article. My full disclosure statement is here
- Because port fast assumes that there are no BPDUs to be received on the interface it will move to forwarding state immediately. If BPDUs are received, then the port will move to blocking state…… usually, but not always, before a loop has paralysed the network. ↩