L2 MultiPath Basic Design Differences

With all the talk about Layer 2 Multipath (L2MP) designs going on, it might be worth highlighing a fundamental change in the way people approach high density network design. It’s possible that this point has been lost somewhere in the discussion of protocols.

The Spanning Tree Protocol blocks looped paths, and in a typical network this means that bandwidth is unevenly distributed. Of course, we might use PVST or MST to provide a rough sharing of load by splitting the spanning tree preferences for different VLANs, but the design still doesn’t change overall. I’ve talked about this in East West and North South networks designs. The basic point is that there is a LOT of bandwidth that is never evenly utilised – and that means wasted power, space and cooling (which costs more than the equipment itself).

For many years, Campus and Data Centre design has focused on the three layer hierarchy of Access/Distribution/Core. It looks something like this:

L2mp grumble 1

There are Core Switches at the top, Distribution in the middle and Access switches at the edge. Ok, it’s not perfect but you get the idea.

When you move to a Layer 2 MultiPath network, the purpose is to remove the unused switches in your network and increase the utilisation of all your assets. This can be most effectively done by getting rid of the distribution layer. Of course, you could use them as Access Switches like this:

L2MP and no distribution layer

In this rather simple (and overstated) example, you have moved a network of twelve switches with only FOUR usuable devices for connecting servers, desktops and printer into a network of twelve switches with EIGHT usable devices for connectivity.

That’s just one impact of L2MP in network design. Here some more:

  • We don’t have to manually configure STP protocols for redundant paths, they are built into the protocol.
  • We need less equipment to do the same job,
  • We still get a lot more bandwidth at the end of the day.
  • The newer protocols take advantage of faster hardware (which STP cannot really do) and can converge around a failure in subsecond times.
  • better OAM support for features such as L2 Traceroute because they have been “baked in”

This all mean that there is significant momentum to move to these protocols, not only in the data center but also in the campus. It’s the end of the Distribution layer.


  • Daniel G

    Alternate title?
    Ding Dong the Distros Dead

  • Rob Horrigan

    Great Post. I look forward to Trill/FabricPath.

    Still… it will feel very odd cross-connecting TOR switches in a data center.

    • http://etherealmind.com Greg Ferro

      Today, I’m more looking forward to QFabric for proprietary solution instead of FabricPath, and SPB for standards based.

      But that might change by next week.


  • Jay E.

    This is extremely interesting. I echo Rob’s point.

    Greg, do you foresee TOR switches directly inter-connected at the access layer?

    While I do think “why not?” especially if a Server in Rack 1 needs to go to a Server in Rack 2, it definitely makes sense to bypass the Core, but does it make sense to maintain at least a 2 tier hierarchy for sanity and scale?

    Do you see enterprises going with a TOR that connects to more than 2 core switches or will be a scale out of core switches b/c of the sheer quantity of racks in large data centers with each TOR still connected back to 2 of the X number of Core switches?

    Any thoughts?

  • Jim

    The east-west aspects of ToR are interesting. I havent played with this yet, but it will be interesting to see how this works in practice. If we are not connecting edge switches together with 10Gb, I wonder if the typical path cost of a north-south jump will still be less than an east-west hop, particularly if you are using channels toward the lower layers. There could well be significant benefit in east-west particularly to keep vmotions off the downlinks and to keep packets between tiered apps within a tier of switching.

    This stuff is getting interesting. Its been a long time since we have had to think about the mechanics of pushing packets around the DC. With DCs running fabricpath/trill/otv/igps and lisp for external connectivity, its time to get our thinking caps back on.

    Its going to be a fun few years.

  • Ryan Malayter

    I think the “core” is going to quickly go away, too, as soon as those assets depreciate. There is simply no appetite for buying dozens (hundreds?) of $500K “core” switches in the new cloud datacenter when everything else is standardized and low-cost (sorry Cisco shareholders). Architectures which utilize only “commodity” switches (48 port, 1U) to scale out with high bisection bandwidth will be enabled by L2 multipath standards. These architectures will quickly move out of the Google/Amazon/Microsoft/Yahoo DCs and into the corporate world. As with server virtualization, theĀ economicsĀ of ditching the “networking mainframe” will be too compelling for enterprise architects to ignore.

    • http://etherealmind.com Etherealmind

      In my view, the price of core switches will remain high, but we will need less ports. e.g. a typical blade server can host 200 guest servers on 4 x 10GbE ports.

      The question is whether people will buy commodity switches instead of high feature/high function products from Cisco / Juniper ? Who knows how this will shake out.