Explaining L2 Multipath in terms of North/South, East West Bandwidth

In a number of Packet Pushers episodes, I’ve been referring to the nature of the data centre designs shifting from “North-South” type designs to “East-West-North-South”. Lets dig into this terminology a bit and show us

Spanning Tree is always North / South

I’m reasonably confident that most people who read this will comprehend how a switching network will use spanning tree to create a TREE.

North south east west 1

It will look something like this where the sore switches are configured to act as the ‘root’ of the spanning tree, and traffic flows to core to the edge. More correctly, traffic always flows from edge to core to edge and always in a fixed direction. Because we tend to draw the core at the top of the diagram, and shows connections to the distribution and access layers as connecting down the hierarchy, we tend to see a ‘top to bottom’ or north-south distribution of data traffic flows.

Where this model fails, is that bandwidth between servers that are on two branches must cross the core of the network as shown in this network diagram.

North south east west 2

The Weakness is the Core Switch Interconnect

The challenge with this is that the connection between the core switches can become heavily overloaded, especially in networks where the server fanout is large and commonly occurs in heavily virtualised network. To some extent, this is a new problem. Previously, the core switches would be interconnected with an EtherChannel that would provide multi-gigabit connectivity, and recently we saw the introduction of 10GbE ports which allowed for further increases in the core capacity.

Now that servers are connected and 10GbE, and the addition of storage data means that sustained traffic flows have increased, and not just by twenty or fifty percent. Storage data (whether iSCSI, NFS or even FCoE) means that these designs won’t last much longer.

Currently, it’s convention to locate the storage arrays close to the core network switches so as to reduce the workload in the branches of the tree which isn’t a bad strategy. But this doesn’t account for the East-West migration of virtual machines.

Layer 2 Multipath Switch Networking

Layer 2 Multipath (L2MP) refers to the recent developments in Data Centre networks where the core switch can no longer handle all the load. That is, if you have a three hundred physical servers and each physical servers hosts twenty virtual machines, then the gross data load including storage traffic will easily exceed the interconnect. We talk about the development of data centre models that support east-west traffic flows.

North south east west 3

In this type of design, we can see that a L2MP core, regardless of the type – Big Brother or Borg style, means that bandwidth does not choke around any specific point in the network. So not only does the network support the traditional North/South bandwidth alignment that we have today, which creates artificial limits on how we can locate and distribute servers inside existing data centre networks, we are now able to provide East/West bandwidth to support loads that are dynamically moved around the data centre with a lesser degree of concern for key choke points that exist in legacy designs.

This especially applies to converged network where the storage data creates new loads that increase the sustained usage of the Ethernet network.

Scale

Also, because hot spots can exist in the network core as traffic loads migrate around the network edge points, the L2MP allows for additional connections to be added as needed. Note that adding does not have the potential service impact and risk profile that making changes to spanning tree presents. Therefore, the network becomes more flexible (or less “crystalline” is the term that I use).

North south east west 4

Note that the terms Borg and Big Brother are fully described in http://blog.ioshints.info/2011/03/data-center-fabric-architectures.html blog post from Ivan Pepelnjak.

The EtherealMind View

It’s worth noting that these changes are key to successfully addressing the networking requirements for virtualisation. Hopefull this helps to explain some of the reason that new switch architectures from Juniper and Cisco that relate to Fabric networking are important.

Bisectional Bandwidth

It’s worth noting that this problem is also related to the topic of Bisectional Bandwidth and the measurement of the server to server bandwidth as a function of the architecture. I wrote about this in this blog post : http://etherealmind.com/bisectional-bandwidth-l2mp-trill-bridges-design-value/

  • Markku Leinio

    Every time I see an L2MP cloud I ask myself: don’t they have any firewalls anyway, just a plain flat L2 network with hundreds of servers talking to each other? In systems I deal with there are the firewalls everytime somewhere and those devices and the physical network near them becomes important in the traffic flow designs. Thus the idea of having the L2 core might not be so bad.

    Markku

    • http://etherealmind.com Greg Ferro

      Good question. The answer is to use virtual firewalls that are part of the hypervisor platform. Thus for VMware there is vShield and Cisco has, or will soon have, virtual version of the ASA.

      These firewalls are attached to the instance and the move with the VM as it moves around the data center.

      • Markku Leinio

        Virtual per-VM firewalls kind of make sense if all the systems are virtualized. Since you presumably still need protection for non-VM traffic as well, is the recommended practice to then deploy some VM just as a firewall VM for the other traffic (= using the same tool to maintain the policies), or to just use some other hardware firewall platform?

        Could be a policy editing nightmare if you need to keep two separate firewall policies.

        Are the VM firewall systems already deployed somewhere, or are they still just promiseware?

        I’m not really a firewall guy, my worry usually is to feed the firewall switches or interfaces with meaningful IP data = DC LAN ;-) These VM firewall things still sound interesting.

        Markku

        • Simon Crosby

          Virtual firewalls will migrate into the NIC as programmable per-flow ACLs (incl stateful processing) in the next year or so. See openvswitch.org for details…

          Simon

          • bradd

            Wow if this post is Citrix’s Simon Crosby????

    • Davide La Valle

      I agree with You, the networks i deal with have lots of Vlan and firewall between them, so L2MP is not a issue, and most of the traffic flows NS instead of WE. my 2 cents
      Davide

  • http://brocade.com Brook Reams

    Greg,

    Good post on the impact of application (which is what networks support) traffic patterns on the network architecture.

    You will find that in addition to Cisco (FabricPath) and Juniper (QFabric), Brocade also provides a solution (VCS Technology with Ethernet Fabric). Each vendor has a different design for the L2MP capability, but all seem focused on improving the network ability to support north-south-east-west traffic.

    BTW, in the interest of full disclosure, I work for Brocade.

    All the best.

  • Omar Baceski

    As i work for a service-provider, my opinion may be biased but: haven’t we abandoned all these gargantuan L2 designs during the 90’s? I mean with the price of the L3 silicon these days doesn’t it make more sense to use layer 3 switches and build routed networks? They DO optimal routing by default ;)

    just my 2 cents

    • Ebulating

      I think this is the million dollar question. This level of complexity should really be at layer 3.

  • Pingback: Cisco is Ready for the Next-Generation Network, Are You? « Wikibon Blog()

  • Pingback: HP Networking – Part 3 | In Search of Tech()

  • Pingback: Technology Short Take #12: Networking Edition - blog.scottlowe.org - The weblog of an IT pro specializing in virtualization, storage, and servers()

  • Pingback: L2 MultiPath Basic Design Differences()

  • Pingback: CIO Intelligence: 3 Advantages of a Flat Network | SiliconANGLE()

  • Guest

    Nice article….however :: “It will look something like this where the sore switches are configured to act as the ‘root’ of the spanning tree,”
    …you mean core switched switched rather than sore switches.

    • Guest

      LOL, I have made a typo while copying and pasting stuff around, but you get the point…Love the podcast btw

  • Pingback: How TRILL (and SPB) can reduce STP risk and mitigate impact — My EtherealMind()

  • Pingback: Northbound API, Southbound API, East/North – LAN Navigation in an OpenFlow World and an SDN Compass — EtherealMind()

  • disqus_Ve6KthG2my

    I have a question.It may sound stupid.How does the multi path L2 switch avoid a broadcast storm? Isn’t the basic idea behind the tree to be able to do that?

    • http://etherealmind.com Greg Ferro

      The purpose of the L2 ECMP is to increase bandwidth. Although other technologies like LAG & MLAG do the same thing but only between a couple of chassis at maximum. The use oft he TRILL allow multi-chassis multi-pathing.

      It doesn’t solve the BUM problem, nor does it attempt to. So far, rate controls on BUM traffic or domain reduction are the only answers.