In a number of Packet Pushers episodes, I’ve been referring to the nature of the data centre designs shifting from “North-South” type designs to “East-West-North-South”. Lets dig into this terminology a bit and show us
Spanning Tree is always North / South
I’m reasonably confident that most people who read this will comprehend how a switching network will use spanning tree to create a TREE.
It will look something like this where the sore switches are configured to act as the ‘root’ of the spanning tree, and traffic flows to core to the edge. More correctly, traffic always flows from edge to core to edge and always in a fixed direction. Because we tend to draw the core at the top of the diagram, and shows connections to the distribution and access layers as connecting down the hierarchy, we tend to see a ‘top to bottom’ or north-south distribution of data traffic flows.
Where this model fails, is that bandwidth between servers that are on two branches must cross the core of the network as shown in this network diagram.
The Weakness is the Core Switch Interconnect
The challenge with this is that the connection between the core switches can become heavily overloaded, especially in networks where the server fanout is large and commonly occurs in heavily virtualised network. To some extent, this is a new problem. Previously, the core switches would be interconnected with an EtherChannel that would provide multi-gigabit connectivity, and recently we saw the introduction of 10GbE ports which allowed for further increases in the core capacity.
Now that servers are connected and 10GbE, and the addition of storage data means that sustained traffic flows have increased, and not just by twenty or fifty percent. Storage data (whether iSCSI, NFS or even FCoE) means that these designs won’t last much longer.
Currently, it’s convention to locate the storage arrays close to the core network switches so as to reduce the workload in the branches of the tree which isn’t a bad strategy. But this doesn’t account for the East-West migration of virtual machines.
Layer 2 Multipath Switch Networking
Layer 2 Multipath (L2MP) refers to the recent developments in Data Centre networks where the core switch can no longer handle all the load. That is, if you have a three hundred physical servers and each physical servers hosts twenty virtual machines, then the gross data load including storage traffic will easily exceed the interconnect. We talk about the development of data centre models that support east-west traffic flows.
In this type of design, we can see that a L2MP core, regardless of the type – Big Brother or Borg style, means that bandwidth does not choke around any specific point in the network. So not only does the network support the traditional North/South bandwidth alignment that we have today, which creates artificial limits on how we can locate and distribute servers inside existing data centre networks, we are now able to provide East/West bandwidth to support loads that are dynamically moved around the data centre with a lesser degree of concern for key choke points that exist in legacy designs.
This especially applies to converged network where the storage data creates new loads that increase the sustained usage of the Ethernet network.
Also, because hot spots can exist in the network core as traffic loads migrate around the network edge points, the L2MP allows for additional connections to be added as needed. Note that adding does not have the potential service impact and risk profile that making changes to spanning tree presents. Therefore, the network becomes more flexible (or less “crystalline” is the term that I use).
Note that the terms Borg and Big Brother are fully described in http://blog.ioshints.info/2011/03/data-center-fabric-architectures.html blog post from Ivan Pepelnjak.
The EtherealMind View
It’s worth noting that these changes are key to successfully addressing the networking requirements for virtualisation. Hopefull this helps to explain some of the reason that new switch architectures from Juniper and Cisco that relate to Fabric networking are important.
It’s worth noting that this problem is also related to the topic of Bisectional Bandwidth and the measurement of the server to server bandwidth as a function of the architecture. I wrote about this in this blog post : http://etherealmind.com/bisectional-bandwidth-l2mp-trill-bridges-design-value/