Integrating Overlay Networking and the Physical Network

The next topic of a technology discussion around Overlay Networking to consider is whether the overlay network needs to be integrated, in some way, with the physical network. To recap, an overlay network uses modern tunnelling protocols to connect software Network Agents in Hypervisors or Operating Systems. Today, these Network Agents are little more than “robot patch panels” (you probably call them vSwitch’s) but in the near future these agents will be complete networking devices performing switching, routing & filtering inside your server.

Multiple Overlays in the Tunnel Fabric

Multiple Overlays in the Tunnel Fabric

In previous articles, I discussed how Overlay Networking results in more networking that delivers better results by abstracting from the problems of the physical network – Overlay Networking is More & Better while Ditching the Toxic Sludge.. In my second article Introduction to How Overlay Networking & Tunnel Fabrics Work , I attempted to provide insight into how Overlay Networking technology works & why this technology is valuable to business by supporting safer & more rapid configuration changes with very low change risk.

What Matters Next

Overlay Networking is still early in the technology adoption cycle but deployed by Cloud Companies such as Microsoft Azure, Rackspace & Joyent. It’s not an idea, it’s a foundational technology that means that networks can really support multi-tenancy for hosting providers. In the near future, Overlay Networks will completely change the way Enterprise Security operates by providing a way to create & manage unlimited network zones from a physical network without the hassle of MPLS or the expense of firewalls. This is huge change in Infrastructure Security and will obsolete many firewall products.

Overlay Networking is a “real” technology that appears to have plenty of momentum in the market, including new players like VMware which suggests that it is past the “Works in PowerPoint” phase. The real value of Overlay Networking is that networking moves away from “dumb connectivity” & into “network services”. Today, networking is a “cost centre” that is must have to provide business services in the same way that power to a data centre is a must have.

Trust is in Short Supply

What matters for customers is that the Tunnel Fabric created by Overlay Networking must be reliable & trustworthy. For the last thirty years the networking industry has been overcoming the physical challenges of forwarding performance, density, & pricing. I take the view that these problems have largely been solved & point out that networking vendors are looking for revenue growth by adding non-core networking functions to existing devices.

The result is that network engineers are well practiced in looking closely at hardware as key solution criteria for networking. In Overlay Networking, the physical network becomes less important from a service delivery perspective because business value is derived from the overlay through service creation & management. This conflict in perspective is creating cognitive dissonance among many networking professional who are not ready to adapt to the idea that networking can be a software centric technology. More importantly, hardware networking requires a lot of attention to maintain skills & competency. This type of “chin down” work doesn’t allow much energy for “chin up” activities to look for other, better technologies.

The Case for Integration & Dependency

What happens to the Overlay Network if the physical network is experiencing packet loss, jitter or some form of “negative service” [1] condition ? The “Case for Integration” states that the Overlay network (of whatever form) must have some awareness of the physical network condition. In normal operation, when a link in a ECMP path fails, the routing flaps in an L3 core & recovers the path. Or in a Layer 2 MLAG network there might be congestion on an uplink due to failure of an Ethernet connection in the bundle or congestion in a single link in the path.

An overlay network tunnel has no state in the physical network. The physical network has no awareness or visibility of the Overlay Flow that passes over the network. VXLAN is just another UDP/IP packet heading on down the network & any packet loss requires a retransmission.

If this vision of the network fills you with apocalyptic fear then some form integration between the physical network & the overlay network is needed to make you happy. Consider the following diagram that represents two VMs communicating via the Overlay Network & the potential path through the Underlay Network.

Overlay Tunnels in the Underlay Network

Overlay Tunnels in the Underlay Network

It’s reasonable to assume that another Overlay Network would have a different path through the underlay as shown here:

Integration physical overlay networking 4

Second paths for a alternate network overlay

However, a network is not simple or necessarily easily defined. A network with only hypervisors (VMware vCloud, OpenStack etc) could have a high level of controller. But in more common networks, there may be hundreds of traffic sources in the Underlay Networks that could generate data and create a more complex system.

The various technologies used for Ethernet Fabrics such Clos Based Leaf/Spine for L2 or L3, MLAG or even STP in the Data Centre LAN mean that data flows are load balanced across available paths using different mechanisms. Predicting traffic is not practical (although technologies like Juniper QFabric are an attempt in this direction).

It’s fairly easy to develop concerns about these problems even when those concerns are completely without basis in fact. Don’t let vendors or colleagues attempt to undermine the future with Fear, Uncertainty and Doubt. These are exactly the same technical events that happens in  your network today.

The IP protocol is designed to drop packets and frames as needed and networks are expected to drop data by the applications. The point of careful design to only drop a minimal percentage at an acceptable rate over time.

There are exceptional cases where dropping packets has a large negative impact like big data clusters or high performance but that usually means a dedicated network is deployed to handle the requirement. Building a entire network to solve for a niche problem is always a waste of time and money. Consider the lessons learned from  FibreChannel over Ethernet protocol – building the entire data centre to support a niche use case has largely failed.

How Could Network/Overlay Integration work ?

In a controller-based Overlay Network, it would be possible to implement an application that “listens” to the network itself. An OSPF software agent on the controller configured as a neighbour in “listen” mode can gather a comprehensive view of IP paths and network change from the OSPF state database. Similarly, a Spanning Tree listener could receive Topology Change Notifications & gain insight on the Ethernet stability (but not much on the network graph).

I think that a more comprehensive approach would be to create software applications on a controller that would gather information from each device in the physical network. A starting point would be to use SNMP for basic information. A vendor might develop a proprietary API [2] that would expose a wide range of other information. Collate that information into a network graph, invest a large sum into software development & then you will have an Network Controller system that is tightly integrated into the Network Fabric. Then have the Network Controller also interface to the Network Agents. Like this:

Network Controller Integration

Network Controller Integration

To integrate an overlay would imply that there is a “feedback loop” between the physical network & overlay network in some form or other.

Another option is to build an Ethernet Fabric that is single network. Something like what Juniper QFabric had done to create a comprehensive end-to-end Ethernet system that acts like a communist dictator for a socialist network.

Caption Text. Adding the QF/Director.(Click for a full size image)

The Case For Abstraction & Isolation

The IP protocol has proven itself to be robust, reliable & workable for a wide range of solutions. Twenty years ago, all telephones used dedicated circuits to guarantee the quality of the call. Today, I use Skype & rarely have problems with a call. An Overlay Network built on tunnels will be guaranteed to work without any interaction with the physical network provided there is enough bandwidth in the underlying network.

And if there is some packet loss, the Network Agents will resend it the data.

Feedback Loops in Networking

Feedback loop – the complete causal path that leads from the initial detection of the gap to the subsequent modification of the gap. – Wikipedia

Feedback loops are fundamental to all engineering processes. In Engineering, a feedback loops is a process whereby a system that is in operation is able to signal a change in state when it deviates from a normal condition. In the case of a steam engine, the feedback loop was created by a physical governor which could detect the speed of the flywheel through the centrifugal force of a fly-ball governor. If the engine overran a known speed, the fly-ball would regulate the speed of the engine. That’s the capability of a feedback loop.

In networking, there are no feedback systems to monitor the steady state of the physical network & provide information to the overlay network. In routing, protocols like OSPF & BGP act to provide feedback on available paths in the network but provide no information about the quality of the path. Equally, DiffServ QOS marking provides no feedback to the source application or the network that the quality of the path is compliant or non-compliant. In fact, the network itself has no way of measuring the concept of “quality” or packet loss. Implementing such features would require a network that starts to look like a FibreChannel storage fabric with specific protocols that signal every which way.

In the last 30 years of networking history, there has been many networking technologies that have implemented tight feedback loops into the technology designs. The oldest example was analog telephone calls, when the current flow was lost in local loop dropped then the end-to-end circuit was dropped by the exchange. The early digital used technologies like ISDN which included a signalling channel to allow communication through the network. If any part of the connected circuit failed, then entire call state was torn down.

This was followed by more ambitious example such as SMDS, ATM & Frame Relay. All of these included state signalling about the end to end circuit. All of which have failed miserably since they don’t scale well and require exceptional levels of software quality (which didn’t happen).

The EtherealMind View

It’s my experience that Bandwidth Always Wins. For the last 20 years, it has been consistently been cheaper & easier to over-provision bandwidth by large factors  instead of attempt to solve the “quality problem”. Metro Ethernet provided more bandwidth at a cheaper price. Fibre Channel couldn’t scale past a hundred or so switches. ATM circuits needed to much memory to make it work. Frame Relay was not efficient compared to Ethernet.

There have been a large number of technologies in the last 20 years that attempted to solve the “network quality” issue and, in the end, more bandwidth was the cheapest and more reliable solution. Data Centre fabrics are running 10GbE today and 40GbE/100GbE are approaching practical pricing.

The idea of building an Ethernet Fabric that does something special doesn’t appear practical when considered in the light of history. I’m always hopeful that smart people can consider new ways to solve challenges like this but Controller-based Overlay Networks look to be the future. I can’t see how integrated of the Overlay Network into the Ethernet Fabric would be successful.

Either way, it’s cool to be in networking right now. Lots of new stuff to consider.


  1. Yes, really. Organisations use words like this. It’s no joke.  ↩
  2. For example, Cisco’s onePK API exposes hundreds of functions on many of it’s operating systems that would provide comprehensive information.  ↩

Other Posts in A Series On The Same Topic

  1. Blessay: Overlay Networking, BFD And Integration with Physical Network (25th April 2014)
  2. ◎ Blessay: Overlay Networking Simplicity is Abstraction, Coupling and Integration (10th December 2013)
  3. Integrating Overlay Networking and the Physical Network (21st June 2013)
  4. ◎ Introduction to How Overlay Networking and Tunnel Fabrics Work (10th June 2013)
  5. ◎ Overlay Networking is More and Better while Ditching the Toxic Sludge. (7th June 2013)