In this blog post I’ll make an attempt to summarise Overlay Networking in a couple of paragraphs to act as reference for upcoming blog posts that discuss the nature of Tunnel Fabrics in Physical Network environments.
This article assumes that you have some exposure to network including topics like VXLAN, Ethernet Fabrics, Leaf/Spine (CLOS) and a bunch of technologies that are still early in market adoption but well accepted as the long-term future of networking. Your mileage may, of course, vary.
Hypervisors and Network Connections
Consider a number of hypervisors connected to the network1 as shown in the diagram here:
In current network best practice, the physical network sets the trust boundary for packets in the switch at the Top of Rack. Networks do not trust servers as a canonical source because the server is outside the administrative control of the network team.
Technologies such as QoS marking, traffic shaping, MPLS, traffic monitoring, NetFlow/sFlow are commonly deployed to the edge of the network. To some extent, this reduces the complexity of the network by distributing the load and configuration into smaller chunks thus reducing the service impacts of poor code or overload [failure]2.
The virtual switch in the hypervisor today is not a switch or a network device of any sort. It’s more like a software controlled patch panel that connects the Network Adapters in the Virtual Server (vNIC) to the Physical NIC for network connectivity. These types of virtual switches are passive devices at best.
Upgrade the vSwitch to Network Device
Lets change the function of the “virtual switch/patch panel” to look more like a network device. Instead of simply connecting two software end points inside the hypervisor memory, lets take the vSwitch and make it a complete network device with routing, switching, QoS, flow management and has an entire configuration interface.
At this point, a network agent is not very useful. There are only a couple of physical connection point to the network via the physical network adapters so routing or switching has limited value. What we need is something more to make use of this concept. The solution is quite straight forward.
Consider the business impact here. The server is now part of the network. Network teams will have permission to enter the server and take ownership of the network connectivity and gain better end to end control. And Security will support this move to ensure the integrity of the network edge. You can expect the virtualization and server teams to resist this change in practice.
Sending packets & frames into the physical network for forwarding is current best practice. Anyone who works on a Data Centre network can easily identify the serious problems & limitations of current technology.
Spanning Tree remains a risky technology and subject to unexpected bridge loops that can cause loss of entire Data Centres.
Traffic isolation and multi-tenant security is possible in limited ways. Software virtualisation in core switches can create a few isolated “device instances” but for a network with hundreds of tenants there are no answers. MPLS is expensive to implement in hardware and complex to administrate. MPLS remains a niche technology for certain markets like Service Providers with extensive human infrastructure resources but seems to have little relevance & very low adoption in the data centre.
IP Routing protocols are eventually consistent and indeterminate. They require enormous investments in engineering resources to accurately predict the behaviour of protocols like OSPF & BGP. Control and configuration of these protocols is often limited to physical control of [network paths]3
Most network engineers don’t regard these networking technologies as poor technology or even failures but I certainly do. Network pathing is an unreliable process that has basic assumptions in their designs of eventual consistency, automatic discovery, no end-to-end validation, limited loop prevention and able to calculate just a single best path through a given network. In a data centre network, none of those assumptions are useful or relevant. The data centre is a tightly bound problem space where all conditions are tightly controlled, restrictions are possible with the walls of the physical location. Compare this with the design assumptions of a Wide Area Network where there are few options for control in a highly distributed network.
The Data Centre network is a very different problem space compared to WAN or Campus network by its very nature.
Network Agent with Tunnels is an Overlay
The Network Agent in the hypervisor is now able to act as a full network device but remains connected to a physical network that is change resistant and a single shared failure domain through the use of distributed networking protocols. Therefore we could connect the Network Agents with tunnels using protocols like VXLAN, NVGRE or NVO3.
These LAN Tunnel protocols are specifically designed to have optimal function in a data centre network unlike IPinIP or GRE protocols. For example, VXLAN Headers have enough entropy in the Ethernet header to effectively load balance over a LACP bundle between two switches. Other protocols have their respective features.
The starting vision of the overlay network looks like the following:
For example, we could emulate a VLAN by creating forwarding traffic through tunnels that are associated with Virtual Machines (VMs) in like this diagram:
And the switching network path between two VMs looks something like this:
And a routing network path is equally simple since the Network Agent would forward according the tunnel that has the best path to the destination just like any other router. Yes, the Network Agent is performing routing by selecting the tunnel that is the best path to the destination just like a physical router.
Abstraction from the Physical Network Allows Change
The collection of tunnel circuits between the Network Agents are sometimes called a “Tunnel Fabric“. The tunnel protocols may, or may not have an awareness of the physical network depending on the progression of technology. At the time of writing, it’s not clear whether the Tunnel Fabric should be integrated with the physical network devices so that the Tunnel Protocols are aware of the status of the underlay network.
But the most interesting features is that the Overlay Network that has been built is fully abstracted from the physical network. That is, the network agents can modify the configuration of the tunnel without any impact to the physical networks, and without any interaction from the physical network.
The Value of Software
One key factor is that the Network Agent is software that has no dependencies on hardware. In physical network devices, the software in the control & management plane is often limited by the silicon. By limited, I mean that the device operating system is limited to a product deployment of perhaps a million or so devices, software features are determined by silicon in the switch.
Note: This especially applies to hardware switching but some routers are more flexible because their architectures rely on software for many features. For example, a Cisco Nexus 5500 is rigidly limited by its hardware design because the silicon was never designed to support routing, only switching.
By comparison, a Cisco ASR1000 has a completely different silicon architecture that is design to be somewhat flexible. This isn’t a criticism. For these devices to perform packet processing at tens of gigabits per second with multiple packet streams, HQoS etc requires custom silicon.
A Network Agent does not need to handle the volume because the processing is distributed into every server. Instead of one pair of HA devices, we can scale horizontally since each additional server adds more forwarding capacity to the network.
Using software on the x86 platform is significantly more flexible and reliable by comparison. The X86 architecture is well understood, programming languages like C or Java have excellent tool chains, unit testing and large pools of programmer expertise.
Performance Causes Problems
The metaphor I often use is that todays physical network devices are like Formula 1 Racing cars: vehicles that go fast & furious but require an expensive and specialist team of resources to keep them running and recover from the repeated crashes.
Putting networking into a hypervisor on an x86 is equivalent of a family sedan for transport which is cheap, simple, easy to service and available everywhere. With enough family sedans you can a lot more done than a F1 car and an acceptable price. Very,very few people actually need a car that can perform above the speed limit. Lets face it, networking vendors like to sell “F1 performance” with reassuringly expensive pricing but the vast majority of customers only need family sedans.
I say this because many people believe that network devices must be custom silicon hardware. I think that this is no longer true. Intel has demonstrated that current generations of x86 hardware & software are capable of delivering forwarding performance of at least 20 Gbps and next generation will be more than 40 Gbps – effectively line rate performance for a server at load. You do not need a hardware network device for everything (only some things).
Supporting Safe and Rapid Change
The final point of software based network devices is the speed of configuration change. The use of standard & common software on standard & common hardware creates opportunity for rapid development of new features. Provided that each network agent runs as a standalone element its possible to rapidly change the software. Today, each network device is part of single coherent system by virtue of shared routing and switching protocols.
Autonomous protocols also create shared failure domains with single cause effects.
Consider that OSPF routing protocol is a single failure domain since every IP router. An OSPF failure can (and does) cause a system wide outage.
When Network Agents are combined with Controller Based Networking, then we have a significant change in underlying nature of networks and the use of Controllers would appear to be the key to the success of overlay networking. But perhaps more on this in later posts.
- Doesn’t matter which hypervisors, VMware, KVM, Xen, …… whatever. ↩
- Some examples of overload failures of switches occur when the TCAM or BCAM runs out of space, or CPU/Memory is exhausted or the internal bus cannot handle traffic patterns. ↩
- Mandating that devices are connected according to strictly defined plan is a workaround not a solution. Deviation from plan is likely to result in network failure or sub-optimal outcomes. ↩