Soft Switching Fails at Scale

There is a significant camp of software developers who are developing software switching solutions for hypervisors. Which is nice, I guess. The use of software switching in the hypervisor has some good points but, in my view they are heavily outweighed by the bad.

Update 20140715

Although the logic of this post remains sound at the time it was written, today I take view that several factors have  changed my view.

1. X86 server network performance is now able to forward more than 40 gigabits  using just a single CPU. This makes the pricing of the 802.1BR/FEX standards unworkable in practice. 

2. In 2011, most of networking was about conserving resources and minimising usage of scarce system capacity in terms of bandwidth, capacity, TCAM or configuration time. In 2014, it’s clear that it is cheaper and better to buy use server software,  new switches and complement functions with SDN platforms. The cost of time and money in addressing the problems in this article is self-evident.

I’ve written several articles in a series on overlay networking that discuss why I can’t see how or why integration between the overlay and underlay matters. Although there are vendors who make the point that some form of integration is needed, I still don’t see the need. If you want to talk about it, please get it touch to tell me why.

Series on Overlay Networking

One final point – hardware still matters in the long run. But for the next five years at least, hardware performance does not need conservation for most people. Its cheaper to buy more resources aka hardware with software orchestration than to implement features that conserve them like 802.1BR/FEX.

Software Switching

The idea about software switching is that you can develop some software on in the hypervisor platform that performs all the frame forwarding. The folks over at Network Heresy have pumped out a number of self-serving and bombastic articles declaiming the value of software switching and that all inter-VM traffic should be handled in software. It all sounds very reasonable, but they haven’t discussed to overall picture that includes the larger eco-system.

I, currently, have a different view – let me use a diagram to show my concerns. If you look at the VM elements in a single hardware server and corresponding network connections, then it should summarise down to this:

Software switching 1

For frames that flow between VM A-B and C-D, the idea of using a software switch makes sense. After all, why send that traffic out from the hypervisor into the network just to switch it back where it came from. It seems rather obvious, but what happens in a much larger network with a number of servers hosting VMs:

Software switching 2

In this diagram, the frame forwarding between VM A-B and C-D will always cross the network. So let’s ask the question.

What is the value of software switching in this scenario ?

The software switching people claim that forwarding 10GbE of Ethernet frames will only one CPU core of a modern Intel server, and that isn’t much of a price to pay for the functionality. Yet, if ALL the frames are forwarded away from the hypervisor, then the software forwarding adds no value.

Ok, that’s not quite true. Using software switching add value where the hypervisor platform can signal between itself the configuration data of the connection to the switch – aka the switch port profile can be fully managed by hypervisor, and moves with the VM as its moves around the hypervisor system.

But, But, But

There is a key point that people often forget. You MUST assume that your VMs are NOT on the same motherboard/hypervisor. You might think that placing Apache and MySQL server on the same hypervisor is good idea for low network latency, and then forget that next week you are highly likely to migrate the MySQL server to new server to provide more RAM and CPU for a performance boost. Or is you are using a dynamic resource scheduler which moves GuestOS around the network on demand, you will never know what the network connection is.

I would strongly make the point that, for most larger sites (where software switching offers the best value), that VMs almost never communicate with each other on the same hypervisor. Therefore, you MIGHT get lower latency within the chassis but you get much higher latency when forwarding off chassis. In modern “blah blah cloud” networks, latency is an absolute that much be reduced so that intersystem communication is as fast possible.

Complexity

The greatest enemy of reliable operation is complexity. I take the view that adding a complex software overlay to a system that already has the necessary features, is going to create serious problems:

Software switching 3

I don’t understand the benefit / impact relationship that would make me accept that adding complexity is viable.

Load and Latency.

In the second article the Network Heresy folks attempts to defend software switching and excoriate SR-IOV in particular.

This quote seems misguided to me:

Another benefit is simple resource efficiency, you already bought the damn server, so if you have excess compute capacity why buy specialized hardware for something you can do on the end host? Or put another way, after you provision some amount of hardware resources to handle the switching work, any of those resources that are left over are always available to do real work running an app instead of being wasted(which is usually a lot since you have to provision for peaks).

This is a schoolboy network design failure – at the very time that peak load is occurring, there is a very high probability that network load is also at peak. Claiming that “unused resources” are used for switching is wrong, and worse, it will catch at the worst possible point in the computing cycle of peak demand — this leads to deadlock design failure.

Of course, there could be some sort of software configuration to overcome this weakness with resource sharing and everyone will forget why it shouldn’t have happened in the first place, and rush out to buy even bigger and more power hungry servers — which suits some of the companies supporting soft switching just fine, thankyou very much e.g. Intel.

Security

Software switching isn’t, and  can never be secure. It fails all security profiling checks for multi tenant separation, administrative control and data plane isolation. For monolithic deployments i.e. companies who run just one application (no matter how large) like Facebook, this is fine – but that is a very small part of the market. This means that a significant portion of the market cannot use this technology – and that’s bad for business and bad for the technology overall.

The EtherealMind View

I can see the point of software switching however wittily / pretentiously the argument is made. However, I do not see how it will be successful given the limitations I’ve explained here.

I suspect that Software switching will arrive and fail. After all, a few months with a handful of software developers and VMware or Xen will have a technology that is part of the vCenter management “lock in” and then attempt to convince everyone it is the way forward.

But then people will learn, after a while, that it doesn’t scale. Server Administrators, who make the buying decisions here, aren’t used to thinking in terms of ten or twenty servers at once, so it will take them some time to learn hard lessons about multiple interdependent systems.

Cisco has already released the VM-FEX which, if you agree with my arguments here, is a better overall technology for VM Guest networking. I’m sure the other vendors are also rushing products to market, this is the only one I know about.

My advice ? Don’t be caught out by the short term win of easy and cheap. Consider the bigger picture and understand the implications of your design choices.

Oh wait, I think I’ve said that somewhere before……

 

 

  • Juan Lage

    Great post Greg. I agree with you 100%. There is a space for softswitching, but in the larger picture I think that more is needed. 802.1Qbh is in my opinion a great solution, I am glad you pointed it out (VM-FEX is a pre-standard implementation). One other angle you hinted at but did not mention explicitly is the operational component: “Managing” a network is more than provisioning. Often times, softswitching solutions tend to highlight the provisioning simplicity as they can integrate with hypervisor management easily, but what about troubleshooting, visibility into the network, performance management, fault management, etc.

    All in all however, I expect we will see both solutions evolve and both gain acceptance (hw based as in VM-FEX, and software based as well).best regards,@juanlage:twitter 

    • http://etherealmind.com Etherealmind

      Yep. Software is cheap and stupid. And in IT – cheap and stupid always wins.

      Plus, the hypervisor vendors can market the “integration” with their products as a feature – while carefully not pointing out that it has a number of design defects overall.

      • Massimo

        “Software is cheap and stupid”
        o_O ? 

        .. and I thought the world was moving off from hw and moving into sw simply because of the opposite of that… 

        Stupid (me). 

        Massimo. 

        • http://etherealmind.com Etherealmind

          Well, yes. The most recent example of this is the use ARM chips compared to x86. The x86 chipset is simple and cheap to use, but long term outcomes have not been good. Now that ARM using a RISC style instructions shows that hardware and software have a complex interplay.

          The current trend to deploy in software is only temporary until the industry stabilises, and over time will return most functions to hardware – again demonstrated in GPUs for graphics rendering.

          Don’t let a burst of enthusiasm for the latest fad to stop looking at the long term outcomes.

          • Massimo

            I believe the curve you are describing has a longer tail than what you predict. I (personally) believe we have just entered into the “software” curve and I am not sure we can predict when(/if) we will exit from this. 
            For the good and the bad IBM is an interesting benchmark of this trend (only because they have been around for long enough to see a pattern). They used to be a HW company (with basic software/microcode) and one could arguably say they are more of a software (and services) company at the moment given hw is becoming cheaper and more stupid. Coincidentally I have raised many of the arguments you are using to describe (in 2001) the issue we were introducing with software virtualization to virtualize x86 servers. Well we know where we ended up with… and I haven’t seen organizations going back to x86 physical because it’s more “secure”, more “scalable” and could deliver more “performance”. It could be just my feeling but I see a lot of commonalities between the 10 years old server virtualization trend and the new network virtualization trend (of which software switching is a small part)Will this be a failure? Hard to predict … the good news is that, most likely, I’ll be retired by a long shot when(/if) that happens… and I am 39. My 2 cents. Massimo. 

          • Massimo

            Sorry I forgot to mention that I work for VMware so you have a right to consider my thought biased. 

            Massimo. 

          • http://etherealmind.com Etherealmind

            Thanks for the disclosure.

          • http://etherealmind.com Etherealmind

            I remember that VMware really only became successful when the Intel CPU added virtualization extensions.

            Same applies for networking, software is a quick & dirty fix that will give way quickly to hardware/software interface is the only real solution when you get down to this type of functionality. The IEEE, via the VEPA standard, understand this. Problem is they are too slow to get something out the door and software will fill the gap.

            The software is really only a gap filler and isn’t fixing the underlying problem.

          • Massimo

            ” remember that VMware really only became successful when the Intel CPU added virtualization extensions” 
            I disagree with this. I have been working with VMware technology since 2001 (although I have joined the company last year) and I can testify that Intel/AMD adding virtualization extensions didn’t change anything in terms of the success of it. Also that is a very small piece that is not at all representative of the big gains/losses of implementing something in HW or SW. 
            The advantage of virtualization is really in terms of the flexibility it provides (workload mobility, high availability, disaster recovery, speed-of-deployment, better management etc etc etc). Something that you just “cannot” implement in hw. 

            This has nothing to do with the optimization of where an x86 instruction is being executed (in hw or in sw). Similarly the key question is not where a network packet switch occurs (in a virtual switch or a physical switch) but it’s rather the advantage of decoupling hw from software and everything you are allow to do when you get to that state. Yes a lot of customers will trade-off (some) performance benefits to get to that state. This isn’t very difference from the performance penalty they have compromised with in the last 10 years virtualizing their servers (with or without the CPU virtualization extensions). 

            My 2 cents. 

            Massimo. 

          • http://etherealmind.com Etherealmind

            I’d disagree with that view. The CPU extensions were key inflection points in the adoption by enterprise because it addressed security and performance limitations inherent in the software only solution. You can’t dismiss this so blithely and pretend that software is the only answer – for infrastructure, software is always a short term fix, and addressed with hardware support later where scalability and reliability can actually be achieved.

            For example, VMware is not developing their fourth generation of virtual switching – and yet very few people are happy with the products released so far. It does not seem that software can solve the problem in this case.

          • http://twitter.com/mreferre Massimo Re Ferre’

            >The CPU extensions were key inflection points in the adoption by enterprise because it addressed >security and performance limitations inherent in the software only solution. 
            We do have indeed a very different POV about what happened in the last 10 years. :)

            >For example, VMware is not developing their fourth generation of virtual switching – and yet very few >people are happy with the products released so far. It does not seem that software can solve the >problem in this case.
            Thanks for the roadmap session but I suspect your source isn’t very reliable…  ;) 

            Massimo. 

          • http://etherealmind.com Etherealmind

            Seriously. Networking in VMware is simply awful. No possible defence in how bad it is. It uses split horizon as it’s most advanced feature.

          • http://twitter.com/mreferre Massimo Re Ferre’

            This is your blog. You deserve the last word.

  • ThetDude

     Isn’t vApp supposed to prevent your exact scenario from happening?  i.e. your Apache and MySQL VM would be in a ‘vApp’ container, maintaining hypervisor affinity.  If they need to migrate, they both migrate together. 

    • http://etherealmind.com Etherealmind

      This is exactly the sort of half arsed workaround I was referring to. While you COULD do this, it doesn’t make strategic sense to do this. What happens when you need to separate them for performance reasons ? Or because you want to even out load in a cluster ?

      What happens when peak load means the only solution is to split them into separate VMs so they don’t damage each other’s performance ?

      While you CAN do this, you SHOULD NOT do it because it’s a deadlock design that’s dooms you to failure in the wider picture.

      • Ryan B

        The one comment I would make is that most virtualization implementations I’ve seen are memory bound with idle CPU resources, even at peak load.

    • Loren

      No, vApps do not maintain hypervisor affinity. You may use DRS rules to implement that, if desired.

  • sh0x12

    Little bit off topic.. how do you make these great soft shadows in all of your diagrams. :)

    • http://etherealmind.com Etherealmind

      I do my diagrams in OmniGraffle for MAC OSX

  • Mike

    p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial}

    To say software fails at scale is misleading.  Software will always scale better than hardware at all things except performance and possibly cost.  

    There is always a cost to using hardware switching to get that performance.  Hardware is much less flexible, very difficult to add new features without replacement, fixed scalability, and complicated deployments.  I say complicated deployments because nobody upgrades all hardware at once, so over time the hardware variants, feature inconsistency, and scale differences become a huge complication.  In physical switching say between 48 ports of 1Gb, the performance difference between a hardware and software implementation is so great that it easily makes sense to go with hardware.  In the case of an SR-IOV nic, the performance difference is fairly incremental.  

    You could have easily written an article that said hardware VM switching fails at scale since most SR-IOV nics only support a fixed number of vnics, upstream switches only support a fixed number of vnics, the number of TCAM entries for ACL, QOS, etc. are fixed.  In the end, both models need a software overlay control plane to address VM mobility and automated placement across racks (move the policy and state along with the VMs).  I can see how it can be done with software switching.  It will be interesting to see if hardware switching will be able to handle this when the hardware versions, number of vnics, etc. change over time and need to be considered all within the same network.

    The security argument is misleading as well, nothing is absolutely secure.  The question is it secure enough ?  The message implies that SR-IOV is more secure than software switching which is false.  Software can be equally as secure as SR-IOV.  

    In the end, the argument of whether to use software switching or hardware switching really boils down to whether you want flexibility (in terms of feature sets, scaling, and feature consistency) or performance.

    • http://etherealmind.com Etherealmind

      “Software will always scale better than hardware at all things except performance and possibly cost. ”

      Except when it’s RISC CPUs ( which need more software ) or MS Windows (which uses more software ) ….. the argument for simplicity is well proven and complex software fails often ( e.g. SAP )

      “There is always a cost to using hardware switching to get that performance. Hardware is much less flexible, very difficult to add new features without replacement, fixed scalability, and complicated deployments.”

      Except that the hardware exists and is proven. Hardware doesn’t need to be flexible for frame forwarding because the data doesn’t change often i.e. IPv4 has been stable for twenty years, Ethernet for at least as long. Software flexibility doesn’t apply to static data sets.

      “The message implies that SR-IOV is more secure than software switching which is false. Software can be equally as secure as SR-IOV. ” Well, it COULD be – but time has proven hardware approaches to more consistently secure than software solutions. The statistical risk for software security failure is orders of magnitude beyond hardware solutions because hardware demands rigidity and simplicty – both of which create effective security outcomes.

      “In the end, the argument of whether to use software switching or hardware switching really boils down to whether you want flexibility (in terms of feature sets, scaling, and feature consistency) or performance.”

      My point is that flexibility for forwarding data frames is a silly idea. How many variations of forwarding a frame can be created ? How many will the marketplace accept ? Does the Ethernet frame change often enough to need monthly software updates ?

      Silly. Software switching adds complexity and failure modes for insignificant gain while failing to address security, and ignoring that the extra load on the server side causes compound failure modes in the hypervisor OS.

      But, because software is cheap and easy, I suspect it will arrive anyway and we will spend the next twenty years working around the flaws. That’s how we ended with Ethernet instead of a halfway decent L2 network protocol.

      • Mike

        “Silly. Software switching adds complexity and failure modes for insignificant gain while failing to address security, and ignoring that the extra load on the server side causes compound failure modes in the hypervisor OS.”

        I guess we’ll have to agree to disagree on this topic.  The datacenter is the most exciting and innovative area in networking right now and virtualization / cloud is one of the main drivers of all this change.  The needs for switching in this environment will be different and continue to evolve over the next few years.  This market is moving way too fast for the current solutions (both hardware and software) to be the perfect solutions.  Will the frame format change in the future ?  Most likely, yes, just as it has for the last 20 years.  Different headers, tunnels, encapsulations, etc.  Look at all of the proposals of IPinIP, MACinMAC, MACinIP, etc. coming out of academia to address datacenter.  Until this area slows down as it matures, software switching allows the greatest flexibility to adapt without introducing the complexity of having different hardware versions throughout the network.

  • Trevor

    So, how does tying a VM to piece of hardware scale?

    • http://etherealmind.com Etherealmind

      Are you asking how VMware scales a Intel CPU ?

      Guess not. That’s how a network scales on hardware and not in software.

      ( Bit of a simplification – but I’m sure you get the drift of my argument )

      • Trevor

        No, I mean, if you tie a VM to a particular NIC (virutalised) inside a piece of hardware, and then you want to move hardware, ie from Cisco to HP, this breaks the tenets of virtualisation.

        Even with SR-IOV and the like, the hardware vendor is responsible for providing a driver for your guest. I cant see a hypervisor vendor providing a universal NIC driver for every guest to make every guest talk to every virtual instance of a hardware NIC.

  • Caner

    Can Openflow provide the software flexibility and hardware performanceand scalability at the same time?

    • http://etherealmind.com Etherealmind

      In my view, yes. OpenFlow is a management and control plane technology that works by modifying the forwarding tables in the data plane. Although there are possible limitations as a control plane manager such as timing of read/write from devices, processing time, response to network conditions  etc, these can probably be handled in software. 

      Performance and scalability of the data plane (switching fabric and Ethernet interfaces) is determined by the silicon and hardware deployed and thus depends on your manufacturer. 

      It depends™

  • Pingback: Internets of Interest:22 Jul 2011 — My Etherealmind()

  • Pingback: Show 55 – Questions You Should Be Asking Your Cloud Provider()