Responding: On optimizing traffic for network virtualization

In a blog post On optimizing traffic for network virtualization Brad Hedlund, now working for DELL/Force10 (having recently handed in his Cisco fanboy card :) ), attempts to make the point that virtual overlays and tunnels are not problems and that we should just get used to being shafted by dumb ideas.

Pop over and have a read. Yes, I’ll wait right here. Then pop back and read my response.

Point 1 – Lossless Networking

Brad seems to have, perhaps conveniently, forgotten that a Data Centre network should be lossless. And when using tunnels and overlays, it’s highly unlikely that you will be able to build a lossless network. Traffic inside tunnels is not easily parsed and detected for QoS capabilities. Ethernet allows for only five useful QoS levels. For example, you might put a VXLAN tunnel in ToS of 4 but FCoE at ToS of 5 – this means that VXLAN will have frames discarded, probably regularly. Evan just a few lost transmissions will cause temporary performance drops of around 50%.

Unless you are planning to throttle network performance as the server edge so that blocking cannot occur. Or you have plans to not allow over-subscription. Both of those ideas are ludicrous in a modern network – oversubscription is a way of life ( although it wasn’t in the past for this reason).

The point isn’t that tunnels are inherently bad, it’s that they blind the network to content detection. The network loses visbility of the data. And tunnels can shift traffic flows without integrating with the network to adapt to that change.

Now, vendors might want to sell 2 or 4 times more hardware to solve that problem but customers DO NOT want to operate or power that much hardware. And maybe that’s Brad’s viewpoint, there is a solution. Solution: build lots of isolated networks for each traffic type.

Point 2 – Troubleshooting

Coming back to the loss of visibility. Today, we can detect or match traffic flows in the network layer and redirect them to load balancers, IDS/IPS, or sniffers for trouble shooting. The use of VXLAN tunnels means that a second set of networking tools will be needed. To capture packets in a VXLAN will require a device that become a member of VXLAN group as a VTEP and derive a copy of every packet or, at least, packets that match the criteria.

Most likely, this means a virtual appliance on either the source or destination, and probably operating in the VM Direct Path so as not to damage the VM performance too badly. At this point, there are no software tools that can do this, and it’s not core business for VMware who is busy developing Java and email tools and doesn’t much care about infrastructure right now.

Loss of visibility is a serious concern for troubleshooting complicated problems and there are very few answers to getting network visibility today. And I’m not seeing any in the future.

Point 3 – Indeterminate Performance

My current research on software switching suggests that forwarding performance is quite poor. Current generation Intel servers can forward about 4 gigabits/second of data across its internal bus to the network adapter provided that the CPU is not otherwise used. If the CPU is heavily loaded with other tasks, then forwarding performance might, and will, be seriously impacted.

That lack of guarantee is a serious problem in large networks that cloud providers use.

Therefore, software switching isn’t really working yet. I can accept that it might in the future if CPU and bus performance continues to improve, but that is not guaranteed. Therefore, making a point that software switching could work one way, is conveniently forgetting all the other ways in which it’s a problem.

The Risky Trombone

Lets clear up a misconception, the Traffic Trombone is not a major problem inside a Data Centre where the Ethernet fabric is coherent.

Caption Text.

The Traffic Trombone.(Click for a full size image)

The Traffic Trombone is a problem between data centres. There is no way to build an Ethernet fabric that spans data centres. Yes, that’s a blanket statement. If you can build a fabric then the data centres are not far enough apart to matter. Fifty kilometres are not redundant, that’s just a nominal decrease in risk.

That said, there are data centres where too many trombones will cause problems. You can easily overrun the available bandwidth in a data centre if too many application servers are tromboning. And troubleshooting that condition is a real problem, one that has service impacts and big dollar signs attached to it when the network collapses.

In an era of uptime, risk free computing, this isn’t a good answer. I’m pretty confident that Amazon and Google don’t trombone traffic for this reason. Neither should you. But if you must, go right ahead and take that risk. It’s your network.

The EtherealMind View

At this point in time, I believe the path to networking future isn’t going to be defined by hardware, fabrics, or debates over tunnels or overlays. Network professionals need software tools that give network visibility, reporting, and operational confidence.

I don’t really care about VXLAN, or OpenSwitch, or whatever the latest “blah blah cloud” technology is this week. These are all good enough solutions to certain problems. Frankly, VXLAN looks like DLSW for SNA to me, and I tend to think that we will regret tunnels and overlays just like we did DLSW in 2001 because they create state where it’s not needed and where it’s proven to work badly.

What I want is the vendors to deliver me management tools that provide visibility and operations. And that doesn’t include OpenView, or Tivoli, or BMC Patrol are any of the tools that we have today. They are, all, well proven failures. Give me that discussion in 2012. Don’t patronise me with “buy more stuff”.

  • @netmanchris

    Hi Greg,

    great post! One small thing. I would think you meant CoS in point one, not ToS. Other than that, Merry Christmas and I hope Santa brings you everything you asked for!

  • Mike

    Greg, Thanks for the pointer to Brad’s blog.  He makes some good analogies.A few comments on your points.

    Point 1: QoS classification and marking is done at the edge.  When packets are encapsulated into tunnels, the markings are promoted to the outer header.  Thus, I don’t believe that this will be an issue (or at least will not make the current problem any worse).  

    Point 2:  I agree that the tools are not here yet for VXLAN, but that likely rings true for all of the new DC forwarding technologies (FabricPath/TRILL/SPB).  I can picture it making some things easier in the future such as having the ability to hook into any “VLAN” from anywhere in the network.  I’m sure there is some analogy to ERSPAN here….

    Point 3:  This is a bit of a religious debate.  If the CPU is really busy due to software switching, does forwarding performance matter ?  It’s not an intermediate hop, it’s the endpoint and the CPU/Application must process the packet.  Regardless, I don’t see how this is relevant to your point and once VXLAN is in hardware with SR-IOV NICs, it becomes a orthogonal issue.

    In general, I agree with you on the issue of interDC tromboning.  It is an issue, but it’s an issue that can exist with any L2 DC interconnect (i.e. OTV, etc.)

    • Mike

      Oops, my comment on Point 3 should read:
      If the CPU is really busy, does forwarding performance of software switching really matter ?  Presumably, the CPU is really busy due to application load.

      • Etherealmind

        Yes, forwarding performance matters since you won’t be able to maximise data processing output. If data I/O slows down then CPU could starve for processing requests thus driving lower overall utilisation.

        And since 100% utilisation is your infrastructure target you can’t plan for second best.

        • Mike

          I agree that if data I/O slows, CPU could starve.  My comment was more regarding your statement “If the CPU is heavily loaded with other tasks, then forwarding performance might, and will, be seriously impacted.” My point is that if the CPU is heavily loaded due to application tasks, more forwarding performance will simply overload it even more.

          • Etherealmind

            That’s not how application service providers ( oops sorry, Cloud Companies) think. Nor do the service level agreements.

            Certainty and guarantees are required. Guessing is no longer acceptable.

    • Etherealmind

      Why use VXLAN in hardware on HBA type NICs when you could use 802.1BR ?

      Why tunnel and perform tag copying when you could use standard Ethernet and generate stateless tunnells with 802.1Qbh/802.1BR ?

      Why not use MPLS, or PBB as control plane mechanisms to direct the forwarding of frames ?

      Why reinvent a wheel ? Why not use Ethernet 802.2 frame format to exchange data directly at Layer 2 ?

      VXLAN is stupid because it’s VMware play to own the network and put it into the administrative control of vCenter (Embrace, Extend, Extinguish).

      Instead of solving MPLS, or SPBB, or some other proven control plane, VMware has opted to be cheap and short sighted. But that’s just me. :)

      • Mike

        The simple answer is that none of these technologies are in the DC today and they require a rip and replace of the entire network.  

        Also, short of running MPLS or SPBB from the hypervisor softswitch, they all also require more than 1 technology to achieve the same function as VXLAN (802.1Qbh/br only addresses the 1st hop).

  • Brad Hedlund

    Thanks for the thorough response.  Your first paragraph gave me a good laugh 😀
    Allow me to briefly respond to some of the points you make here.

    Point #1 – I think you’ve incorrectly assumed here that FCoE would be encapsulted in VXLAN.  That would not be the case.  You can still provide lossless forwarding to FCoE, because nothing will have changed from the way it’s done today.

    Point #2 – Pretty much agree with you here.  There’s definitely going to be a market demand for these capabilities. Lets see who steps up to the plate.

    Point #3 – I would be curious to see your research here, if you’re willing to share it.  Those in favor of soft switching seem to have some data that contradicts your findings.  I think I recall seeing this on Martin Casado’s Network Heresy blog. No?

    The Risk Trombone – Agree with you here too.  The focus of my blog was on internal data center flows, though I probably could have been more clear about that.

    Thanks again Greg.  Have a great new year!


    • Etherealmind


      FCoE would never be encapsulated in anything. That’s wouldn’t make any sense at all. The impact on storage service would be severe. Not that it matters, FCoE is already dead and the corpse is silently rotting in the data aisles. Smells bad too.

      No, lossless forwarding is for critical applications. The ones that make real money. .

      I’ll get your some links on the performance limits. Martin Casado believes that by the time software switching arrives that Intel CPU/Memory/Bus performance will have advanced into hyperspace performance and therefore won’t be a problem. Which is ….. convenient for a company building a control plane over OpenSwitch.

      The reality, of course, is that Intel CPU architecture has reached some sort of plateau and it’s I/O architecture is particularly at risk over the next two years. 64 bit I/O helped, but 128 bit won’t. Where to ? It’s not clear.

      And you. Look forward to meeting at the next Network Field Day in March ?

      • Brad Hedlund

        Yes, I’m hoping to be at Network Field Day in March.  Looking forward to finally meeting you, Ivan, and the others.

      • ioshints

        That performance figure is plain wrong (unless you were trying to push traffic through userland, like VM-based routing or load balancing).

        Juniper managed to push 30+ Gbps through vSwitch with firewalling using pretty reasonable server.

  • dj spry

    Great post as usual from both you and Brad! 

    All of this info has just brought out more questions. 

    Point #1 – The FCoE traffic will be in its own separate VLAN.  I presume this traffic will never ask VMware for any VXLAN support and will traverse out normally and all DCB/FIP/COS concerns should be treated as they would regardless of VXLAN.  Is this the correct assumption?  I believe this is the point Brad was making I just want to verify. 

    Point #2 – Agreed.  This means operators/customers will have to manage two different networks.  The physical and the overlay.  Seems a bit to me like the same server vs. network admin argument all over again. 

    Point #4 – Is there any reason VXLAN work will not work with MPLS type solutions?  I am not all that thrilled with having to deploy OTV vs an open standard such as VPLS or the upcoming E-VPN.  I lean more to the benefits of MPLS such as TE and FRR for DC interconnects.

    New question –  How does this interop with VEPA?  From what I understand the softswitch has to support VXLAN so in this case the 1000v.  Yet also VEPA forces all traffic to a physical switch.  VEPA is something my customers are very interested in from a security aspect and VXLAN seems to have muddied up the water. 

    Thanks again everyone for the excellent information! 

  • Peter Phaal

    Hi Greg,

    Great article!

    Regarding point #2 – troubleshooting/visibility, I wanted to bring your attention to the widespread support for the sFlow standard in data center switches. With sFlow, switch hardware randomly captures packet headers and immediately sends them to a central collector which decodes and analyzes the traffic. Network-wide visibility into tunnels, VLANs, priorities, TRILL, SPB, FCoE, VXLAN etc. is simply a matter of choosing the right collector software – open source and commercial solutions are available today. 

    As you point out, “virtualizing” networking can lead to circuitous traffic paths – visibility is essential in order to manage bandwidth and avoid congestion.


  • Andy Collier

    Are you the world expert on rusty tromboning?