Response:Revenge of the TOE and TCP Offload problems

Very interesting story from the front lines here where a lot of effort finally discovered the TOE was causing a major problem:

This guy probably spent hundreds of hours testing and researching this problem. He finally admitted to a rather drastic solution, removing the TOE chip from the NICs in multiple servers. From my own research, the firmware on the card can be “problematic” and when the kernel driver is enabled (in Linux or VMware), odd behavior can sometimes be observed, including dropped packets, resets or suboptimal performance. But there’s lots of controversy surrounding this issue.

via Revenge of the TOE - Packet Pushers.

I’m hearing a lot of reports of the problems with TOE drivers and hardware. A recent podcast with Jim Gettys about Bufferbloat also was a problem:

NIC Offload engines generate bursts of line rate packet streams at multi-gigabit rates. These features are now “on” by default even in cheap consumer hardware including home routers, and certainly in data centers. Whether this is advisable (it is not…) is orthogonal to the reality of deployed hardware and current device drivers and default settings.

 The Internet is Broken, and How to Fix It

I’m beginning to think that TOE might be something to avoid. It’s also worth noting the latest generation Intel processors with DPDK make TOE unnecessary. And CNAs for FibreChannel.

The times are changing.

About Greg Ferro

Greg Ferro is a Network Engineer/Architect, mostly focussed on Data Centre, Security Infrastructure, and recently Virtualization. He has over 20 years in IT, in wide range of employers working as a freelance consultant including Finance, Service Providers and Online Companies. He is CCIE#6920 and has a few ideas about the world, but not enough to really count.

He is a host on the Packet Pushers Podcast, blogger at EtherealMind.com and on Twitter @etherealmind and Google Plus

You can contact Greg via the site contact page.

  • Ryan Malayter

    The best argument against TOE I’ve heard came from the Linux kernel mailing list. It went something like this:

    “Why would you take the world’s most hardened, reliable, and interoprable TCP/IP stack and replace it with a dumb closed-source version baked into silicon that can’t be easily changed, even if it has security vulnerabilities?”
    In our shop we disable all hardware TCP acceleration features via Windows group policy or Linux deployment scripts. VMware used o default it to off at the hypervisor layer, not sure about 5.1.
    The CPU overhead of TCP/IP is basically zero on modern hardware, and TOE is just a premature optimization with a lot of buggy implementations.

    • Will

      Also it makes your wireshark capture look ‘ugly’ with all the checksum errors.

      • Bill Karn

        Umm, that’s the least of your Wireshark problems when these features are enabled. When the packet lengths are over 1500 bytes (or 9k for jumbo frames) that should be a dead giveaway that you are not capturing the packets that are actually hitting the wire.

  • Michael Gonnason

    It reduces heat output of the CPU. The NIC is much more efficient at TCP than the host CPU. Most NICs have firmware that can be upgraded for bug fixes and patches,

    • Ryan Malayter

      TOE almost nothing to reduce heat, since modern CPUs spend <<5% on TCP/IP-specific functions with the overwhelming majority of server workloads. You could argue that a CPU-based firewall, router, or IDS might need the extra help of a TOE with all the traffic, but in reality you need to do lots of intensive inspection of the TCP/IP protocol data on such devices, which has to be done in software anyway. So TOE is of little use.

      Firmware updates are operationally painful, and usually require extended downtime (often with a physical server touch). But firmware updates do little to address vulnerabilities, mis-features or bugs baked into the silicon.The drivers and firmware supplied by manufacturers have sucked uniformly since the introduction of TOEs, and have caused nothing but problems. Even on Intel NICs.

      • Michael Gonnason

        Hm I posted a link to a study done, but I think it got eaten…

        Network traffic is actually rather spendy CPU wise.

        General rule of thumb is 1 bit requires 1Hz to process, So 20Gb of throughput requires approx. 20Ghz of processing power, or 8 2.5Ghz cores.

Subscribe For Weekly Updates by Email

Get a Weekly Summary of Latest Articles and Posts to your Email Inbox Every Sunday

Thanks for signing up. Look for the email from MailChimp & make sure you confirm your email address. You may need to check your spam or gmail settings to be sure of receiving the email.

Note: You can unsubscribe at any time using the link at the bottom of every email.