Ethernet Jumbo Frames, Full Duplex and Why Jumbo Frames are 9000 bytes)

I’ve been doing some research into Ethernet and the use of Jumbo frames for some content I’ve been writing and come across something interesting. A document claims that Jumbo frames can only be used on Full Duplex Ethernet connections.

Jumbo frames were proposed by Alteon in 1998 and later Alteon was acquired by Nortel Networks who produced the first devices to support Jumbo frames.

So I’ve dome some scratching on that itch and not got very many satisfactory answers.

Q. Why do we want jumbo frames ?

A. To improve server performance. How ?

  • Larger frames translate into less packet overhead on the server.
  • Less packets means fewer interrupts on the server and a smaller CPU load.
  • Less interrupts means less delay on the server bus.
  • Larger frames provide better buffer utilisation and forwarding performance in switches.
  • Less packets means less routing CPU/performance used
  • Less packets means less network overhead in terms of headers and frame formats melding to better utilisation.

Let me make some extrapolations based on the available data . Jumbo frames can only be used on full-duplex Ethernet connections because:

  1. the IEEE 802.2-1998 standard has never adopted or ratified Jumbo frames ( because it’s not backward compatible with existing Ethernet equipment) therefore the use of oversized frames on half duplex connections would be problematic.
  2. Ethernet in half duplex mode assumes that there is an active shared bus with more than two workstations and therefore the CSMA/CD MAC layer is active and is detecting collisions. Once frames are longer that 1518 bytes, the IEEE 802.2 CSMA/CD would mark these as giant frames and regard them as errors.

Metcalfe enet

Proof ?

Unfortunately, I cannot prove this. I don’t have access to a lab where I could configure this with any level of confidence by setting two switches to half duplex and enabling . For example, to configure this in a hem lab would imply that the server software and hardware network interface was capable for jumbo frames and that the switches were of sufficient quality to be confident that the jumbo frames were working correctly.

Earlier this year, Jason Boche wrote post Jumbo Frames Comparison Testing with IP Storage and vMotion which suggested that IP Storage and Jumbo did not offer a performance benefit for his testing. This is counterintuitive but Jason knows his stuff and is highly trustable, the test is rigorous and his results clearly show limited benefit.

Knowing that half duplex links might cause Jumbo frames to fail would be interesting research – since it’s possible that either the server or the storage array might have be in half duplex mode.

If anyone can test and advise me, I’d be pleased to post the results here.

Postscript

I found this interesting data in the research process.

Q. Why are Jumbo frames 9000 bytes long ?

A. Because the CRC field isn’t long enough to guarantee detection of errors for frames larger than 9000 bytes.1

 

Response

Denton Gentry has written Requiem for Jumbo Frames where he talks working a hardware developer for NICs and the fact that performance benefits of Jumbo Frames were driven by performance of CPU/Memory bus in the server. This was offset the use LSO traferring large chunks of data for the NIC driver to handle which segmented into 1500 bytes payloads with CPU interrupts.
Hence Jumbo frames are not the win we might have thought.

  1. http://staff.psc.edu/mathis/MTU/arguments.html – The undetected packet error rate was calculated using the following assumptions: 1- The “raw strength” of the CRC is 1 part in 232. I.e. a single arbitrary burst error will yield a “random” CRC, which will be a false pass once per 4 × 109 packets. Actual CRCs are stronger than this because all errors patterns in some of the more common cases (e.g. single bit errors) can be proven to never cause a false pass. 2- The ability of the CRC to detect a given burst error is not affected by the amount of correct data in the same packet. 3- The probability of there being 2 burst errors in the same packet is low. If this is not the case, you introduce second order terms on both sides of the calculation. 4 – I am in complete agreement that CRC-32 is not strong enough for large data sets. It probably does need to be improved. The CRC issue does not provide an argument for limiting MTU, only that current Ethernet may not be suitable for large data sets. ( Author’s emphasis)
  • http://twitter.com/MrFogg97 David Bulanda

    Greg,
    Interesting. But can you set half-duplex on an interface faster that 100 Mbit? So I would think that Jumbo frames on a slower speed link would be less than useful. 

    And if you have a half-duplex link in a network you could be stuck with a duplex mis-match which I would think be just a destructive to the network.

    Just some thoughts.

    • http://etherealmind.com Etherealmind

      1GbE has a half duplex mode to support hub type operations – it’s rarely used.

      1GbE is certainly able to auto-neg half duplex when poor cabling is used or certain other conditions occur.

      • http://packetspersecond.wordpress.com/ @BriMcS

        I don’t know of any switch that supports 1GbE HD. Generally, smart-autosensing on a 1000/100/10 Mbps interface will force re-negotiation of a link to a lower speed when errors are detected.

  • Anonymous

    With regard to Jason’s testing, in a lot of cases a server’s TCP offload engine makes up for what jumbo frames would have achieved. If your goal is jumbo frames that’s one thing, but if the goal is reduced server load you get it without having to redesign your infrastructure.

    • http://etherealmind.com Etherealmind

      TOE only improves throughput by reducing latency for frame preparation and transmission. Jumbo frames work by removing multiple round trips for data payload and further improve throughput.

      I suspect that the problem is frame stuffing where applications or drivers are not stuffing frames to optimal capacity when using jumbo byte counts. I’m particularly suspect of storage arrays and their ability to correctly drive Ethernet in an optimal fashion. But I don’t have access to a lab and skills to test the hypothesis so I can only make noises about it.

  • http://twitter.com/#!/Daniel_Bowers Daniel Bowers

    Rather than “To improve server performance”, I’d say the larger value of jumbo frames is simply to “increase bandwidth a server can provide”, or maybe “increase utilization of a network connection”.   With jumbo frames, a server that was streaming, say, 4Gb of video traffic on a 10Gb connection might be able to get up to 5.5Gb.   That aligns with Jason’s statement thatt “jumbo frames provided a 7-15% reduction in elapsed vMotion time”.   Reducing CPU resources is a plus, but mainly because it gets the CPU out of the way of the flowing bits.

    Offloading, jumbo frames, large NIC buffers, RSS, RDMA…They all band-aids to the latency and bottlenecks caused by the problem that x86 servers make poor switches.

    • http://etherealmind.com Etherealmind

      I wouldn’t use the term “bandwidth” because you are actually referring to throughput. Because you can send a single 9000 byte frame instead of approx six 1500 byte frames you get more throughput because of less round trip times in frame transmission. vMotion would have been improved because of better throughput on the same bandwidth.

      But yes, x86 architectures make poor switches – in fact, high latency switches which is kind of worse.

      • http://twitter.com/#!/Daniel_Bowers Daniel Bowers

        Yes, you’re right, ‘throughput’ is the better term.

      • Priscilla

        Why do you say “round trip?” Bits go out; they don’t come back, or get acknowledged in Ethernet. You get better throughput because of fewer inter-frame gaps, and fewer headers (if we’re talking throughput of upper-layer data, the header comment is relevant). 

        • http://etherealmind.com Etherealmind

          I guess my perspective shifted to the application in the context of Daniels comment. That is, he is referring to application performance which is impacted by all those factors.

          Of course the article is focussed on Ethernet and you make a good point, Ethernet doesn’t care about lost frames – (although we are trying to strap jet packs onto that camel in DCBX).

  • Jason Boche

    Thank you for the pingback Greg. I’ve been hoping someone could put together emprical data showing substantial improvements with Jumbo Frames + the vSphere use case.  As you know, I don’t see them on my lab gear.  Chad Sakac has chimed in with similar results on newer gear in the vSpecialist labs basically saying my results are consistent & from their tests, the results don’t improve much on modern switching.  The 3COM Super Stack I tested on was configured for 1000FD on the ESX ports as well as the EMC Celerra CGE ports.  Since writing that post, I’ve sort of gone against my own recommendation by enabling Jumbo Frames in the lab because I want to test again with Storage + ESXi 5 as well as with the new link aggregation for vMotion in vSphere 5.  I want to be clear that I’m not a jumbo frames bigot at all. Rather, I’d love to see it provide an infrastructure performance or scalability boost with real numbers, but I haven’t seen any numbers with substance – only marketing.  That is why my broad recommendation, everything else being equal, would be to forgo jumbo frames as a design decision.  Not enough bang for the buck.  There may also be increased risk to an environment with jumbo frames enabled due to the added complexity & the impacts to a consolidated environment when an errant change is made which affects jumbo frames.

    • Michael Webster

      I haven’t tested with 1Gb/s Links much, but with 10Gb/s links Jumbo makes a huge difference. I can vMotion at over 1250MB/s on my 10G network in my lab with a single 10G interface used. This is ESXi 5. With 4.1 it was just over 1100MB/s with Jumbo and round 800 – 900MB/s without Jumbo. I would definitely enable Jumbo in 10G environments, and just to get cpu reduction and higher throughput in 1Gb/s environments where the config won’t cause any problems. 

  • Pere Camps

    I think the reason why you can’t use jumbo frames on half duplex links are repeaters.

    Repeaters are allowed on a shared medium as per Ethernet/802.3 specs. However these same specs stipulate that the maximum frame size is only 1518 bytes — and repeaters are built to these specs.

    So if you put a repeater between two half duplex links and start using jumbo frames then the repeater might/might not start dropping/corrupting the frames and the network connectivity would be broken.

    If you stipulate that you have to use a full duplex link then you can’t use repeaters and your problem is gone.

    I believe this is why IEEE never approved jumbo frames on the 802.3 spec — they rather maintain backward compatibility with 10/100 shared medium networks than to allow jumbo frames.

    Anyway, the following draft explains the situation much better than I do:
    http://tools.ietf.org/id/draft-ietf-isis-ext-eth-01.txt
    — p.

  • http://cdplayer.livejournal.com/ Dmitri Kalintsev

    The only major difference that I can see between jumbo and non-jumbo frames is the number of Ethernet headers+checksum (18 or 22 bytes) per given transmitted data block, plus potential payload padding due to mis-alignment of the transmitted data’s block size and the available payload space.

    While headers are pretty predictable, the worst case for jumbo frames is a transmission of a whole 9K of data on the wire with only one byte useful vs. 1.5K of data on the wire in case of regular frame size.

  • Russell Heilling

    Martin Levy of Hurricane Electric is currently trying to push adoption of Jumbo frames on Internet Exchanges.  Some useful background information in his draft:   http://www.ietf.org/id/draft-mlevy-ixp-jumboframes-00.txt

  • http://twitter.com/ssl_boy Glen Kemp

    This is really interesting, I’ve been mulling enabling the Jumbo frames on a closed DMZ infrastructure to see if it would help or harm performance.  I’ve got some lab time in the next couple of weeks with a new Dell iSCSI array and new VMWare host, connected via Juniper EX4200 switches.  Anyone care to design an experiment to provide some empirical evidence as which is faster on 1G networks?  Will share/blog the results if I get any response..

  • Thomas Kessler

    You should also do a posting on TSO and LRO (TCP segmentation offload and
    Large Receive Offload).  These techniques whether done network adapter hardware
    or in a device driver’s receive routine, when couple with other modern infrastructure
    such as RSS (Receive Side Scaling) and interrupt mitigation (dynamically adjustable interrupt thresholds) can meet or exceed jumbo frame performance in all the situations that I’ve seen while using standard frames.  They provide the same sorts of benefits, especially to the single threaded parts of operating systems, that jumbo frames do.

    As for jumbo frames, certainly the longer the frame gets the weaker the checksum will be.
    Part of why we ended up with 9Kbytes was the trade off of CRC strength versus the most efficient I/O size of the day (mid 1990s).  Certainly at the time file systems wanted to do I/O in 4K to 8Kbyte chunks for best efficiency.

    T

  • http://www.pc-freak.net/ Georgi Georgiev

    I’ve red quite a lot on the topic of Jumbo Frames and how they can be enabled. So far I’ve not seen any thing so creative as your kind of exposure. Most of the info you’re talking about I’ve already learn by other sources. What is new for me is “Jumbo frames can only be used on Full Duplex Ethernet connections.” Are you sure about this? I ask you because I’ve recently written an article on how Jumbo Frames can be enabled on FreeBSD – http://www.pc-freak.net/blog/freebsd-jumbo-frames-network-configuration-short/ and there I point that it is better to enable jumbo frames in half-duplex mode because this can preserve from problems if the other side of the network is not connected to communicate in full-duplex. However if you or someone can confirm your claim, I’ll have to fix that in my article.

    Thanks for the info.

    Best

    G

    • http://etherealmind.com Etherealmind

      There is no standard for Jumbo Frames from the IEEE. Therefore the transmission of non-standard sized frames is not part of the CD back off algorithm and therefore Jumbo frames at half duplex will cause problems.

      Furthermore, I would like to say that you ALWAYS enable auto-negotiation, you should NEVER use half-duplex or full duplex because that is incorrect configuration. Doing so WILL cause problems. I’ve explained Ethernet Auto-Negotiation in this blog post and have supporting references there. http://etherealmind.com/ethernet-autonegotiation-works-why-how-standard-should-be-set/

      Please do not propagate the falsehood about overriding the default configuration for Ethernet auto-negotiation.