Wednesday, February 24, 2010

Jumbo Frames Demystified

The Promise and Peril of Jumbo Frames

We sit at the intersection of two trends:

  1. Most home networking gear, including routers, has safely transitioned to gigabit ethernet.
  2. The generation, storage, and transmission of large high definition video files is becoming commonplace.
If that sounds like you, or someone you know, there's one tweak you should know about that can potentally improve your local network throughput quite a bit -- enabling Jumbo Frames.
The typical UDP packet looks something like this:
udp packet diagram
But the default size of that data payload was established years ago. In the context of gigabit ethernet and the amount of data we transfer today, it does seem a bit.. anemic.

The original 1,518-byte MTU for Ethernet was chosen because of the high error rates and low speed of communications. If a corrupted packet is sent, only 1,518 bytes must be re-sent to correct the error. However, each frame requires that the network hardware and software process it. If the frame size is increased, the same amount of data can be transferred with less effort. This reduces CPU utilization (mostly due to interrupt reduction) and increases throughput by allowing the system to concentrate on the data in the frames, instead of the frames around the data.
I use my beloved energy efficient home theater PC as an always-on media server, and I'm constantly transferring gigabytes of video, music, and photos to it. Let's try enabling jumbo frames for my little network.
The first thing you'll need to do is update your network hardware drivers to the latest versions. I learned this the hard way, but if you want to play with advanced networking features like Jumbo Frames, you need the latest and greatest network hardware drivers. What was included with the OS is unlikely to cut it. Check on the network chipset manufacturer's website.
Once you've got those drivers up to date, look for the Jumbo Frames setting in the advanced properties of the network card. Here's what it looks like on two different ethernet chipsets:
gigabit jumbo marvell yukon advanced settings   gigabit jumbo realtek advanced settings
That's my computer, and the HTPC, respectively. I was a little disturbed to notice that neither driver recognizes exactly the same data payload size. It's named "Jumbo Frame" with 2KB - 9KB settings in 1KB increments on the Realtek, and "Jumbo Packet" with 4088 or 9014 settings on the Marvell. I know that technically, for jumbo frames to work, all the networking devices on the subnet have to agree on the data payload size. I couldn't tell quite what to do, so I set them as you see above.
(I didn't change anything on my router / switch, which at the moment is the D-Link DGL-4500; note that most gigabit switches support jumbo frames, but you should always verify with the manufacturer's website to be sure.)
I then ran a few tests to see if there was any difference. I started with a simple file copy.
Default network settings
gigabit jumbo frames disabled file copy results
Jumbo Frames enabled
gigabit jumbo frames enabled file copy results
My file copy went from 47.6 MB/sec to 60.0 MB/sec. Not too shabby! But this is a very ad hoc sort of testing. Let's see what the PassMark Network Benchmark has to say.
Default network settings
gigabit jumbo frames disabled, throughput graph
Jumbo Frames enabled
gigabit jumbo frames enabled, throughput graph
This confirms what I saw with the file copy. With jumbo frames enabled, we go from 390,638 kilobits/sec to 477,927 kilobits/sec average. A solid 20% improvement.
Now, jumbo frames aren't a silver bullet. There's a reason jumbo frames are never enabled by default: some networking equipment can't deal with the non-standard frame sizes. Like all deviations from default settings, it is absolutely possible to make your networking worse by enabling jumbo frames, so proceed with caution. This SmallNetBuilder article outlines some of the pitfalls:

1) For a large frame to be transmitted intact from end to end, every component on the path must support that frame size. The switch(es), router(s), and NIC(s) from one end to the other must all support the same size of jumbo frame transmission for a successful jumbo frame communication session.
2) Switches that don't support jumbo frames will drop jumbo frames.
In the event that both ends agree to jumbo frame transmission, there still needs to be end-to-end support for jumbo frames, meaning all the switches and routers must be jumbo frame enabled. At Layer 2, not all gigabit switches support jumbo frames. Those that do will forward the jumbo frames. Those that don't will drop the frames.
3) For a jumbo packet to pass through a router, both the ingress and egress interfaces must support the larger packet size. Otherwise, the packets will be dropped or fragmented.
If the size of the data payload can't be negotiated (this is known as PMTUD, packet MTU discovery) due to firewalls, the data will be dropped with no warning, or "blackholed". And if the MTU isn't supported, the data will have to be fragmented to a supported size and retransmitted, reducing throughput.
In addition to these issues, large packets can also hurt latency for gaming and voice-over-IP applications. Bigger isn't always better.
Still, if you regularly transfer large files, jumbo frames are definitely worth looking into. My tests showed a solid 20% gain in throughput, and for the type of activity on my little network, I can't think of any downside.


 Comments Worth Noting:

i have been building networks for broadcasters for over a decade - who always wanted bigger / faster / more type networks.
jumbo frames are great in theory, but the pain level can be very high.
A core network switch can be brought to its knees when 9 Kbyte frames have to be fragmented to run out a lower MTU interface.
Many devices dont implement PMTU correctly, or just ignore responses - video codecs seem particularly prone to this.
and wasnt there a discussion a few newsletters ago about dont try to optimise things too much? If you need 20% more network performance, but you are only operating at maybe 40% load, then you need a faster machine or a better NIC card.
And there have been something like 5 definitions of jumbo just in the cisco product line. Also telecomms manufacturers idea of jumbo often have frames with 4 Kbytes, not 9 Kbytes.....
And just to set the record straight - the reason for the 1514 bytes frame limit in GigE and 10G ethernet is backward compatibility.
Just about every network has some 10/100 (or 10 only) equipment still, and the 1514 limit has been built into other standards such as 802.11 wireless LAN.
the old saying is that God would have struggled to make the world in 7 days if he started with an installed base...

------------------------------------------------------------------------

Just a couple things to point out.
File transfer is typically done using TCP, not UDP. TCP has more overhead than UDP.
I'm curious why we see a sawtooth pattern in the un-jumbo framed graph. Is that TCP Vegas doing its thing?
I'm glad you've gone ahead and tried this out. Jumbo frames wouldn't exist if they didn't have a purpose, but with all the different kinds of traffic I think 1500 MTU is a good choice.
One with jumbo frames that you touched on, but didn't adequately explain, is that most consumer switches use the store-and-forward method of switching packets. This means that your switch must receive the whole packet before it can send it along, it can't be doing anything else because packets can't be multiplexed. This can cause unacceptable latency (you have 2 computers, not a big deal, but between several machines all trying to send data, you can end up with some seriously delayed packets).
I just would have liked to see more reasons not to do this that it's not a supported standard and doesn't work with a lot of hardware. There are other reasons this has not become the default.

----------------------------------------------------------------------

@Bob from what I have seen IPv6 is potentially a bigger problem than IPv4, because where an IPv4 router may see that the packet is too large and fragment it, IPv6 leaves it to the end devices.

---------------------------------------------------------------------------------


Jumbo frames are great. I work on VMware ESX networking, and I will point out what may not be obvious to everyone. In a virtualized environment (hosted or hypervisor) jumbo frames make an even bigger difference, since you are doing more work per packet to begin with. That's why we added jumbo frame support since ESX 3.5 shipped.
My experience is that any recent machine can easily push full 1Gbit line rate (on native, and for that matter ESX VMs). Setting Jumbo Frames will save you on CPU though, which will allow you to run more VMs or otherwise use that power. And while Jumbo Frames are nice- they get you from 1.5k packets to 9, TCP Segmentation Offloading (TSO) is much better, since you push down entire 64k (or sometimes up to 256k) packets, and an engine on the NIC itself automatically handles dicing them into 1.5K packets. Most good nics support this- Intel, Broadcom, etc. On the other side, the reverse is LRO, or RSS, but this is more complicated and less common. Plus with TSO, you don't have to worry about MTU.
The other thing I would mention is- for the love of god, don't run networking benchmarks by doing a file copy. With 1GBit networks, you are limited by your disk speed! Run a proper tool such as iperf (brain dead simple) or netperf, which just blasts data. Even if your hard drive could reach 1Gbit speeds, you would be wasting cycles, so your networking performance would be worse. You always want to look at these things in isolation.

--------------------------------------------------------------------------------------------

The reason that all these people are seeing performance improvements using Jumbo Frame on Windows is because Windows networking stack sucks. Windows is really stupid and often will not let a single tcp stream reach the full capacity of the NIC. I.e. you run 1 TCP stream and measure 400Mbits, but if you ran 3 in parallel you would hit 940Mbits (~Line rate). This is even more annoying with 10G, since you need like 18 streams to reach the peak performance. Linux doesn't have these problems, and will give you its best possible performance on a single stream. I can only imagine Window's behavior is the result of some misguided attempt at ensuring fairness between connections by making sure that even if there is only one connection, it never uses the full capacity.

--------------------------------------------------------------------------------------------

If you simply enable jumbo frames on your NIC, every connection to any Internet destination (which don't support jumbos) will need to undergo PMTU discovery, PMTU blackhole detection, router fragmentation, or other time-consuming / performance-sapping hacks. This might explain why people complain about latency issues with gaming. These people are also seeing slightly slower peformance with all Internet activity.
*nix, as/400/, mainframes, and other operating systems let you set the frame size on a per route basis. E.g.,
route add -net 0.0.0.0 gw 192.168.0.1 mss 1460
This tells the OS to use jumbo frames only on the local LAN, and to assume a normal packet size everywhere else.
Alas, Windows has no such ability. One solution on Windows is to use two NICs attached to the same network. Have one NIC configured with normal frames and the default route. Have the second NIC configured for jumbos with no default route.

---------------------------------------------------------------------------------------

I participated in the IEEE 802.3 committee for a while. IEEE never standardized a larger frame size for two reasons that I know of:
1. The end stations can negotiate the frame size, but there was no backwards-compatible way to ensure that all L2 bridges between them can handle it. Even if you send a jumbo frame successfully, you can still run into a problem later if the network topology changes and your packets begin taking a different path through the network.
2. The CRC32 at the end of the packet becomes weaker after around 4 KBytes of data. It can no longer guarantee that single bit errors will be caught, and the multibit error detection becomes weaker as well.
One is free to enable it, and it does improve the performance, but the situation is unlikely to ever get better in terms of standard interoperability. It will always be an option to be enabled manually.
Also a number of years ago,. jumbo frames provided a much bigger boost. Going from 1.5K to 9K regularly doubled performance or more. What has happened since is smarter ethernet NICs: they routinely coalesce interrupts, steer packets from the same flow to the same CPU, and sometimes even reassemble the payload of the 1.5K frames back into larger units. The resistance to standardizing jumbo frames resulted in increased innovation elsewhere to compensate.

-----------------------------------------------------------------------------

@Timothy Layer 2 ethernet switches will just drop packets they cannot handle. It is not just if they don't handle jumbo frames: they can drop a normal size packet if their internal queues are full, or if rate limiting has been configured, or if the switch hit some other internal condition which the ASIC designer didn't bother resolving. They just drop the packet and expect the sender to retransmit. There is no mechanism for an L2 device to send back a notification that it has dropped the packet. A managed L2 switch will have some counters so you can post-mortem analyze what is wrong with your network.
Layer 3 routers will drop packets for more reasons, in addition to queue congestion. For example when the packet is too big and the don't fragment bit is set, an ICMP message is sent back (this is how path MTU discovery works). Similarly routers send back ICMP messages if they have no route to the destination.
Even the ICMP is just a best effort notification. Routers routinely limit the rate of ICMP messages they will send, to avoid having a flood of ICMP messages make a bad network problem worse. ICMP messages can also be dropped on their way back to the originator. So the best the sender can expect is it _might_ get notified of an L3 problem, sometime.

--------------------------------------------------------------------------


Disclamier: Credit to original Poster
http://www.codinghorror.com/blog/2009/03/the-promise-and-peril-of-jumbo-frames.html

No comments: