Veeam Backup part 2- Using jumbo frames to target storage

January 10th, 2011 |

In my quest to get the most out of my home lab setup when it comes to backup speeds to my IX2-200 (see Veeam Backup part 1- Optimizing IX2-200 backup speeds) today I will configure jumbo frames on my environment, and I will show how each of the possible connection options to the IX2-200 can be configured for jumbo frames.

A small history on network frames, and especially the Jumbo Ones

There are many stories going round about jumbo frames. Some say it is not worth while, others say it is the difference between day and night. But what are jumbo frames in the first place?

As with all things, the idea is relatively simple. At some point in history, people figured that 1500 bytes was a pretty neat size for a network frame (you often see this as MTU=1500). A 14k4 modem would take about a second to send or receive a frame like this. Since you need some sort of header and tail around the data inside a frame, you end up with 1400-and-some bytes of “payload” per single frame (1460 bytes to be exact).

So what to do if you want to send more data than these 1460 bytes? Simple: You carve the data up into several chunks, you fit each chunk inside a frame, and you send all these frames on their way to the target.

As with all things, there are pros and cons to this. One big pro, is that if a frame is lost, you only need to resend 1500 bytes of data. TCP/IP will take care of that for you automagically. But these small frames also have a downside: The network cards and possibly CPUs at both ends have to do a lot of work to handle all these tiny frames, stick them back together and things like that. Also, each packet carries an overhead of 40 bytes in every 1500 bytes total.

So why not simply send larger frames? A generally accepted value for these larger frames is a MTU of 9000, which has a payload of 8960 bytes per frame. These are called jumbo frames. Less overhead in CPU, less overhead in bytes across the wire. A downside of larger frames is, that if a frame is lost and needs to be resent, you have to resent far more data than without these jumbo frames (impacting latency).

You cannot just start sending these larger frames over a network; all networking devices are used to the 1500 MTU framesize, and would choke in larger frames. So all equipment including sender, receiver and all networking components inbetween would have to support jumbo frames. If one device somewhere along the line has no support for it, jumbo frames should not be enabled. Yes, even unmanaged switches can “crash” when you force-feed them jumbo frames!

Especially when jumbo frames are enabled, it might be a good idea to enable flow control if your switches support it. When enabled, a PAUSE frame can be sent from the receiver to the transmitter to stop flooding the receiver with more frames. This will hopefully keep frames from being dumped to /dev/null at the receivers end because the receiver has no room to store the frame. In a full Gbit network (without any 100Mbit connections involved) this would not be an issue, as long as the receiving end (in our case the NAS) can absorb all packets in a neatly fashion.

Setting up jumbo frames

My environment currently does not use jumbo frames at all, so I wanted to make sure everything would keep working when I enabled it. Luckily all my devices can handle jumbo frames, so I should be able to get jumbo frames going.

I did the following to get jumbo frames up and running:

Enable jumbo frames on all (I use Linksys SLM2008) switches in the path between vSphere and the IX2-200 NAS;
Enable flow control on the Linksys switches on all impacted ports between ESX and IX2-200;
Enable jumbo frames on the IX2-200 (take care – reboot of the IX2-200 required);
Create a second vSwitch in vSphere. I used one of my three Gbit connections for this second vSwitch (I can live with a single uplink since I only use it for backing up);
Enabled jumbo frames on the new vSwitches (using the esxcfg-vswitch -m 9000 vSwitch1 command);
Moved my VMkernel port for NFS/iSCSI from my old vSwitch to the new jumbo frame-enabled vSwitch;
Enabled jumbo frames on the VMkernel portgroup (using the esxcfg-vmknic -m 9000 -p VMkernelPG command);
Added a Virtual Machine portgroup to this vSwitch to allow VMs with jumbo frames enabled vNICs to send/receive jumbo frames;
Created a second virtual NIC for jumbo-framed-data in the Veeam VM and enabled jumbo-frames on the vNIC.

That should do it, and nothing even went down because of it ;)!

With jumbo frames installed and enabled, I wanted to make another backup using the direct CIFS connection method. I wanted to make sure that Veeam would use the jumbo-frame enabled NIC for backing up. Strangely, I could not find a way to configure this within Veeam (a preferred NIC option would be nice). I decided to do things the simple way: I moved the base NIC of the Veeam VM to another VLAN (I l-o-v-e SLM2008 switches!), and left the “jumbo frame” NIC in the same VLAN as the IX-200 is. Now Windows should decide to use the NIC in the same VLAN and not the management NIC.

Testing with a jumbo-frames enabled network

I started another backup backing up directly to CIFS. As you could have read in part 1 of the Veeam posts, without jumbo frames I managed to get 8MB/s backup rate. But after enabling jumbo frames, I got a transfer rate of 10MB/s, the fastest yet!

It was a simple step to repeat the test with jumbo frames enabled on the iSCSI-through-vSphere link as well, since I reconfigured the same link to the IX2-200 with jumbo frames, also on the VMkernel side of things. The final backup speed reached using iSCSI through vSphere was 9,5MB/s, almost as fast. The same backup using NFS through vSphere even came in at 11MB/s!

Finally I tested the direct iSCSI connection with jumbo frames enabled. This backup ran at only 7MB/s, which struck me as odd. I think for some reason I did not get jumbo frames enabled on this one. I must admit I did not do further testing on this.

All of this may not seem to be a big deal. But looking at the 8MB/s maximum I got in part 1, and you compare it with the 11MB/s I got with jumbo frames enabled, this is quite a difference: A gain of over 27% in performance! Take a look at the comparing graph:

Compared backup speeds - now with jumbo frame measurements included

Comparing backup speeds – Jumbo frames win most of the time, apart from the direct-iSCSI test.
I think jumbo frames failed to enable in that particular test.

The final setup of the Veeam backup target

Now that I had selected my favorite type target connection, it was time to run a backup off all my VMs. Including a (for my means) massive file server holding over 900GB of data. So I removed all previous drives and shares, and rebuilt a clean setup. Since the NFS-through-vSphere has proven to be the fastest in my setup, I choose that one. I created an NFS share on my IX2-200, and added the NFS share to both of my vSphere nodes. From vSphere, I created a 1.8TB virtual disk (vmdk), provisioned as a thin disk (as always using NFS). This 1.8TB vmdk was mounted to the Veeam VM. Within Veeam, I created a 1MB-aligned partition on the drive (using diskpart). I formatted the disk using a quick format (to keep the thin disk from ballooning to its full 1.8TB size). I decided to quickformat the disk using a 64KB clustersize, since I figured that would be the best performer for writing larger blocks of data.

Creating an “all VM” backup job in Veeam

After the Veeam backup target configurations were made, I created the one big job within Veeam for backing up everything. I created a job for each VM individually, because I have a different setup than most.

I have been backing up to a dedup appliance for some time now, and I now give all my VMs a separate, independent disk for (Windows) swap. I do not include these swap disks in any backup. This easily takes a GByte of changed blocks per VM out of the equation.

Within the Veeam backup job I had no trouble excluding these disks. I do not want to backup swap space in the first place, and a lot of writes take place there (impacting CBT and dedup effectiveness the most).

When I started the backup, I noticed this setup did not go as fast as I had previously measured. What was different? I had formatted the NTFS target with a 64KB cluster size! This proved to sink the performance down to only 6MB/s. So I cancelled the Veeam job, reformatted the target back to default NTFS cluster size (using quick format!), and restarted the backup job. Decent backup speeds were now seen again (between 10 and 15 MB/s per VM), depending on the number of white space in the disks. On average, the full backup of the system disks ran at 13MB/s.

Troubles backing up the file server

All went well, backups kept going on an average of 13MB/s. But then Veeam hit the 900GB file server. It started out ok, then froze at a processed size of 8GB (which is exactly the size of the system drive).

Some hours later (!!) it was still at 8GB. I looked at the Veeam VM, and I noticed that Veeam had actually mounted the system disk of the file server to drive H:. But the data disk of the file server was not visible here. Looking closer, that was no surprise. I had used up all available disk letters! network drives start at J:, and I had the I: drive still manually connected to the IX2-200 CIFS share. As soon as I removed the (no longer needed) I: drive, it was replaced with the data drive of the file server!

Still, backup was frozen at 8GB. Not pretty. Cancelling the job had no effect, it just kept hanging at 8GB progress. Not stopping, not failing. So I decided to do the thing I love best:

Break stuff!

Because Veeam was going nowhere, I shut down the Veeam VM. Then to clean up the mess: I removed the file server virtual disks from the Veeam VM’s config, and then I removed the snapshot from the fileserver. Then I restarted the Veeam VM, and since this was my first backup anyway, I got rid of all data previously backed up.

Then I started the job again. This time all went smoothly, and the file server now was backed up beyond the 8GB point.

Some graphs during the backups of system-drive only VMs

Graphs are always nice to show. I took some screenshots of the VI client’s performance graphs during backup:

CPU usage of the Veeam VM during backup of multiple VMs.
Note that each full backup (8-10GB; no CBT) takes about 10 minutes

Source disk read speed performed by the Veeam VM during backup of multiple VMs

Latency of the target IX2-200 NAS when writing VM backups

Some graphs during the backup of the file server data disk

The file server data disk is different. It is a very large disk (over 900 GB), with no empty blocks (at least not in the first 800 GB of the disk). Now the performance sank back to 6 MB/s. I am still trying to figure out why the backup speed is lower here. I think the absence of empty blocks is to blame. Anyway, it really shows how continuous full backup performs. Some graphs to show what is going on:

Continuous CPU usage during the backup of the file server data disk

Readytimes of both vCPU cores during backup (remember to divide these values by 20!)

Waittimes of both vCPU cores during backup (remember to divide these values by 20!)

Disk I/O during the backup of the file server datadisk

Conclusion

Looking back at the backup performance, I am convinced my poor CPU capacity is limiting my throughput. Right now I do not have access to a dual quadcore Nehalem server or the like, but I’m sure backups would fly then. I am glad to see though, that I did not configure jumbo frames for nothing: I got a 27% gain out of it.

Unfortunately it is hard to say what any random environment will gain from using jumbo frames. It might be that jumbo frames use just a little less CPU overhead, which boosts performance significantly in this setup. On bigger servers (read: more and faster cores) the difference might not be this significant. Still I would recommend to test it before taking your backup solution into production. As an added bonus you get the hang of your backup software before you start using it seriously: Better to tune in advance then have to troubleshoot later!

Posted in Storage, Virtual Backup, VMware |

Tags: 1500, 9000, CIFS, direct-iSCSI, Iomega, iSCSI, IX2, IX2-200, jumbo, jumbo frames, local iSCSI, MTU, NFS, Veeam, VMware

Veeam Backup part 1- Optimizing IX2-200 backup speeds says:

January 10, 2011 at 16:40

[…] « Ghostly snapshots: Failed to remove snapshot Veeam Backup part 2- Using jumbo frames to target storage […]

» Using the Dell EQL MEM Module to simplify my backups (also, thanks again, veeam!) says:

February 24, 2011 at 11:26

[…] Now being quite satisfied with this setup I’m going to investigate the series of advice from ErikZandboer on optimising his ix2-200 backup speeds, specifically the post that looks at jumbo frames to target storage. […]

Calin Damian Tanase says:

April 6, 2011 at 18:26

For the jumbo frames setup I think that you need to activate it specifically on
vmkernel ports as well (i.e. setting the right mtu size) using something like

vicfg-vmknic … -m 9000 …

Erik Zandboer says:

May 16, 2011 at 10:08

Hi Calin,

You are right. I forgot to include this command in the blogpost. I will add this extra step.

April 6, 2011 at 18:42

Your setup is not really clear for me, but assuming that veeam vm gets storage through the vmk# (i.e. nfs/iscsi IP storage configured at esx level and you present storage as a vdisk/vmdk to veeam) the tweak for mtu size I think would be necessary

Erik Zandboer says:

May 16, 2011 at 10:09

I tried different ways of connecting up the storage. You can do it either through Windows or through the VMkernel. In either situation you have to tweak for the MTU size to be above 1500 (9000 is a commonly accepted but not standardized maximum)

VMdamentals.com