Scaling VMware hot-backups (using esXpress) – by Erik Zandboer
There are a lot of ways of making backups- When using VMware Infrastructure there are even more. In this blog, I will focus on so called “hot backups”- Backups made by snapshotting the VM in question (on an ESX level), and then copying the (now temporary read-only) virtual disk files off to the backup location. And especially, how to scale these backups into larger environments.
Say CHEESE!
Hot backups are created by first taking a snapshot. A snapshot is quite a nasty thing. First of all, each virtual disk that makes up a single VM have to be snapped at exactly the same time. Secondly, if at all possible, the VM should flush all pending writes to disk just before making this snapshot. Quiescing is supported in the VMware Tools (which should be inside your VM). Quiescing will flush all write buffers to disk. Effective to some extent, however not enough for database applications like SQL, Exchange or Active Directory. In those cases VSS was thought up by Microsoft. VSS will tell VSS enabled applications to flush every buffer to disk and hold any new writes for a while. Then the snapshot is made.
There is a lot of discussions about making these snapshots, and quiescing or using VSS. I will not get into that discussion, it is too much application related for my taste. For the sake of this blog, you just need to know it is there 🙂
Snapshot made – now what?
After a snapshot is made, the virtual disk files of the VM have become read-only accessible. It is time to make a copy. And this is where different backup vendors start to really do things different. As far as I have seen, there are several different ways of taking out these files:
Using VCB
VCB is VMware’s enabler for primarily making backups through a fibre-based SAN. VCB enables a “view” to the internals of a virtual disk (for NTFS), or give a view to an entire virtual disk file. From that point, any backup software can make a backup of these files. It is fast for a single backup stream, but requires a lot of local storage on the backup proxy, and does not easily scale up to a larger environment. The variation using the network as carrier is clearly a suboptimal solution compared to other network-based solutions.
Using the service console
This option installs a backup agent inside the service console, which takes care of the backup. Do not forget, an FTP server is also an agent in this situation. It is not a very fast option, especially since the service console network is “crippled” in performance. This scenario does not scale very well to larger environments.
Using VBAs
And here things get interesting – Please welcome esXpress. I like to call esXpress the “Software version” of VCB. Basically, what VCB does – make a snapshot and create a view to a backup proxy – is what esXpress does as well. The backup proxy is no hardware server though, or a single VM, but numerous tiny appliances, all running together on each and every ESX host in your cluster! You guessed it – see it and love it. I did anyways.
esXpress – What the h*ll?
The first time you see esXpress in action, you might think it is a pretty strange thing – First you install it inside the service console (you guessed right – There is no ESXi support yet). Second it creates and (re)configures tiny Virtual Appliances all by itself.
When you look closer and get used to these facts – It is an awesome solution. It scales very well, each ESX server you add to your environment starts acting as a multiheaded dragon, opening 2-8 (even up to 16) parallel backup streams out of your environment.
esXpress is also the only solution I have seen which does not have a single SPOF (Single Point of Failure). ESX host failure is countered by VMware HA, and the restarted VMs on other ESX servers are backup up from the remaining hosts. Failing backup targets are automatically failed over to by esXpress to other targets.
Setting up esXpress can seem a little complex – there are numerous option you can configure. You can get it to do virtually anything. Delta backups (changed blocks only), skipping of virtual disks, different scheduling of backups per VM, compression and encryption of the backups, and SO many more. Excellent!
Finally, esXpress has the ability to perform what is called “mass restores” or “simple replication”. This function will automatically restore any new backups found on the backup target(s) to other locations. YES! You can actually create a low-cost Disaster Recovery solution with esXpress – RPO (Recovery Point Objective) is not too small (about 4-24 hours), but the RTO (Recovery Time Objective) can be small, 5-30 minutes is easily accomplished.
The real stuff: Scaling esXpress backups
Being able to create backups is a nice feature for a backup product. But what about scaling to a larger environment? esXpress, unlike most other solutions, scales VERY well. Although esXpress is able to backup to VMFS, I will focus in this blog on backing up to the network, in this case FTP servers. Why? Because it scales easily! following makes the scaling so great:
- For every ESX host you add, you add more Virtual Backup Appliances (VBAs), so increases total backup bandwidth out of your ESX environment;
- Backup uses CPU from the ESX hosts. Especially because CPU is hardly an issue nowadays, it scales with usually no extra costs for source/proxy hardware;
- Backups are made through regular VM networks (not the service console network), so you can easily add more bandwidth out of the ESX hosts and bandwidth is not crippled by ESX;
- Because each ESX server runs multiple VBAs at the same time, you can balance the network load very well across physical uplinks, even when you use PORT-ID load balancing;
- More backup target bandwidth can be realized by adding network interfaces to the FTP server(s) when they (and your switches) support load-balancing through mulitple NICs (etherchannel/port aggregation);
- More backup target bandwidth can also be realized by adding more FTP targets (esXpress can load-balance VM backups across these FTP targets).
Even better stuff: Scaling backup targets using VMware ESXi server(s)
Although ESXi is not supported with esXpress, it can very well be leveraged as a “multiple FTP server” target. If you do not want to fiddle with load-balancing NICs on physical FTP targets, why not use the free ESXi to install serveral FTP servers as VMs inside one or more ESXi hosts! By adding NICs to the ESXi server it is very easy to establish load-balancing. Especially since each ESX host delivers 2-16 data streams, IP-hash load-balancing works very well in this scenario, and is readily available in ESXi.
Conclusion
If you want to make high performance full-image backups of your ESX environment, you should definitely consider the use of esXpress. In my opinion, the best way to go would be to:
- Use esXpress Pro or better (more VBAs, more bandwidth, delta backups, customizing per VM);
- Reserve some extra CPU power inside your ESX hosts (only applicable if you burn a lot of CPU cycles during your backup window);
- Reserve bandwidth in physical uplinks (use a backup-specific network);
- Backup to FTP targets for optium speed (faster than SMB/NFS/SSH);
- Place multiple FTP targets as VMs on (free) ESXi hosts;
- Use multiple uplinks from these ESXi hosts using loadbalancing mechanisms inside ESXi;
- Configure each VM to use a specific FTP target as its primary target. This may seem complex, but it guarantees that backups of a single VM always land on the same FTP target (better than selecting a different primary FTP target per ESX host);
- And finally… Use non-blocking switches in your backup LAN, which preferably support etherchannel/port aggregation.
If you design your backup environment like this, you are sure to get a very nice throughput! Any comments or inquiries for support on this item are most welcome.
We have used esXpress for years and I am in complete agreement on your comments. It is a very good DR product and the company is wonderful to work with.
Robert Reynolds
Storage and Virtualization
Indiana University
[…] The limitations mentioned above are not a limit of esXpress though, but more a limitation of dedup in itself. PHD choose to use online dedup (basically you dedup while you write), which will use CPU power during backup and restores. CPU power might even be the limiting factor in your backup speed. Luckily CPU power is usually available in abundance nowadays. I will dive deeper into performance and scaling of deduped installations in the next blogpost, which will hopefully prove that dedup really performs (like the setup using multiple FTP targets simultaneously described in my blogpost Scaling VMware hot-backups using esXpress). […]
I like the concept of using ESXi to create multiple FTP targets. Do you have any thoughts about the best ways to attach and scale the storage for the individual FTP target VM’s running in ESXi? Feel free to email me directly because I am involved in such a project.
Hi Randy,
When you scale storage for ESXi servers, I often reuse “old” hardware. One customer had a large HP server freed by P2V, we then installed ESXi on it. They also freed some DAS trays full of 146 and 300GB SCSI disks. We simply connected all DAS trays to that one server, created three FTP target VMs, each had its own DAS connected to it. That is one way.
Also, customers often have for example a Sun 2540 or a HP MSA-1000 or 1500 that just isn’t speedy enough anymore. They make nice storage for backup targets too. In that case you’d have to look closely to the type of IOPS involded; you could create one big RAID10, and simply carve LUNs out of there to give to each FTP VM. You could also decide to create several smaller RAID5 groups (4+1), and give each FTP target its own RAID set (RAID5 is very effective as long as the SAN can perform full-stripe writes); see this blogpost for more on that: http://www.vmdamentals.com/?p=296
Hope this helped you out, each environment is different so do not hesitate to discuss! VMTN (VMwares forum) is also a great place to put your ideas to the test!
Thanks for the quick response. So would this be a FC connection from the ESXi box to the MSA and using VMFS as the filesystem for the backups?
[…] Scaling VMware hot-backups (using esXpress) […]