Posts Tagged ‘esXpress’
PHD Virtual Backup software with XenServer support: One down, ESXi to go
PHD Virtual, creators of the backup product for VMware environments formerly known as esXpress, have introduced a new version of their PHD Virtual Backup software with support for Citrix XenServer.
The PHD Virtual Backup product is the first “virtual only” backup product with full support for Xen based virtualization I know of. But then again, I am no XenServer expert. For me, this is not too important at the moment, because I’m still very focussed on VMware only. But it is a major step for PHD Virtual; a lot of vendors are roadmapping support for all leading virtualization platforms, but today PHD Virtual actually delivers!
Keep going guys! The backup solution they have on the market right now for VMware is simply brilliant, especially in its robustness and its almost famous “install and forget” ability. I actually installed the product at several customer sites, and it started backing up. Fully automated, no headaches. No one ever bothered again. New VMs are automatically added to the list to backup if configured to do so. Simply brilliant. In many ways VMware has been looking closely as to how esXpress has been doing backups. Prove of this is VMware’s Data Recovery, which basically is a poor copy of esXpress in its way of working.
Some other vendors have been shouting about this great “hotadd feature” they now support. People tend to forget that esXpress has used similar technology for several YEARS now! Because hotadd did not exist then, they were forced to use “coldadd” , meaning their Virtual Backup Appliances (VBAs) needed to be powered down between backups (to clarify: NOT the VMs to be backed up).
Whether you use hot- or cold-add, backup speeds are great in any case. But cold-add has a drawback: The VM performing backups has to be powered up and down, reconfigured etc. That takes time. Especially now that Changed Block Tracking (CBT) is used by most vendors, a backup can take as little as 20 seconds if not too many blocks have changed within the virtual disk to backup. And this is where “cold-add” start to hurt: Reconfigure, power up and down of the VBAs for every virtual disk to backup easily takes a 3-5 minutes to complete.
PHD Virtual has been working hard on a version which is compatible with ESXi. Especially now that VMware is pushing even more towards ESXi, this is more important than ever. I hope to see a beta version soon of the ESXi compatible product; I cannot wait to test
. This will also solve the “cold-add” overhead, because I’ve been told this version will use a single VBA per ESX node which hotadds multiple disks in parallel and then performs the backup also in parallel. Very exciting: Hopefully we’ll see both backup efficiency and robustness like never before. Add replication in the mix (which is a part of the product by default) and you have a superb solution at a relatively low cost.
PHD Virtual Backup with XenServer support link:
http://www.phdvirtual.com/solutions/server_virtualization_citrix_xenserver.php
PHDVirtual releases Virtual Backup 4.0-4 with vSphere 4.1 support
PHDVirtual has released an updated version of their famous Virtual Backup solution (formerly esXpress). This version fully supports VMware vSphere 4.1, and is one of the first (if not THE first) of the 3rd party “high tech virtual backup only” to support vSphere 4.1
I was very quick into upgrading my test environment to vSphere 4.1 (right after the general release), breaking the PHDvirtual backup in the process. For days the environment failed to backup, because vSphere 4.1 introduced a snapshot issue with esXpress. PHDvirtual worked hard to get vSphere 4.1 supported, and on 9/17/2010 they released version 4.0-4 which did just that.
So I upgraded my test environment to PHDvirtual 4.0-4. Right after the upgrade I forced a reinstall on the ESX nodes to 4.0-4 from the 4.0-4 GUI appliance, and I kicked of an initial backup by renaming a VM from the VI client to include [xPHDD] in the VM name. PHDvirtual Backup picked it up, renamed the VM back and commenced performing the backup. It just worked straight away. Even CBT was still functional, and my first Windows VM backed up again with only 2.2[GB] in changed blocks. Awesome!
From the initial tests it shows that both speed and stability are just fine, not very different from the previous release. Still fast and definitely rock solid. Highly recommended!
esXpress uses vStorage API for detecting changed blocks
Today at VMworld 2009 is joined a breakout session presented by PHD Virtual about their latest version of esXpress (3.6). Great stuff once again! Apart from the fact that esXpress is now fully functional on vSphere (still no ESXi support though), they also managed to use the vStorage API for “changed block reporting”. Basically what this means, is that when you are using vSphere and doing delta or deduped backups, you no longer need to read all the blocks of a VM and then decide is that block was changed or not. PHD managed to get esXpress so far that it reads only the changed blocks directly by using this “cheat sheet” that VMware was so nice to make available though the vStorage API.
What this means is, that backup speeds will be way higher when you do delta or deduped backups.
When you also use their dedup targt, with the dedup action going on on the SOURCE, you get tremendous backup speeds and as an added bonus you can use smaller WAN links when you are sending these backups offsite. Wonderfull guys, you did it again!
The new esXpress 3.5
For a long time now I have been a fan of PHDs esXpress. It is still the only VMware backup solution I know that scales, has no single points of failure and works reliable with VMware snapshots. The solution has always been “other than others”: At first it appears to be a really weird piece of software, that creates its own appliances to perform its backups. Once you get to know it, esXpress’s way of working is great. So great in fact, that VMware themselves are now adopting this very way of working with their Disaster Recovery feature in vSphere 4, maybe even stepping away from their beloved VCB (VMware Consolidated Backup).
VCB in my opinion has never been that great, apart for some special uses in special environments. esXpress fits all, from single ESX hosts to large clusters. In contrast to VMware’s Disaster Recovery which is still buggy at the time of this blog, esXpress has been on this train for years now, and definitely knows the drill. EsXpress 3.1 is not the holy grail though. Some features were just not easy to use, there was no global GUI to manage all nodes easily and there was no Data deduplication available (not that I am that big a fan data dedup for backup, but ey, everbody does it!).
Enter esXpress 3.5
To make up for most of the shortcomings, esXpress version 3.5 has been introduced. The engine itself still is pretty much the same. And exactly there lies the power of esXpress: It still WORKS. It just works, it always works. Extra features have been added in such a smart and incredible simple way, that the product remains rock stable. No “waiting for the point 1 release” needed here!
I was over at a client who suffered a SAN failure (when upgrading firmware). They were in progress of failing over to their recovery site, when the administrator got an email from one of the production ESX hosts: esXpress had successfully completed its backups. What? All LUNs appeared unavailable at the production site. This host did not have its storage devices rescanned; it still kept on ticking. I think things like this are major plusses for both VMware ESX and esXpress showing their enterprise readiness.
Finally: A working global GUI
From the initial ESX 3.5 release, PHD also released a GUI to manage all esXpress instances from one central portal. In the old 3.1 (and before) days, you ended up copying configfiles between hosts; working, but not very user friendly. You might think that adding a central GUI took a lot of deep digging in the code of esXpress. But, they surprised once again: The GUI just holds the config files and, could it be more simple, the GUI appliance introduces a small NFS store. The NFS store is automagically mounted to the ESX servers, and presto! That is where the config files can be found. EsXpress itself just has to check the share for a new config, something already (partly) in existance in the previous version.
Even better: the GUI does a great job. I had some trouble with the first versions, some manual labor was needed to get it going (like manually needing to change the time zone and not being able to add a second DNS server). All these issues are fixed now, but even those early versions were already very effective. And things have become only better since then!
Because “everybody has it”: Deduplication
What should we do without deduplication nowadays? It is a major hype around storage and backup. If you don’t have it, you’re out of business it seems. But who ever thinks about the risks and limitations involved (see: The Dedup Dillema).
The idea of deduplication is brilliant, but the implementation has to be right. I must admit, I am not a big fan of deduplication. It is still your vital data you are talking about! Admit nr.2: EsXpress 3.5 managed to change my opinion to dedup a little.
The deduplication implementation of esXpress is in style with PHDs way of working: both effective and simple. A separate appliance is installed (which is in fact the same one as the GUI appliance. At first boot of the appliance you choose what the appliance will become. Smart!). The dedup appliance (called PHDD for PHd Data Dedup) can mount a datastore or an NFS store for storing its deduped data. It performs quite well, saving diskspace as you backup more of the same (or alike) data. It is now much “cheaper” to keep more backups of your VMs.
Only few changes appear to have been made to esXpress itself to allow PHDD as a backup target, so once again, stability guaranteed.
So now all your data lives inside the PHDD appliance. Now how do I get out this data the way I want it? PHD did something clever: They added a CIFS/SAMBA interface to the appliance, allowing you to browse, copy and backup your VMs as if they weren’t deduped at all! This last feature makes the mix of backup and dedup more acceptable, even effectively useable
When will the fun EVER stop? File level restore!
The best feature of the PHDD dedup target in my opinion, next to dedup itself, is the ability to perform file level restores. At last you can get out that one single file of a full VM without having to restore the whole thing. This option is so cool, you simply browse to the appliance, select your files, and save the collection you marked as a single zip file! Couldn’t be easier, another bulls eye for PHD, even in their first release of this piece of software.
Scaling esXpress 3.5 with dedup
Not all is bright and shiny with dedup. I found it hard to scale the solution: If there is only one PHDD target, scaling ends somewhere, and a SPOF (single point of failure) is introduced. Not good (although PHD is working on a way to link the dedup appliance to a secondary one). Still, one may consider to use two or more PHDD appliances in parallel. This will work, but the dedup effectiveness will drop sharply, especially when you use DRS and all VM backups end up on all PHDD targets in time (this happens when you design the often used strategy where one assings for a backup target to each ESX server individually with failovers to others). You can make it somewhat more effective by specifying a backup target for each VM (in the local config), a best practice that also stands when using multiple FTP targets btw. This will ensure that a backup of a particular VM will always end up on the same backup target, making things clearer and making dedup more effective (although far from ideal – Every PHDD target has its own library of data, meaning that identical blocks still get stored on EACH PHDD target instead of just one).
The limitations mentioned above are not a limit of esXpress though, but more a limitation of dedup in itself. PHD choose to use online dedup (basically you dedup while you write), which will use CPU power during backup and restores. CPU power might even be the limiting factor in your backup speed. Luckily CPU power is usually available in abundance nowadays. I will dive deeper into performance and scaling of deduped installations in the next blogpost, which will hopefully prove that dedup really performs (like the setup using multiple FTP targets simultaneously described in my blogpost Scaling VMware hot-backups using esXpress).
Conclusion
The new version of esXpress 3.5 is in terms of speed and reliability on par with its predecessor version 3.1. It is still the only backup solution I know that has no Single Point of Failure, scales (REALLY scales) up to whatever size you want without any issues, and best of all: Once it works it KEEPS working with hardly any problems around VM snapshotting like some other backup solutions do have.
On top of all the good things that already were, a global GUI is added which manages all esXpress installs at the same time, and there is a Data Deduplication appliance which features a very well working single file restore option. I would like to have seen a file restore option in a non-dedup target as well. From what I’ve seen, online deduping costs a lot of CPU power, and the backup speeds go down because of this. Once the database is built though, things do get better (less data to backup because more and more blocks are already backup up in th dedup appliance). Still, calculations have to be done.
In a smaller environment, the dedup appliance is no match for a set of non-dedupping FTP targets. This is a drawback from which any dedup system suffers… It is just the way the “thingy” works. Still I see a solid future for esXpresses PHDD dedup targets where speed is not of the utmost importance.
Make no mistake on backup speeds: IF esXpress and its backup targets are designed and configured properly, it is by far the fastest full-VM backup solution I’ve seen. It does not mess with taking backups through the service console network, it creates Virtual Appliances runtime that perform the backups – and many in parallel. If you want to see real backup speed from esXpress, do not test it on a single VM like some people tend to do when comparing. If you do, speeds are about on par with other 3rd party vendors. But when scaled up to make 8 or more backups in parallel to several backup targets with matched bandwidth, esXpress will start to shine and leave the competition far behind.
esXpress as low-cost yet effective DR – by Erik Zandboer
You want to have some form of fast and easy Disaster Recovery, but you do not want to spend a lot of money in order to get it. What can you do? You might consider buying two SANs, and leaving out SRM. That will work, it will make your recovery and testing more complex, but it will work. But even then, you still have to buy two SANs, the expensive WAN etc. What if you want to do these things – on a budget
DR – What does that actually mean?
More and more people start to implement some form of what they call Disaster Recovery. I too am guilty of misusing that name (who isn’t), Disaster Recovery. My point is, tape backups made for ages now are also part of Disaster Recovery. Your datacenter explodes, you buy new servers, you restore the backups. There you go: Disaster Recovery in action. What comes in reach now, for the larger part because of virtualization, is what is called Disaster Restart. This is when no complex actions are required, you “press a button” and basically – you’re done. I conveniently kept the title to “DR”, which kind of favors both
Products like VMware SRM make the restart after a disaster quite easy, and more important, for the larger part you can actually test the failover without interrupting your production environment. This is a very impressive way of doing Disaster Restarting, but still quite a lot of money is involved. You need extra servers, you need an extra (SRM supported!) SAN in order to get this into action.
Enter esXpress
Recovering or Restarting from a disaster is all about RPO and RTO – The point in time to recover to, and the time required to get your server up and running (from that point in time). The smaller the numbers, the more expensive the solution. Now lets put things in reverse. Why not build a DR solution with esXpress, and see how far we get!
DR setup using esXpress
The setup is quite simple. EsXpress is primarily a backup product, and that is just what we are going to setup first. Lets assume we have two sites. One is production with four ESX nodes, and the other site with two nodes is the recovery site (oops restarting site). For the sake of evading these terms, we’ll use Site-A and Site-B
At Site-A, we have four nodes running esXpress. At site-B, we have one or more FTP servers running (why not as a VM !) which receive the backups over the WAN. Now, Disaster Recovery is in place, since all backups go off-site. Now all we have to do, is try and get as near to Disaster Restart as we can get.
For the WAN link, we basically need the bandwidth to perform the backups (and perhaps to use for regular networking in case of failover). The WAN could be upgraded as needed, and you can balance between backup frequency versus available bandwidth. EsXpress can even limit its bandwidth if required…
Performing mass-restores
All backups now reside on the FTP server(s) on Site-B. If we were to install esXpress on the ESX nodes at Site-B as well, all we need to do is use esXpress to restore the backups there. And it just so happens that esXpress has a feature for this: Mass Restores.
When you configure mass-restores, the ESX nodes at Site-B are “constantly” checking for new backups on the FTP servers. As soon as a backup finishes, esXpress at Site-B will discover this backup, and start a restore automatically. Where does it restore to? Simple! It restores to a powered-off VM at Site-B.
What this accomplishes is, that at Site-B you have your backups of your VMs (with their history captured in FULL and DELTA backups), and the ability to put that to tape if you like. You also have each VM (or just the most important if you choose) in the state of the last successful backup standing there, just waiting for a power-on. As a bonus on this bonus, you also have just found a way to test your backups on the most regular basis you can think of – every single backup is tested by actually performing a restore!
What does this DR setup cost?
There is no such thing as a free lunch. You have to consider these costs:
- Extra ESX servers (standby at the recover/restart site) plus licenses; ESXi is not supported by esXpress (yet);
- esXpress licenses for each ESX server (on both sites);
- A speedy WAN link (fast enough to offload backups);
- Double or even triple the amount of storage on the recover/restart site (space for backups+standby VMs. This is only a rough rule-of-thumb).
Still, way below the costs of any list that holds two SANs and SRM licenses…
So what do you get in the end?
Final question of course, is what do you get from a setup such as this? In short:
- Full-image Backups of your VMs (FULLs and DELTAs), which are instantaneously offloaded to the recover/restart site;
- The ability to make backups more than one time per 24 hours, tunable on a “per VM” basis;
- Have standby VMs that match the latest successful backup of the originating VMs;
- Failover to the DR site is as simple as a click… shiftclick… “power on VMs” !;
- Ability to put all VM backups to tape with ease;
- All backups created are tested by performing automated full restores;
- Ability to test your Disaster Restart (only manual reconnection to a “dummy” network is needed in order not to disturb production);
- RTO is short. Very short. Keep in mind, that the RTO for one or two VMs can be longer if a restore is running at the DR site: The VM being restored has to finish the restore before it can be started again;
- Finally (and this one is important!), if the primary site “breaks” during a replication action (backup action in this case), the destination VM is still functional (in the state of the latest successful backup made).
Using a setup like this is dirt-cheap when compared to SRM-like setups, you can even get away with using local storage only! The RPO is quite long (in the range of several hours to 24 hours), but RTO is short- In a smaller environment (like 30-50 VMs) RTO can easily be shorter than 30 minutes.
If this fits your needs, then there is no need to spend more – I would advise you to look at a solution like this using esXpress! You can actually build a fully automated DR environment without complex scripting or having to sell your organs
. You even get backup as a bonus (never confuse backup with DR!)