Breaking VMware Views sound barrier with Sun Open Storage (part 1)

December 21st, 2009 |

A hugely underestimated requirement in larger VDI environments is disk IOPs. A lot of the larger VDI implementations have failed using SATA spindles, when you use 15K SAS or FC disks you get away with it most of the times (as long as you do not scale up too much). I have been looking at ways to get more done using less (especially in current times, who doesn’t!). Dataman, the dutch company I work for (www.dataman.nl) teamed up with Sun Netherlands and their testing facility in Linlithgow, Scotland for testing. I got the honours of performing the tests, and I almost literally broke the sound barrier using Suns newest line of Unified Storage: The 7000 series. Why can you break the sound barrier with this type of storage? Watch the story unroll! For now part one… The intro.

What VMware View offers… And needs

Before a performance test even came to mind, I started to figure what VMware View offers, and what it needs. It is obvious: View gives you linked cloning technology. This means, that only a few full clones (called replicas) are read by a lot of Virtual Desktops (or vDesktops as I will call them from now on) in parallel. So what would really help pushing the limits of your storage? Exactly, a very large cache or solid-state disks.

At the same time, you need less replicas if you link more clones to them. A common best practice up till now, is to use no more than 64 linked clones per LUN (read: per replica). This means that for every 64 vDesktops, you need one full clone (replica). So an environment for 640 vDesktops would require 10 replicas. If we assume that a replica is about 12 GBytes in size on average, you would need 128GB cache for only 640 desktops if you want to fit all replicas into read cache. True, not the entire 12 Gbytes will be all that active on reads, so you could probably get away with having only 3-4 GBytes of each replica in cache. It still would help a lot if the 64 linked-clones barrier could be broken… So some more thoughts on that are required 🙂

The Linked Clone best-practice limit of 64

So why is there a best practice stating no more than 64 linked clones should be used per replica? As far as I can think of, it is either a limit of LUN locking, LUN queue depth, VMFS or any combination of them.

My guess is that the growing nature of linked clones themselves is enough to saturate a LUN if you have far more than 64 instances. If that is not the limit, then LUN queueing will probably pose a problem anyway. Evading all of the problems like queue depth, LUN locking etc. , is…. you might guess… NFS! NFS has no problem with 64 open and growing files. Even better, NFS is not impressed by a thousand or more. Now this could be something…

The ultimate combination for VMware View linked clones?: NFS on ZFS

After I had decided on NFS, it all became clear: NFS should allow to use a lot of linked clones per replica. So you need less replicas for a given number of vDesktops, and having less replicas will mean that they fit more easily into the cache of an NFS box, boosting readOPS (or ROPS as I like to call them).

On the Suns 7000 series Unified Storage (also called “Amberroad” by Sun internally), the underlying disks are accessed through ZFS. This is a very smart filesystem which does all things like RAID-like mirroring/striping and/or parity together with caching on different tiers. Basically all Amberroad devices are based on SATA storage (this is where all data lies). As the unit performs IOPs, read data is cached in memory or in solid-state disks (which is the second tier of caching; in the Sun 7000 storage these solid-state disks are called Readzillas). Also, in order to speed up writing to SATA, Logzillas can be added (which are also solid-state disks that are optimized and used solely for write caching). These solid-state drives basically cache all writes, which are in turn flushed to SATA.

As the largest Amberroad (the 7410) can contain up to 256GB of memory, you can imagine that this storage device just might be able to carry an immense amount of vDesktops given its size (and price), even without any Readzilla solid-state disks.

The need for a performance test

Using conventional storage, you can calculate almost exactly how fast it will be. It is no rocket science. It is all about number of spindles, IOPs, seektimes etc. But using Suns Amberroad, there is hardly a way of telling where a read is going to come from: Memory, Solid-state disk or SATA. That is why a performance test is needed in order to have any clue on what you need in terms of (cache) memory, number of SATA disks and maybe solid-state disks (Readzillas and/or Logzillas) for a given number of vDesktops.

Sun Netherlands kindly offered the use of a bunch of their X4450 servers together with a Sun 7410 Open storage array to perform performance tests in their testing facilities in Linlithgow, Scotland. In the next part of this blog, I will get back on the results of the performance tests. I can assure you, from what I have seen performance is enormous for both the size and cost of this type of storage.

Spoiler Alert: Some first results!

How does this sound:

Running over 450 vDesktops from 64GB (cache) memory in the storage box, having less than one (yes: less than ONE) ReadOPS and only about 22 (yes only 22x 1Mbyte) WriteOPS (or WOPS) to SATA disks, without using a Readzilla and only using a single Logzilla SSD (and we might even be able to do without that one)…
In the end we got to run 1319 vDesktops in total from a single storage box, when the ESX environment itself decided it had had enough and gave up on us due to the lack of ESX server memory installed (8x 64 Gbytes).

STAY TUNED FOR MORE JUICY RESULTS!