Different Routes to the same Storage Challenge

December 1st, 2011 |

Once shared storage came about, people have been designing these storages so that you would not have to care again for failing disks; shared storage is built to cope with this. Shared storage is also able to deliver more performance; by leveraging multiple hard disks storage arrays managed to deliver a lot of storage performance. Right until the SSDs came around, the main and only way of storing data was using hard disks. These hard disks have their own set of “issues”, and it is really funny to see how different vendors choose different roads to solve the same problem.

Every day there is this battle going on for the latest and greatest features and integrations; but maybe we need a more long-term vision: Where is each vendor going, what route did they take before and what route are they likely to take in the future? Does their route even go anywhere or is it simply a dead end? In this post I try to show some different routes different vendors took and what their ideas are all about.

The main issue of hard disks: latency and seek time

Through history, we have seen a tremendous increase in CPU compute performance. Together with the CPUs, memory also got bigger and faster all the time. But not hard disks. They have grown in capacity, but in speed not all that much has been accomplished. It is simply the limits of the whole design of a hard disk: It storage platters rotate, a head has to seek to the correct track, the disk has to wait for the correct data to pass under the head.

So as CPU and memory get enormous boosts in performance, how do we go about the relatively slow hard disks? Different vendors took different roads. Apart from what vendors deliver today, what road did they take before to get here and now? And… may the road the have choosen possibly be a dead end, especially now that we are seeing Flash-based devices more and more?

Route One: spread the load over multiple disks and use caching

The first route you can take is a straightforward one. You create RAID protection that will spread the load over multiple spindles. You add read and write caching to be able to cope with things like read/write bursts and localized intensive workloads.

The advantage is clear: You can make a design that is guaranteed to perform at a certain level. Since each disk of a certain type is guaranteed to deliver a minimum number of IOPS, and you know your exact RAID construction, you can accurately scale your system. As you add cache to this setup it gets a little harder to predict performance in some cases, but overall it is simple, reliable and certain.

Route Two: build an ecosystem around the idea that CPU is fast and disk is slow

Route two is an interesting one. As CPUs got faster and faster, people started to wonder if they could utilize this ever growing CPU power to make life easier on the hard disks. And they succeeded: Products like Sun’s ZFS file system are prime examples. The underlying technique is extremely CPU intensive compared to Route 1 described above: By building a tree in software, where each leaf contains data, you can place your data anywhere on a spindle without the need follow the track-sector type approach. The result is, that you can write many small random writes sequentially to disk, and leave it to the CPUs and their software-based tree to find the data if needed.

The upside of this is obvious: Many small random writes can be sequentialized in memory and then written out in one big sequential data stream. For flushing all these random writes your disks need to seek your disks only once, and then commit these many very small writes to disk in one swift stroke. It is not uncommon to see 250 4K random writes going out in one single 1MByte write operation!

The downside? You burn a lot of CPU cycles, and the effect is minimized if you perform large writes to disk. Another downside is that you need to be ABLE to write these big chunks of data; as your device fills up there will be less and less big “gaps” on the disks. This will force a storage device to start writing out smaller blocks, minimizing the upside effect but still having the huge CPU overhead. It is not unseen that storage devices like this come to a crawling “near-halt” if they fill up.

But another big problem lurks in the dark: As you sequentialize your writes, you effectively randomize any sequentials reads. So you could say the each and every read is always random, even if it used to be sequential from a “track-sector” perspective. How to solve this? Simple: you make sure you hardly ever need to read from hard disks, a.k.a. you add read-cache. Loads of read cache 🙂

Route Three: Don’t use hard drives at all!

A lot of very recent startup companies build solutions that solve the “disk issues” in the most simplest way you can imagine: You just don’t use hard drives! With the prices of SSD drives coming down, it is getting more and more realistic to build an “SSD-only SAN”, as long as you manage to smear all writes across many SSDs (to save them from breaking down due to heavy writes to a single set of cells).

The technology to smear out writes is already being used inside SSD devices today. This is a necessity because each Flash cell can only be written a limited amount of times. To make sure that a “hot spot” will not destroy a single SSD in setups like these, you need the same algorithm to smear the writes over many SSDs.

So where are all these routes going now that SSDs are getting cheaper?

Here is the real trick: You generally buy storage on “what vendors offer today”. But maybe, just maybe you should consider the route they are taking and look at the long-term impact of this. Where will they eventually get you? Will they be able (and willing) to help you to get you where you want to be in the future? It almost never is, but should the number one concern when choosing shared storage that should last you for years.

When looking at the Route one, you see that vendors are effectively able to incorporate SSD technology into these type of arrays. Either by creating separate raid groups for the SSD drives, or even pools of different storage types, like a mix of SSD, SAS/FC and SATA/NL-SAS and perform movement of blocks between tiers to either promote them to a faster tier (hot data) or migrating them down to a cheaper tier (cold data). Their route is definitely ready for the future and SSD is their enabler to enter a new and exciting era of shared storage.

For those who chose Route two, things do not seem to be all that rosy. Due to their initial strategy to build this enormous eco system around the “slow” disks they are now faced with a new problem: very fast solid state disks. These devices simply do not “fit” in this model. You can leverage SSDs or Flash as a caching tier (as for example NetApp does for reads) or the Sun OpenStorage platform (7000 series) where SSDs are used for reads and SSDs can even be used as logging disks (which is kind of write cache, but not really; it enables the device to roll back to a consistent state in case of a failure). But truly using SSD as a tier of storage… No. We are starting to see vendors that chose this path being “hurt” by this at the moment: As competitive vendors have great success with tiering in SSD drives, they are slowly but surely loosing ground.

Route three is also an interesting one; these are mostly vendors that have been around for a very short time (often less than 4 years). Typically they deliver immense amount of IOPS to and from their devices, but very often lack any integration with VMware which is becoming more and more important (think about technologies like VAAI and VASA!). Other difficulties for this group are often how to build a multi-controller platform (multiple SPs that can failover transparently), and how to build remote replication reliably.

There is also the question of trust here: Do you trust their technology, do you trust your data to them knowing they use algorithms to spread the writes over all SSD drives (not being able to ever retrieve your data unless that one vendor can help you out)? The route they’ll take in the future is blurry; it is very likely they will simply be “overrun” by bigger players as SSD drives become more and more integrated and affordable.

But even a total lack of any of the features described above do not scare some (even big) customers: I have come across storage vendors (not naming any of them though) who do NOT have dual controllers, do NOT do snapshots, do NOT do replication, do NOT do any logging for troubleshooting. They build storage especially for running VMware workloads, but they are NOT on the VMware HCL. They target a very specific VDI workload, the only workload where you might “get away” not having most of these features. Still, it never ever ceases to amaze me that even large accounts seriously consider and even buy products like this… “Trust them you should not” is what Yoda would say, and what I would advise.

So what SHOULD one buy?

First I’ll start of with a disclaimer: I work for EMC, so I may be biased to certain solutions (and repel others 😛 ). Still I try to keep an open mind whenever writing this blog. Anyway, be warned 🙂

If I would be buying shared storage today for a VMware environment I would definitely go with Route one. Route two used to be a very effective choice, but now that they fail to effectively incorporate anything else than the old hard drives they designed around they will have a very hard time to get along over time.

Route three (SSD-only devices) could work, even with little or no integration with VMware, for example if you use it for a VMware View (VDI) deployment. But even then I’d consider to just buy “regular” (route one) storage and fill it up with SSD; That will give you best of both worlds: Loads of IOPS, fully integrated storage and from reliable vendors.

One Final important note. There are some vendors who also “do” storage just because they acquired the technology. Very often you’ll see that within the new company most or even all R&D on the storage solution is frozen, and the product is simply used for a short-term profit until it ages. I’d always go with a company that really invests in R&D; a solution bought from them will ensure you it will be cutting edge, and you can expect fast responses to problems and early adoption of the latest and greatest new technologies and features.

Posted in Storage, VMware |

Tags: harddisk, routes, san, shared storage, sizing storage, slow harddisks, SSD, Storage, storage challenge, VMware.storage, what to buy, which san to buy

VMdamentals.com