In this post: Throughput part 2: RAID types and segment sizes I wrote that a RAID5 setup can potentially perform better in a heavy-write environment over RAID10, if tunes right. While theoretically this might be true, I have some important addendum’s to this statement that vote against RAID5 which I’d like to share.
The original idea
The original idea was hidden inside the “write penalty” of a RAID array. A RAID10 is relatively simple: the Write Penalty is always 2. This is because the way a RAID10 is constructed: It is basically a striped set of mirrors (a RAID0 of several RAID1′s if you like). So for every segment you write, you need to perform a seek of TWO disks (both disks in a single RAID1 mirror). This makes a RAID10 a simple and high performing setup, especially for heavy writes.
Now we look at RAID5. RAID5 is basically a RAID0 striped set of disks. But since RAID0 will loose ALL of its data is a single member fails, RAID5 adds a parity segment to every stripe set. This will impact writing to a RAID5 array though: If a single segment has to be written (like in the RAID10 example above), the array actually needs to first read that segment, read the parity segment, recalculate the parity, then write both the new segment and the parity back to the two disks. This is commonly called “read-modify-write”, and has a write penalty of 4, which is obviously worse than the RAID10 variant.
But potentially a RAID5 set can perform very well when looking at write penalty: If you have an entire stripe to write, you simply calculate the parity, and then proceed to write the ENTIRE stripe in one swift stroke. SANs like EMC’s CLARiiON actually will try and keep a RAID5 write into its write cache hoping it will gain all segments of a particular stripe so that a full stripe write can be performed. Very smart indeed
Knowing all the things above, I figured you could optimize a RAID5 setup with heavy writes by tuning the segment size: If you were writing mostly 64K blocks on a (4+1) RAID5 set, you could optimize these writes by selecting a segment size of 16KB. This way every 64K write is split up over 4 data segments and a parity segment, effectively performing a full-stripe write on the array for each 64KB write.
Where it all falls down
This tuning sounds very nice. But I figured out this tuning actually almost never helps! The explanation is simple: In a random-write environment (which virtually every VMware environment is) any full-stripe write would need ALL disks in the RAID5 set to seek to a certain segment. The write after that is very effective though.
Now look at the alternative: let’s say any random write fits on a single segment (for example use a 64KB segment size; VMware hardly writes block larger than this). Now the array will perform a “read modify write”, which causes the array first to read one segment and its parity (only two spindles seek). After a recalculation of the parity, two writes occur (the same two spindles seek to the same cylinder; with a little luck the heads are still around!).
This alternative gets more and more effective as you add more members in the RAID5 set, actually BOOSTING write performance over the “tuned” version. The more members in the RAID5 set, the more disks need to seek for each write in the tuned version.
Is this RAID5 segment-size tuning never a good idea?
In some isolated cases, it might still be effective to tune the segment size of a RAID5 array. Basically writing gets a real boost if you use sequential writing, and the array for some reason is not able to combine these writes into Full Stripe Writes by itself (like EMC’s CLARiiON does, see above). Tuning the segment size could “force” the array to perform Full Stripe Writes in this case, boosting performance by great lengths.
Considering most modern arrays, the chances of RAID5 tuning to be effective are slim at best. Most arrays will not even allow you to adjust the segment size, and even if they do they are most of the time able to tune their writes to Full Stripe Writes internally.
While discussing RAID5 anyway: Do not forget that apart from the write performance, the performance impact when rebuilding a RAID5 array is usually very high. Some arrays (yes the EMC CLARiiON for example) can detect disks which are about to fail, and copy their contents off to a hot spare disk before the actual failure occurs. The impact for this action will be much smaller, since only one disk in the array performs a full capacity read, and all other disks can still handle IOPS. Rebuild impact is very much like a RAID10 set in this case. But these “smart” rebuilds are not always possible; disks fail, sometimes without notice.
So if you scale a RAID5 set to be able to perform no more IOPS than is required by the workload, a rebuild of that RAID5 set should be noted as downtime: During a rebuild, all disks in the set perform a full read of their ENTIRE capacity, and the disk being rebuild performs a write of its ENTIRE capacity, heavilly impacting performance for the regular workload. And a full rebuild can go on for hours or even days!
So when using RAID5, you should consider two important things: Make sure no heavy writing is going on, or in specific situations: Tune the segment size.. Also think about whether or not you need to size for RAID5-rebuild impact.