Under the covers with Miss Alignment: Full-stripe writes


In a previous blogpost I covered the general issue of misalignment on a disk segment level. This is the most occurring and the most obvious misalignment, where several spindles in a RAID set perform random I/O and misalignment causes more spindles need to seek for a single I/O than would be required when properly aligned.

Next in the series there is another misalignment issue which is rare, but can have a much bigger impact on tuned storage: Full stripe misalignment.






Environments prone to full-stripe misalignment

The issue occurs whenever you have a RAID3, RAID5 or RAID6 set of disks tuned especially for a specific heavy write workload. Let’s assume you are able to tune the behavior of a very write-intensive workload. You may think that RAID10 is optimal for heavy-write workloads, but actually RAID5 can potentially do a better job here.

When for example talking about EMC storage underneath (for example CLARiiON or VNX), the segment size is 64KB. So for example on a RAID3 or RAID5 set which is constructed out of a (4+1) diskset, every stripe has 4*64KB = 256KB worth of data (plus a single parity segment).

If you can optimize your workload to always write 256KB blocks, each and every write to this RAID5 makes up exactly a full-stripe write. In this scenario there is very little write overhead, even less than having RAID10, even though RAID10 is considered to be most effective for heavy writes.

But for these tuned full-striped writes, a RAID5 set will need to seek all “n” spindles, but (n-1) of these spindles will carry data. This can be more effective than a RAID10 set (where only 50% of the number of spindles cary data). In this rare case RAID5 can outperform RAID10 for writes (at about the same number of disks). One needs to remember though, that this is a very specific workload (video streaming could be a good example here).



Misaligned full-stripe writes: What goes wrong?

So what goes wrong if I misalign this tuned workload? If you read my post on RAID types and throughput, you’ll probably see where I am going: As soon as the tuned writes are misaligned, all full-stripe writes all of a sudden turn into a full-stripe write, followed by a single segment write (because the data was not properly aligned on the stripe). The impact of this is generally devastating in RAID3, 5 or 6: In an aligned environment all disks in the RAID set seek to a particular track, after which they write 256KB of data plus parity in a single stroke. No parity reading of any kind required. After this write, the system is ready for another 256KB write (which can be on the next stripe (sequential writes) or any other stripe for that matter (random writes).

But when the data is misaligned to the stripe, there is a “leftover” after the full-stripe write: part of the data now overflows to the next stripe! This data will have to be written to the next segment on the next stripe. This means that the RAID5 set will have to execute another write on the disks, this time a read-modify-write in order to complete its write operation.
For this is be executed, the array will have to read the data currently present on that next segment, and read the parity of the stripe. Then recalculate the parity information and finally write out both the new block and the recalculated parity. This means two reads followed by two writes on two members of the RAID5 set.

As you probably can imagine, this will heavily impact the write performance for the following write.



Conclusion

The impact of a misaligned write on a RAID stripe can be very high. Don’t get me wrong, the issue described here is rare. Especially with VMware running, most I/O will be variable in size, and be of a random nature. But if you have this one application that is tuned to writing full-stripes to a RAID5 (or RAID3 or RAID6 for that matter), misalignment of that full-stripe write is an absolute performance-killer.

4 Responses to “Under the covers with Miss Alignment: Full-stripe writes”

  • PiroNet says:

    This used to be an issue a few years ago when IT Admins were deploying a SAN per application. They were fine tuning the array in regards the application/IO pattern taking into account the issue you describe here.

    Nowadays the trend is to throw at the storage any type of IO pattern and the array will deal with it. At least that’s the message some storage vendors try to communicate.

    I may be wrong here but that’s my feeling when reading the storage vendors’ catalog of features 😉

    • True, as workloads become more and more random due to many machines using the same resources (virtualization is a great driver here), you see more and more vendors head towards ways of solving these issues without the need for tuning.

      However, most of these solutions are about averages and handling bursts. If you have for example a sustained heavy write going on for days in a row, write cache will at some point stop working, even if you manage to have a lot of it.

      This is why I stated that this type of mis alignment is very rare. However, I think it gives a nice insight into the logic behind storage. That is really why I posted this entry. The few that do this kind of optimization are aware of these issues anyway 🙂

  • […] post is the continuation of Under the Covers with Miss Alignment: I keep hearing this rumor more and more often: It appears that both snapshots and linked clones on […]

  • … [Trackback]…

    […] There you will find more Infos: vmdamentals.com/?p=2246 […]…

Soon to come
  • Coming soon

    • Determining Linked Clone overhead
    • Designing the Future part1: Server-Storage fusion
    • Whiteboxing part 4: Networking your homelab
    • Deduplication: Great or greatly overrated?
    • Roads and routes
    • Stretching a VMware cluster and "sidedness"
    • Stretching VMware clusters - what noone tells you
    • VMware vSAN: What is it?
    • VMware snapshots explained
    • Whiteboxing part 3b: Using Nexenta for your homelab
    • widget_image
    • sidebars_widgets
  • Blogroll
    Links
    Archives