Throughput part 1: The Basics

January 14th, 2010 |

As I tackle more and more disk performance related issues, I thought it was time to create a series of blogposts about spindles, seektimes, latency and all that stuff. For now part 1, which covers the basics. Things like raid type, rotational speeds and seektimes basically make up “how fast you will go”. On to the dirty details!

Introduction to physical disks and their behaviour

So what is really important when looking at physical disks, and their performance? Firstly and most important, we must look at the storage system parameters in order to reduce disk latencies. In order to be able to do this properly, we have to take into account the characteristics of the I/O what is being performed. Secondly, we have to look at segment sizes within the chosen raid types (which in turn followes from the system parameters). Finally, we’ll deepdive into alignment (which still appears to be misunderstood by a lot of people)

Delays and latency

If we start looking at a physical spinning disk, what kind of latencies and delays do we have to take into account? To simply sum them up:

Head seek time;
Rotational latency;
Transfer time;
Bonus: Optimizing by command queueing.

The first one appears to be a familiar one; whenever you see “how fast” a disk is, seek time is always the parameter to look at. But what is this seek time exactly? Basically, it is the time required for the harddisk to find the required data on its disk. But technically, the first item I describe is the HEAD seek time, which is the time required by the physical head to hoover to the destination track on the harddisks surface (called the platter), and lock into the reading of that tracks data. the further the head has to move, the bigger this head seek time becomes. In order to be able to call this by a single number, the average seek time is used. Kind of like the time required for the head to move over half of the platter.

I like to split up the total seek time into different subcomponents. I want to introduce something called the rotational latency and transfer time so that:

total seektime = head seektime + rotational latency + transfer time

Rotational latency starts occurring as soon as the hard disks head is locked onto the track where the needed sector resides. Rotational latency is the delay introduced by the question “where does a circle start?”. Sometimes the head will arrive at the track, just when the needed sector passes underneath (rotational latency is near-zero in that case). On the other hand, it might just have passed underneath the head, and now the system has to wait almost an entire rotation of the disk underneath the head for the sector to pass by. Rotational latency in this case is at its maximum.

Considering this, you may be able to extract that the average rotational latency is half of a full disk rotation. From there on, it is easily understood that the speed at which the disk rotates directly impacts the rotational latency and thus the total seektime.

Third item, the transfer time. This is the time required to actually read the data from the disk platter once the head is seeked. The transfer time in itself is influenced by three factors: The size of the chunk of data that has to be read (eg how many sectors does the disk have to read) and the rotational speed. Did I say three items? Indeed. Nowadays, data is stored all over the platter in an equal density. This means that a single full rotation on the centre of the platter delivers less data than a full rotation over the edge of the platter. That is why it is always stated that data on the edge of the platter is always “faster” than data stored near the centre of the platter.

Add all of these influences together, and you get a clear view on all actions that introduce delays in order to get to the data you want.

IOP sizes

Before we look at stringing disks together in part2, we’ll briefly look at the balance between IOPs and throughput. From what I’ve seen, not too many people look at these two things together. Some people “think” only in throughput (MBs per second), others tend to think only in IOPs (how many I/O operations can a disk perform in one second). But these two numbers actually are linked together by something we call the IOP size. For example, take a standard 7200rpm SATA drive. This drive can do about 40MB/sec, but also about 70 IOPs. If I was to choose an IOP size of 1Mbyte, you can imagine the drive would saturate long before 70 IOPs, because then we would be trying to put 1[Mbyte] * 70 [IOPs] = 70 [MBytes] of data through it very second. The other way around, if I choose an IOP size of 4 Kbytes, I would not get more done then 70 [IOPs] * 4[KBytes] = 280 [KBytes/sec] or 0,28 [MBytes/sec].

This balance between IOPs and throughput can be concluded logically. When using very large IOP sizes, it is almost like sequential reading or writing to a disk: many sectors are read, all on the same track or maybe just a single track away (when we go beyond full-circle). Transfer time is the one that poses the biggest impact in this scenario. Head seeking much less so, because once we have done one seek all we have to do is tranfserring data, maybe just skip to the next track. This last item is also called the track-to-track (head) seek time. This value is much smaller than the average track seek time (being a seek halfway across the platter).

On the other hand, when we are using very small IOP sizes, the transfer time far less relevant. What starts to count now is the head seek time and the rotational latency. On an IOP size of 4Kbytes, we read very few sectors (transfer time is always short), but for every 4 KBytes of data we have to perform a head seek and we have to take into account the rotational latency after each head seek. So in this scenario, the disk is seeking and seeking without transferring a lot of data.

The example above (hopefully) clarifies the difference between seektimes and transfertimes. This is in fact also the main difference between sequential and random based IO operations: Sequential operations which transfer a lot of data basically seeks to a track, then reads all the sectors on that track. After that, only a treack-to-track seek occurs (which is only a fragment of average seektimes), after which another full track is read. Transfer time is the main delay here, while in random IO patterns its the head seek time and the rotational latency that limit the capabilities of a drive.

“Its the latest thing” – Solid State Drives (SSD)

In the past 10 years (even more), disks have gotter bigger and bigger. They have hardly gone any faster though. With the new high performance servers, more and more spindles were required in order to “keep up” with these fast servers. A really new step in disks are Solid State Drives, or SSDs for short. I will not get into too much details, such as why enterprise level SSds are so much faster than consumer SSDs, and how SSDs perform tricks to lengthen their lifetime – All that matters for the purpose of this blog, is to state that the transfer time is not even that much higher than in conventional harddisks, but the seektimes are dramatically shorter. There is no physical arm to move to a track, there is no rotational latency involved. Why? Simple: There are no moving parts at all. Its all basically a bunch of memory chips. Seektime still exists in SSDs, but it is the time required to fetch a block of data from the memory chips. To put things into perspective, a SATA harddisk can do about 70 IOPS, an SSD drive can easily deliver 20.000 IOPS or more.

So if we add Solid State Disks in the mix, you can imagine that although SSDs might not have a very impressive throughput in regard to harddisks, the random seek of very small data blocks can make it a very big winner in some cases (but note, not all!)

Getting to the bottom of it

So how do we get to know the IOP size we are firing away at our disks or storage boxes? One simple and effective way in a windows environment, is to use perfmon. when you for example unleash perfmon on a windows machine, you often see a write size of 4Kbytes (which is the default blocksize in NTFS). Reads that are observed are often a mix of both 4Kb and 64KB blocksizes. This could help us a little when we try to optimize disk IOPs for a specific application. Want to get the most out of your SQL server (or any other disk-intensive server for that matter)? then know the leading blocksize, and also… know the kind of IO patterns (sequential or random).

Take great care too, a sequential IO pattern is easilly disrupted and becoming a random IO pattern. For example, virtualisation. You can have a VM performing sequential I/O. But if you put another disk of another VM on the same VMFS, or even on the same set of disks (!!), the IO pattern is already “downsized” to a random access pattern. So if you are sure IO patterns are sequential, and you want to make the most of that, then simply reserve a set of disks for that sequential data ONLY.

When you know IOP sizes, access pattern, IOP and throughput requirements, it is on to choosing the right disks, and the right RAID level. I’ll continue this in part two of this blogpost.

Stay tuned!

Posted in Storage |

Tags: disks, IOPS, latency, RAID, rotational latency, seek time, spindles, Storage, throughput, VMware

Erik Zandboer says:

June 30, 2011 at 22:42

Hi,

I do not think you can actually see the read and write sizes from the Windows operating system using perfmon. Some “real” RAID controllers can give you such statistics, but not the cheaper onboard ones. I look at the read and write block sizes using vscsiStats (a tool within in ESX). This tool can output the blocksizes used at which frequency. Check out this to get an idea of vscsiStats: http://www.vmdamentals.com/?p=722 .

I usually simulate reads and writes in Windows using iometer. I then use vscsistats to see if vSphere “does” anything with the blocks and blocksizes.

Generally speaking, most Windows reads are usually 4KB (which is the default NTFS “clustersize” used for Windows bootdisks). Writes are often either 4KB, 32KB or (mostly) 64KB. These are larger blocks written, mostly to the swapfile. I believe Windows 7 is able to do even larger blocks up to 1 MB.

Throughput part 2: RAID types and segment sizes says:

January 18, 2010 at 13:50

[…] HomeAbout « Throughput part 1: The Basics […]

Performance impact when using VMware snapshots says:

January 22, 2010 at 22:16

[…] ***) The initial snapshot (which is 1 GByte in size) is now cleaned from disk. All of a sudden, the WOPS increase even further somewhat. To my knowledge, this has to do with the snapshot size. Previously the snapshot size was 1GByte, causing a lot of random reads over a larger part of the physical disks. This accounts for the hard disk head having to seek over a larger portion of the platter, delivering less IOPS. After the big snapshot file is gone, only a very small snapshot file remains. Now we get closer to track-to-track seeks (almost sequential IO patterns) which have a smaller head seek time, thus increasing the number of IOPS the disks deliver (for more info on this read Throughput part 1: The Basics). […]

Tons of Formulas For My Virtual Infrastructure « DeinosCloud says:

October 30, 2010 at 15:42

[…] for HP StorageWorks Enterprise Virtual Array (EVA) family and VMware vSphere 4 White Paper, Throughput part 1: The Basics, Understanding disk […]

The Elusive Miss Alignment says:

November 30, 2010 at 12:08

[…] (which in general takes a very long time on physical disks compared to all other actions; see Throughput Part 1: The Basics). The double seek required in this case should degrade performance. Sounds easy enough, so now to […]

vscsiStats in 3D part 2: VMs fighting over IOPS says:

November 30, 2010 at 21:58

[…] well, increasing the overall seek distance, increasing latency and sinking throughput (see “Throughput part 1: The Basics” for more details on latency and seek […]

“If only we could still get 36GB disks for speed” says:

February 18, 2011 at 10:40

[…] where I’m going? If not, here is a hint: Throughput part 1: The Basics. In random access patterns, the biggest latency in physical disks comes from the average seek time […]

Gilgamesh says:

June 1, 2011 at 14:16

So what counters do I need to look at in perfmon exactly to find the read and write sizes? Are the sizes in Windows always the same? What if I format the disk using larger sector sizes?

Erik Zandboer says:

June 30, 2011 at 22:42

Hi,

I do not think you can actually see the read and write sizes from the Windows operating system using perfmon. Some “real” RAID controllers can give you such statistics, but not the cheaper onboard ones. I look at the read and write block sizes using vscsiStats (a tool within in ESX). This tool can output the blocksizes used at which frequency. Check out this to get an idea of vscsiStats: http://www.vmdamentals.com/?p=722 .

I usually simulate reads and writes in Windows using iometer. I then use vscsistats to see if vSphere “does” anything with the blocks and blocksizes.

Generally speaking, most Windows reads are usually 4KB (which is the default NTFS “clustersize” used for Windows bootdisks). Writes are often either 4KB, 32KB or (mostly) 64KB. These are larger blocks written, mostly to the swapfile. I believe Windows 7 is able to do even larger blocks up to 1 MB.

VMdamentals.com