vscsiStats in 3D part 2: VMs fighting over IOPS

vscsiStats is definitely a cool tool. Now that the 2D barrier was broken in vscsiStats into the third dimension: Surface charts! it is time to move on to the next level: Multiple VMs fighting for IOPS!



Update: Build your own 3D graphs! Check out vscsiStats 3D surface graph part 3: Build your own!



I figured the vscsiStats would be most interesting in a use case where two VMs are battling for IOPS from the same RAID set. A single VM would have to force I/O on a RAID set. Wouldn’t it be cool to start a second VM on the same RAID set later on and to see what happens in the 3D world? In this blogpost I’m going to do just that!



TO THE LAB!

The setup is simple: Take a LUN on a RAID5 array of (4+1) SATA72K spindles, take two (Windows 2003 server) VMs which have a datadisk on this LUN. Now install iometer on both VMs. These two instances of iometer will be used to make both VMs fight for IOPS.

The iometer load is varied between measurements, but globally it emulates a server load (random 4K reads, random 4K writes, some sequential 64K reads).

First only a single VM runs the iometer load. At 1/3rd of the sample-run, the second VM is started to produce the same IO pattern. At 2/3rd, the first VM stops its IO pattern load. This results in the following graph:

VMs fighting for IOPS - blocksize view
Cool right? Take your time to look at this graph: Along the X-axis you can see time going by. On the Y-axis you can see the blocksizes of BOTH VM. Yes, both VMs are shown behind each other. I found no other decent way to compare two 3D graphs (I really need a fourth dimension!). You can clearly see the used blocksizes, the number of IOPS and the 1/3rd and 2/3rd changes to the graph. From zero to 300 seconds the first VM solely put IOPS to the RAID set. At 300 seconds, the second VM kicks in, immediately showing in the first VM! Both VMs are clearly getting in each other’s way her. Around 900 seconds, the first VM lets go and this immediately shows in the performance of the second VM; this VM now starts to perform like the first one in the beginning.

The point here is that there is a very clear impact between both VMs. So how about latencies? Well, here you have it:

VMs fighting for IOPS - Latency view
If all else fails, I can always go into 3D gaming :). Some explanation is required on this graph as well. As the first VM starts out, it performs at a certain level with a certain latency. As the second VM kicks in, the graph gets lower, but most apparent, it shifts to the rear. This means that latency increases due to the added VM workload. Both VMs perform the same iometer workload, and that shows: between 300 and 900 seconds latency is high, and throughput sinks. After 900 seconds, the first VM clearly lets go and the second VM starts to show a graph comparable to the first VM during the first 300 seconds.

Another thing to note (which is easier to spot when you look at the performance numbers), is the fact that the sum of both VM performing IOPS is not equal to the number of IOPS a single VM achieves; the single VM produces better results than the sum of both VMs. Thinking it through, it all makes sense. Each VM’s disk is located somewhere on the platter. At that platter location, a 1GB file is used to perform all the test IOPS. As the second VM kicks in, a second 1GB file, located somewhere else on the platter start to load with IOPS as well, increasing the overall seek distance, increasing latency and sinking throughput (see “Throughput part 1: The Basics” for more details on latency and seek times).

So you might say, on to the distance between IOPS graph:

VMs fighting for IOPS - Seek distance view
Unfortunately, we cannot see the seek distance between the two VMs here, so the graph is not all that exciting (apart from the optical perspective šŸ˜‰ ).

ANOTHER TEST: One VM performing random I/O, the other sequential I/O

I expected the impact on a VM performing sequential I/O would be devastating when a random (or even a second sequential) workload is introduced to the RAID set. This is because when performing sequential I/O on a RAID volume, all seeks on the disk have a very small (or even zero) cylinder seek distance. As soon as another workload is added, the seek distance grows and the sequential I/O performance should plummet. And that is exactly what happened:

VMs fighting for IOPS one sequential one random - blocksize view

As you can see, we start off with the VM performing sequential I/O (the high rising graph at the most left). Performance is pretty good:

  • The graph for VM1 at the left is around 36.000 units high;
  • I/O’s are all 64KB in size (check the position of the pattern at the axis on the left)
  • Sampletime is 60 seconds per 2D histogram “ribbon”.



From these values you can derive total throughput. In this case 36.000 IOPS were performed every 60 seconds, so 36000/60 = 600 IOPS. Each I/O is 64KB in size, so the throughput in this case was 600*64 = 38400 KB/sec or 38,4 MB/sec.

But then at 360 seconds, the random IOPS performing VM cuts in (graph on the right). Immediately there is a devastating effect on the first VM: VM1’s sequential IOPS are dropping from 36.000 to a significantly lower value of 7.000. That is about 1/5th of its original performance!

Finally, VM1 (the sequential load) is stopped. Funny to see that the performance of VM2 (the random I/O load VM) now grows, but only marginally.

A very important point can be derived from this: A pure sequential load will be heavily impacted when another load is induced on the same set of spindles.

On the other hand, a load that is already random will not suffer too much from other loads being induced on the same set of spindles. Pure sequential workloads are rare, but isolating these loads on their own spindles makes all the difference!

Now let’s look at the latency graph for this measurement:

VMs fighting for IOPS one sequential one random - latency view
To no surprise, the sequential workload of VM1 drops sharply in this graph as well. At the same time, latency increases. The random I/O load of VM2 is much less effected.

Finally, since we mix sequential and random loads, it is interesting to look at the I/O distance graph for both VMs:

VMs fighting for IOPS one sequential one random - distance view
VM1 (left graph) clearly performs sequential I/O (graph shows all IOPS around the ‘0’ distance). VM2 performs random I/Os (all I/O performed at the edges of the graph). The impact on performance is very clearly visible here as well.

If I find another excuse to use surface graphs on vscsiStats statistics I’ll surely write a part 3! Too bad I cannot measure VMs that get snapshotted…

5 Responses to “vscsiStats in 3D part 2: VMs fighting over IOPS”

Soon to come
  • Coming soon

    • Determining Linked Clone overhead
    • Designing the Future part1: Server-Storage fusion
    • Whiteboxing part 4: Networking your homelab
    • Deduplication: Great or greatly overrated?
    • Roads and routes
    • Stretching a VMware cluster and "sidedness"
    • Stretching VMware clusters - what noone tells you
    • VMware vSAN: What is it?
    • VMware snapshots explained
    • Whiteboxing part 3b: Using Nexenta for your homelab
    • widget_image
    • sidebars_widgets
  • Blogroll
    Links
    Archives