Backwards VDI math: Putting numbers to the 1000 user RA

EMC and VMware have published a joined Reference Architecture where an EMC VNX5300 using a minimum configuration of disks squeezes out the required IOPS for a thousand VDI users. That is awesome stuff, but how to go about using and remodeling this RA for your own needs? In this blog post I’ll try to put some numbers to it, both validating and enabling you to resize for your needs.

A very cool use case: VMware View and 1000 vDesktops running off an EMC VNX5300

This is a very VERY cool one. You can find the Reference Architecture and Proven Solutions Guide through the following links. As EMC and VMware joined hands putting this out, the Reference Architecture is available from both EMC and VMware:

EMC infrastructure for VMware View 5.0 Reference Architecture (EMC)
EMC infrastructure for VMware View 5.0 Proven Solutions Guide (EMC)
EMC infrastructure for VMware View 5.0 Reference Architecture (VMware)

I’ll just drop all of coolness on you right away: In this RA, EMC runs 1000 View desktops (consuming 9.8 IOPS each) on 15x 15K SAS drives with only TWO 100GB EFDs configured as FAST-cache in front of them.

Yes, that is ONE THOUSAND vDesktops
out of 15 spinning disks and two small EFDs!

This was sized on 15K SAS drives. Looking closer, you could even get away with the same config running from 15x 10K SAS drives with two EFDs in front as the spinning disks were doing only 40 IOPS a piece. Awesome stuff!

Doing some nasty calcs

So how is it possible to run a workload like that from such a small number of disks? I’ll just throw in some math. Not an exact science here anymore, so I will be rounding off numbers to make it easier to grab the idea.

This calculation is going to be backwards from what most people are used to. I will not convert to backend IOPS straight away, because IOPS will be coming form different sources with different RAID overheads. It is a new way of playing with numbers around IOPS, in this case tuned to a VDI environment.

I am sorry, there is no clean or easy way to do this… It is numbers numbers and more numbers. Hopefully I won’t bore you to death πŸ™‚

I will emphasize the partial results in RED(ish) in order to at least keep some kind of logical progression through the story.

Calculating VDI with FAST-cache part 1: Performance

As the desktops were light users, I assume each linked clone will have around 0,1GB of “HOT” data. This means that 0,1GB of its total footprint (both from replica and linked clone) will be involved with around 95% of its related IOPS. This is exactly where FAST-cache draws its power from in the VDI use case.

So the total footprint carrying hot data will be 0.1GB * 1000 = 100GB. This hot data will partly be delivered directly from DRAM cache, the rest should be delivered from FAST-cache. The used array (VNX5300) has 2*8GB of cache. As we focus on writes (and writes need to be mirrored in cache), I’ll assume 10GB as total cache size here given the 20/80 read/write ratio.

So the DRAM cache will absorb (10/100) = 10% of the hottest of the hottest IOPS. The key here is in the “hottest of hottest”: this is not a linear thing, and the DRAM will be delivering relatively more than 10% of the IOPS (as this data is “hotter” than the average hot data). I am guesstimating we can just about quadruple this number (!!), so we’ll estimate that 40% of the hot IOPS should be coming from DRAM cache.

Knowing this, we can start building out numbers into the IOPS realm. The desktops deliver 1000* 9.8 = 9800 IOPS in total. We aim for 95% to be delivered from any of the caching tiers, so that makes 9800*0.95 = 9300 IOPS. If 40% gets delivered from DRAM cache, the remaining 60% should be delivered from FAST-cache: 9300*0.60 = 5600 frontend IOPS.

As FAST-cache runs in RAID10, we need to accomodate a write penalty of two for the WOPS going to FAST-cache. As 80% were writes, we get:

5600 * 0.8 * 2 = 9000 WOPS
5600 * 0.2 = 1100 ROPS

In total, the FAST-cache SSDs need to deliver 9000+1100 = 10.000 backend IOPS.

The remaining 5% of the IOPS, which is 9800*0.05 = 490 frontend IOPS need to be delivered by the SAS drives. At 80% writes and a write penalty of 4 (for RAID5) that is:

490 * 0.8 * 4 = 1600 WOPS
490 * 0.2 = 100 ROPS

So the SAS drives need to deliver 1700 backend IOPS. With a drive delivering 170 IOPS a piece (15K drives), you need 10x 15K SAS drives to accommodate for the IOPS.

Calculating VDI with FAST-cache part 2: Capacity

Required capacity for the environment as a whole is another ballgame. Here we need to look at capacity consumed by replicas, linked clones, swap files etc. I will use the same “loose math” here and do some wild (yet realistic) assumptions:

  • For 1000 desktops we need at least two replicas, which I assume to be 20GB a piece totals to 40GB;
  • For 1000 desktops we need linked clones which grows to 2GB a piece totals to 2000GB;
  • For 1000 desktops we need vSphere level swap files of 2GB a piece which totals to 2000GB;
  • For 1000 desktops we need config and logging files of 250MB a piece which totals to 250GB;

In total we will be needing around 40+2000+2000+250 = 4,3TB of storage capacity. As both DRAM cache and FAST-cache is CACHE, they do not participate in the capacity calculations.

To obtain 4TB of storage, we could use either 300GB SAS drives or 600GB SAS drives. When we use RAID5 (4+1) sets, we need this number of drives to satisfy the capacity:

Net capacity per 300GB drive in RAID5 (4+1): (4/5)*280GB = 220GB.
Net capacity per 600GB drive in RAID5 (4+1): (4/5)*560GB = 450GB.

In groups of 5 drives, we could get to 4,3TB like this:

220GB*20 = 4,4TB     or     450GB*10 = 4,5TB.

So capacity-wise we could get away with 10x 600GB drives.

Calculating VDI with FAST-cache part 3: Finale

In the end, if we combine the performance and the capacity calculations and really push things we need at least these drives for the solution:

2x 100GB EFD drives (in FAST-cache; delivering 5000 IOPS a piece);
10x 15K 600GB SAS drives (delivering 4,5TB of total capacity and 170 IOPS a piece);

So when we really would be pushing things, we might get away with 2x 100GB EFD and 10x 15K SAS 600GB. In the reference architecture there are actually 2x 100GB EFD drives, and 15x 15K 600GB SAS drives to deliver the VDI workload. So the math was not all that much off: The EFD calculations appear to be spot on, but the 15 SAS drives were actually doing only 40 IOPS a piece in the RA. I need to look into that πŸ™‚

In real life scenarios, I’d be tempted to stick to the 15x 15K drives, but grow to 4x FAST-cache drives of 100GB each. This is mainly because the 5000 IOPS are pushing it for sustained performance on the EFDs, and the tested workload was a synthetic one, and real life workloads tend to use a larger footprint for hot IOPS. Naturally, as your workload will vary, you will need to adjust these numbers and rerun the theory.

I want to end this blog post by a very important statement:

Theoretical sizing will give you a good starting point, but never an exact sizing. “You mileage may vary” is a very true statement here!

4 Responses to “Backwards VDI math: Putting numbers to the 1000 user RA”

  • Mr.R says:

    Hi Erik,

    The IOPS taken for EFD are unrealistic. I would never recommend anyone to go beyond the 2500 IOPS per EFD drive as a maximum. You’re also forgetting the warm-boot feature of the FAST-cache algorithm in your example. Your figures will need to be doubled (for EFD and SAS) to be on the safe side πŸ™‚

    Regards, R.

    • Hi Mr.R . I am not saying that it would be wise to size at 5000 IOPS for the EFDs, nor that the math I do is an exact one… Normally we use 2500 or 3000 at max (and that is considered pushing it) for an EFD. Yet the RA clearly shows EFDs doing over 4000 IOPS – Not saying that is OK as a design baseline… I am merely trying to put some loose math to the numbers the RA gives us. Like I also stated in the post: My first reaction to the RA was that I’d use at least 4 EFDs in any real life scenario, or even more if your desktops push more IOPS. The loose math should give some information on how to go about sizing a STARTING POINT with this RA as a baseline.

  • […] HomeAbout « Backwards VDI math: Putting numbers to the 1000 user RA […]

  • […] as an excellent solution. EMC has been promoting their FAST-cache with great success. What about 1.000 light VDI users on 15x 15K 300GB spindles with just two FAST-cache SSD drives 100GB each? Wow. VMware’s response to read caching: […]

Soon to come
  • Coming soon

    • Determining Linked Clone overhead
    • Designing the Future part1: Server-Storage fusion
    • Whiteboxing part 4: Networking your homelab
    • Deduplication: Great or greatly overrated?
    • Roads and routes
    • Stretching a VMware cluster and "sidedness"
    • Stretching VMware clusters - what noone tells you
    • VMware vSAN: What is it?
    • VMware snapshots explained
    • Whiteboxing part 3b: Using Nexenta for your homelab
    • widget_image
    • sidebars_widgets
  • Blogroll