Posts Tagged ‘snapshot’

Snapshot Consolidation needed – Which with my luck… fails

As I am testing several third party backup tools, this morning I stumbled upon a failed backup. No snapshot present on the VM which could not be backed up – but a yellow mention in the VI client: “Configuration Issues – Virtual Machine disks consolidation is needed“. And my luck was, that selecting “consolidate” ended in that one brilliant error: Unable to access file since it is locked. Great. Here’s what was wrong!

Read the rest of this entry »

Under the Covers with Miss Alignment Part 2: Linked Clones

This post is the continuation of Under the Covers with Miss Alignment: I keep hearing this rumor more and more often: It appears that both snapshots and linked clones on vSphere 4.x and 5.0 are misaligned. Not having had the time to actually put this to the test, I thought it would at least be informative to give you some more down-and-dirty information on the subject.

Read the rest of this entry »

Ghostly snapshots: Failed to remove snapshot

Recently I had another one of those great little problems. A VM refused to have its snapshot removed. Not because the snapshot was too big, it just failed.

Read the rest of this entry »

Quick dive: ESX and maximum snapshot sizes

Even today I still encounter discussions about snapshots and their maximum size. It is somewhat too simple a test for my taste, but I’m posting it anyway so hopefully I don’t have to repeat this “yes/no”-discussion every time 🙂



The steps to take are easy:

  1. – Take any running VM;
  2. – Add an additional disk (not in independent mode);
  3. – Fill this disk with data;
  4. – Check out the snapshot size;
  5. – Delete all data from the disk;
  6. – Fill the disk once again with different data just to be sure;
  7. – Check the snapshot size again.

So here we go:



Create an additional disk of 1GB, and we see this:

-rw------- 1 root root 65K Oct 18 09:58 Testdisk-ctk.vmdk
-rw------- 1 root root 1.0G Oct 18 09:28 Testdisk-flat.vmdk
-rw------- 1 root root 527 Oct 18 09:56 Testdisk.vmdk

As you can see, I created a Testdisk of 1GB. The Testdisk-ctk.vmdk file comes from Changed Block Tracking, something I have enabled in my testlab for my PHD Virtual Backup (formerly esXpress) testing.



Now we take a snapshot:

-rw------- 1 root root 65K Oct 18 09:59 Testdisk-000001-ctk.vmdk
-rw------- 1 root root 4.0K Oct 18 09:59 Testdisk-000001-delta.vmdk
-rw------- 1 root root 330 Oct 18 09:59 Testdisk-000001.vmdk
-rw------- 1 root root 65K Oct 18 09:59 Testdisk-ctk.vmdk
-rw------- 1 root root 1.0G Oct 18 09:28 Testdisk-flat.vmdk
-rw------- 1 root root 527 Oct 18 09:56 Testdisk.vmdk

Above you see that the Testdisk now has an additional file to it, namely Testdisk-000001-delta.vmdk. This is the actual snapshot file, where VMware will keep all changes (writes) to the snapped virtual disk. At this stage the base disk (Testdisk-flat.vmdk) is not modified anymore, all changes go into the snapshot from now on (you can see this in the next sections where the change date of the base disk stays at 9:59).



Now I log into the VM where the disk is added to, and I perform a quickformat on the disk:

-rw------- 1 root root 65K Oct 18 09:59 Testdisk-000001-ctk.vmdk
-rw------- 1 root root 33M Oct 18 09:59 Testdisk-000001-delta.vmdk
-rw------- 1 root root 385 Oct 18 09:59 Testdisk-000001.vmdk
-rw------- 1 root root 65K Oct 18 09:59 Testdisk-ctk.vmdk
-rw------- 1 root root 1.0G Oct 18 09:28 Testdisk-flat.vmdk
-rw------- 1 root root 527 Oct 18 09:56 Testdisk.vmdk

Interestingly, the snapshot file has grown a bit to 33MB. But it is nowhere near the 1GB size of the disk. Makes sense though, a quick format does not touch data blocks, only some to get the volume up and running. Because snapshot files grow in steps of 16[MB], I guess the quick format changed anything between 16MB and 32MB of blocks.



Next I perform a full format on the disk from within the VM (just because I can):

-rw------- 1 root root 65K Oct 18 09:59 Testdisk-000001-ctk.vmdk
-rw------- 1 root root 1.1G Oct 18 10:19 Testdisk-000001-delta.vmdk
-rw------- 1 root root 385 Oct 18 09:59 Testdisk-000001.vmdk
-rw------- 1 root root 65K Oct 18 09:59 Testdisk-ctk.vmdk
-rw------- 1 root root 1.0G Oct 18 09:28 Testdisk-flat.vmdk
-rw------- 1 root root 527 Oct 18 09:56 Testdisk.vmdk

Not surprising, the format command touched all blocks within the virtual disk, growing the snapshot to the size of the base disk (plus 0.1GB in overhead).



Let’s try to rewrite the same block by copying a file of 800MB in size onto the disk:

-rw------- 1 root root 65K Oct 18 09:59 Testdisk-000001-ctk.vmdk
-rw------- 1 root root 1.1G Oct 18 10:19 Testdisk-000001-delta.vmdk
-rw------- 1 root root 385 Oct 18 09:59 Testdisk-000001.vmdk
-rw------- 1 root root 65K Oct 18 09:59 Testdisk-ctk.vmdk
-rw------- 1 root root 1.0G Oct 18 09:28 Testdisk-flat.vmdk
-rw------- 1 root root 527 Oct 18 09:56 Testdisk.vmdk

Things get really boring from here on. The snapshot disk remains at the size of the base disk.



While I’m at it, I delete the 800MB file and copy another file on the disk, this time 912MB:

-rw------- 1 root root 65K Oct 18 09:59 Testdisk-000001-ctk.vmdk
-rw------- 1 root root 1.1G Oct 18 10:21 Testdisk-000001-delta.vmdk
-rw------- 1 root root 385 Oct 18 09:59 Testdisk-000001.vmdk
-rw------- 1 root root 65K Oct 18 09:59 Testdisk-ctk.vmdk
-rw------- 1 root root 1.0G Oct 18 09:28 Testdisk-flat.vmdk
-rw------- 1 root root 527 Oct 18 09:56 Testdisk.vmdk

Still boring. There is no way I manage to get the snapshot file to grow beyond the size of its base disk.


CONCLUSION

No matter what data I throw onto a snapped virtual disk, the snapshot never grows beyond the size of the base disk (except just a little overhead). I have written the same blocks inside the virtual disk several times. That must mean that snapshotting nowadays (vSphere 4.1) works like this:


For every block that is written to a snapshotted basedisk, the block is added to its snapshot file, except when that logical block was already written in the snapshot before. In this case the block already existing in the snapshot is OVERWRITTEN, not added.




So where did the misconception come from that snapshot files can grow beyond the size of their base disk? Without wanting to test all ESX flavours around, I know that in the old ESX 2.5 days a snapshot landed in a REDO log (and not a snapshot file). These redo logs were simply a growing list of written blocks. In those days snapshots (redo files) could just grow and grow forever (till your VMFS filled up. Those happy days 😉 ). Not verified, but I believe this changed in ESX 3.0 to the behavior we see today.

Performance impact when using VMware snapshots

It is certainly not unheared of – “When I delete a snapshot from a VM, the thing totally freezes!“. The strange thing is, some customers have these issues, others don’t (or are not aware of it). So what really DOES happen when you clean out a snapshot? Time to investigate!

Test Setup

So how do we test performance impact on storage while ruling out external factors? The setup I choose was using a VM with the following specs:

Read the rest of this entry »

Ye Olde Snapshot – by Erik Zandboer

A lot of people have had more or less unpleasant experiences with forgotten snapshots. You login in the morning, and a VM is down. “Strange” you think. After some investigation, you find out the VMFS volume on which the VM was running is full. Completely full. And to your horror you find out why – A forgotten snapshot is in place which has now grown beyond the size of the VMFS volume.

 

What exactly does a snapshot do

First thing to understand, is how a snapshot exactly works. When you add a snapshot, the original virtual disk is no longer written to. Each block that should be written into this file, is redirected to a snapshot file. So basically this snapshot file holds all changes made to the virtual disk after the snapshot was made. The more changes you make to blocks not changed before, the larger the snapshot file will grow (in steps of 16MB). Each changed block is stored inside the snapshot file only once. This means that a snapshot file can reach a sometimes staggering size equal or almost equal to the size of the original virtual disk (defragmentation inside a VM is my personal favorite 😉 ).

 

Monstrous snapshot – now what?

If you “forget” about a snapshot, changes are you will never notice this, right until it might be too late. Especially if you snapshotted a very large virtual disk, and have plenty of room left on the VMFS, snapshots can grow to immense sizes. Cleaning them up can be very time consuming indeed.

If you have found a very old snapshot file which has grown very large (eg. 10-40GB), you can actually delete the snapshot without problems, thereby committing all changes recorded in the snapshot file back to the original disk. So you end up with only the virtual disk as it appeared when the snapshot was in place, only without the snapshot there. But beware – If you delete the snapshot from vCenter (got to get used to that name instead of VirtualCenter), you might very well get a timeout. This has given some people some really sweaty fingers. Don’t panic, login to the ESX node itself, and you’ll probably see that the snapshot is still being removed. It might take an hour, it might take four hours, but in time the snapshot should remove itself.

 

VMFS full – How to get the VM running again

If a forgotten snapshot fills up the entire VMFS, you might run out of VMFS space. chances are that your snapshotted VM stops. This is because the VM is trying to write to its disk, and the snapshot needs to grow but it can’t. There are two ways to resolve this: 1) make room on the VMFS, or 2) delete the snapshot while the VM remains off. In a production environment, option 2) might not work for you. Deletion of large snapshots might take hours. So we’re back to making room on the VMFS. Maybe you can or move another VM from the VMFS. Maybe you have some ISOs laying about the VMFS you can delete. Then you can start your troubled VM again, and remove the snapshot while the VM is running again. A last resort might even be to give the VM less memory, or put its swapfile in another location (possible in ESX 3.5u3). Then start to delete the snapshot right away, before it manages to fill up the VMFS again.

I have even heard of people who put a 2GB dummy file on each VMFS volume, so that when it comes to these issues they just delete the file – and gain 2 Gbytes of space. If forgetting snapshots is your habit, you might consider this as a “best practice” for your environment… 

 

50GB+ snapshot – Delete or…?

What if you have a really big snapshot (and I mean 50+ GB), or you might even have multiple huge snapshots in place? Or even have snapshots that appear to be garbled in their linkage (horrors like “cannot delete snapshot because the base disk was modified after the snapshot was taken”). You might not want to risk deletion of these snapshot(s). There is another way to recover safely, especially if you run Windows 2003 or later, which should be much more advertised: VMware Converter! It is really a magical tool. Not only for P2V, but also in cases exactly like this. While you keep your VM running, just point Converter to the VM while telling Converter it is a physical machine. Converter will install its agent inside the VM, and start to duplicate your VM to another LUN. After the conversion, the target VM will be free of any snapshots!

This option also works great if you have issues with your SAN. I have seen environments that had LUNs you could not even browse through any more (not from the datastore browser nor via ssh) – but VMs placed there were still running OK. It shows stability and enterprise-readyness of ESX for sure, but how to recover? Even restarting the VM or scanning LUNs is risky here. The simple answer was: Use Converter. Simply use Converter! To make a short story even shorter: converter saved the day 🙂

So I guess as a final word I should say: For VM recovery from even the weirdest disk-related issues, consider to use VMware Converter !

Soon to come
  • Coming soon

    • Determining Linked Clone overhead
    • Designing the Future part1: Server-Storage fusion
    • Whiteboxing part 4: Networking your homelab
    • Deduplication: Great or greatly overrated?
    • Roads and routes
    • Stretching a VMware cluster and "sidedness"
    • Stretching VMware clusters - what noone tells you
    • VMware vSAN: What is it?
    • VMware snapshots explained
    • Whiteboxing part 3b: Using Nexenta for your homelab
    • widget_image
    • sidebars_widgets
  • Blogroll
    Links
    Archives