Pages

Wednesday, 22 August 2012

Understanding virtual machine snapshots in VMware ESXi and ESX

Symptoms

This article may be helpful when you encounter these issues:
  • Virtual machines are not responding or cannot start due to broken parent and child virtual disk dependencies.
  • Virtual machines are not responding or do not start due to redo logs residing on datastores that do not have free space.
  • Snapshot creation takes too long when specifying the memory snapshot option.
  • Snapshot delete or remove operations result in the vSphere or VMware Infrastructure (VI) Client timing out.
  • Backups fail while quiescing during a snapshot operation.

Purpose

This article provides information about virtual machine snapshots.

Resolution

  

What is a snapshot?

A snapshot preserves the state and data of a virtual machine at a specific point in time.
  • The state includes the virtual machine’s power state (for example, powered-on, powered-off, suspended).
  • The data includes all of the files that make up the virtual machine. This includes disks, memory, and other devices, such as virtual network interface cards.
A virtual machine provides several operations for creating and managing snapshots and snapshot chains. These operations let you create snapshots, revert to any snapshot in the chain, and remove snapshots. You can create extensive snapshot trees.

In VMware Infrastructure 3 and vSphere 4.x, the virtual machine snapshot delete operation combines the consolidation of the data and the deletion of the file. This caused issues when the snapshot files are removed from the Snapshot Manager, but the consolidation failed. This left the VM still running on snapshots, and the user may not notice until the datastore is full.

In vSphere 4.x, an alarm can be created to indicate if a virtual machine was running in snapshot mode. For more information, seeConfiguring VMware vCenter Server to send alarms when virtual machines are running from snapshots (1018029).

In vSphere 5.0, enhancements have been made to the snapshot removal. In vSphere 5.0, you are informed via the UI if the consolidation part of a RemoveSnapshot or RemoveAllSnapshots operation has failed. A new option, Consolidate, is available via the Snapshot menu to restart the consolidation.

Creating a snapshot

When creating a snapshot, there are several options you can specify:
  • Name: This is used to identify the snapshot.
  • Description: This is used to describe the snapshot.
  • Memory: If the <memory> flag is 1 or true, a dump of the internal state of the virtual machine is included in the snapshot. Memory snapshots take longer to create.
  • Quiesce: If the <quiesce> flag is 1 or true, and the virtual machine is powered on when the snapshot is taken, VMware Tools is used to quiesce the file system in the virtual machine. Quiescing a file system is a process of bringing the on-disk data of a physical or virtual computer into a state suitable for backups. This process might include such operations as flushing dirty buffers from the operating system's in-memory cache to disk, or other higher-level application-specific tasks. 

    Note: Quiescing indicates pausing or altering the state of running processes on a computer, particularly those that might modify information stored on disk during a backup, to guarantee a consistent and usable backup.
When a snapshot is created, it is comprised of these files:
  • <vm>-<number>.vmdk and <vm>-<number>-delta.vmdk
    A collection of .vmdk and -delta.vmdk files for each virtual disk is connected to the virtual machine at the time of the snapshot. These files can be referred to as child disks, redo logs, or delta links. These child disks can later be considered parent disks for future child disks. From the original parent disk, each child constitutes a redo log pointing back from the present state of the virtual disk, one step at a time, to the original.

    Note: The <number> value may not be consistent across all child disks from the same snapshot. The file names are chosen based on filename availability.
  • <vm>.vmsd
    The .vmsd file is a database of the virtual machine's snapshot information and the primary source of information for the snapshot manager. The file contains line entries which define the relationships between snapshots as well as the child disks for each snapshot.
  • <vm>Snapshot<number>.vmsn
    These files are the memory state at the time of the snapshot.
Note: The above files will be placed in the working directory by default in ESX/ESX 3.x and 4.x. This behavior can be changed if desired. For more information on creating snapshots in another directory, see Creating snapshots in a different location than default virtual machine directory (1002929). In ESXi 5.x and later snapshots descriptor and delta VMDK files will be stored in the same location as the virtual disks (which can be in a different directory to the working directory). To change this behavior, seeChanging the location of snapshot delta files for virtual machines in ESXi 5.0 (2007563)

What products use the snapshot feature?

In addition to being able to use snapshot manager to create snapshots, snapshots are used by many VMware and third-party products and features. Some VMware products that use snapshots extensively are:
  • VMware Data Recovery
  • VMware Lab Manager
  • VMware vCenter and the VMware Infrastructure Client (Snapshot Manager, Storage vMotion)
Note: This is not an exhaustive list.

How do snapshots work?

Our VMware API allows VMware and third-party products to perform operations with virtual machines and their snapshots. This is a list of common operations that can be performed on virtual machines and snapshots using our API:
  • CreateSnapshot: Creates a new snapshot of a virtual machine. As a side effect, this updates the current snapshot.
  • RemoveSnapshot: Removes a snapshot and deletes any associated storage.
  • RemoveAllSnapshots: Remove all snapshots associated with a virtual machine. If a virtual machine does not have any snapshots, then this operation simply returns successfully.
  • RevertToSnapshot: Changes the execution state of a virtual machine to the state of this snapshot.
  • (vSphere 5.0 only) Consolidate: Merges the hierarchy of redo logs.
This is a high-level overview of how to create, remove, or revert snapshot requests that are processed within the VMware environment:
  1. A request to create, remove, or revert a snapshot for a virtual machine is sent from the client to the server using the VMware API. 
  2. The request is forwarded to the VMware ESX host that is currently hosting the virtual machine in question.

    Note: This only occurs if the original request was sent to a different server, such as vCenter, which is managing the ESX host.
  3. If the snapshot includes the memory option, the ESX host writes the memory of the virtual machine to disk.

    Note: The virtual machine is stunned throughout the duration of time the memory is being written. The length of time of the stun cannot be pre-calculated, and is dependent on the performance of the disk in question and the amount of memory being written. ESX/ESXi 4.x and later have shorter stun times when memory is being written. For more information, seeTaking a snapshot with virtual machine memory stuns the virtual machine while the memory is written to disk (1013163).
  4. If the snapshot includes the quiesce option, the ESX host requests the guest operating system to quiesce the disks via VMware Tools.

    Note: Depending on the guest operating system, the quiescing operation can be done by the sync driver, the vmsync module, or Microsoft's Volume Shadow Copy (VSS) service. For more information on quiescing, see Troubleshooting Volume Shadow Copy (VSS) quiesce related issues (1007696) for VSS or A virtual machine can freeze under load when you take quiesced snapshots or use custom quiescing scripts (5962168) for the SYNC driver.
  5. The ESX host makes the appropriate changes to the virtual machine's snapshot database (.vmsd file) and the changes are reflected in the snapshot manager of the virtual machine.

    Note: When removing a snapshot, the snapshot entity in the snapshot manager is removed before the changes are made to the child disks. The snapshot manager does not contain any snapshot entries while the virtual machine continues to run from the child disk. For more information, see Committing snapshots when there are no snapshot entries in the snapshot manager (1002310).
  6. The ESX host calls a function similar to the Virtual Disk API functions to make changes to the child disks (-delta.vmdkand .vmdk files) and the disk chain.

    Note: During a snapshot removal, if the child disks are large in size, the operation may take a long time. This can result in a timeout error message from either VirtualCenter or the VMware Infrastructure Client. For more information about timeout error messages, see vCenter operation times out with the error: Operation failed since another task is in progress (1004790).

The child disk

The child disk, which is created with a snapshot, is a sparse disk. Sparse disks employ the copy-on-write (COW) mechanism, in which the virtual disk contains no data in places, until copied there by a write. This optimization saves storage space. The grain is the unit of measure in which the sparse disk uses the copy-on-write mechanism. Each grain is a block of sectors containing virtual disk data. The default size is 128 sectors or 64KB.

Child disks and disk usage

It is important to note these points regarding the space utilization of child disks:
  • If a virtual machine is running off of a snapshot, it is making changes to a child or sparse disk. The more write operations made to this disk, the larger it grows.
  • The space requirements of the child disk are in addition to the parent disk on which it depends. If a virtual machine has a 10 GB disk with a child disk, the space used will be 10 GB + the child disk size.
  • Child disks have been known to grow large enough to fill an entire datastore.
  • The speed at which child disks grow is directly dependent on the amount of I/O being done to the disk.
  • The size of the child disk has a direct impact on the length of time it takes to delete the snapshot associated to the child disk.
These Knowledge Base articles touch on the topic of child disks and disk usage:

The disk chain

Generally, when you create a snapshot for the first time, the first child disk is created from the parent disk. Successive snapshots generate new child disks from the last child disk on the chain. The relationship can change if you have multiple branches in the snapshot chain.
This diagram is an example of a snapshot chain. Each square represents a block of data or a grain as described above:
 
Caution: Manually manipulating the individual child disks or any of the snapshot configuration files may compromise the disk chain. VMware does not recommend manually modifying the disk chain as it may result in data loss. For more information, seeConsolidating snapshots (1007849).

Additional Information

  • To determine if a virtual machine is running on snapshots, see Determining if a virtual machine is using snapshots (1004343). 
  • There are specific considerations when hosting a Microsoft Active Directory controller in a virtual environment. For a full list of considerations, see Microsoft Knowledge Base article 888794.

    Note: The preceding link was valid as of August 1, 2012. If you find the link to be broken, provide feedback on the article and a VMware employee will update the article as necessary.
  • Time-sensitive applications may be impacted by reverting to a previous snapshot. Reverting the snapshot will revert the virtual machine to the point in time when the snapshot was created. This includes any operations conducted by the time-sensitive service or application in the guest operating system. 
  • Reverting virtual machines to a snapshot causes all settings configured in the guest operating system since that snapshot to be reverted. The configuration which is reverted includes, but is not limited to, previous IP addresses, DNS names, UUIDs, guest OS patch versions, etc.

See Also

Source:-
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180

No comments:

Post a Comment