Latest Posts



Translate

Total Pageviews

Monday, 28 January 2013

Determining if a High Availability Virtual Machine Monitoring event caused a virtual machine to reboot


Symptoms

  • A virtual machine running on an ESX/ESXi host appears to have been rebooted unexpectedly
  • The ESX/ESXi host where the virtual machine is registered is a part of an VMware High Availability (HA) cluster in vCenter Server and Virtual Machine Monitoring is enabled on the HA cluster.


Purpose

It can be difficult to determine the cause of an unexpected virtual machine reboot. This article provides information about the logs and events to look for to determine if the reboot was caused by HA Virtual Machine Monitoring.
Provides a workaround for disabling Virtual Machine monitoring.

Resolution

To look for events related to the virtual machine in vCenter Server:
  1. Select the virtual machine in vCenter Server.
  2. Click the Tasks & Events tab.
  3. Click the Events button.
  4. Enter the word reset in the search field.
  5. If HA Virtual Machine Monitoring was responsible for resetting the virtual machine, an Event similar to this displays:

    This virtual machine reset by HA. Reason: VMware Tools heartbeart failure. A screenshot is saved at /vmfs//volumes/4c4850e4-0dcce710-28d9-00215-a5d36b8/wdcsmdc2/wdcsmdc2-screenshot-0.png

    Example:



    Note: The reason in the Description field may differ slightly, but it always states that the virtual machine was reset by HA if Virtual Machine Monitoring is involved.
On the ESX/ESXi host, two logs can help identify Virtual Machine Monitoring as the source of the virtual machine reboot. If Virtual Machine Monitoring is the cause:
  • The /var/log/vmware/hostd.log file contains entries similar to:

    [2010-09-14 16:13:09.094 F5760B90 verbose 'vm:/vmfs/volumes/4c4850e4-0dcce710-28d9-00215a5d36b8/wdcsmdc2/wdcsmdc2.vmx'] Updating current heartbeatStatus: red
    [2010-09-14 16:14:46.969 F5B73B90 verbose 'vm:/vmfs/volumes/4c4850e4-0dcce710-28d9-00215a5d36b8/wdcsmdc2/wdcsmdc2.vmx' opID=task-internal-977-4429b01] Reset request received
  • The vmware.log for the affected virtual machine (/vmfs/volumes/<datastore>/<VM directory>/vmware.log), contains entries similar to:

    Sep 14 16:14:47.004: vmx| Vix: [104333 vmxCommands.c:392]: VMAutomation_Reset
    Sep 14 16:14:47.016: vmx| Vix: [104333 vmxCommands.c:457]: VMAutomation_Reset. Trying hard reset
    Sep 14 16:14:47.017: vmx|
    Sep 14 16:14:47.017: vmx|
    Sep 14 16:14:47.017: vmx| VMXRequestReset
    Sep 14 16:14:47.018: vmx| Stopping VCPU threads...
Note: Knowing the time of the unexpected reboot helps in searching the log files, as you can look for the time stamp.
If any of the relevant log entries or the vCenter Server event are present, HA Virtual Machine Monitoring restarted the virtual machine as it was not receiving virtual machine heartbeats (via VMware Tools) and there was no I/O on the virtual machine.
Examine the logs within the guest operating system to help determine the cause of the event. If multiple virtual machines are affected, review the vmkernel, messages, and hostd logs on the ESX host where the virtual machines are registered to look for a system wide problem.
To workaround this issue, disable Virtual Machine Monitoring:
  • Log in to the vCenter Server using the vSphere Client.
  • Right-click on VMware HA Cluster and choose Edit Settings
  • In the Cluster Settings dialog box, Select VM Monitoring
  • VM Monitoring StatusVM Monitoring drop down box, select Disabled
  • Click OK
Note: This feature can also be disabled on the VMware HA page of the New Cluster Wizard by deselecting Enable Host Monitoring.
Source:-
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1027734