Virtualization The Future: November 2013

Tuesday, 26 November 2013

The Management Network vSwitch is deleted on the ESXi host (1010992)

Purpose

This article provides steps for troubleshooting a situation where the Management Network vSwitch is deleted on the ESXi host.

Resolution

You must restore the network settings to the default settings.

Note: This procedure requires you to re-register virtual machines, recreate VMkernel ports, and vSwitches.

To restore your network work settings:

Use the Direct Connect UI (DCUI) to connect to ESXi host.
Click Reset System Configuration.
Reboot the ESXi host.
Enter the networking information. For more information, see Configuring the ESXi Management Network from the direct console (1006710).
Do a test ping using the DCUI. If successful, you can access the ESXi host using the VI Client.
Re-register virtual machines and recreate your vSwitches. For more information see, the ESXi documentation.

Source:-

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1010992

Friday, 22 November 2013

Best practices for joining vCenter Servers in Linked Mode (2005481)

Purpose

This article provides best practices when working with vCenter Server Linked Mode, as well as steps to troubleshoot vCenter Server Linked Mode issues.

Resolution

Best practices

When working with vCenter Server Linked Mode issue, follow these best practices:

If the vCenter Server is joined to a domain, ensure that it can communicate with the Domain Controller. If Domain Controller communication problems exist, remove and add the vCenter Server to the Windows domain.
Ensure that all vCenter Server system times are synchronized with a time difference of no greater than 5 minutes.
Ensure that all vCenter Server are the same version and build. For more information, see Cannot access instances of vCenter Server in Linked Mode configuration after upgrading to vCenter Server 4.1 (1026346).
Ensure that the VirtualCenter Server Service uses an account with rights to logon as a service/batch job.
The vCenter Server Linked Mode Configuration tool must be run by a domain user that is also a local administrator on both machines where vCenter Server is installed.
Different Windows domains for vCenter Servers are permitted only if there is a two-way trust between the two domains. Ensure this is true from both Windows domains.
If User Account Control (UAC) is enabled, be sure to use Run as administrator when starting the vCenter Server Linked Mode Configuration tool.
Ensure that the vCenter Server Windows machine name matches the Domain/DNS name.

Note: Instancename, VimWebServicesUrl, and VimApiUrl keys must match. For more information, see ESX and vCenter Server Installation Guide.
Ensure that the Windows firewall service is running but the firewall is turned off.

Verifying the initial replication

The Jointool/vCenter Server installer does a large set of checks to validate initial replication between instances. Issues with joining two instances are usually due to errors in initial replication. However, after a successful join (especially with more than two total instances in the vCenter Server linked mode group), some instances may not see all instances in the group.

To see if ADAM replication is the issue, perform these steps on all concerned vCenter Server machines:

Click Start > Administrative Tools > ADSI Edit.
Right-click ADSI Edit in the left pane and click Connect to.
Under Connection Point in the Distinguished Name box, enter dc=virtualcenter,dc=vmware,dc=int.
Under Computer in the domain or server box, enter localhost:389, then click OK. This opens up a new connection to our application partition in ADAM.
Expand Default naming context and drill down clicking the OU=Instances container on the left pane. You see entries (GUIDs) under OU=Instances for the vCenter Servers in your setup.

This list should be identical on every replica (and the primary). It does guarantee that replication will continue to succeed, but it does indicate that initial replication during installation was successful.

Verifying the Health service status

To verify the Health service status for the LDAP Replication Monitor, install the service-monitoring vSphere Client plugin as part of all vCenter Server installs:

In vSphere Client, click Home.
Click vCenter Server Service Status in the Administration section.

Note: If you do not see vCenter Service Status, you have to enable the plugin by clicking Plug-ins > Manage plug-ins.

Troubleshooting replication issues

To troubleshoot replication issues:

Click Start > Administrative Tools > Event Viewer.
- Review the Event Viewer Log entries for related ADAM instance (VMwareVCMSDS or something similar) events. Record any warning or error messages you find.
- Example warning messages involving replication are often explicit. For example:
  
  8453 Replication access was denied.1772 The list of RPC servers available for the binding of auto handles has been exhausted.
  Note: This error is often a symptom of firewalls blocking ports (RPC mapper runs on port 135, and needs ports > 1024 to be open on the machine).
Run Knowledge Consistency Check (KCC) from the command line to confirm replication is the problem. Run KCC on the replica machine:
- C:\Windows\ADAM\repadmin.exe /kcc localhost:389 (to confirm local consistency)
- C:\Windows\ADAM\repadmin.exe /kcc remoteVCFQDNremotePort (to confirm remote primary consistency)
  
  If either of these return an error, inform VMware if you open a Support Request.
Forcing replication can help diagnose issues. To force replication between ADAM instances:

C:\WINDOWS\ADAM>repadmin /replicate remote-vc:remote-vc-adam-port local-vc-fqdn:local-adam-port dc=virtualcenter,dc=vmware,dc=int

This is an example of successful replication:

C:\WINDOWS\ADAM>repadmin /replicate vm08.PDPVC.com:389 vm04.PDPVC.com:389 dc=virtualcenter,dc=vmware,dc=int Positive response:
Sync from vm04.PDPVC.com:389 to vm08.PDPVC.com:389 completed successfully.

This is an example of failed replication:
DsBindWithCred to vm04.pdpvc.com failed with status 1753 (0x6d9):There are no more endpoints available from the endpoint mapper
To verify inbound and outbound replication from one machine, run the command:

repadmin /syncall localhost:vc-ldap-port
Run directory service tests with dcdiag. This runs a comprehensive list of tests to help diagnose what may have failed with the replication (such as name resolution and or referrals):

(c:\windows\adam or c:\windows\system32) dc diag /s:localhost:vc-ldap-port

Source:-

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2005481

Tuesday, 19 November 2013

vMA User Account Privileges

Account Privileges for vCLI Usage lists the privileges that the different user accounts have for vCLI usage against different targets.

Account Privileges for vCLI Usage

Target

Authentication Policy

vi-admin

vi-user

domain user

ESXi

fpauth

Y

Y

N

ESXi

adauth

Y

N

Y

vCenter Server

fpauth

Y

N

N

vCenter Server

adauth

Y

N

Y

Account Privileges for vCLI Usage
Target	Authentication Policy	vi-admin	vi-user	domain user
ESXi	fpauth	Y	Y	N
ESXi	adauth	Y	N	Y
vCenter Server	fpauth	Y	N	N
vCenter Server	adauth	Y	N	Y

Thanks to Vmware Documentation

Enable the vi-user Account

As part of configuration, vMA creates a vi-user account with no password. However, you cannot use the vi-user account until you have specified a vi‑user password.


Important The vi-user account has limited privileges on the target ESXi hosts and cannot run any commands that require sudo execution. You cannot use vi-user to run commands for Active Directory targets (ESXi or vCenter Server). To run commands for the Active Directory targets, use the vi-admin user or log in as an Active Directory user to vMA.

To enable the vi-user account

1	Log in to vMA as vi‑admin.

2	Run the Linux passwd command for vi-user as follows:

sudo passwd vi-user

If this is the first time you use sudo on vMA, a message about root user privileges appears, and you are prompted for the vi-admin password.

3	Specify the vi-admin password.

4	When prompted, type and confirm the password for vi-user.

After the vi-user account is enabled on vMA, it has normal privileges on vMA but is not in the sudoers list.

When you add ESXi target servers, vMA creates two users on each target:

■	vi-admin has administrative privileges on the target system.

■	vi-user has read-only privileges on the target system. vMA creates vi-user on each target that you add, even if vi-user is not currently enabled on vMA.

When a user is logged in to vMA as vi-user, vMA uses that account on target ESXi hosts, and the user can run only commands on target ESXi hosts that do not require administrative privileges.

Thanks to Vmware Documentation

Sunday, 17 November 2013

Metrics and Thresholds

Display Metric Threshold Explanation

CPU %RDY 10 Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs

CPU %CSTP 3 Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.

CPU %SYS 20 The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause

CPU %MLMTD 0 The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU.

CPU %SWPWT 5 VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.

MEM MCTLSZ 1 If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.

MEM SWCUR 1 If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.

MEM SWR/s 1 If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.

MEM SWW/s 1 If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.

MEM CACHEUSD 0 If larger than 0 host has compressed memory. Possible cause: Memory overcommitment.

MEM ZIP/s 0 If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment.

MEM UNZIP/s 0 If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.

MEM N%L 80 If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used.

NETWORK %DRPTX 1 Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization

NETWORK %DRPRX 1 Dropped packets received, hardware overworked. Possible cause: very high network utilization

DISK GAVG 25 Look at “DAVG” and “KAVG” as the sum of both is GAVG.

DISK DAVG 25 Disk latency most likely to be caused by array.

DISK KAVG 2 Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.

DISK QUED 1 Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.

DISK ABRTS/s 1 Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.

DISK RESETS/s 1 The number of commands reset per second.

DISK CONS/s 20 SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS.

Check the Default Server Name with whom you are connected in PowerCLI

Command to Check this in Power CLI is:-
$global:DefaultVIServers | %{$_.Name}

Saturday, 16 November 2013

Advanced Memory Attributes

Advanced Memory Attributes
Attribute	Description	Default
Mem.SamplePeriod	Specifies the periodic time interval, measured in seconds of the virtual machine’s execution time, over which memory activity is monitored to estimate working set sizes.	60
Mem.BalancePeriod	Specifies the periodic time interval, in seconds, for automatic memory reallocations. Significant changes in the amount of free memory also trigger reallocations.	15
Mem.IdleTax	Specifies the idle memory tax rate, as a percentage. This tax effectively charges virtual machines more for idle memory than for memory they are actively using. A tax rate of 0 percent defines an allocation policy that ignores working sets and allocates memory strictly based on shares. A high tax rate results in an allocation policy that allows idle memory to be reallocated away from virtual machines that are unproductively hoarding it.	75
Mem.ShareScanGHz	Specifies the maximum amount of memory pages to scan (per second) for page sharing opportunities for each GHz of available host CPU resource. For example, defaults to 4 MB/sec per 1 GHz.	4
Mem.ShareScanTime	Specifies the time, in minutes, within which an entire virtual machine is scanned for page sharing opportunities. Defaults to 60 minutes.	60
Mem.CtlMaxPercent	Limits the maximum amount of memory reclaimed from any virtual machine using the memory balloon driver (vmmemctl), based on a percentage of its configured memory size. Specify 0 to disable reclamation for all virtual machines.	65
Mem.AllocGuestLargePage	Enables backing of guest large pages with host large pages. Reduces TLB misses and improves performance in server workloads that use guest large pages. 0=disable.	1
Mem.AllocUsePSharePool and Mem.AllocUseGuestPool	Reduces memory fragmentation by improving the probability of backing guest large pages with host large pages. If host memory is fragmented, the availability of host large pages is reduced. 0 = disable.	15
Mem.MemZipEnable	Enables memory compression for the host. 0 = disable.	1
Mem.MemZipMaxPct	Specifies the maximum size of the compression cache in terms of the maximum percentage of each virtual machine's memory that can be stored as compressed memory.	10
LPage.LPageDefragEnable	Enables large page defragmentation. 0 = disable.	1
LPage.LPageDefragRateVM	Maximum number of large page defragmentation attempts per second per virtual machine. Accepted values range from 1 to 1024.	32
LPage.LPageDefragRateTotal	Maximum number of large page defragmentation attempts per second. Accepted values range from 1 to 10240.	256
LPage.LPageAlwaysTryForNPT	Try to allocate large pages for nested page tables (called 'RVI' by AMD or 'EPT' by Intel). If you enable this option, all guest memory is backed with large pages in machines that use nested page tables (for example, AMD Barcelona). If NPT is not available, only some portion of guest memory is backed with large pages. 0= disable.	1

vNUMA is disabled if VCPU hotplug is enabled (2040375)

Details

If virtual NUMA is configured with VCPU hotplug settings, the virtual machine will be started without virtual NUMA and instead it will use Uniform Memory Access with interleaved memory access. The virtual machine log displays the message:

vmware.log> vmx| W110: NUMA and VCPU hot add are incompatible. Forcing UMA

Solution

None. If you do not plan to use VCPU hotplug, do not enable it. Add the maximum VCPUs that might be needed by the workload.

Source:-

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2040375

NPIV Capabilities and Limitations

■	NPIV supports vMotion. When you use vMotion to migrate a virtual machine it retains the assigned WWN. If you migrate an NPIV-enabled virtual machine to a host that does not support NPIV, VMkernel reverts to using a physical HBA to route the I/O.
■	If your FC SAN environment supports concurrent I/O on the disks from an active-active array, the concurrent I/O to two different NPIV ports is also supported.

When you use ESXi with NPIV, the following limitations apply:

■	Because the NPIV technology is an extension to the FC protocol, it requires an FC switch and does not work on the direct attached FC disks.
■	When you clone a virtual machine or template with a WWN assigned to it, the clones do not retain the WWN.
■	NPIV does not support Storage vMotion.
■	Disabling and then re-enabling the NPIV capability on an FC switch while virtual machines are running can cause an FC link to fail and I/O to stop.

Virtualization The Future

Pages

Translate

Total Pageviews

My YouTube Channel

Tuesday, 26 November 2013

The Management Network vSwitch is deleted on the ESXi host (1010992)

Purpose

Resolution

Friday, 22 November 2013

Best practices for joining vCenter Servers in Linked Mode (2005481)

Purpose

Resolution

Best practices

Verifying the initial replication

Verifying the Health service status

Troubleshooting replication issues

Tuesday, 19 November 2013

vMA User Account Privileges

Enable the vi-user Account

Sunday, 17 November 2013

Metrics and Thresholds

Check the Default Server Name with whom you are connected in PowerCLI

Saturday, 16 November 2013

Advanced Memory Attributes

vNUMA is disabled if VCPU hotplug is enabled (2040375)

Details

Solution

NPIV Capabilities and Limitations

Display	Metric	Threshold	Explanation
CPU	%RDY	10	Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs
CPU	%CSTP	3	Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU	%SYS	20	The percentage of time spent by system services on behalf of the world. Most likely caused by high IO VM. Check other metrics and VM for possible root cause
CPU	%MLMTD	0	The percentage of time the vCPU was ready to run but deliberately wasn’t scheduled because that would violate the “CPU limit” settings. If larger than 0 the world is being throttled due to the limit on CPU.
CPU	%SWPWT	5	VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.
MEM	MCTLSZ	1	If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEM	SWCUR	1	If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.
MEM	SWR/s	1	If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEM	SWW/s	1	If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEM	CACHEUSD	0	If larger than 0 host has compressed memory. Possible cause: Memory overcommitment.
MEM	ZIP/s	0	If larger than 0 host is actively compressing memory. Possible cause: Memory overcommitment.
MEM	UNZIP/s	0	If larger than 0 host has accessing compressed memory. Possible cause: Previously host was overcommited on memory.
MEM	N%L	80	If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. Check “GST_ND(X)” to find out which NUMA nodes are used.
NETWORK	%DRPTX	1	Dropped packets transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK	%DRPRX	1	Dropped packets received, hardware overworked. Possible cause: very high network utilization
DISK	GAVG	25	Look at “DAVG” and “KAVG” as the sum of both is GAVG.
DISK	DAVG	25	Disk latency most likely to be caused by array.
DISK	KAVG	2	Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.
DISK	QUED	1	Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.
DISK	ABRTS/s	1	Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISK	RESETS/s	1	The number of commands reset per second.
DISK	CONS/s	20	SCSI Reservation Conflicts per second. If many SCSI Reservation Conflicts occur performance could be degraded due to the lock on the VMFS.