Translate

Total Pageviews

My YouTube Channel

Sunday, 20 September 2015

Advanced configuration options for VMware High Availability in vSphere 5.x and 6.0 (2033250)

Purpose

In the majority of environments, VMware High Availability (HA) default settings do not need to be changed. However, depending on your specific environment you may need to modify some HA options. 

This article describes the different configuration options available and how to apply them.
 
Source:-
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2033250&src=vmw_so_vex_ragga_1012

Resolution

Note: Not all configuration variables work in all versions of vCenter Server. As new variables are introduced in newer releases, they remain throughout later versions.

Applying a VMware HA customization

Using the vSphere Web Client
  1. Log in to VMware vSphere Web Client.
  2. Click Home > vCenter > Clusters.
  3. Under Object click on the cluster you want to modify.
  4. Click Manage.
  5. Click vSphere HA.
  6. Click Edit.
  7. Click Advanced Options.
  8. Click Add and enter in Option and Value fields as appropriate (see below).
  9. Deselect Turn ON vSphere HA.
  10. Click OK.
  11. Wait for HA to unconfigure, click Edit and check Turn ON vSphere HA.
  12. Click OK and wait for the cluster to reconfigure.
Using the vSphere Client
  1. Log in to vCenter Server with vSphere Client as an administrator.
  2. Right-click the Cluster in the Inventory and click Edit Settings.
  3. Click VMware HA.
  4. Click the Advanced Options button.
  5. Enter Option and Value fields as appropriate (see below).
  6. Click OK.
  7. Click OK again.
  8. Wait for the Reconfigure Cluster task to complete and right-click the Cluster again from the Inventory.
  9. Click Properties.
  10. Disable VMware HA and wait for the Reconfiguration Cluster task(s) to complete.
  11. Right-click the cluster and Enable VMware HA to have the settings take effect.

    Note: See below if reconfiguration of the hosts is necessary.
There are three types of HA advanced options and each is set in a different way.
  • vCenter Server options (VC) -- these options are configured at the vCenter Server level and apply to all HA clusters unless overridden by cluster-specific options in cases where such options exist. If the vCenter Server options are configured using the vCenter Server options manager, a vCenter Server restart may not be required -- see the specific options for details. But if these options are configured by adding the option string to the vpxd.cfg file (as a child of the config/vpxd/das tag), a restart is required.
  • Cluster options (cluster) -- these options are configured for an individual cluster and if they impact the behavior of the HA Agent (FDM), they apply to all instances of FDM in that cluster. These options are configured by using the HA cluster-level advanced options mechanism, either via the UI or the API. Options with names starting with "das.config." can also be applied using the "fdm options" mechanism below, but this is not recommended because the options should be equally applied to all FDM instances.
  • fdm options (fdm) -- these options are configured for an individual FDM instance on a host. They are configured by adding the option to the/etc/opt/vmware/fdm/fdm.cfg file of the host as a child of the config/fdm tag. Options set in this way are lost when fdm is uninstalled (for example if the host is removed from vCenter Server and then re-added) or if the host is managed by Auto Deploy and is rebooted.

Common Options

Version
Name
Description
Reconfiguration
Type of Option
Cluster Configuration
5.0, 5.1, 5.5
das.allowNetworkX
Allows you to specify the specific management networks used by HA, where X is a number between 0 and 9. For example if you set a value to ʺManagement Networkʺ, only the networks associated with port groups having this name are used. Ensure that all hosts are configured with the named port group and the networks are compatible. In 5.5, this option is ignored if vSAN is enabled for the cluster.
Yes. Reconfigure HA on all hosts to have the specification take effect.
Cluster
5.0, 5.1, 5.5
das.ignoreRedundantNetWarning
HA will report a config issue on a host if the host is not configured with redundant networks for the networks used by HA. Prior to 5.5, HA only uses management networks, while in 5.5, if vSAN is enabled, HA will use the networks configured for vSAN. Valid values are true/false. Set to true to suppress the config issue. False is assumed if the option is not set. 
Yes. Reconfigure HA on a host to have the config issue for that host cleared.
Cluster
5.0, 5.1, 5.5, 6.0
das.heartbeatDsPerHost
HA chooses by default 2 heartbeat datastores for each host in an HA cluster. This option can be used to increase the number to a value in the range of 2 to 5 inclusive.
Yes. Reconfigure HA on all hosts in the cluster.
Cluster
5.0, 5.1, 5.5, 6.0
das.ignoreInsufficientHbDatastore
HA will report a host config issue if it was not able to select the required number of datastores for a host given by das.heartbeatDsPerHost. Set this option to true to suppress this warning, and false to enable it. A value of false is assumed if the option is not set.
Yes. Reconfigure HA on all hosts in the cluster.
Cluster
5.0, 5.1, 5.5
das.includeFTcomplianceChecks
Whether to check the cluster for compliance with Fault Tolerance as part of the cluster profile compliance check. Set this option to false if you don't plan to use FT in the cluster. A value of true enables the checks. If unset, a value of true is assumed.
No
Cluster
Admission Control
5.0, 5.1, 5.5
das.vmMemoryMinMB
Value in MB to use for the memory reservation of a virtual machine if no non-zero memory reservation is set by a user. 0 is assumed if the option is not set.
No
Cluster
5.0, 5.1, 5.5
das.vmCpuMinMHz
Value in MHz to use for the CPU reservation of a virtual machine if no non-zero CPU reservation are set by a user. 32 is assumed if the option is not set.
No
Cluster
5.0, 5.1, 5.5, 6.0
das.slotCpuInMHz
Maximum value in MHz to use for CPU component of the slot size. No limit is imposed if the option is not set. In 5.1, the CPU component of the slot size can be exactly specified in the UI and the API (see the vim.cluster.slotPolicy object). Note that this option and the UI/API behave differently -- this option sets a max while the UI/API sets the exact value. If a slot policy is defined and this option is specified, the value specified by this option is ignored.
No
Cluster
5.0, 5.1, 5.5, 6.0
das.slotMemInMB
Maximum value in MB to use for memory component of the slot size. No limit is imposed if the option is not set. In 5.1, the memory component of the slot size can be exactly specified in the UI and the API (see the vim.cluster.slotPolicy object). Note that this option and the UI/API behave differently -- this option sets a max while the UI/API sets the exact value. If a slot policy is defined and this option is specified, the value specified by this option is ignored.
No
Cluster
6.0das.config.fdm.memreservationmb By default vSphere HA agents run with a configured memory limit of 250 MB. A host might not allow this reservation if it runs out of reservable capacity. You can use this advanced option to lower the memory limit to avoid this issue. Only integers greater than 100, which is the minimum value, can be specified. Conversely, to prevent problems during master agent elections in a large cluster (containing 6,000 to 8,000 VMs) you should raise this limit to 325 MB. 

Note: Once this limit is changed, for all hosts in the cluster you must run the Reconfigure HA task. Also, when a new host is added to the cluster or an existing host is rebooted, this task should be performed on those hosts in order to update this memory setting.
NoCluster
Restarting virtual machines
5.0, 5.1, 5.5
das.maxvmrestartcount
The maximum number of times an FDM master will try to restart a virtual machine before giving up. Five attempts are made if this option is unset. This limit only applies if the time since the first restart attempt was made is less than das.maxvmrestartperiod. Note that FT secondary virtual machine restarts are governed by the separate parameter, das.maxftvmrestartcount.
Warning: Setting this value to a very high number creates a large amount of extra logging which can have an impact on your system log directories.
No
Cluster
5.0, 5.1, 5.5
das.maxvmrestartperiod
The maximum amount of time (in seconds) during which a FDM master will attempt to restart a virtual machine after the first restart attempt failed. The time is measured from when the FDM master first tried to restart the virtual machine. This time limit takes precedence over das.maxvmrestartcount. No time limit is imposed if this option is unset.
No
Cluster
5.0, 5.1, 5.5
das.maxftvmrestartcount
The maximum number of times an FDM master will try to start a secondary virtual machine for an FT virtual machine pair before giving up. Five attempts are made if this option is unset.
Warning: Setting this value to a very high number creates a large amount of extra logging which can have an impact on your system log directories
No
Cluster
5.0U1, 5.1, 5.5
das.maskCleanShutdownEnabled
When a virtual machine powers off and its home datastore is not accessible, HA cannot determine whether the virtual machine should be restarted. So, it must make a decision. If this option is set to false, the responding FDM master will assume the virtual machine should not be restarted, while if this option is set to true, the responding FDM will assume the virtual machine should be restarted. If the option is unset in 5.0U1, a value of false is assumed, whereas in ESXi 5.1 and later, a value of true is assumed.
No
Cluster
5.5, 6.0das.respectVmVmAntiAffinityRulesRespect vm-vm anti-affinity rules when restarting virtual machines after a failure. The valid values are "false" (default) and "true"NoCluster
6.0das.maxresetsThe maximum number of reset attempts made by VMCP. If a reset operation on a virtual machine affected by an APD situation fails, VMCP retries the reset this many times before giving up.NoCluster
6.0das.maxterminatesThe maximum number of retries made by VMCP for virtual machine termination.NoCluster
6.0das.terminateretryintervalsecIf VMCP fails to terminate a virtual machine, this is the number of seconds the system waits before it retries a terminate attempt.NoCluster
Isolation Response
5.0, 5.1, 5.5, 6.0
das.isolationAddressX
IP addresses an FDM agent uses to check for isolation when no agent network traffic is observed on the network(*) used by HA, where X = 0-9. HA will use the default management-network gateway as an isolation address by default plus those specified by this advanced option as additional addresses to check. We recommend adding an isolation address for each management network used by HA.(*) Prior to 5.5, HA uses only the management network, but in 5.5 when vSAN is also enabled on the cluster, HA will use the vSAN network for inter-agent communication.
No
Cluster
5.0, 5.1, 5.5, 6.0
das.useDefaultIsolationAddress
Whether the default isolation address (gateway of management network) should be used when determining if a host is network isolated. Valid values are true/false. By default, the management network default gateway is used. If the default gateway is a non-pingable address, set the “das.isolationaddressX” to a pingable address and disable the usage of the default gateway by setting this option to “false”.
No
Cluster
5.1, 5.5
das.config.fdm.isolationPolicyDelaySec
The number of seconds an FDM agent waits before executing the isolation policy once it has determined that the host is isolated. The minimum value is 30. If set to a value less than 30, the delay is 30 seconds.
No
Cluster
5.0, 5.1, 5.5, 6.0
das.isolationShutdownTimeout
The number of seconds an FDM waits for a virtual machine to power off after initiating a guest shutdown before the FDM issues a power off. If the option is unset, 300s is used.
No
Cluster
 6.0
fdm.isolationpolicydelaysec
The number of seconds system waits before executing the isolation policy once it is determined that a host is isolated. The minimum value is 30. If set to a value less than 30, the delay will be 30 seconds.
No
Cluster
6.0das.config.fdm.reportfailoverfaileventWhen set to 1, enables generation of a detailed per-VM event when an attempt by vSphere HA to restart a virtual machine is unsuccessful. Default value is 0. In versions earlier than vSphere 6.0, this event is generated by default.NoCluster
Virtual machine/App Monitoring
5.0, 5.1, 5.5, 6.0
das.iostatsInterval
If an FDM detects that a sufficient number of VMtools heartbeats are missing to trigger a virtual machine's configured virtual machine/App monitoring policy, the FDM checks if any I/O have been issued in the last ioStatsInterval, and will only reset the virtual machine if no I/O occurred in this interval. Values of 0 or greater are valid. 120s is assumed if the option is unset.
No
Cluster
Fault Tolerance
5.0, 5.1, 5.5
das.maxFtVmsPerHost
Specifies the number of Fault Tolerance virtual machines that can be run on a host at one time. If unset, a value of 4 is used. A value of -1 or 0 disables the limit. The limit is enforced by vCenter Server when executing user initiated power ons and vmotions, and by DRS when doing initial placement and load balancing. HA does not enforce this limit to maximize uptime. DRS does not correct any violations of this limit.
No
Cluster
Logging
5.0, 5.1, 5.5
das.config.log.maxFileNum
Controls the number of FDM log-file rotations retained by the FDM file-based logger. The file-based logger is used by default only by the FDM when running on ESX versions earlier than ESX 5.0. If you wish to change the number of log-file rotations maintained for a pre ESX 5.0 host, set this option to the desired number of log files. For ESX 5.0 and later hosts, the FDM logs to syslog by default and so you need to use the syslog configuration mechanism to change the amount of retained logging history. However, it is possible to enable the file-based logger for ESXi 5.0 and later hosts hosts also. To do so, set this option to a valid value. If you are using vSphere 5.0 Update 1 or later, you must also set the option das.config.log.outputToFiles to true. For all ESX versions, setting the option das.config.log.maxFileNum to 1 will disable the log-file rotations. The location of log files can be changed using the option das.config.log.directory.
Yes
Cluster
5.0, 5.1, 5.5
das.config.log.maxFileSize
Controls the size of each log file written out by the FDM file-based logger. Files are 1 MB in size unless this option is specified. This option is used in conjunction with das.config.log.maxFileNum to control the log history.
Yes
Cluster
5.0, 5.1, 5.5
das.config.log.level
Controls the amount of information recorded in the logs based on severity levels None, Warning, Info, Verbose, and Trivia.
Yes
Cluster

Less Common Options


Caution: These options have a range of subtle effects and should not be used in production environments unless directed by VMware Support.
VersionNameDescriptionReconfigurationType of Option

Cluster Configuration

5.0, 5.1, 5.5vpxd.das.aamMemoryLimitMemory limit in MB for the resource pool used by HA (the aam resource pool). If unspecified, 100 MB is used. Value applies to all clusters in the vCenter Server inventory.Yes. HA must be reconfigured on all hosts for which the change is required.VC
5.0, 5.1, 5.5vpxd.das.electionWaitTimeSecHow long does vCenter Server wait in seconds after sending the host list to a new host for vCenter Server to learn the outcome of the election. A timeout exception is thrown if the host is not a master or connected slave by the timeout. If not specified, a value of 120 seconds is used. The value can not exceed 2000 as it causes failures of HA.No. Applied the next time a FDM is configured.VC
5.0, 5.1, 5.5fdm.nodeGoodnessWhen a master election is held, the FDMs exchange a goodness value, and the FDM with the largest goodness value is elected master. Ties are broken using the host IDs assigned by vCenter Server. This parameter can be used to override the computed goodness value for a given FDM. To force a specific host to be elected master each time an election is held and the host is active, set this option to a large positive value. This option should not be specified at the cluster level.No. The new goodness value is used in the next election.fdm
5.0, 5.1, 5.5vpxd.das.sendProtectListIntervalSecMinimum time (in seconds) between consecutive calls by vCenter Server to the HA master agent (it is in contact with) to request that it protect a new virtual machine. If not specified, 60s is used. This option also controls how frequently vCenter Server sends the master updates to the virtual machine to host compatibility information for virtual machines that are powered on when their compatibility with hosts changes.Yes. vCenter Server needs to be restarted after setting this option.VC
5.5fdm.cluster.vsanDatastoreLockDelayThe delay (in seconds) before the vsan datastore object is  "acquired".  Failover of virtual machines on a datastore do not take place until the vSan datastore has been acquired by the Master.  The delay gives time for the isolated or partitioned slave to communicate its powered on virtual machines to avoid duplicate power ons. The default is to wait 30 seconds, and only if there are heartbeat datastores defined.No. The value is read when the master is elected.fdm

Admission Control

5.0, 5.1, 5.5vpxd.das.slotMemMinMBvCenter Server-wide default value in MB to use for memory reservation if no memory reservation is specified for a virtual machine. Setting the cluster option das.vmMemoryMinMB for a cluster will override this value for that cluster. If this option is not set, a value of zero is assumed unless overridden by das.vmMemoryMinMB.No. The value is taken into account the next time admission control is done.VC
5.0, 5.1, 5.5vpxd.das.slotCpuMinMHzvCenter Server-wide default value in MHz to use for cpureservation if no CPU reservation is specified for a virtual machine. Setting the cluster option das.vmCPUinMHz for a cluster will override this value for that cluster. If this option is not set, a value of 32 is assumed unless overridden by das.vmCPUinMHz.No. The value is taken into account the next time admission control is done.VC
6.0vpxd.das.completemetadataupdateintervalsecThe period of time (seconds) after a VM-Host affinity rule is set during which vSphere HA can restart a VM in a DRSdisabled cluster, overriding the rule. Default value is 300 seconds.NoVC

Detecting Failures

5.0, 5.1, 5.5das.config.fdm.hostTimeoutControls the time in seconds a master FDM waits in seconds for a slave FDM to respond to a heartbeat before declaring the slave host is not connected and initiating the work flow to determine whether the host is dead, isolated, or partitioned. If not specified, 10s is used.Yes. Reconfigure HA on all hosts.Cluster
5.0, 5.1, 5.5fdm.deadIcmpPingIntervalICPM pings are used to determine whether a slave host is network accessible when the FDM on that host is not connected to the master. This option controls the interval (expressed in seconds) between pings. If not specified, 10s is used.In ESXi 5.0, after making a change, HA must be reconfigured on all hosts in the cluster. In 5.1 and later, NoCluster
5.0, 5.1, 5.5das.config.fdm.icmpPingTimeoutDefines the time an FDM waits in seconds for an ICMP ping reply before assuming the host being pinged is not network accessible. If not specified, 5s is used.In ESXi 5.0, after making a change, HA must be reconfigured on all hosts in the cluster. In 5.1 and later, NoCluster
5.0, 5.1, 5.5vpxd.das.heartbeatPanicMaxTimeoutThis option impacts how long it takes for a host impacted by a PSOD to release file locks and hence allow HA to restart virtual machines that were running on it. If not specified, 60s is used. HA sets the host Misc.HeartbeatPanicTimeout advanced option to the value of this HA option. The HA option is in seconds.Yes, after setting the option, HA needs to be reconfigured on all hosts in all HA clusters.VC

Restarting virtual machines

5.0, 5.1, 5.5das.config.fdm.policy.unknownStateMonitorPeriodDefines the number of seconds the HA master agent waits after it detects that a virtual machine has failed before it attempts to restart the virtual machine. If not specified, 10s is used.NoCluster
5.0, 5.1, 5.5das.perHostConcurrentFailoversLimitThe number of concurrent failovers a given FDM will have in progress at one time. Setting a larger value will allow more virtual machines to be restarted concurrently but will also increase the average latency to power each on since a greater number adds more stress on the hosts and storage. The default value is 32. This value was determined empirically to provide the minimum overall latency.NoCluster

Virtual machine operation coordination

5.0, 5.1, 5.5das.config.fdm.ft.cleanupTimeoutWhen a vSphere Fault Tolerance virtual machine is powered on by vCenter Server, vCenter Server informs the HA master agent that it is doing so. This option controls how many seconds the HA master agent waits for the power on of the secondary virtual machine to succeed. If the power on takes longer than this time (most likely because vCenter Server has lost contact with the host or has failed), the master agent will attempt to power on the secondary virtual machine. If the option is not specified, 900s is used.NoCluster
5.0, 5.1, 5.5das.config.fdm.storageVmotionCleanupTimeoutWhen a storage vmotion is done in an HA enabled cluster using pre 5.0 hosts and the home datastore of the virtual machine is being moved, HA may interpret the completion of the storage vmotion as a failure, and may attempt to restart the source virtual machine. To avoid this issue, the HA master agent waits the specified number of seconds for a storage vmotion to complete or fail. When the storage vmotion completes or the timer expires, the master will assess whether a failure occurred. If the option is not specified, 900s is used for the timeout.NoCluster

Reporting

5.0, 5.1, 5.5das.config.log.outputToFilesEnable the FDM file-based logger for ESXi 5.0 and later hosts. 5.0 host log to the ESX syslog and so file-based logging is not enabled by default. This option has no affect on pre-5.0 hosts. To enable the file-based logger, set das.config.log.outputToFiles to true and das.config.log.maxFileNum to a number greater than 2. To disable file-based logging, set this option to false.YesCluster
5.0, 5.1, 5.5das.config.log.directorySets the directory used by the FDM file-based logger. If not specified, files are written into/var/log/vmware/fdm. See the option das.config.log.maxFileNum for more information.YesCluster
5.0, 5.1, 5.5das.config.fdm.stateLogIntervalFrequency in seconds a FDM logs a summary of the cluster state. If not specified, 600s (10 min) is used.In ESXi 5.0 - Yes, HA must be reconfigured on all hosts. In ESXi 5.1 and later - NoCluster
5.0, 5.1, 5.5das.config.fdm.event.maxMasterEventsDefines the maximum number of events cached by the master. If not specified, 1000 are cached.In ESXi 5.0 - Yes, HA must be reconfigured on all hosts. In ESXi 5.1 and later, NoCluster
5.0, 5.1, 5.5das.config.fdm.event.maxSlaveEventsDefines the maximum number of events cached by a slave. If not specified, 600 are cached.In ESXi 5.0 - Yes, HA must be reconfigured on all hosts. In 5.1 and later - NoCluster
5.0, 5.1, 5.5vpxd.das.reportNoMasterSecA vCenter Server parameter that determines how long to wait in seconds before issuing a cluster config issue to report that vCenter Server was unable to locate the HA master agent for the corresponding cluster. If not specified, 120s is used.Yes, vCenter Server needs to be restarted.VC

No comments:

Post a Comment