The Ether: VMware ESX High Availability not working as expected

VMware ESX High Availability not working as expected

Issue

When configuring HA on an ESX cluster, it is best practice to configure a service console, and a backup service console on separate NICs on separate switches to ensure that heartbeats continue to be received in the event of a NIC or switch failure. If heatbeats are not received for 15000 milliseconds (15 seconds), the ESX servers go into the isolation response which is, by default to shut down the running virtual machines so they can be restarted on another host.

Isolation response is started when:
- A host has stopped receiving heartbeats from other cluster nodes AND the isolation address cannot be pinged.
- The default isolation address is the ESX service console gateway, and the default isolation response time is 15000 milliseconds (15 seconds).

Using the configuration described above, and IP addresses on the same subnet for the Service Console and Backup Service Console, both IP addresses will resolve to the same MAC address, and use the same switch port/NIC.

e.g.

ping service-console.domain.com
ping backup-service-console.domain.com
arp -a

Returns:

172.31.1.100          00-50-56-43-32-2e
172.31.1.101          00-50-56-43-32-2e

This will cause all virtual machines on all ESX hosts with this configuration to shut down when a switch failure occurs, particularly if link is not broken.

It is the default behavior of Red Hat Enterprise Linux that network interfaces would respond to arp request for IP address on ANY interface of the same machine.

Similar behaviour may be seen when a service console is disconnected, due to routing issues and the failure to detect dead interfaces.

ESX host may be uncontactable in VirtualCenter when backup Service Console is disconnected
http://theether.net/kb/100084

Resolution

No resolution is available at this time. A resolution using arp_ignore is not available until kernel 2.4.26. ESX 3/3.5 is based on kernel 2.4.21.

Update: ESX 4.0 is based on kernel 2.6.18-128 and has this functionality.

Once the sysctl arp_ignore is included in the Service Console kernel the following commands should resolve the issue:

echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore

Or by setting the appropriate sysctl.conf for persistence through a reboot.

/etc/sysctl.conf

net.ipv4.conf.all.arp_ignore = 1

A possible workaround to avoid the isolation response would be to configure a a virtual switch with two NICs connected to two switches and set the Network Failure Response to Beacon Probing, set Rolling Failover and set the das.isolationaddress on the cluster to the first switch IP address.

From VirtualCenter Client:
- Select the ESX server
- Select the Configuration tab
- Select Networking
- Select Properties of the virtual switch
- Select the Service Console (or the vSwitch) and click Edit
- Select the NIC Teaming tab
- Set Network Failure Detection to Beacon Probing
- Set Rolling Failover to Yes
- Click OK
- Click Close

Note: Beacon Probing may have issues with the Service Console in ESX 3.0.1 (see Further Information below)

- Right click on the HA cluster and select Edit Settings from the context menu
- Select VMware HA from the left pane
- Click the Advanced Options button
- Enter das.isolationaddress for the option
- Enter the IP address of the switch for the value
- Click OK
- Click OK
- Right click on each ESX host in the cluster and select Reconfigure for HA

From VirtualCenter 2.0.2 it is also possible to set the following parameters on the cluster:
- das.isolationaddress2: An additional isolation response address can be specified. Use the IP address of the switch use by the second NIC of the virtual switch to assist in avoiding the isolation response.
- das.failuredetectiontime: Adjust the default timeout used for failure & isolation detection. Default is 15000 milliseconds (15 seconds). Set this to 120000 milliseconds (120 seconds) to allow for switch failover/spanning tree/portfast.

In VirtualCenter Server 2.0.2 and above, you can specify more than one isolation response address for VMware High Availability (HA). Multiple isolation response addresses can be specified using the das.isolationaddress1 through das.isolationaddress10 options.

Setting Multiple Isolation Response Addresses for VMware High Availability
http://kb.vmware.com/kb/1002117

Advanced Configuration options for VMware High Availability
http://kb.vmware.com/kb/1006421

References

http://kbase.redhat.com/faq/FAQ_45_11033.shtm

Resource Management Guide
http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_resource_mgmt.pdf

das.isolationaddress
Sets the address to ping to determine if a host is isolated from the network. If this option is not specified, the default gateway of the console network is used. This default gateway has to be some reliable address that is available, so that the host can determine if it is isolated from the network. Multiple isolation addresses (up to 10) can be specified for the cluster: das.isolationaddressX, where X = 1-10.

das.usedefaultisolationaddress
By default, HA uses the default gateway of the console network as an isolation address. This attribute specifies whether that should be used (true|false).

das.failuredetectiontime
Changes the default failure detection time (with a default of 15000 milliseconds). This is the time period when a host has received no heartbeats from another host, that it waits before declaring the other host dead.

das.failuredetectioninterval
Changes the heartbeat interval among HA hosts. By default, this occurs every second (1000 milliseconds).

VMware HA Requirements and Best Practices
http://theether.net/download/VMware/vmware_ha_wp.pdf
"VMware HA monitors heartbeat between hosts on the console network for failure detection. So, to have reliable failure detection for HA clusters, the console network should have redundant network paths. That way, if a host's first network connection fails, the second connection can broadcast heartbeats to other hosts. To set up redundancy, you need two physical network adapters on each host."

Setting Failure and Isolation Detection Timeout and Multiple Isolation Response Addresses
http://kb.vmware.com/kb/1002080
http://kb.vmware.com/Platform/Publishing/attachments/1002080_fHA_Tech_Best_Practices.pdf

Virtual Switch Policies
http://pubs.vmware.com/vi301/server_config/sc_adv_netwk.6.5.html
"Beacon Probing – Sends out and listens for beacon probes on all NICs in the team and uses this information, in addition to link status, to determine link failure. This detects many of the failures mentioned above that are not detected by link status alone.

Rolling Failover— Select Yes or No to disable or enable rolling.
This option determines how a physical adapter is returned to active duty after recovering from a failure. If rolling is set to No, the adapter is returned to active duty immediately upon recovery, displacing the standby adapter that took over its slot, if any. If rolling is set to Yes, a failed adapter is left inactive even after recovery until another currently active adapter fails, requiring its replacement."

Setting Advanced Virtual Machine Attributes
http://pubs.vmware.com/vi301/resmgmt/vc_advanced_mgmt.11.32.html

Virtual Infrastructure 3: Beta to Production
http://download3.vmware.com/vmworld/2006/mdc5173.pdf
"Do not use with Beacon Probing network failure detection (BUG!)"

arp_ignore - INTEGER
Define different modes for sending replies in response to
received ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configured
on any interface
1 - reply only if the target IP address is local address
configured on the incoming interface
2 - reply only if the target IP address is local address
configured on the incoming interface and both with the
sender's IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host,
only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses

The max value from conf/{all,interface}/arp_ignore is used
when ARP request is received on the {interface}

Products

VMware ESX 4.0
VMware ESX 3.5
VMware ESX 3

Created: 15th August 2007
Updated: 14th October 2009