The Netscaler product range both physical and the vpx appliance all have a dizzying array of features which can be difficult to get your head round to start with.
However I’m going to attempt to explain how HA works in regards to the netscaler devices (namely the VPX’s but the same can also be said for the physical devices).
HA (high availability) in a netscaler infrastructure is achieved by 2 netscaler nodes combined into an HA pair with one actively working and another passive node within the same site(waiting to take over should a failure occur in the primary netscaler). If you want a netscaler infrastructure to failover from one site to another you will need to look into GSLB.
The HA pair share the same configuration and the only individual aspect of each node is their NSIP (Netscaler IP address) and configuration changes from the primary (active) node are replicated automatically to the secondary node including the MIP’s VIP’s and all other configuration changes.
The nodes communicate with each other every 200ms via a heartbeat message (this is configurable but unadvisable to change) and 3 heartbeat messages must be missed for a failover to occur.
A simple demonstration example of this would be an IIS server behind an HA pair of netscalers. In this configuration I have a single IIS server set up and connected to a load balancing VIP with the HTTP monitor service checking for the IIS health.
When a client computer asks for the DNS/IP address of the IIS VIP the primary node responds and services the request, this creates an interesting problem which we will discuss later (hint think MAC addresses). The primary netscaler services the request until either a failure occurs or it is forced to failover by an administrator, as briefly mentioned above the NetScaler HA pair probe for each others health by sending heartbeat messages to each other every 200ms on port udp 3003 and provided that a heartbeat is not missed 3 times the devices are deamed to be healthy. HA is also monitored by means of interface monitors so the logical representations of the physical network ports are monitored for failures.
If a failure is detected then the primary netscaler will attempt to notify the secondary netscaler that a failover needs to occur because there is something wrong with it. Naturally if this cannot occur then the secondary node will see the loss of heartbeat messages anyway and failover and begin servicing requests. The HA interface monitor also has an interesting gotcha that you need to be aware of before merrily enabling HA, by default on physical netscaler devices HA monitoring will be turned on for all present interfaces which means even those which are unused, this will cause the netscaler to believe itself to be in an unhealthy state because it detects that an interface is down. It is therefore best practice to disable any unused physical interfaces and also turn off any HA monitoring on those interfaces so the netscaler does not falsely identify a unhealthy state. Within the VPX netscalers the HA monitors are not enabled by default and cannot actually be turned on this is because the VPX’s network connections are always connected to vswitches even if the physical NIC is disconnected or in a faulted state.
During a failover event the secondary node will change to the primary node and then send out a GARP (gratuatious ARP) message to all devices advertising change of the floating IP addresses (VIP’s , MIPs etc) to the new MAC address .The clients now connect to the new primary node and the connections are re-established. Due to some devices not accepting GARP messages this may causes some downtime as the old MAC address will still be held in the devices internal tables, which if you are looking for HA isn’t really what you want.
VMAC’s are a method of reducing the issue of the GARP and change of MAC address for the floating IP’s. VMAC as you might imagine stands for Virtual Mac Address and is essentially an extention of the floating IP addresses. The VMAC’s are floating MAC addresses that are bound to a particular interface on the netscaler device, the VMAC and the interface bind is synchronised with the secondary node and so within a failover situation because VMACS are bound to an interface and the floating IP’s are assigned to the VMAC the new primary node does not have to send a GARP message as the MAC address in the other devices tables do not require updating which naturally results in an almost un-noticeable amount of service interruption during a failover.
HA Failover Failures
HA can fail to failover for a number of reasons, once you have created a HA pair of netscalers it is important to set the netscaler systems up correctly so that a failover can occur seamlessly. When you create an HA pair it is best practice to change the state of the intended Primary node (the one with all the correct configuration on it already) to STAY PRIMARY, otherwise when creating an HA pair there is a risk that the new blank node will be designated as the primary and all the configuration on the existing node will be overwritten with blank data (effectively wiping out any configuration changes you have made to the first netscaler which may have been installed and working by itself for a good few years). When both nodes are configured for HA and then not set back to HA enabled mode the primary node setting will not allow a failover to occur because the STAY PRIMARY setting does exactly that and will not allow control of the environment to be passed over to the secondary. In this setting the secondary will stay in listening mode and not service any requests even though the PRIMARY has failed. A failover will also fail by default when the secondary node is determined not to be in a healthy state, so if the primary fails and the secondary node has HA monitoring enabled on an interface that is not being used it considers itself to be in an unhealthy state and will not service any requests.
Author: Dale Scriven