I came across an interesting issue the other day where we had a HA pair of NetScalers deployed on Microsoft Hyper-V 2012 R2 with SCVMM.
The NetScalers were misbehaving randomly showing service groups as partially up and also failing to send and receive heartbeats correctly. This had the fun side effect of both NetScalers very briefly becoming primary and causing IP address conflicts. Essentially services were flapping and causing lots of errors not only in the NetScaler but also in the downstream Citrix StoreFront and XenDesktop servers
The problem would arise at least once a minute for the affected service groups and last for only around 10 seconds while the monitors failed 3 probes and then passed the default 3 probes. Also the http monitors would report that the TCP connection was successful but the application timed out.
Looking initially at the problem it smelt very much of a networking problem because I discovered that the servers causing the partial up on the service groups where always on a remote hyper-v host to the primary NetScaler VPX.
I took a trace file from the NetScalers diagnostics toolsets and loaded it into wireshark and quickly saw that when the monitors failed, packets sent to the target MAC address where coming back as a different address. The MAC address that was sending the response to the monitor was the physical network cards MAC address of the Hyper-v host and not the MAC address assigned to the XenDesktop or StoreFronts virtual machine. The NetScalers obviously don’t like surprises and drop unexpected packets causing the failures in the monitors.
Now as it turns out this is a “feature” of using Nic Teaming in the Hyper-V host with Dynamic Load balancing.
This needs to be changed to Hyper-V port load balancing to ensure that all the traffic for the virtual machines that communicate with the NetScalers flows through a single adapter and does not get its the packets MAC address changed.
Once you’ve made the change from dynamic to Hyper-V port load balancing (which is non disruptive so can be done during business hours) the NetScalers will be much happier.
A colleague of mine Manuel Kolloff has also recently come across this problem however has found another solution to the issue check out his post HERE.
Author: Dale Scriven