When running a Stretched DAG over a WAN, you may be seeing a bunch of failovers at odd times when nothing is actually wrong.
The Failover Cluster Service (FCS) defaults are set at pretty low timeouts, which causes packet loss and heartbeats to drop even over a "fast" WAN connection.
By default, heartbeat frequency is 1000ms for local and remote subnets, and when a node misses 5 heartbeats your DAG will initiate a failover.
Here's how to make it not so sensitive by upping the values, which allows more latency.
Fire up an Elevated Command Prompt on a DAG Node, then issue each of these commands:
cluster /prop SameSubnetDelay=2000:DWORD
cluster /prop CrossSubnetDelay=4000:DWORD
cluster /prop CrossSubnetThreshold=10:DWORD
cluster /prop SameSubnetThreshold=10:DWORD
**Note**These are the Maximum values allowed, so you may want to start lower and play around with them a bit before settling on the highest timeouts.
You can verify that the changes took by running:
cluster /prop
Now your Stretched DAG shouldn't keep you up at night with it's "Phantom Phailovers" as I call it :)
No comments:
Post a Comment