-->

Wednesday, December 2, 2015

Exchange 2013 DAG Heartbeat Threshold on VMWare

Over the past couple weeks we'd been noticing quite a few DAG failovers and cluster service stoppages in our Exchange 2013 environment. Our multi-role servers are hosted on VMWare and we noticed some vMotion events happening. Technically vMotion shouldn't cause any disruption in Exchange, but ours were migrating a couple times within a few minutes...yeah our infrastructure team needs to fix that. 
It's best practice to bump up the DAG heartbeat interval (per VMWare) to allow for host migrations.

To change the interval values on the cluster service, you'll need to use Windows PowerShell.

**Note** It's totally safe to make the change during production hours, as it's transparent and does not require service restarts.

To get the default values, fire up Windows PowerShell and run:

Get-Cluster | fl *subnet*

It will return a list like so:

CrossSubnetDelay                      : 1000
CrossSubnetThreshold               : 5
PlumbAllCrossSubnetRoutes     : 0
SameSubnetDelay                      : 1000
SameSubnetThreshold               : 5

The "1000" value is the milliseconds between heartbeats (1 second) and the "5" is the amount of heartbeats.

You'll want to be safe and only change the SameSubnetDelay interval to 2 seconds (2000 milliseconds), which according to VMWare is plenty of time for the cluster to adjust for any vMotion.

To change the interval, run the following:

(Get-Cluster).samesubnetdelay = 2000

Now, run the original cmdlet again and you'll see your changes:

Get-Cluster | fl *subnet*

CrossSubnetDelay                       : 1000
CrossSubnetThreshold                 : 5
PlumbAllCrossSubnetRoutes      : 0
SameSubnetDelay                        : 2000
SameSubnetThreshold                 : 5

**Note** You only need to do this on one of the DAG nodes, as the changes will replicate across the cluster.

Now you'll have less failovers during vMotion migrations.

No comments:

Post a Comment