Saturday, September 9, 2017

Exchange - Database Failovers and MSMQ Event ID 2250

My databases had been automatically switching over from one of my three Exchange 2016 CU4 DAG nodes, several times a day. I would move them back and they would fail back over almost instantly.

I started checking the Event Logs, and there were tons of Event ID 2250 and Event ID 2252 in the Application Log that coincided with the failover timestamps in the Exchange/High Availability/Operational Logs. This only happened on the one Mailbox Server though...the other two servers were running like champs.

The full Event IDs are:

MSMQ Event ID: 2250

Message Queuing will not be able to accept messages temporarily because system paged pool is low. During this period, machine quota will be set to 0. No manual intervention is required at this stage. Once memory utilization has normalized, Message Queuing will automatically resume accepting messages.

MSMQ Event ID: 2252

Message Queuing is resuming accepting messages because the system memory usage has normalized.

**Note** The 2252 event would happen directly after the databases moved off of this server.

Googling those events turned up nothing that really helped. As you can see the 2250 states that the page pool was exhausted, which was not the case, and my pagefile is set to preferred architecture (RAM + 10MB in my case 32,778MB).

Since it was only this one server, it had to be some sort of communication error.

So I did the normal Test-ReplicationHealth, Get-MailboxDatabaseCopyStatus and I noticed that PowerShell was throwing tons of errors like "MBX1 does not exist on DC1" and WinRM errors every time I would run a cmdlet.

That led me check the Domain Controllers and sure enough DCDIAG from Exchange-to-DC and DC-to-DC came back nasty with errors for the DC holding the FSMO roles.

The Cause:

Another domain admin was monkeying around with the firewall and GPOs on the DC's causing all kinds of network issues.

DO NOT turn off the Windows Firewall on production servers, and check your GPO scopes before enabling them!!!!!

The Fix:

1. Take away admin rights from everyone else :)

2. Reboot the DC and check DCDIAG to ensure it was clean...it is now.

3. Reboot the crippled Mailbox server
    - The errors were gone before the reboot, but just to make sure, I bounced it anyway.

**Note** Rebooting the FMSO DC will result in mailbox database dismounts. This is because the FMSO DC holds the AD configuration container for the databases.They should remount cleanly after the DC is back online, but you should do this during off hours if you can, since it will affect client connections to Exchange.

Now your databases should stay put according to their activation preference and those MSMQ events will be gone.

No comments:

Post a Comment