The official SQL Server AlwaysOn team blog.
AlwaysOn Availability Groups Troubleshooting and Monitoring Guide provides information on troubleshooting the common AlwaysOn Availability Groups issues and monitoring the availability group health. We will try to keep updating this guide as new troubleshooting scenarios are documented so that you can conveniently find all the information in one place.
Please send us feedback on it!
We recently did 17 million updates to a table and our asynchronous AG node fell more than 24 hours behind. This secondary node has identical hardware and drives. We rarely see any latency over one second. But, we rarely do these kinds of large updates at one time. Are there any settings or hardware improvements we can do so our secondary doesn't fall so far behind? I believe this is an I/O issue, not a network bottleneck. We are considering adding SSD caching, but I'm not sure this will help this issue.
Hi. I looked over the troubleshooting guide, and it's very helpful, but doesn't help my situation.
I'm running a primary and secondary on sql 2012 enterprise edition on windows server 2012, and it runs fine except when a network outage occurs.
Then the handshaking keeps failing, and the only way to sync is to reboot both primary and secondary.
keep getting 3520's, etc.
Was wondering if upgrading to windows server 2012r2 would help this, or does anyone have any suggestions on commands to try in future, to eliminate all these prod reboots?
I increased query connection timeouts to 60, but saw no change.