Short blip in service availability today
We experienced a short blip in the availability of SQL Azure today and I wanted to post a quick note to let you know what happened.
Inside of SQL Azure we have these automated monitoring tools called watchdogs. These watchdogs monitor every aspect of the system and if one of them notices that something is not functioning correctly, it raises an alert to our operations staff and will, depending on the issue type, start some sort of corrective maintenance.
Shortly before noon (PDT) an alert was raised that the SQL Azure gateways were unable to communicate with the backend data nodes. Once the alert was raised, the operations team and certain members of the product team were automatically notified. Due to the granularity of the watchdogs, we were quickly able to identify the issue. The issue stemmed from a configuration change that was made as we were getting the cluster ready for the upcoming refresh of our CTP deployment. The configuration error was remedied and the communications between the gateways and the backend data nodes were restored thus restoring access to the service. The elapsed time from when the service became unavailable to the time it was restored was just under an hour. We have started implementing safeguards to ensure such configuration issues don't occur in the future.
Our goal is and always has been to be as transparent with the user community as possible. With that, as soon as we noticed that there was a service disruption; notification was sent to the MSDN forums. Once we go live, customers would have received an email notifying them. We posted an additional update a few minutes later once we identified what the issue was. If we had not identified the issue so quickly, our incident response plan, or "playbook" as we call it, requires us to notify our users every hour until the issue is resolved. Our goal is to ensure that if an incident should arise, our customers are never questioning what is going on and are always kept in the loop. We believe that by combing our best of breed data platform service with clear, frequent communications, we will only strengthen the rock solid relationship we have with our customers.
You will notice above I mention a refresh to our CTP bits. Yes, a refresh is coming. I'll be sending out more information on that soon.
Thanks,
Dave