When OpsMgr discovery hangs it’s because the SQL Broker service has not been enabled in the operationsmanager database. Just a quick BING search will show you tons of hits about this issue – and I’ve never seen one case where this wasn’t the case – until now. I just finished working an issue where OpsMgr discovery would hang but the SQL Broker service was enabled. I didn’t know much about the Broker service and definately had never had to troubleshoot it before.
What is the SQL Broker service? MSDN offers the following statement to describe what Broker does. A more complete discussion can be found at http://msdn.microsoft.com/en-us/library/ms166043(SQL.90).aspx This new technology, a part of the Database Engine, provides a message-based communication platform that enables independent application components to perform as a functioning whole. Service Broker includes infrastructure for asynchronous programming that can be used for applications within a single database or a single instance as well as for distributed applications.
So we need to troubleshoot the Broker service – cool, where can we find out information about it? The first stop is on the Broker enabled database itself. Under the Service Broker folder we can access various information about the service. But I didn’t find anything here to really help me understand what might be broken. Searching a bit I came across two very useful SQL queries – one to help me troubleshoot and one to show me data processing through the Broker.
select * from sys.transmission_queue
select * from sys.conversation_endpoints
From my limited troubleshooting it seems that when everything is running normally the first query should return no results where the second query will show data in process/queued by the system. Looking at the results of the first query in the transmission_status column I found this error ‘An exception occurred while enqueueing a message in the target queue. Error: 15404, State: 19. Could not obtain information about Windows NT group/user 'Domain/User', error code 0x52e.’ Obviously I’ve replace the real user information with Domain\User for confidentiality reasons. Regardless, with this information I now have a clue what is going on. We have an account in play somewhere that we can use to authenticate. Error 52e confirms that and translates to ERROR_LOGON_FAILURE. Digging a bit we found the culprit. The owner of the operationsmanager database was the account that was incorrect. It’s easy to see who the database owner is, just select the database, click properties and on the general page, the owner information is displayed. Here is a screenshot from my lab. Once we set the database owner to the correct value, discovery worked like a champ!