We have observed a number of issue now where in clustered instance of sql server 2005 fails to come online after performing changing on shared disk on a cluster.
As mentioned in my previous posts if the sql server fails to come online we need to start troubleshooting by looking at the sql server error log. So we checked the sql server error log and observed the following error
2009-01-11 21:10:06.96 Server Error: 26054, Severity: 16, State: 1. 2009-01-11 21:10:06.96 Server Could not find any IP address that this SQL Server instance depends upon. Make sure that the cluster service is running, that the dependency relationship between SQL Server and Network Name resources is correct, and that the IP addresses on which this SQL Server instance depends are available. Error code: 0x103. 2009-01-11 21:10:06.96 Server Error: 17182, Severity: 16, State: 1. 2009-01-11 21:10:06.96 Server TDSSNIClient initialization failed with error 0x103, status code 0xa. 2009-01-11 21:10:06.96 Server Error: 17182, Severity: 16, State: 1. 2009-01-11 21:10:06.96 Server TDSSNIClient initialization failed with error 0x103, status code 0x1. 2009-01-11 21:10:06.96 Server Error: 17826, Severity: 18, State: 3. 2009-01-11 21:10:06.96 Server Could not start the network library because of an internal error in the network library. To determine the cause, review the errors immediately preceding this one in the error log. 2009-01-11 21:10:06.96 Server Error: 17120, Severity: 16, State: 1. 2009-01-11 21:10:06.96 Server SQL Server could not spawn FRunCM thread. Check the SQL Server error log and the Windows event logs for information about possible related problems.
As seen from the above Error 26054 message which says that cluster service is unable to find the dependency between the sql server instance and the sql network resource name.
We checked the Cluster Administrator for the sql server resource and found that the dependency exists between the sql server resource and sql network name but still we receive the error.
We checked the Registry Location HKLM\Cluster\Resources\<SQL Server GUID>\Parameters
The values InstanceName and VirtualServerName were missing from the keys below:
HKLM\Cluster\<SQL Server GUID>\Parameters HKLM\Cluster\<SQL Agent GUID>\Parameters
IMPORTANT : This resolution contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:
256986 Description of the Microsoft Windows Registry
We added the following registry values in the Parameters registry key:
For a named instance of SQL Server:
• InstanceName Value Name: InstanceName Value Type: REG_SZ Value Data: <instance name of the named instance>
• VirtualServerName Value Name: VirtualServerName Value Type: REG_SZ Value Data: <Virtual server name of the instance>
Once we added the registry keys we were able to bring the sql server online on Node1.
We failed the instance over to Node2 we checked the above registry keys and found they were missing, so we added missing keys and rebooted both the servers of the clusters.
After rebooting the server the sql server resources came online fine without any errors
This post should not be treated as the Microsoft’s Recommendation or Resolution to the problem, it is only a workaround which worked in our environment and so we would like to share it.
Parikshit Savjani, SE, Microsoft SQL Server
Good Post. I changed HDD’s in virtual world in ESX and the 2 registry key entries kept me from rebuilding the entire damn thing. Work around worked for me.
Excellent Post. Saved me multiple hour SQL Cluster reinstall.
Excellent Post. Thankyou, this saved me having to re-install the SQL 2005 cluster.
Helped me out too. I removed the SQL Server resource from the cluster to prevent it from being brought up as we "retired" this instance. As usual, later some of the old data was needed. Thanks.
This was a life saver! Thought I was going to have to rebuild the cluster!
I see that sql server resource and sql server agent resource are missing in cluster administratorfor one of my instance. Can any of you provide me with the info to recover these resource.
Excellent solution. Saved me a lot of hours & effort.
Hi! Just a quick question, is there any reason why these values were originally missing?
I am not able locate. I am installing SQL Server 2005 On Windows 2008 R2. Having the same issue as mentioned above.
How to find the SQL Server GUID as mentioned above. ?
Ideally you shouldn't face this issue while installing SQL Server. Since we have observed this issue specifically when you already have SQL installed as a clustered resource which is up and running and You had to bring down the sql instance to perform some SAN activity and thereafter the SQL resource doesn't come online.
However still to answer your question. In regedit, once you go to the location HKLM\Cluster you can search (Ctrl+F) for InstanceName (case sensitive) or VirtualServerName (case sensitive) and locate your SQL instance. if you are seeing these DWORDs with missing values then you might be facing this issue. If the value are present and they are correct then it might be another issue and you might want to raise a ticket with CSS to help you troubleshoot the issue.