Unable to connect to a SQL Server named instance on a cluster

Unable to connect to a SQL Server named instance on a cluster

Rate This
  • Comments 65

We have been seeing several occurrences where users failed to connect to SQL Server named instance on cluster. The error messages are usually as follow:

 

For SNAC:

C:\>osql -E -S cluster\inst

[SQL Native Client]SQL Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].

[SQL Native Client]Login timeout expired

For MDAC:

C:\>osql -E -S cluster\inst

[DBNETLIB]Specified SQL server not found.

[DBNETLIB]ConnectionOpen (Connect()).

 

In all occurrences, customers already tried some basic steps for solving similar issues, such as, enabling TCP protocol, enabling remote connection, put SQL Server named instance into firewall exception list, etc. After we tracked down to the issues, we found that it's a combination of the specifics of Windows Cluster and the way we discover SQL Server named instance. When connecting to SQL Server named instances, our client components rely on SQL Browser to discover the server and its parameters. The discovery process is:

 

The client sends a UDP packet to SQL Browser on the target machine. When the named instance is on a windows cluster, the packet is sent to the cluster IP (or more specifically, the IP address corresponding to the virtual SQL Server). However, SQL Browser is not cluster-aware and listens on IP ANY. When SQL Browser receives the UDP request packet, it sends a response UDP packet back the client. The destination IP address is the client's IP address, however, the source IP address is changed. It's now the IP address for the NIC card on the physical machine, rather than the virtual SQL Server IP address. The source IP address of the response UDP packet is determined by Windows OS, based on the routing table. Because both virtual SQL Server IP address and the IP address attached to physical NIC are usually on the same subnet (thus belong to same route), physical IP address is selected preferably. Depends on the security settings on the client and server machines, this response UDP packet may be dropped because the peer IP address is changed. We have been seeing that the response packet is dropped by Firewall and/or IPSec.

 

Windows Firewall does not drop the packet. However, a third-party firewall may drop the packet. In addition, IPSec may also drop the packet if IPSec policy is enabled on the client and it can not establish a trust connection between the client and server. (Important update: If your client is a Vista machine, you will see this issue. A workaround is to specify tcp port or pipe name in your connection string directly.)

 

We decided not to fix this minor issue because it is determined by the nature of UDP protocol. A UDP socket can response to multiple senders and the socket layer never knows which one it is actually replying to. We may consider letting SQL Browser listen on individual IPs but the cost will be high. A workaround is to specify TCP port number in the connection string in which case we bypass the discovery process.

 

Please refer to the following links for additional information. The articles talk about issue for SQL Server 2000, but it also applies to SQL Server 2005 as the fundamentals did not change.

http://support.microsoft.com/?kbid=888228

http://support.microsoft.com/default.aspx?scid=kb;[LN];318432

 

*********Important update 2: regarding SQL Server 2008. ***************

We had a fix for this issue in SQL Server 2008. Unfortunately, the fix is still partial. We identify another issue which invalidate the fix on X64 machine. Other than that, the issue is fixed if the server is SQL Server 2008 on Vista/Windows Server 2008 on X86/IA64. We don't have to do anything on the client side for these scenario. Note: the issue still applies to all version of SQL Server 2005.

 

Update3: (Mar/2009)

If you upgrade your OS to Vista SP2 or Windows Server 2008 SP2, and your SQL Server 2008 is SP1, the partial issue on X64 is fixed. Meanwhile, we identified another related issue which affect the ability to enumrate SQL instances on Vista/Windows 2008 on network . The fix is also in SQL Server 2008 SP1.

 

 

 

Xinwei Hong, SQL Server Protocols
Disclaimer: This posting is provided "AS IS" with no warranties, and confers no rights

Leave a Comment
  • Please add 1 and 5 and type the answer here:
  • Post
  • You said that this issue won't fix; so what do we do now (after search knowledge base for enabling TCP/IP port and some more)
  • As I mentioned in the blog: "A workaround is to specify TCP port number in the connection string in which case we bypass the discovery process. "

    Also, you can change your Firewall/IPSec settings so that this response UDP packet won't be filtered. E.g, put your client application under exception list of your firewall.

    Thanks.
    Xinwei
  • Would installing SQL2005 as a default instance (instead of a named instance) help avoid the problem?  Since you can only have one named instance per virtual server (in a clustered environment), what is the advantage of a named instance over a default instance anyway?
  • If it's default instance, client can connect the server using default port number (1433). So, it's not an issue unless you change the server TCP port.

    For the other question, Yes. One named instance per virtual server. But, you can install multiple virtual servers on one physical machine, in which case, you would need named instances so that we can distinguish those instances on the physical machine.
  • We do have the same phenomenon.

    When running osql on the cluster node, randomly the connection fails or succeeds.

    When doing this workaround to specify the port, than it works.

    Is this bug in SQL Browser still not fixed in the upcoming SQL2005 SP2 ?

  • PingBack from http://chaespot.com/mssql/2007/02/24/sql-protocols-unable-to-connect-to-a-sql-server-named-instance-on-a/

  • would this scenario also apply to multiple instances on a host, each bound to two VLANs on two NICS?

  • Your scenario is different. Say you have IP1 on VLAN1 and IP2 on VLAN2. SQL Instance 1 listens on IP1, SQL Instance 2 listens on IP2. SQL Browser listens on IP ALL. You can only make a SQL connection to INST1 from VLAN1 and to INST2 from VLAN2. In this case, UDP packets input/output from the same IP address on the server, so no such issue. If you try to connect to INST1 from VLAN2 or INST2 from VLAN1, you will fail. But that's because of routing, not because of Firewall issue I described.

    Please let me know if your scenario is different than what I just assumed.

  • thanks for the feedback - our setup is slightly more complicated : SQL 2005 - two instances listening on both vlans, and one sql 2000 instance.

    We were getting intermittent prelogin failures to the SQL 2005 instances, but usage is currently light so its hard to get accurate pictures.

    I changed the bindings on the SQL 2005 instances so that they listen on only one VLAN and we still get the issue. I'm planning to simplify further as the second VLAN was only required for internet access for service pack 1.. so I'll take that out of the equation, and hopefully that will cure the problem - otherwise I'll have to start looking at removing the sql2000 instance.

  • The issue I mentioned in the blog only happens when you connect one IP but the return packet changed the server IP to another one. With two VLANs, networks are seperated by lower layer(link layer). IPs usually won't be messed up. Unless your client also attachs to both VLANs, you should not see the issue I mentioned. Also, if you do have this issue, you won't be able to connect your server at all (without the workaround). Since you see intermittent pre-login error, I believe your issue is different.

    What's the exact error message?

  • removing the multiple vlans has cured the problem, thanks.

    exact error message was: The SQLBrowser service was unable to process a client request.

  • Users often see this error message when connection to a SQL Server and don't know where to start to solve

  • So is this a bug that will be corrected in a Service pack at some point? I am hoping it is not as I have the same issue in an active active cluster each with a named instance.

    Thanks,

    Mike

  • You can consider this as a bug for us, but its origin is the nature of UDP. We hope to fix this in coming releases. For now, you can either use the workaround I mentioned, or define an exception between your server and client on your client Vista machine. Thanks.

  • I'm trying to register a database engine (named instance of a cluster) in SQL Server Management Studio and I can't connect.  I also can't connect if I drop down to osql.  I've added exceptions to my firewall for ports and programs but still no luck.  If I try to register or connect to the default instance of the cluster I can connect fine.  It's only the named instance which I'm having problems with.  It also only happens on one PC.  I hate to refresh the PC if there's a workaround.  Any ideas?

Page 1 of 5 (65 items) 12345