Unexpected Behavior Setting Up Firewall Rules with Clustered SQL Server Instances

If you have ever try to setup firewall rules in your network to manage access to SQL Server you have probably found a somehow weird behavior when connecting to a clustered instance. In this particular case my customer was finding that the rules she had configured on her Windows firewall were working fine when connecting to a SQL Server 2008 cluster, but were failing when connecting to a SQL Server 2005 cluster instead.

This behavior is indeed clearly documented on KB944390 article but in our case it took some time to match what we were finding looking at the network traces to this specific Knowledge Base article. I am posting this information here so you can see what you can expect when running your prefer network sniffer while tracing down this problem.

If you want to go ahead and try this to yourself, grab you favorite network sniffing utility, a PC with SQL Server Management Studio (SSMS) and locate a SQL Server 2005 and SQL Server 2008 clusters on your network. The test only includes opening SSMS on your client and capture some network traces so there is no risk for the clusters on a the production system. In my case I used Microsoft Network Monitor  because I find easy to interpret the SQL Server packet information using the available SQL Server parser; you can find and download this parser here.

Open Microsoft Network Monitor and make sure the correct SQL Server parser is selected, the new Network Monitor version makes this easier than ever:

image

Click on New Capture to open a new capture window and then click on the Start button to initiate the capture process. Then, open SSMS and connect to the clustered SQL Server 2008 instance using the default setting;, as soon as the connection completes, go back to Network Monitor and stop the capture process.

As you may know, when connecting to SQL Server named instances the SQL Browser service plays an important role giving to the clients the required information. SQL Browser runs as a local service on each standalone or cluster server where SQL Server is installed (you won’t find SQL Browser service on SQL Server 2000 because it was implemented as a stand-alone service starting with SQL Server 2005). Here is in summary what SQL Server browser does from the MSDN documentation:

“On startup, SQL Server Browser starts and claims UDP port 1434. SQL Server Browser reads the registry, identifies all SQL Server instances on the computer, and notes the ports and named pipes that they use. When a server has two or more network cards, SQL Server Browser will return all ports enabled for SQL Server. SQL Server 2005 and SQL Server Browser support ipv6 and ipv4. […]

When SQL Server 2000 and SQL Server 2005 clients request SQL Server resources, the client network library sends a UDP message to the server using port 1434. SQL Server Browser responds with the TCP/IP port or named pipe of the requested instance.”

The component in SQL Browser that takes care of all this is called the SQL Server Resolution Protocol or SSRP for short. When connecting to my SQL Server 2008 clustered and named instance I was able to see the expected pattern of network packets going in and out. First we can see my client “JORGEPC-SQL” trying to access the remote clustered instance “SQL2008\KATMAI64” , the first Browser request is identified by and UDP Ssrp packet to the destination port 1434:

image

Then we can see the response from the SQL Server Browser service using the same SSRP protocol; the response contains all the information required by the client to complete the connection successfully:

image

As you can see, the answer in the SSRP traffic comes from the SQL Server Network Name resource in the cluster (sql2008.contoso.lab), not from the cluster node where the SQL Browser service is running. This is expected because I am initiating the communication with the virtual server name in SQL Server, not with the cluster node.

See now what happens when I perform the very same action using connecting to a clustered SQL Server 2005 named instance called “SQL2005\YUKON” . Here you have the screen captures of the SSRP request from my client (first) and the SSRP response from the server (second):

image

 

image

As you can see from the second image the Browser service is providing the correct information to the client via SSRP, including the instance name, version, and TCP and NP information (you can also see from this frame there are two instances in the cluster, one for SQL Server 2000, represented by Version = 8 and another one for 2005, represented by Version = 9). However, the SSRP response in this case is not coming from the SQL Server Network Name “SQL2005” but from the cluster node where the instance is currently active, “JORGEPC-VM-CN1” .

If you are not using a firewall or if your firewall rules do not include to block all unknown incoming traffic this behavior will be unnoticed. However, if you block all unknown incoming traffic on your client and using of a firewall that does not support loose source mapping, the connection attempt will fail. You firewall will log the following information on this case:

2010-10-28 15:23:21 DROP UDP 192.168.0.50 192.168.0.40 1434 56039 122 - - - - - - - RECEIVE
2010-10-01 15:23:22 DROP UDP 192.168.0.50 192.168.0.40 1434 56040 122 - - - - - - - RECEIVE
2010-10-01 15:23:23 DROP UDP 192.168.0.50 192.168.0.40 1434 56041 122 - - - - - - - RECEIVE
2010-10-01 15:23:25 DROP UDP 192.168.0.50 192.168.0.40 1434 56042 122 - - - - - - - RECEIVE

The KB944390 mentioned before explain this situation in more detail:

“When you connect to the named instance, the client network library sends a User Datagram Protocol (UDP) request packet to the IP address of the named instance. Then, SQL Server Browser returns a UDP response packet that contains the information about the connection endpoints.
However, in the UDP response packet, the source IP address may not be the IP address to which the UDP request packet was sent. If the named instance is a failover cluster instance, the source IP address is the IP address of the physical computer instead of the virtual IP address of the remote server. If the remote server has multiple IP addresses, the source IP address may be any of the IP addresses that are assigned to the remote server”

You can learn more about loose source mapping in the this MSDN link.

These results depend not only on SQL Server but Windows versions. The network traces will shown the expected behavior (incoming RRSP traffic source is the same as the destination) in SQL Server 2008 SP1 and above when installed on Windows Server 2008 or above; other SQL Server and Windows combination below these two versions will suffer from the same problem, for example:

  • SQL Server 2008 (RTM) on Windows Server 2008
  • SQL Server 2008 SP1/SP2 on Windows Server 2003 cluster
  • SQL Server 2005 on Windows Server 2008

For the SQL Server and Windows Server versions affected by this problem, 3 different workarounds are provided on the abovementioned KB944390 article.