Topics from the Microsoft SQL Server Protocols team - Netlibs, TDS, SQL Browser, etc.
Sporadic “Connection forcibly closed by remote host” errors with SQL Server connections can be very difficult to troubleshoot and resolve. This blog post is targeted at diagnosing TOE/Chimney issues that may lead to this client error message. Chimney is a feature introduced in the Windows Server 2003 Scalable Networking Pack, which was included in Windows Server 2003 SP2. Chimney increases network performance when using a network card which implements TOE, TCP/IP Offload Engine, which is a hardware implementation of the TCP/IP stack.
The following are the symptoms to look for:
· The client connection is sporadically failing with the message: “TCP Provider: Connection forcibly closed by remote host.” The client connection may, in addition, sometimes fail with the message: “General network error”.
· There are no corresponding network-related error messages in the SQL Server instance’s ERRORLOGs. Normally, the “Connection forcibly closed by remote host” message on the client indicates that an error occurred on the server which is deemed severe enough to close the connection; in that case, the server would log an error message explaining why the connection was closed. An example error message for this would be Error 17828: “The prelogin packet used to open the connection is structurally invalid; the connection has been closed. Please contact the vendor of the client library.” However, if the issue is in the networking hardware, such as a TOE-related issue, there will be no message in the SQL Server instance’s ERRORLOGs for this connection closure, since the server is not intentionally closing the connection. Therefore, check the SQL Server ERRORLOG for an absence of any corresponding network-related error messages.
· There is no other client killing the first client’s connection. In addition to potential network hardware causes, the “Connection forcibly closed” message can also appear with no corresponding server ERRORLOG message if the client’s connection is being killed by a different client. Examine the SQL Server ERRORLOG for KILL statements; if there are none, then no other client is killing SQL Server connections.
If all three of these symptoms are appearing, your problem is likely due to a faulty piece of network hardware, possibly due to TOE/Chimney.
To test if TOE/Chimney is the source of your problem, you can disable it and see if the problem goes away. You should do this for BOTH the client and server, since TOE/Chimney on either machine, or both, could be the cause of the issue. To disable Chimney, run this command (if on Windows Vista or Windows Server 2008, run it at an elevated command prompt):
netsh int ip set chimney DISABLED
This command does NOT require a reboot. If you have these symptoms and running this command doesn’t fix the problem, then you likely have an issue with network hardware and should follow up by investigating your network hardware. This kb article should give you some leads on how to begin network troubleshooting: http://support.microsoft.com/kb/325487
Dan BenediktsonSQL Server ProtocolsDisclaimer: This posting is provided "AS IS" with no warranties, and confers no rights
You nicely explain for which symptoms one has to look out, however I'm still looking for the reason why this happens. What causes exactly the connections to drop (forced closure) ? Is the TOE implementation so buggy or are their certain szenarios that forces the reset of the tcp connections ?
I actually need to update this blog post, so thanks for reminding me! The root cause of this exact problem is a buggy implementation of keepalive in the NIC driver. Fortunately, in the latest driver version that is available for the affected NICs, that implementation has been fixed, so now rather than turning off this feature entirely, you can update your NIC drivers and get the benefits of TOE.
Hope this helps,
Thanks for the very fast response, in our situation we see that behavior between two servers that are both running on VM Ware ESX. Would the mentioned cause also apply for a virtualisation environment like this ? Is there a KB that gives an overview which NICs are affected ?
I believe the affected NICs are covered in this KB article: http://support.microsoft.com/kb/942861
I am not positive that this is possible in a VMWare environment, although I believe it should be possible if one or both of the host machines are using one of these NICs.
Note that this is not by any stretch the only possible cause of "Connection forcibly closed", though - malfuncioning network hardware could give exactly the same symptoms, and used to be the main source of these sorts of error messages before this problem came up. TCP Chimney is definitely the first place to look, though, both because it is a common problem and because it is easier to troubleshoot than other kinds of misbehaving switch, NIC, etc...
An existing connection was forcibly closed by the remote host
A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) (Microsoft SQL Server, Error: 10054)
We had chronic issues with this type of error that was intermittent and extremely frustrating. It turned out that there was a remote SNMP query running to the firewall that we had to get through to make our database queries. we ran a Traceroute tool and foudn the 100% packet loss for a split second every 4or so minutes that gave us our first real clue. Once this was found and disabled all was well. It was a Cisco router and I do not know if a permanent solutions was found.
can this error be related to a hack ?