CSS SQL Server Engineers

This is the official team Web Log for Microsoft Customer Service and Support (CSS) SQL Support. Posts are provided by the CSS SQL Escalation Services

Error 18056 can be unwanted noise in certain scenarios

Error 18056 can be unwanted noise in certain scenarios

Rate This
  • Comments 8

I saw a lot of hits on the web when I searched for the Error message 18056 with State 29. I even saw two Microsoft Connect items for this issue filed for SQL Server 2008 instances:

http://connect.microsoft.com/SQL/feedback/ViewFeedback.aspx?FeedbackID=468478

http://connect.microsoft.com/SQLServer/feedback/details/540092/sql-server-2008-sp1-cu6-periodically-does-not-accept-connections

So, I thought it was high time that we pen a blog post on when this message can be safely ignored and when it is supposed to raise alarm bells. Before I get into the nitty-gritty details, let me explain under what condition is 18056 raised with the state = 29.

Most applications today make use of connection pooling to reduce the number of times a new connection need to be opened to the backend database server. When the client application reuses the connection pool to send a new request to the server, SQL Server performs certain operations to facilitate the connection reuse. During this process (we shall call it Redo Login for this discussion) if any exception occurs, we report an 18056 error. The state numbers like the famous 18456: Login Failed error message give us more insight into why the Redo Login task fails. State 29 occurs when there is an Attention received from the client while the Redo Login code is being executed. This is when you would see the message below which has plagued many a mind till date on SQL Server 2008 instances:

2009-02-19 04:40:03.41 spid58 Error: 18056, Severity: 20, State: 29.

2009-02-19 04:40:03.41 spid58 The client was unable to reuse a session with SPID 58, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Is this a harmful message?

The answer that always brings a smile to my face: It depends! The dependency of this error message being just plain noise to something that should send all the admins in the environment running helter-skelter can be summarized in one line.

If the above error message (note that the state number should reflect 29) is the only message in the SQL Server Errorlog along with no other errors noticed in the environment (connectivity failures to the SQL instance in question, degraded performance, high CPU usage, Out of Memory errors), then this message can be treated as benign and safely ignored.

Why is this message there?

Well our intentions here were noble and we didn’t put the error message out there to create confusion. This error message is just reporting that a client is reusing a pooled connection and when the connection was reset, the server received an attention (in this case, a client disconnect) during the connection reset processing on the server side. This could be due to either a performance bottleneck on the server/environment or a plain application disconnect. The error message is aimed at helping in troubleshooting the first category of problems. If you do see some other issues at the same time though, these errors may be an indicator of what is going on at the engine side.

What should you do when you see your Errorlog bloating with these error messages?

a.       The foremost task would be to scan the SQL Errorlog and determine if this error message is accompanied before/after by some other error message or warning like Non-yielding messages, Out of Memory (OOM) error message (Error 701, Failed Allocate Pages etc.).

b.      The next action item would be to determine if there is high CPU usage on the server or any other resource bottleneck on the Windows Server. Windows Performance Monitor (Perfmon) would be your best friend here.

c.       Lastly, check if the Network between the Client and Server is facing any latency issues or if network packets drops are occurring frequently. A Netmon trace should help you here.

 

Tejas Shah

Escalation Engineer - Microsoft

Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
  • So has a fix been found for this? We are experiencing this problem, this error message is followed by extended periods of high CPU which we are only able to rectify by restarting the server. The only fix I've managed to find is to set the maxdop option to 1 for the server.

  • How about state 23? This is my case...

  • State 23 typically means true login failure (access verification failure). Although we would need to look at exact error to comment on that.

    Note: as it is mentioned in the blog entry, this is an indicator or symptom of another potential problem. So if this is accompanied by some other issues (such as high CPU), it would be good to find out the reason for high CPU and troubleshoot that. When that is taken care of, the problem symptom (as such error 18056) may go away along with it.

    Tejas

  • Thanks Tejas

    Brilliant explanation, well written and easy to understand. Thankyou

  • Hi,

    We've been receiving a lot of these messages lately as well except the issues are that clients can't connect to the server due to the timeouts that we have in place.   We know that CPU definitely isn't a bottleneck nor is memory.  (Our box has 64 cores and 512GB of RAM)   Also, the gig link between the SQL server and the web servers is far from saturated.

    Any help would be greatly appreciated.  

    Thanks,

    Glenn

  • Hello,

    We have been seeing the same thing Glenn reported as well.  Everything else seems to be running fine (resources, network, etc.) Our clients just have a problem connecting to SQL when we see these errors in the error log.

    Any help would be greatly appreciated.

    Thanks,

    Jason

  • Hello,

    We have been seeing the same thing Glenn reported as well.  Everything else seems to be running fine (resources, network, etc.) Our clients just have a problem connecting to SQL when we see these errors in the error log.

    Any help would be greatly appreciated.

    Thanks,

    Jason

  • Glenn & Jason,

    SQL Server can easily consume all of its buffer upon a 512GB RAM box, even if the sum of the database sizes is smaller. Other potential consumers of the buffer pool are stored procdure cache, permissions cache, and temporary tables (just to name a few). Once or if SQL Server's buffer pool is consumed (and if the buffer cannot be flushed), disk IO will increase. Or the lack of an index (or inaccurate statistics) can cause a large table to be scanned into the buffer. There are many different causes of performance problems. I suggest becoming familiar with SQLDiag (see Books Online), readtrace (support.microsoft.com/.../944837, with its useful help file) and SQL Nexus (http://sqlnexus.codeplex.com). Beware that a perfmon log should be collected concurrently with the server-side trace(s) (for readtrace or SQL Nexus to consume, later), and that two or more polls of various dm_os_% DMVs should also be collected concurrently.

Page 1 of 1 (8 items)