SQL in Stockholm

SQL Server thoughts from the support team in Stockholm

SQL Server troubleshooting thoughts from Graham Kent, (ex) Senior Support Escalation Engineer at Microsoft Sweden.

  • SQL in Stockholm

    Non-yielding IOCP Listener - Stack Dump analysis

    • 2 Comments

     

    Here's something I was looking at yesterday which is worth a quick look. It's quite a common scenario and one that's relatively easy to address. Your SQL Server 2005 instance suddenly throws the following to the error log (apparently without warning):

    2008-09-09 08:30:44.810 Server       Using 'dbghelp.dll' version '4.0.5'
    2008-09-09 08:30:44.960 Server       **Dump thread - spid = 0, PSS = 0x0000000000000000, EC = 0x0000000000000000
    2008-09-09 08:30:45.000 Server       ***Stack Dump being sent to F:\MSSQL.1\MSSQL\LOG\SQLDump0001.txt                 
    2008-09-09 08:30:45.000 Server       * *******************************************************************************
    2008-09-09 08:30:45.010 Server       *                                                                                
    2008-09-09 08:30:45.010 Server       * BEGIN STACK DUMP:                                                              
    2008-09-09 08:30:45.010 Server       *   09/09/08 08:30:45 spid 0                                                     
    2008-09-09 08:30:45.010 Server       *                                                                                
    2008-09-09 08:30:45.010 Server       * Non-yielding IOCP Listener                                                     
    2008-09-09 08:30:45.010 Server       *                                                                                
    2008-09-09 08:30:45.010 Server       * *******************************************************************************
    2008-09-09 08:30:45.010 Server       * -------------------------------------------------------------------------------
    2008-09-09 08:30:45.010 Server       * Short Stack Dump                                                               
    2008-09-09 08:30:45.320 Server       Stack Signature for the dump is 0x000000000000020B                               
    2008-09-09 04:47:06.180 spid3s       LazyWriter: warning, no free buffers found.
                 
                            

    Following this several hundred lines of output similar to the following format are shown

    2008-09-09 08:31:08.750 spid3s      
    Memory Manager
        VM Reserved = 16963064 KB
        VM Committed = 16942872 KB
        AWE Allocated = 0 KB
        Reserved Memory = 1024 KB
        Reserved Memory In Use = 0 KB 
    2008-09-09 08:31:08.750 spid3s      
    Memory node Id = 0
        VM Reserved = 16957304 KB
        VM Committed = 16937200 KB
        AWE Allocated = 0 KB
        SinglePage Allocator = 2519016 KB
        MultiPage Allocator = 39632 KB 
    2008-09-09 08:31:08.750 spid3s      
    MEMORYCLERK_SQLGENERAL (Total)
        VM Reserved = 0 KB
        VM Committed = 0 KB
        AWE Allocated = 0 KB
        SM Reserved = 0 KB
        SM Committed = 0 KB
        SinglePage Allocator = 48576 KB
        MultiPage Allocator = 5136 KB


    You may well recognize that this latter output is basically the same as you would get if you run:
    DBCC MEMORYSTATUS

    You'll also have noted that a small dump file has been produced in the default error log directory of this SQL instance (the path of which is noted in the error log output above).

    In normal circumstances you can submit this dump file to us in support, and we can take a look at it and, all being well tell you what the problem is. The thing you have to remember here is that the production of the dump file does not necessarily mean that you have encountered a bug in SQL Server (as some people often believe). In the more recent versions of SQL there are innumerable scenarios where we control and catch certain scenarios well known to us and produce a dump file automatically.

    This particular scenario was caused by inappropriate configuration of SQL Server memory settings, and should be easily rectified, so I thought it a good example to show why.

    Although there is little that can normally be gleaned from debugging a SQL Server dump in the public domain, this one does contain a few hints. We don't publish private symbols for SQL Server, meaning that the amount you can do yourself with a dump file is quite limited (since you only have access to retail symbols), however you can look at the thread stacks and see if anything of note can be seen. Anyone who has read Ken Henderson's fantastic book about SQL Server internals will have used these techniques before. Sometimes it's just interesting to look a little deeper after all.

    I'll assume that you're already familiar with installing and configuring the debugger, but if you're not here's a good place to start. Debugging Tools and Symbols: Getting Started. This example just uses the retail symbol server address from this page:

    SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

    If you open this dump, you'll find yourself starting with the context set to the thread that generated the dump, in this case:

    0:004> kn
    # Child-SP          RetAddr           Call Site
    00 00000000`0474de58 00000000`77d704bf ntdll!NtWaitForSingleObject+0xa
    01 00000000`0474de60 00000000`016be0f9 kernel32!WaitForSingleObjectEx+0x130
    02 00000000`0474df00 00000000`016bdaeb sqlservr!CDmpDump::DumpInternal+0x4a3
    03 00000000`0474dff0 00000000`01c7c7b3 sqlservr!CDmpDump::Dump+0x3b
    04 00000000`0474e040 00000000`01d3af67 sqlservr!CImageHelper::DoMiniDump+0x413
    05 00000000`0474e1a0 00000000`025f8cf5 sqlservr!stackTrace+0x6e7
    06 00000000`0474f6d0 00000000`01a3b025 sqlservr!SQL_SOSNonYieldIOCPCallback+0x1ef
    07 00000000`0476f7e0 00000000`018662b6 sqlservr!SOS_OS::ExecuteNonYieldIOCPCallbacks+0x44
    08 00000000`0476f830 00000000`010b9628 sqlservr!alloca_probe+0x220815
    09 00000000`0476f990 00000000`029002c9 sqlservr!SchedulerMonitor::Run+0x164
    0a 00000000`0476fac0 00000000`010079fe sqlservr!SchedulerMonitor::EntryPoint+0x9
    0b 00000000`0476faf0 00000000`01008362 sqlservr!SOS_Task::Param::Execute+0xee
    0c 00000000`0476fc00 00000000`01007da1 sqlservr!SOS_Scheduler::RunTask+0xc9
    0d 00000000`0476fc90 00000000`014787a7 sqlservr!SOS_Scheduler::ProcessTasks+0xb4
    0e 00000000`0476fd00 00000000`014789ca sqlservr!SchedulerManager::WorkerEntryPoint+0xe7
    0f 00000000`0476fda0 00000000`01478911 sqlservr!SystemThread::RunWorker+0x59
    10 00000000`0476fde0 00000000`014795d9 sqlservr!SystemThreadDispatcher::ProcessWorker+0x130
    11 00000000`0476fe80 00000000`781337a7 sqlservr!SchedulerManager::ThreadEntryPoint+0x128
    12 00000000`0476ff20 00000000`78133864 msvcr80!_callthreadstartex+0x17
    13 00000000`0476ff50 00000000`77d6b6da msvcr80!_threadstartex+0x84

    This is the thread that is actually doing the monitoring and generating the dump. You can see in frame 02 the name of the function doing this. If you had the private symbols at this point, you can look through some of the earlier frames and determine which thread caused this condition to occur, but I can't show that here :-)

    The thread in question here was actually thread 8, we can switch to this and take a look

    0:008> kn 50
    # Child-SP          RetAddr           Call Site
    00 00000000`04f6f008 00000000`01750bd0 sqlservr!NumaNode::RampUp
    01 00000000`04f6f010 00000000`012f1523 sqlservr!alloca_probe+0xbf420
    02 00000000`04f6f110 00000000`0103f136 sqlservr!BPool::Steal+0x3ec
    03 00000000`04f6f1c0 00000000`0103f32e sqlservr!SQLSinglePageAllocator::AllocatePages+0x25
    04 00000000`04f6f200 00000000`0103f1ce sqlservr!MemoryNode::AllocatePagesInternal+0xce
    05 00000000`04f6f4a0 00000000`0103f7ba sqlservr!MemoryClerkInternal::AllocatePages+0x6e
    06 00000000`04f6f500 00000000`010d4bb4 sqlservr!CMemThread::PbGetNewPages+0x6a
    07 00000000`04f6f540 00000000`01068bba sqlservr!CFixedPageMgr::PbAllocate+0xcd
    08 00000000`04f6f5b0 00000000`01004d37 sqlservr!CMemThread::Alloc+0x1d7
    09 00000000`04f6f640 00000000`0147dc3a sqlservr!operator new+0x26
    0a 00000000`04f6f680 00000000`01045589 sqlservr!SOS_Node::EnqueueTaskDirectInternal+0x587
    0b 00000000`04f6f730 00000000`01045b03 sqlservr!SOS_Node::EnqueueTask+0x49
    0c 00000000`04f6f780 00000000`01046581 sqlservr!CNetConnection::EnqueuePacket+0x8f
    0d 00000000`04f6f890 00000000`01045cab sqlservr!TDSSNIClient::ReadHandler+0x171
    0e 00000000`04f6f900 00000000`010459bc sqlservr!SNIReadDone+0xb7
    0f 00000000`04f6f960 00000000`010079fe sqlservr!SOS_Node::ListenOnIOCompletionPort+0x251
    10 00000000`04f6faf0 00000000`01008362 sqlservr!SOS_Task::Param::Execute+0xee
    11 00000000`04f6fc00 00000000`01007da1 sqlservr!SOS_Scheduler::RunTask+0xc9
    12 00000000`04f6fc90 00000000`014787a7 sqlservr!SOS_Scheduler::ProcessTasks+0xb4
    13 00000000`04f6fd00 00000000`014789ca sqlservr!SchedulerManager::WorkerEntryPoint+0xe7
    14 00000000`04f6fda0 00000000`01478911 sqlservr!SystemThread::RunWorker+0x59
    15 00000000`04f6fde0 00000000`014795d9 sqlservr!SystemThreadDispatcher::ProcessWorker+0x130
    16 00000000`04f6fe80 00000000`781337a7 sqlservr!SchedulerManager::ThreadEntryPoint+0x128
    17 00000000`04f6ff20 00000000`78133864 msvcr80!_callthreadstartex+0x17
    18 00000000`04f6ff50 00000000`77d6b6da msvcr80!_threadstartex+0x84
    19 00000000`04f6ff80 00000000`00000000 kernel32!BaseThreadStart+0x3a

    So again, just by looking at the function names, we can make a guess that some internal memory routines related to the buffer pool are going on here, particularly from frames 02 and 03. If you ever attach a debugger into a running SQL Server instance (obviously never in production!) you'd see functions like this occurring regularly as SQL Server manages the buffer pool pages.

    There's not much more that can be seen from this stack, so we'll take a look through the rest of the threads, in this case by running

    ~* kn

    When browsing through these threads, we notice lots that also appear to be involved in memory related events, just one example of which is

      53  Id: 554.a6c Suspend: 1 Teb: 000007ff`fffd9000 Unfrozen
    # Child-SP          RetAddr           Call Site
    00 00000000`1521e140 00000000`012f1523 sqlservr!BPool::ReplenishFreeList+0x10d
    01 00000000`1521e240 00000000`0103f136 sqlservr!BPool::Steal+0x3ec
    02 00000000`1521e2f0 00000000`0103f32e sqlservr!SQLSinglePageAllocator::AllocatePages+0x25
    03 00000000`1521e330 00000000`0103f1ce sqlservr!MemoryNode::AllocatePagesInternal+0xce
    04 00000000`1521e5d0 00000000`0103f676 sqlservr!MemoryClerkInternal::AllocatePages+0x6e
    05 00000000`1521e630 00000000`0103f737 sqlservr!CVarPageMgr::PviNewVarPage+0x56
    06 00000000`1521e6a0 00000000`0100ca58 sqlservr!CVarPageMgr::PbAllocate+0x3ce
    07 00000000`1521e710 00000000`01004d37 sqlservr!CMemObj::Alloc+0xcc
    08 00000000`1521e780 00000000`0109b203 sqlservr!operator new+0x26
    09 00000000`1521e7c0 00000000`014607a1 sqlservr!operator new[]+0x23
    0a 00000000`1521e800 00000000`0145d4c8 sqlservr!CErrorReportingManager::CwchFormatToBuffer+0x180
    0b 00000000`1521e970 00000000`0128d202 sqlservr!CErrorReportingManager::CwchCallFormatToBuffer+0x18
    0c 00000000`1521e9b0 00000000`0128d0f8 sqlservr!CLoginClientInfo::CLoginClientInfo+0x71
    0d 00000000`1521ea30 00000000`01285265 sqlservr!IssueLoginSuccessReport+0x67
    0e 00000000`1521eb40 00000000`0128494f sqlservr!login+0x584
    0f 00000000`1521f480 00000000`01285610 sqlservr!process_login+0xee
    10 00000000`1521f840 00000000`010079fe sqlservr!process_commands+0x40d
    11 00000000`1521faf0 00000000`01008362 sqlservr!SOS_Task::Param::Execute+0xee
    12 00000000`1521fc00 00000000`01007da1 sqlservr!SOS_Scheduler::RunTask+0xc9
    13 00000000`1521fc90 00000000`014787a7 sqlservr!SOS_Scheduler::ProcessTasks+0xb4

    We note that sqlservr!BPool::ReplenishFreeList is common to a lot of the threads, and other BPool functions crop up frequently as well.

    At this point there is not much else to be gleaned from the dump without private symbols, however it's clear that there appears to be a significant number of threads waiting on what appear to be internal memory routines. In this case, we also have the fact that SQL Server itself has detected that it believes that memory allocations were the cause of the problem, which is why it has dumped out the DBCC MEMORYSTATUS to help us. The error log specifically states that it has "no free buffers".

    If you were to look through the whole of the DBCC MEMORYSTATUS and decode some of the latter items using this article as a guide

    http://support.microsoft.com/kb/907877

    then you would note that the buffer pool has been recently grown, and it is issuing a notification that it wished to grow further.

    So what next, is the question; you've noted that SQL Server appears to be having memory management problems, so maybe you think that you should add some more memory to alleviate this? Well in a word - "NO".

    The answer to this problem is that we have to look at the configuration of the hardware, and the configuration of SQL Server first (which in any troubleshooting scenario is always a good idea).

    What we saw here at the time of the dump was:

    MemoryLoad = 95%                    
    Total Physical = 16382 MB           
    Available Physical = 792 MB    

    And in fact over the preceding minutes before the dump we saw even lower memory availability than this, with physical utilization up to 98%. (this is something that we can extract from the dump with private symbols).

    So we do appear short of physical memory, but when we run sp_configure on the server, we see this:

    name                                minimum     maximum     config_value run_value
    ----------------------------------- ----------- ----------- ------------ -----------
    max server memory (MB)              16          2147483647  2147483647   2147483647

    Therefore the max server memory setting has not been set. Although SQL will dynamically attempt to manage its memory, there can be circumstances in large memory installations where if it runs unlimited you can get into situations where it ends up fighting with other processes or the OS for memory, and situations like this can occur. This is why we recommend running a "max server memory" setting in this type of scenario, to limit the size of the buffer pool to an amount lower than the total memory on the server itself. The most referenced article on this is probably here

    http://blogs.msdn.com/slavao/archive/2006/11/13/q-a-does-sql-server-always-respond-to-memory-pressure.aspx

    We see numerous cases revolving around this scenario and derivatives of it, and many can simply be avoided by limiting the memory of the SQL Server buffer pool with this setting, so that you don't ever get into this memory bound scenario. So the bottom line here is change the max server memory setting to something around the 13000 mark, and then monitor your memory usage to see if you're still encountering this type of problem, making further adjustments if necessary.

    If you have any questions surrounding this topic, feel free to post them.

  • SQL in Stockholm

    Kayaking and tempdb file placement

    • 0 Comments

    So, no posts for 10 weeks, but that's because sometimes there are other things going on in Stockholm as opposed to SQL Server support :-) Here's what I've been doing over the summer....

     Kayaking on Brunnsviken

    I've been fortunate enough to have several weeks paternity leave off to spend with the family and enjoy the Swedish summer. But I'm back to the technical stuff this week and of course to the release of SQL 2008.

    This week I've been working on some proactive advisory cases with Swedish customers, reviewing their installations for best practices and providing some tips and tricks. We do this quite frequently for Premier customers and it's good for all involved. Customers get some of their servers checked out in advance of any potential technical problems, and I get to stop things happening in advance, as opposed to my normal "fix it urgently" role :-)

    I looked back on my past few reports of this nature today, and I thought I'd just mention the one point that has come up in every review that I've done this year. This may just be co-incidence but maybe word isn't spread as well as it could be.

    In SQL 2005 there were significant numbers of enhancement to the implementation of tempdb, both from an implementation and a supportability standpoint. We now recommend that by default you create one tempdb data file per visible scheduler within the SQLOS, or in other words (for most people) one per logical processor core that you expose or affinitize to SQL Server. Then test and benchmark the results and do some fine tuning if you need be.

    Remember, you can look at the schedulers that have been created by looking in the following DMV

    select *

    from sys.dm_os_schedulers

    Here are the 2 main articles that I always point customers to on this tempdb issue:

    http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx

    http://msdn.microsoft.com/en-us/library/ms175527(SQL.90).aspx

    I'll be back in coming weeks with "proper troubleshooting".

Page 1 of 1 (2 items)