• Ntdebugging Blog

    Detecting and automatically dumping hung GUI based windows applications..

    • 11 Comments

    Written by Jeff Dailey 

    My name is Jeff,  I’m an Escalation Engineer on CPR Platforms team.  Following Tate’s blog on scoping hangs I’d like discus a common category of hangs and some creative ways to track them down.  I will be providing a couple of labs to go with this post that you can run and debug on your machine and I will also be showing you how to write a hang detection tool that will dump processes that go unresponsive.  In addition to this I will be writing several more blog entries about the various hang scenarios contained in the badwindows.zip that is included with this blog.

    GUI Hangs

    Sometimes a Windows application that is GUI (Graphical User Interface) based, that is to say uses windows, buttons, scroll bars etc. may stop responding (Not Responding status in Task Manager).  When this happens in most cases the rest of the operating system seems to continue functioning ok.  However the application does not repaint or respond to mouse clicks or key strokes.  Sometimes these types of problem may be transient.   Your app may hang once or twice a day for 10-30 seconds.  In other cases it may hang for long periods of time or never recover.

    To get a better understand of this scenario it’s important to understand that all GUI based Windows applications work by passing messages to one another via a message queues.   Each Windows application typically has a single main thread that is responsible for processes these messages.   Though the application may be multi threaded there will typically be one thread processes messages.  This functionally is normally implement in WinMain.    This thread does different tasks based on the messages it receives.   It could open a dialog, create anther thread, or take actions based on a mouse click of even send a message to another Windows application or applications.

    When your application stops responding it’s generally due to this thread making a blocking / synchronous call that takes too long.  If the thread is unable to pull incoming messages from the OS it will appear to be hung.   Most of the time once you have the dump of the process you can look at thread 0 by doing a ~0s in cdb or windbg.  Then do a KB and see what the thread is blocking on or possibly looping in that is preventing it from processing messages.  If thread 0 is not the thread processing your messages you may be able to find it by dumping all the thread stacks, ~*kb

    The problem is you may not be able to fire up cdb or windbg to get a dump in time.  Or you may have a non technical user community that does not know about debuggers or creating dumps.  In this case you can do what I sometimes do. 

    Create your own tool.

    That’s right.  Sometimes I will see a scenario that warrants a slightly more elegant solution and there is nothing more powerful than a determined engineer and a C complier.   

    What is required?  Visual Studio (The Express edition is free), Windows SDK (free), the debugger SDK (free with Debugging Tools Install), and a little knowledge of how Windows works.

    Let’s take a moment and think about what our ideal debug application will and won’t do.  

    1.     It will be easy to use and configure and use.

    2.     It will not break or negatively impact our operating system.  That is to say, it will not use much CPU or resources.

    3.     It will wait quietly for our desired condition (in this case a hung window) to manifest.

    4.     It will spring into action and gather the critical information about the state of our misbehaving application by creating a dump file without raising a fuss.

    5.     It will be multi user aware and not place dumps in insecure locations, this means the dumps will go in the user’s temp directory.

    6.     We will only collect a limited number of dump files so we do not fill up the hard drive.

    7.     It will notify the admin of a hang and dump event by putting a message in the event log. 

    8.     It will execute an optional binary when a hang is detected.

     

    Here are the details of how it will work.

    To keep things simple we will just create a console application.  The application will be called dumphungwindow.exe.  We will run in a loop until we collect the desired number of dump files.  We will wake up every so many seconds,  get the top most window, loop through each window sending it a message with the SendMessageTimeout  API and if any window takes more then what we signify as our timeout we will create a dump of that process and log an event in the event log.   

    I have the sample dumphungwindow.zip and badwindow.zip embedded within it available for download here, it has the EXEs and the visual studio 2005 project with all of the source.  The tool project is called dumphungwindow, and the test application is in a project called badwindow.  This project contains a lab with three different hang scenarios that cause a window to stop responding. 

    The command line options are as follows.

    C:\source\dumphungwindow\debug>dumphungwindow.exe /?
     This sample application shows you how to use the debugger
     help api to dump an application if it stop responding.

     This tool depends on dbghelp.dll, this comes with the Microsoft debugger tools on www.microsoft.com

     Please make sure you have the debugger tools installed before running this tool.
     This tool is based on sample source code and is provided as is without warranty.

     feel free to contact jeffda@microsoft.com to provide feedback on this sample application

     /m[Number] Default is 5 dumps

     The max number of dumps to take of hung windows before exiting.

     /t[Seconds]  Default is 5 seconds

     The number of seconds a window must hang before dumping it.

     /p[Seconds] Default is 0 seconds

     The number of seconds to pause when dumping before continuing scan.

     /s[Seconds] Default is 5 seconds.

     The scan interval in seconds to wait before rescanning all windows.

     /d[DUMP_FILE_PATH] The default is the SystemRoot folder

     The path or location to place the dump files.

     /e[EXECUTABLE NAME] This allows you to start another program if an application hangs

    To run the tool simply start dumphungwindow.exe  The output should look something like this.

    C:\source\dumphungwindow\debug>dumphungwindow.exe
    Dumps will be saved in C:\Users\jeff\AppData\Local\Temp\
    scanning for hung windows

    ****

    To start our bad application extract the badwindowapp.zip file contained in the dumphungwindows.zip

     

    Then run badwindow.exe and from the menu select hang \ hang type 2.

     

    After a few seconds findhungwindow should detect the unresponsive badwindow.exe and generate a dump.

     

    Hung Window found dumping process (7064) badwindow.exe

    Dumping unresponsive process

    C:\Users\jeffda\AppData\Local\Temp\HWNDDump_Day5_29_2007_Time10_36_38_Pid7064_badwindow.exe.dmp

     

     

    Please take a moment and review the source.  I’ve added comments that explain how we go about finding the hung window, and how we go about dumping it to a dump file you can open in windbg..

     

    Feel free to download and try out dumphungwindow against the badwindow.exe application.  Try looking at “hang type 1” first as that will be my next blog.  Over the coming weeks I’ll be writing about hang types 1,2 and 3 in the badwindow.exe application.   Once you have the dump file you can open it by inside of windbg via file \ open crash dump.  See Debugging tools for the install location.

     

    I hope you find this tool and sample helpful.

     

    Thank you Jeff-

     

     

    /********************************************************************************************************************

    Warranty Disclaimer

    --------------------------

    This sample code, utilities, and documentation are provided as is, without warranty of any kind. Microsoft further disclaims all

    implied warranties including without limitation any implied warranties of merchantability or of fitness for a particular  purpose.

    The entire risk arising out of the use or performance of the product and documentation remains with you.

     

    In no event shall Microsoft be liable for any damages whatsoever  (including, without limitation, damages for loss of business

    profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to

    use the sample code, utilities, or documentation, even if  Microsoft has been advised of the possibility of such damages.

    Because some states do not allow the exclusion or limitation of liability for consequential or incidental damages, the above

    limitation may not apply to you.

     

    ********************************************************************************************************************/

     

    #include <stdio.h>

    #include <windows.h>

    #include <dbghelp.h>

    #include <psapi.h>

     

    // don't warn about old school strcpy etc.

    #pragma warning( disable : 4996 )

     

    int iMaxDump=5;

    int iDumpsTaken=0;

    int iHangTime=5000;

    int iDumpPause=1;

    int iScanRate=5000;

    HANDLE hEventLog;

    char * szDumpLocation;

    int FindHungWindows(void);

    char * szDumpFileName = 0;

    char * szEventInfo = 0;

    char * szDumpFinalTarget = 0;

    char * szModName = 0;

    char * szAppname = 0;

    DWORD dwExecOnHang = 0;

     

    #define MAXDUMPFILENAME 1000

    #define MAXEVENTINFO 5000

    #define MAXDUMPFINALTARGET 2000

    #define MAXDUMPLOCATION 1000

    #define MAXAPPPATH 1000

    #define MAXMODFILENAME 500

    #define HMODSIZE 255

     

    int main(int argc, char * argv[])

    {

          int i;

          int z;

          size_t j;

          char scan;

     

          // check to make sure we have dbghelp.dll on the machine.

          if(!LoadLibrary("dbghelp.dll"))

          {

                printf("dbghelp.dll not found please install the debugger tools and place this tool in \r\nthe debugging tools directory or a copy of dbghelp.dll in this tools directory\r\n");

                return 0;

          }

     

          // Allocate a buffer for our dump location

          szDumpLocation = (char *)malloc(MAXDUMPLOCATION);

          {

                if(!szDumpLocation)

                {

                printf("Failed to alloc buffer for szdumplocation %d",GetLastError());

                return 0;

                }

          }

     

          szAppname = (char *)malloc(MAXAPPPATH);

          {

                if(!szAppname)

                {

                printf("Failed to alloc buffer for szAppname  %d",GetLastError());

                return 0;

                }

          }

     

          // We use temp path because if we are running under terminal server sessions we want the dump to go to each

          // users secure location, ie. there private temp dir. 

          GetTempPath(MAXDUMPLOCATION, szDumpLocation );

         

          for (z=0;z<argc;z++)

          {

                switch(argv[z][1])

                {

                case '?':

                      {

                      printf("\n This sample application shows you how to use the debugger \r\n help api to dump an application if it stop responding.\r\n\r\n");

                      printf("\n This tool depends on dbghelp.dll, this comes with the Microsoft debugger tools on www.microsoft.com");

                      printf("\n Please make sure you have the debugger tools installed before running this tool.");

                      printf("\n This tool is based on sample source code and is provided as is without warranty.");

                      printf("\n feel free to contact jeffda@microsoft.com to provide feedback on this sample application\r\n\r\n");

                      printf(" /m[Number] Default is 5 dumps\r\n The max number of dumps to take of hung windows before exiting.\r\n\r\n");

                      printf(" /t[Seconds]  Default is 5 seconds\r\n The number of seconds a window must hang before dumping it. \r\n\r\n");

                      printf(" /p[Seconds] Default is 0 seconds\r\n The number of seconds to pause when dumping before continuing scan. \r\n\r\n");

                      printf(" /s[Seconds] Default is 5 seconds.\r\n The scan interval in seconds to wait before rescanning all windows.\r\n\r\n");

                      printf(" /d[DUMP_FILE_PATH] The default is the SystemRoot folder\r\n The path or location to place the dump files.  \r\n\r\n");

                      printf(" /e[EXECUTABLE NAME] This allows you to start another program if an application hangs\r\n\r\n");

     

                      return 0;

                      }

                case 'm':

                case 'M':

                      {

                            iMaxDump = atoi(&argv[z][2]);

                            break;

                      }

                case 't':

                case 'T':

                      {

                            iHangTime= atoi(&argv[z][2]);

                            iHangTime*=1000;

                            break;

                      }

                case 'p':

                case 'P':

                      {

                            iDumpPause= atoi(&argv[z][2]);

                            iDumpPause*=1000;

                            break;           

                      }

                case 's':

                case 'S':

                      {

                            iScanRate = atoi(&argv[z][2]);

                            iScanRate*=1000;             

                            break;

                      }

                case 'd':

                case 'D':

                      { // Dump file directory path

                            strcpy(szDumpLocation,&argv[z][2]);

                            j = strlen(szDumpLocation);

     

                            if (szDumpLocation[j-1]!='\\')

                            {

                                  szDumpLocation[j]='\\';

                                  szDumpLocation[j+1]=NULL;

                            }

                            break;

                      }

                case 'e':

                case 'E':

                      { // applicaiton path to exec if hang happens

                            strcpy(szAppname,&argv[z][2]);

                            dwExecOnHang = 1;

                            break;

                      }

                }

          }

     

     

          printf("Dumps will be saved in %s\r\n",szDumpLocation);

          puts("scanning for hung windows\n");

     

          hEventLog = OpenEventLog(NULL, "HungWindowDump");

     

          i=0;

          scan='*';

          while(1)

          {

                if(i>20)

                {

                      if ('*'==scan)

                      {

                      scan='.';

                }

                      else

                      {

                      scan='*';

                }

                      printf("\r");

                i=0;

                }

                i++;

                putchar(scan);

                if(!FindHungWindows())

                {

                      return 0;

                }

                if (iMaxDump == iDumpsTaken)

                {

                      printf("\r\n%d Dumps taken, exiting\r\n",iDumpsTaken);

                      return 0;

                }

                Sleep(iScanRate);

          }

     

          free(szDumpLocation);

          return 0;

    }

     

    int FindHungWindows(void)

    {

    DWORD dwResult = 0;

    DWORD ProcessId = 0;

    DWORD tid = 0;

    DWORD dwEventInfoSize = 0;

     

    // Handles

    HWND hwnd = 0;

    HANDLE hDumpFile = 0;

    HANDLE hProcess = 0;

    HRESULT hdDump = 0;

     

    SYSTEMTIME SystemTime;

    MINIDUMP_TYPE dumptype = (MINIDUMP_TYPE) (MiniDumpWithFullMemory | MiniDumpWithHandleData | MiniDumpWithUnloadedModules | MiniDumpWithProcessThreadData);

     

    // These buffers are presistant.

     

    // security stuff to report the SID of the dumper to the event log.

    PTOKEN_USER pInstTokenUser;

    HANDLE ProcessToken;

    TOKEN_INFORMATION_CLASS TokenInformationClass = TokenUser;

    DWORD ReturnLength =0;

     

    // This allows us to get the first window in the chain of top windows.

    hwnd = GetTopWindow(NULL);

    if(!hwnd)

    {

          printf("Could not GetTopWindow\r\n");

          return 0;

    }

     

    // We will iterate through all windows until we get to the end of the list.

    while(hwnd)

    {

          // Get the process ID for the current window   

          tid = GetWindowThreadProcessId(hwnd, &ProcessId);

     

          // Sent a message to this window with our timeout. 

          // If it times out we consider the window hung

          if (!SendMessageTimeout(hwnd, WM_NULL, 0, 0, SMTO_BLOCK, iHangTime, &dwResult))

          {

                // SentMessageTimeout can fail for other reasons, 

                // if it's not a timeout we exit try again later

                if(ERROR_TIMEOUT != GetLastError())

                {

                      printf("SendMessageTimeout has failed with error %d\r\n",GetLastError());

                      return 1;

                }

                      // Iint our static buffers points.

                      // On our first trip through if we have not

                      // malloced memory for our buffers do so now.

                      if(!szModName)

                      {

                            szModName = (char *)malloc(MAXMODFILENAME);

                            {

                                  if(!szModName)

                                  {

                                  printf("Failed to alloc buffer for szModName %d",GetLastError());

                                  return 0;

                                  }

                            }

                      }

                      if(!szDumpFileName)// first time through malloc a buffer.

                      {

                            szDumpFileName = (char *)malloc(MAXDUMPFINALTARGET);

                            {

                                  if(!szDumpFileName)

                                  {

                                        printf("Failed to alloc buffer for dumpfilename %d",GetLastError());

                                        return 0;

                                  }

                            }

                      }

                      if(!szDumpFinalTarget)// first time through malloc a buffer.

                      {

                            szDumpFinalTarget= (char *)malloc(MAXDUMPFINALTARGET);

                            {

                                  if(!szDumpFinalTarget)

                                  {

                                  printf("Failed to alloc buffer for dumpfiledirectory %d",GetLastError());

                                  return 0;

                                  }

                            }

                      }

                      if(!szEventInfo)

                      {

                            szEventInfo= (char *)malloc(MAXEVENTINFO);

                            {

                                  if(!szEventInfo)

                                  {

                                  printf("Failed to alloc buffer for szEventInfo %d",GetLastError());

                                  return 0;

                                  }

                            }

                      }

                      // End of initial buffer allocations.

     

                GetLocalTime (&SystemTime);

               

                // Using the process id we open the process for various tasks.

                hProcess = OpenProcess(PROCESS_ALL_ACCESS,NULL,ProcessId);

                if(!hProcess )

                {

                      printf("Open process of hung window failed with error %d\r\n",GetLastError());

                      return 1;

                }

                // What is the name of the executable?

                GetModuleBaseName( hProcess, NULL, szModName,MAXMODFILENAME);

     

                printf("\r\n\r\nHung Window found dumping process (%d) %s\n",ProcessId,szModName);

     

                // Here we build the dump file name time, date, pid and binary name

                sprintf(szDumpFileName,"HWNDDump_Day%d_%d_%d_Time%d_%d_%d_Pid%d_%s.dmp",SystemTime.wMonth,SystemTime.wDay,SystemTime.wYear,SystemTime.wHour,SystemTime.wMinute,SystemTime.wSecond,ProcessId,szModName);

                strcpy(szDumpFinalTarget,szDumpLocation);

                strcat(szDumpFinalTarget,szDumpFileName);

     

                // We have to create the file and then pass it's handle to the dump api

                hDumpFile = CreateFile(szDumpFinalTarget,FILE_ALL_ACCESS,0,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL);

                if(!hDumpFile)

                {

                      printf("CreateFile failed to open dump file at location %s, with error %d\r\n",szDumpLocation,GetLastError());

                      return 0;

                }

     

                printf("Dumping unresponsive process\r\n%s",szDumpFinalTarget);

               

                // This dump api will halt the target process while it writes it's

                // image to disk in the form a dump file.

                // this can be opened later by windbg or cdb for debugging.

                if(!MiniDumpWriteDump(hProcess,ProcessId,hDumpFile,dumptype ,NULL,NULL,NULL))

                {

                      // We do this on failure

                      hdDump = HRESULT_FROM_WIN32(GetLastError());

                      printf("MiniDumpWriteDump failed with a hresult of %d last error %d\r\n",hdDump,GetLastError());

                      CloseHandle (hDumpFile);

                      return 0;

                }

                else

                {

                      // If we are here the dump worked.  Now we need to notify the machine admin by putting a event in

                      // the application event log so someone knows a dump was taken and where it is stored.

                      sprintf(szEventInfo,"An application hang was caught by findhungwind.exe, the process was dumped to %s",szDumpFinalTarget);

     

                      // We need to get the process token so we can get the user sit so ReportEvent will have the

                      // User name / account in the event log.

                      if (OpenProcessToken(hProcess,      TOKEN_QUERY,&ProcessToken ) )

                      {

                            // Make the firt call to findout how big the sid needs to be.    

                            GetTokenInformation(ProcessToken,TokenInformationClass, NULL,NULL,&ReturnLength);

                            pInstTokenUser = (PTOKEN_USER) malloc(ReturnLength);

                            if(!pInstTokenUser)

                            {

                                  printf("Failed to malloc buffer for InstTokenUser exiting error %d\r\n",GetLastError());

                                  return 0;

                            }

                            if(!GetTokenInformation(ProcessToken,TokenInformationClass, (VOID *)pInstTokenUser,ReturnLength,&ReturnLength))

                            {

                                  printf("GetTokenInformation failed with error %d\r\n",GetLastError());

                                  return 0;

                            }

                      }

                      // write the application event log message. 

                      // This will show up as source DumpHungWindow

                      dwEventInfoSize=(DWORD)strlen(szEventInfo);

         

                      ReportEvent(hEventLog,EVENTLOG_WARNING_TYPE,1,1,pInstTokenUser->User.Sid,NULL,dwEventInfoSize,NULL,szEventInfo);

     

                      // Free to token buffer, we don't want to leak anything.

                      free(pInstTokenUser);

                     

                      // In additon to leaking a handle if you don't close the handle

                      // you may not get the dump to flush to the hard drive.

                      CloseHandle (hDumpFile);

                      printf("\r\nDump complete");

                     

                      // This allows you to execute something if you get a hang like crash.exe

                      if (dwExecOnHang)

                      {

                            system(szAppname);

                      }

                     

                      //  The Sleep is here so in the event you want to wait N seconds

                      //  before collecting another dump

                      //  you can pause.  This is helpful if you want to see if any

                      //  forward progress is happening over time

                     

                      Sleep(iDumpPause);

                }

                // Once we are at our threadshold for max dumps

                // we exit so we do not fill up the hard drive.

                iDumpsTaken++;

                if (iMaxDump == iDumpsTaken)

                {

                      return 0;

                }

            }

            // This is where we traverse to the next window.

                hwnd = GetNextWindow(hwnd, GW_HWNDNEXT);

          }

          return 1;

    }

     

  • Ntdebugging Blog

    Scoping and Troubleshooting Hangs of Various Causes

    • 4 Comments

    Hi again!  Today I want to bring to your attention an upcoming series of posts on troubleshooting hangs and this post as a primer for understanding hangs and how we scope these scenarios.

     

    Scoping is a practice we use in troubleshooting that helps us to quickly narrow down the domain or scope of a problem from the entire operating system or enterprise to a specific computer and component.  This allows the elimination of millions of other possible problems or interactions.

     

    Hangs are a common and can be a sometimes lengthy support request because of the mere nature of the problem, and just describing it can be difficult.  “Okay, what do you mean, it’s hung?”  By nature I mean that some internal architecture knowledge is necessary to discover what component of the application or OS is not doing what is should and thus leaving us with either an unresponsive user interface or service or both.  So how do we isolate what is going on here?

     

    We will cover the main buckets or symptoms and I will list these in increasing depth or dependency into the OS below, in other words, moving from the Application Layer down into the OS.  But let’s scope first…the most important step!

     

    Scoping the Hang

    We can determine which bucket or symptom we are running into by testing increasing layers of the operating system (OSI stack). Meaning, what layer of the system is working and which ones are not.  The heart of this is to determine “What IS working properly and what IS NOT?”

    The following table outlines the layers and tools we usually use to determine their responsiveness.

     

    Functional Layer to Test

    Tools To Test

    Basic hardware + Network driver + Bottom of the network stack

    Does Ping work? Num Lock light on keyboard?

    SMB over Tcp/ip + Kernel as Server Service runs in the system process)

    Does Net view work?

    Rpc over Tcp/ip

    RPC? (Event Viewer, Remote Management, Event Log, or rpcping.exe)

    Rpc+Application

    COM/DCOM? (Dcomtest.exe)

     

    For example, if a machine is reported “hung” and we can ping it, and net view does not work (when it normally would) we should conclude that the server side of that request failure in most likely in the Server Service or one of its sub components.   This being the case it would not make sense to troubleshoot why myapplication.exe is hung on the same server if lower level things like the server service itself do not work which may be a direct dependency!

     

    Tip:  This is a scoping method we use in isolating all problems.  Look at the interaction of applications, services, the OS, drivers, etc. in light of their dependencies.  “Okay, A is failing not because of A but because B failed, because C failed, and aha here is root cause in D’s failure”.  Testing dependencies can yield considerable time savings vs. debugging “through” the application.  Another example, if an RPC dependent application stops working, testing RPC by using another RPC app might be the first thing to do vs. debugging the first app which could be very time consuming and require specific knowledge about that app.

    Here are some common scoping questions to help think about the context of the issue which could also isolate the problem quickly.

     

    • What is the smallest action we must take to recover from the hang?
    • How often does it occur?
    • Does it occur on a cycle? If so, what cycle?
    • What time did the last occurrence take place?
    • Was it under load at that time? 
    • What was that load?
    • When does it occur, at a particular time of day?
    • How long does the hang last?
    • Can we make it hang?
    • What else happened just before the hang?
    • What changed?
    • When did it start?
    • Relevant or timely errors in the Application or System logs?
    • Is the observation from the console or a remote (RDP or ICA) session?
    • Does the machine still hang if we disconnect from network?
    • Does Task Manager show a particular process taking up CPU?
    • Does the hang occur in Safe Mode or Safe Mode with Networking?

    Answering these simple questions may have obvious yet extremely helpful results.

     

    For example, if the machine is reported as hung and the observation was just made through a Remote Desktop (RDP) session, is it responsive at the console?  Let’s say it is responsive at the console, we must then conclude that only the Terminal Server Service layer or one of its unique dependencies (lower in the stack) is the problem vs. the entire server. Jump to Terminal Services specific troubleshooting, etc.

     

    Common Hang Buckets or Symptoms

    Using the above scoping usually leads to these main classes of hangs which we will cover in future posts:

     

    1.)    Specific Application Menu/Button/Function Hang

    The application looks “OK” in that it will repaint if we drag another window over it; however, if we click on a menu item or send a key stroke whatever functionality associated with it does not…function.

    2.)    Application Window Hang  “I’m not dead yet…just Not Responding

    The application stops responding entirely at the UI layer, meaning it no longer refreshes and dragging another window over the top does not repaint thus displays artifacts of other windows result.

    3.)    The Start Menu, Desktop, or the “Shell” is hung

    So here we know that the Microsoft process responsible for these windows, explorer.exe, is hung.

    4.)    All Windows are Hung…but Task Manager comes up eventually

    Here the mouse still moves, and if we hit Ctrl+Shift+Esc we can invoke task manager, or via Ctrl+Alt+Del.  This may not be a true hang, but slow or unresponsive enough to qualify or be reported as a hang!

    5.)    There are No Windows!

    I can move the Mouse and Keyboard but they don’t “do” anything and there’s just a blank desktop, no windows, it’s hung.

    In this case it may be that the server appears hung interactively while specialized services like file sharing, mail server, etc. still actually function…impending doom?

    6.)    No Windows + No Mouse/Keyboard + but the machine is still running, well, sort of…

    Obviously the most drastic of the symptoms leaving little recourse but a debug of the machine…which might be easier than it sounds!

    The server may or may not be responsive remotely via services, etc.

     

    In each of the upcoming posts expect to see for each symptom:

    Common Example(s)

    Scoping Steps (what works vs. what doesn’t in each scenario)

    Troubleshooting Steps

    Specific Debug Steps

     

    Please look forward to these installments in the New Year!


    -Tate

  • Ntdebugging Blog

    Desktop Heap Overview

    • 100 Comments

     

    Desktop heap is probably not something that you spend a lot of time thinking about, which is a good thing.  However, from time to time you may run into an issue that is caused by desktop heap exhaustion, and then it helps to know about this resource.  Let me state up front that things have changed significantly in Vista around kernel address space, and much of what I’m talking about today does not apply to Vista.

     

    Laying the groundwork: Session Space

    To understand desktop heap, you first need to understand session space.  Windows 2000, Windows XP, and Windows Server 2003 have a limited, but configurable, area of memory in kernel mode known as session space.  A session represents a single user’s logon environment.  Every process belongs to a session.  On a Windows 2000 machine without Terminal Services installed, there is only a single session, and session space does not exist.  On Windows XP and Windows Server 2003, session space always exists.  The range of addresses known as session space is a virtual address range.  This address range is mapped to the pages assigned to the current session.  In this manner, all processes within a given session map session space to the same pages, but processes in another session map session space to a different set of pages. 

    Session space is divided into four areas: session image space, session structure, session view space, and session paged pool.  Session image space loads a session-private copy of Win32k.sys modified data, a single global copy of win32k.sys code and unmodified data, and maps various other session drivers like video drivers, TS remote protocol driver, etc.  The session structure holds various memory management (MM) control structures including the session working set list (WSL) information for the session.  Session paged pool allows session specific paged pool allocations.  Windows XP uses regular paged pool, since the number of remote desktop connections is limited.  On the other hand, Windows Server 2003 makes allocations from session paged pool instead of regular paged pool if Terminal Services (application server mode) is installed.  Session view space contains mapped views for the session, including desktop heap. 

    Session Space layout:

    Session Image Space: win32k.sys, session drivers

    Session Structure: MM structures and session WSL

    Session View Space: session mapped views, including desktop heap

    Session Paged Pool

     

    Sessions, Window Stations, and Desktops

    You’ve probably already guessed that desktop heap has something to do with desktops.  Let’s take a minute to discuss desktops and how they relate to sessions and window stations.  All Win32 processes require a desktop object under which to run.  A desktop has a logical display surface and contains windows, menus, and hooks.  Every desktop belongs to a window station.  A window station is an object that contains a clipboard, a set of global atoms and a group of desktop objects.  Only one window station per session is permitted to interact with the user. This window station is named "Winsta0."  Every window station belongs to a session.  Session 0 is the session where services run and typically represents the console (pre-Vista).  Any other sessions (Session 1, Session 2, etc) are typically remote desktops / terminal server sessions, or sessions attached to the console via Fast User Switching.  So to summarize, sessions contain one or more window stations, and window stations contain one or more desktops.

    You can picture the relationship described above as a tree.  Below is an example of this desktop tree on a typical system:

    - Session 0

    |   |

    |   ---- WinSta0           (interactive window station)

    |   |      |

    |   |      ---- Default    (desktop)

    |   |      |

    |   |      ---- Disconnect (desktop)

    |   |      |

    |   |      ---- Winlogon   (desktop)

    |   |

    |   ---- Service-0x0-3e7$  (non-interactive window station)

    |   |      |

    |   |      ---- Default    (desktop)

    |   |

    |   ---- Service-0x0-3e4$  (non-interactive window station)

    |   |      |

    |   |      ---- Default    (desktop)

    |   |

    |   ---- SAWinSta          (non-interactive window station)

    |   |      |

    |   |      ---- SADesktop  (desktop)

    |   |

    - Session 1

    |   |

    |   ---- WinSta0           (interactive window station)

    |   |      |

    |   |      ---- Default    (desktop)

    |   |      |

    |   |      ---- Disconnect (desktop)

    |   |      |

    |   |      ---- Winlogon   (desktop)

    |   |

    - Session 2

        |

        ---- WinSta0           (interactive window station)

               |

               ---- Default    (desktop)

               |

               ---- Disconnect (desktop)

               |

               ---- Winlogon   (desktop)

     

    In the above tree, the full path to the SADesktop (as an example) can be represented as “Session 0\SAWinSta\SADesktop”.

     

    Desktop Heap – what is it, what is it used for?

    Every desktop object has a single desktop heap associated with it.  The desktop heap stores certain user interface objects, such as windows, menus, and hooks.  When an application requires a user interface object, functions within user32.dll are called to allocate those objects.  If an application does not depend on user32.dll, it does not consume desktop heap.  Let’s walk through a simple example of how an application can use desktop heap. 

    1.     An application needs to create a window, so it calls CreateWindowEx in user32.dll.

    2.     User32.dll makes a system call into kernel mode and ends up in win32k.sys.

    3.     Win32k.sys allocates the window object from desktop heap

    4.     A handle to the window (an HWND) is returned to caller

    5.     The application and other processes in the same session can refer to the window object by its HWND value

     

    Where things go wrong

    Normally this “just works”, and neither the user nor the application developer need to worry about desktop heap usage.  However, there are two primary scenarios in which failures related to desktop heap can occur:

    1. Session view space for a given session can become fully utilized, so it is impossible for a new desktop heap to be created.
    2. An existing desktop heap allocation can become fully utilized, so it is impossible for threads that use that desktop to use more desktop heap.

     

    So how do you know if you are running into these problems?  Processes failing to start with a STATUS_DLL_INIT_FAILED (0xC0000142) error in user32.dll is a common symptom.  Since user32.dll needs desktop heap to function, failure to initialize user32.dll upon process startup can be an indication of desktop heap exhaustion.  Another symptom you may observe is a failure to create new windows.  Depending on the application, any such failure may be handled in different ways.  Note that if you are experiencing problem number one above, the symptoms would usually only exist in one session.  If you are seeing problem two, then the symptoms would be limited to processes that use the particular desktop heap that is exhausted.

     

    Diagnosing the problem

    So how can you know for sure that desktop heap exhaustion is your problem?  This can be approached in a variety of ways, but I’m going to discuss the simplest method for now.  Dheapmon is a command line tool that will dump out the desktop heap usage for all the desktops in a given session.  See our first blog post for a list of tool download locations.  Once you have dheapmon installed, be sure to run it from the session where you think you are running out of desktop heap.  For instance, if you have problems with services failing to start, then you’ll need to run dheapmon from session 0, not a terminal server session.

    Dheapmon output looks something like this:

    Desktop Heap Information Monitor Tool (Version 7.0.2727.0)

    Copyright (c) 2003-2004 Microsoft Corp.

    -------------------------------------------------------------

      Session ID:    0 Total Desktop: (  5824 KB -    8 desktops)

     

      WinStation\Desktop            Heap Size(KB)    Used Rate(%)

    -------------------------------------------------------------

      WinSta0\Default                    3072              5.7

      WinSta0\Disconnect                   64              4.0

      WinSta0\Winlogon                    128              8.7

      Service-0x0-3e7$\Default            512             15.1

      Service-0x0-3e4$\Default            512              5.1

      Service-0x0-3e5$\Default            512              1.1

      SAWinSta\SADesktop                  512              0.4

      __X78B95_89_IW\__A8D9S1_42_ID       512              0.4

    -------------------------------------------------------------

     

    As you can see in the example above, each desktop heap size is specified, as is the percentage of usage.  If any one of the desktop heaps becomes too full, allocations within that desktop will fail.  If the cumulative heap size of all the desktops approaches the total size of session view space, then new desktops cannot be created within that session.  Both of the failure scenarios described above depend on two factors: the total size of session view space, and the size of each desktop heap allocation.  Both of these sizes are configurable. 

     

    Configuring the size of Session View Space

    Session view space size is configurable using the SessionViewSize registry value.  This is a REG_DWORD and the size is specified in megabytes.  Note that the values listed below are specific to 32-bit x86 systems not booted with /3GB.  A reboot is required for this change to take effect.  The value should be specified under:

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management

    OS

    Size if no registry value configured

    Default registry value

    Windows 2000 *

    20 MB

    none

    Windows XP

    20 MB

    48 MB

    Windows Server 2003

    20 MB

    48 MB

    * Settings for Windows 2000 are with Terminal Services enabled and hotfix 318942 installed.  Without the Terminal Services installed, session space does not exist, and desktop heap allocations are made from a fixed 48 MB region for system mapped views.  Without hotfix 318942 installed, the size of session view space is fixed at 20 MB.

    The sum of the sizes of session view space and session paged pool has a theoretical maximum of slightly under 500 MB for 32-bit operating systems.  The maximum varies based on RAM and various other registry values.  In practice the maximum value is around 450 MB for most configurations.  When the above values are increased, it will result in the virtual address space reduction of any combination of nonpaged pool, system PTEs, system cache, or paged pool.

     

    Configuring the size of individual desktop heaps

    Configuring the size of the individual desktop heaps is bit more complex.  Speaking in terms of desktop heap size, there are three possibilities:

    ·         The desktop belongs to an interactive window station and is a “Disconnect” or “Winlogon” desktop, so its heap size is fixed at 64KB or 128 KB, respectively (for 32-bit x86)

    ·         The desktop heap belongs to an interactive window station, and is not one of the above desktops.  This desktop’s heap size is configurable.

    ·         The desktop heap belongs to a non-interactive window station.  This desktop’s heap size is also configurable.

     

    The size of each desktop heap allocation is controlled by the following registry value:

                HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SubSystems\Windows

     

     The default data for this registry value will look something like the following (all on one line):

                   %SystemRoot%\system32\csrss.exe ObjectDirectory=\Windows

                   SharedSection=1024,3072,512 Windows=On SubSystemType=Windows

                   ServerDll=basesrv,1 ServerDll=winsrv:UserServerDllInitialization,3

                   ServerDll=winsrv:ConServerDllInitialization,2 ProfileControl=Off

                   MaxRequestThreads=16

                                                               

     

    The numeric values following "SharedSection=" control how desktop heap is allocated. These SharedSection values are specified in kilobytes.

    The first SharedSection value (1024) is the shared heap size common to all desktops. This memory is not a desktop heap allocation, and the value should not be modified to address desktop heap problems.

    The second SharedSection value (3072) is the size of the desktop heap for each desktop that is associated with an interactive window station, with the exception of the “Disconnect” and “Winlogon” desktops.

    The third SharedSection value (512) is the size of the desktop heap for each desktop that is associated with a "non-interactive" window station. If this value is not present, the size of the desktop heap for non-interactive window stations will be same as the size specified for interactive window stations (the second SharedSection value). 

    Consider the two desktop heap exhaustion scenarios described above.  If the first scenario is encountered (session view space is exhausted), and most of the desktop heaps are non-interactive, then the third SharedSection can be decreased in an effort to allow more (smaller) non-interactive desktop heaps to be created.  Of course, this may not be an option if the processes using the non-interactive heaps require a full 512 KB.  If the second scenario is encountered (a single desktop heap allocation is full), then the second or third SharedSection value can be increased to allow each desktop heap to be larger than 3072 or 512 KB.  A potential problem with this is that fewer total desktop heaps can be created.

     

    What are all these window stations and desktops in Session 0 anyway?

    Now that we know how to tweak the sizes of session view space and the various desktops, it is worth talking about why you have so many window stations and desktops, particularly in session 0.  First off, you’ll find that every WinSta0 (interactive window station) has at least 3 desktops, and each of these desktops uses various amounts of desktop heap.  I’ve alluded to this previously, but to recap, the three desktops for each interactive window stations are:

    ·         Default desktop - desktop heap size is configurable as described below

    ·         Disconnect desktop - desktop heap size is 64k on 32-bit systems

    ·         Winlogon desktop - desktop heap size is 128k on 32-bit systems

     

    Note that there can potentially be more desktops in WinSta0 as well, since any process can call CreateDesktop and create new desktops.

    Let’s move on to the desktops associated with non-interactive window stations: these are usually related to a service.  The system creates a window station in which service processes that run under the LocalSystem account are started. This window station is named service-0x0-3e7$. It is named for the LUID for the LocalSystem account, and contains a single desktop that is named Default. However, service processes that run as LocalSystem interactive start in Winsta0 so that they can interact with the user in Session 0 (but still run in the LocalSystem context).

    Any service process that starts under an explicit user or service account has a window station and desktop created for it by service control manager, unless a window station for its LUID already exists. These window stations are non-interactive window stations.  The window station name is based on the LUID, which is unique for every logon.  If an entity (other than System) logs on multiple times, a new window station is created for each logon.  An example window station name is “service-0x0-22e1$”.

    A common desktop heap issue occurs on systems that have a very large number of services.  This can be a large number of unique services, or one (poorly designed, IMHO) service that installs itself multiple times.  If the services all run under the LocalSystem account, then the desktop heap for Session 0\Service-0x0-3e7$\Default may become exhausted.  If the services all run under another user account which logs on multiples times, each time acquiring a new LUID, there will be a new desktop heap created for every instance of the service, and session view space will eventually become exhausted.

    Given what you now know about how service processes use window stations and desktops, you can use this knowledge to avoid desktop heap issues.  For instance, if you are running out of desktop heap for the Session 0\Service-0x0-3e7$\Default desktop, you may be able to move some of the services to a new window station and desktop by changing the user account that the service runs under.

     

    Wrapping up

    I hope you found this post interesting and useful for solving those desktop heap issues!  If you have questions are comments, please let us know.

     

    - Matthew Justice

     

    [Update: 7/5/2007 - Desktop Heap, part 2 has been posted]

    [Update: 9/13/2007 - Talkback video: Desktop Heap has been posted]

    [Update: 3/20/2008 - The default interactive desktop heap size has been increased on 32-bit Vista SP1]

     

  • Ntdebugging Blog

    Understanding Pool Consumption and Event ID: 2020 or 2019

    • 41 Comments

     

    Hi!  My name is Tate.  I’m an Escalation Engineer on the Microsoft Critical Problem Resolution Platforms Team.  I wanted to share one of the most common errors we troubleshoot here on the CPR team, its root cause being pool consumption, and the methods by which we can remedy it quickly!

     

    This issue is commonly misdiagnosed, however, 90% of the time it is actually quite possible to determine the resolution quickly without any serious effort at all!

     

     

    First, what do these events really mean?

     

    Event ID 2020
    Event Type: Error
    Event Source: Srv
    Event Category: None
    Event ID: 2020
    Description:
    The server was unable to allocate from the system paged pool because the pool was empty.

     

    Event ID 2019
    Event Type: Error
    Event Source: Srv
    Event Category: None
    Event ID: 2019
    Description:
    The server was unable to allocate from the system NonPaged pool because the pool was empty.

     

     

    This is our friend the Server Service reporting that when it was trying to satisfy a request, it was not able to find enough free memory of the respective type of pool.  2020 indicates Paged Pool and 2019, NonPaged Pool.  This doesn’t mean that the Server Service (srv.sys) is broken or the root cause of the problem, more often rather it is the first component to see the resource problem and report it to the Event Log.  Thus, there could be (and usually are) a few more symptoms of pool exhaustion on the system such as hangs, or out of resource errors reported by drivers or applications, or all of the above!

     

     

    What is Pool?

     

    First, Pool is not the amount of RAM on the system, it is however a segment of the virtual memory or address space that Windows reserves on boot.  These pools are finite considering address space itself is finite.  So, because 32bit(x86) machines can address 2^32==4Gigs, Windows uses (by default) 2GB for applications and 2GB for kernel.  Of the 2GB for kernel there are other things we must fit in our 2GB such as Page Table Entries (PTEs) and as such the maximum amount of Paged Pool for 32bit(x86) of ~460MB puts this in perspective in terms of our realistic limits per processor architecture.  As this implies, 64bit(x64&ia64) machines have less of a problem here due to their larger address space but there are still limits and thus no free lunch.

     

    *For more about determining current pool limits see the common question post “Why am I out of Paged Pool at ~200MB…” at the end of this post.

     

    *For more info about pools:  About Memory Management > Memory Pools

    *This has changed a bit for Vista, see Dynamic Kernel Address space

     

     

    What are these pools used for?

     

    These pools are used by either the kernel directly, indirectly by its support of various structures due to application requests on the system (CreateFile for example), or drivers installed on the system for their memory allocations made via the kernel pool allocation functions.

     

    Literally, NonPaged means that this memory when allocated will not be paged to disk and thus resident at all times, which is an important feature for drivers.  Paged conversely, can be, well… paged out to disk.  In the end though, all this memory is allocated through a common set of functions, most common is ExAllocatePoolWithTag.

     

     

    Ok, so what is using it/abusing it? (our goal right!?)

     

    Now that we know that the culprit is Windows or a component shipping with Windows, a driver, or an application requesting lots of things that the kernel has to create on its behalf, how can we find out which?

     

    There are really four basic methods that are typically used (listing in order of increasing difficulty)

     

    1.)    Find By Handle Count

     

    Handle Count?  Yes, considering that we know that an application can request something of the OS that it must then in turn create and provide a reference to…this is typically represented by a handle, and thus charged to the process’ total handle count!

     

    The quickest way by far if the machine is not completely hung is to check this via Task Manager.  Ctrl+Shift+Esc…Processes Tab…View…Select Columns…Handle Count.  Sort on Handles column now and check to see if there is a significantly large one there (this information is also obtainable via Perfmon.exe, Process Explorer, Handle.exe, etc.).

     

    What’s large?  Well, typically we should raise an eyebrow at anything over 5,000 or so.  Now that’s not to say that over this amount is inherently bad, just know that there is no free lunch and that a handle to something usually means that on the other end there is a corresponding object stored in NonPaged or Paged Pool which takes up memory.

     

    So for example let’s say we have a process that has 100,000 handles, mybadapp.exe.  What do we do next?

     

    Well, if it’s a service we could stop it (which releases the handles) or if an application running interactively, try to shut it down and look to see how much total Kernel Memory (Paged or NonPaged depending on which one we are short of) we get back.  If we were at 400MB of Paged Pool (Look at Performance Tab…Kernel Memory…Paged) and after stopping mybadapp.exe with its 100,000 handles are now at a reasonable 100MB, well there’s our bad guy and following up with the owner or further investigating (Process Explorer from sysinternals or the Windows debugger for example) what type of handles are being consumed would be the next step.

     

    Tip: 

    For essential yet legacy applications, which there is no hope of replacing or obtaining support, we may consider setting up a performance monitor alert on the handle count when it hits a couple thousand or so (Performance Object: Process, Counter: Handle Count) and taking action to restart the bad service.  This is a less than elegant solution for sure but it could keep the one rotten apple from spoiling the bunch by hanging/crashing the machine!

     

    2.)    By Pooltag (as read by poolmon.exe)

     

    Okay, so no handle count gone wild? No problem.

     

    For Windows 2003 and later machines, a feature is enabled by default that allows tracking of the pool consumer via something called a pooltag.  For previous OS’s we will need to use a utility such as gflags.exe to Enable Pool Tagging (which requires a reboot unfortunately).  This is usually just a 3-4 character string or more technically “a character literal of up to four characters delimited by single quotation marks” that the caller of the kernel api to allocate the pool will provide as its 3rd parameter.  (see ExAllocatePoolWithTag)

     

    The tool that we use to get the information about what pooltag is using the most is poolmon.exe.  Launch this from a cmd prompt, hit B to sort by bytes descending and P to sort the list by the type (Paged, NonPaged, or Both) and we have a live view into what’s going on in the system.  Look specifically at the Tag Name and its respective Byte Total column for the guilty party!  Get Poolmon.exe Here  or More info about poolmon.exe usage. 

     

    The cool thing is that we have most of the OS utilized pooltags already documented so we have an idea if there is a match for one of the Windows components in pooltag.txt.  So if we see MmSt as the top tag for instance consuming far and away the largest amount, we can look at pooltag.txt and know that it’s the memory manager and also using that tag in a search engine query we might get the more popular KB304101 which may resolve the issue!

     

    We will find pooltag.txt in the ...\Debugging Tools for Windows\triage folder when the debugging tools are installed.

     

    Oh no, what if it’s not in the list? No problem…

     

    We might be able to find its owner by using one of the following techniques:

     

    • For 32-bit versions of Windows, use poolmon /c to create a local tag file that lists each tag value assigned by drivers on the local machine (%SystemRoot%\System32\Drivers\*.sys). The default name of this file is Localtag.txt.

     

    Really all versions---->• For Windows 2000 and Windows NT 4.0, use Search to find files that contain a specific pool tag, as described in KB298102, How to Find Pool Tags That Are Used By Third-Party Drivers.

    From:  http://www.microsoft.com/whdc/driver/tips/PoolMem.mspx

     

     

     

    3.)    Using Driver Verifier

     

    Using driver verifier is a more advanced approach to this problem.  Driver Verifier provides a whole suite of options targeted mainly at the driver developer to run what amounts to quality control checks before shipping their driver.

     

    However, should pooltag identification be a problem, there is a facility here in Pool Tracking that does the heavy lifting in that it will do the matching of Pool consumer directly to driver!

     

    Be careful however, the only option we will likely want to check is Pool Tracking as the other settings are potentially costly enough that if our installed driver set is not perfect on the machine we could get into an un-bootable situation with constant bluescreens notifying that xyz driver is doing abc bad thing and some follow up suggestions.

     

    In summary, Driver Verifier is a powerful tool at our disposal but use with care only after the easier methods do not resolve our pool problems.

     

    4.)    Via Debug (live and postmortem)

     

    As mentioned earlier the api being used here to allocate this pool memory is usually ExAllocatePoolWithTag.  If we have a kernel debugger setup we can set a break point here to brute force debug who our caller is….but that’s not usually how we do it, can you say, “extended downtime?”  There are other creative live debug methods with are a bit more advanced that we may post later…

     

    Usually, debugging this problem involves a post mortem memory.dmp taken from a hung server or a machine that has experienced Event ID:  2020 or Event ID 2019 or is no longer responsive to client requests, hung, or often both.  We can gather this dump via the Ctrl+Scroll Lock method see KB244139 , even while the machine is “hung” and seemingly unresponsive to the keyboard or Ctrl+Alt+Del !

     

    When loading the memory.dmp via windbg.exe or kd.exe we can quickly get a feel for the state of the machine with the following commands.

     

    Debugger output Example 1.1  (the !vm command)

     

    2: kd> !vm 
    *** Virtual Memory Usage ***
      Physical Memory:   262012   ( 1048048 Kb)
      Page File: \??\C:\pagefile.sys
         Current:   1054720Kb Free Space:    706752Kb
         Minimum:   1054720Kb Maximum:      1054720Kb
      Page File: \??\E:\pagefile.sys
         Current:   2490368Kb Free Space:   2137172Kb
         Minimum:   2490368Kb Maximum:      2560000Kb
      Available Pages:    63440   (  253760 Kb)
      ResAvail Pages:    194301   (  777204 Kb)
      Modified Pages:       761   (    3044 Kb)
      NonPaged Pool Usage: 52461   (  209844 Kb)<<NOTE!  Value is near NonPaged Max
      NonPaged Pool Max:   54278   (  217112 Kb)
      ********** Excessive NonPaged Pool Usage *****

     

    Note how the NonPaged Pool Usage value is near the NonPaged Pool Max value.  This tells us that we are basically out of NonPaged Pool.

     

    Here we can use the !poolused command to give the same information that poolmon.exe would have but in the dump….

     

    Debugger output Example 1.2  (!poolused 2)

     

    Note the 2 value passed to !poolused orders pool consumers by NonPaged

     

    2: kd> !poolused 2
       Sorting by NonPaged Pool Consumed
      Pool Used:
                NonPaged            Paged
    Tag    Allocs     Used    Allocs     Used
    Thre   120145 76892800         0        0
    File   187113 29946176         0        0
    AfdE    89683 25828704         0        0
    TCPT    41888 18765824         0        0
    AfdC    90964 17465088         0        0 

     

    We now see the “Thre” tag at the top of the list, the largest consumer of NonPaged Pool, let’s go look it up in pooltag.txt….

     

    Thre - nt!ps        - Thread objects

     

    Note, the nt before the ! means that this is NT or the kernel’s tag for Thread objects.

    So from our earlier discussion if we have a bunch of thread objects, I probably have an application on the system with a ton of handles and or a ton of Threads so it should be easy to find!

     

    Via the debugger we can find this out easily via the !process 0 0 command which will show the TableSize (Handle Count) of over 90,000!

     

    Debugger output Example 1.3  (the !process command continued)

     

    Note the two zeros after !process separated by a space gives a list of all running processes on the system.

     

     

    PROCESS 884e6520  SessionId: 0  Cid: 01a0    Peb: 7ffdf000  ParentCid: 0124
    DirBase: 110f6000  ObjectTable: 88584448  TableSize: 90472
    Image: mybadapp.exe

     

    We can dig further here into looking at the threads…

     

    Debugger output Example 1.4  (the !process command continued)

     

    0: kd> !PROCESS 884e6520 4
    PROCESS 884e6520  SessionId: 0  Cid: 01a0    Peb: 7ffdf000  ParentCid: 0124
    DirBase: 110f6000  ObjectTable: 88584448  TableSize: 90472.
    Image: mybadapp.exe
            THREAD 884d8560  Cid 1a0.19c  Teb: 7ffde000  Win32Thread: a208f648 WAIT
            THREAD 88447560  Cid 1a0.1b0  Teb: 7ffdd000  Win32Thread: 00000000 WAIT
            THREAD 88396560  Cid 1a0.1b4  Teb: 7ffdc000  Win32Thread: 00000000 WAIT
            THREAD 88361560  Cid 1a0.1bc  Teb: 7ffda000  Win32Thread: 00000000 WAIT
            THREAD 88335560  Cid 1a0.1c0  Teb: 7ffd9000  Win32Thread: 00000000 WAIT
            THREAD 88340560  Cid 1a0.1c4  Teb: 7ffd8000  Win32Thread: 00000000 WAIT
     And the list goes on…

     

    We can examine the thread via !thread 88340560 from here and so on…

     

    So in this rudimentary example the offender is clear in mybadapp.exe in its abundance of threads and one could dig further to determine what type of thread or functions are being executed and follow up with the owner of this executable for more detail, or take a look at the code if the application is yours!

     

     

     

    Common Question:

     

    Why am I out of Paged Pool at ~200MB when we say that the limit is around 460MB?

     

    This is because the memory manager at boot decided that given the current amount of RAM on the system and other memory manager settings such as /3GB, etc. that our max is X amount vs. the maximum.  There are two ways to see the maximum’s on a system.

     

    1.)   Process Explorer using its Task Management.  View…System Information…Kernel Memory section.

     

    Note that we have to specify a valid path to dbghelp.dll and Symbols path via Options…Configure Symbols.

     

    For example:

     

          Dbghelp.dll path:

    c:\<path to debugging tools for windows>\dbghelp.dll

     

    Symbols path:

    SRV*C:\websymbols*http://msdl.microsoft.com/download/symbols

     

    2.)The debugger (live or via a memory.dmp by doing a !vm)

     

    *NonPaged pool size is not configurable other than the /3GB boot.ini switch which lowers NonPaged Pool’s maximum.

    128MB with the /3GB switch, 256MB without

     

    Conversely, Paged Pool size is often able to be raised to around its maximum manually via the PagedPoolSize registry setting which we can find for example in KB304101.

     

     

    So what is this Pool Paged Bytes counter I see in Perfmon for the Process Object?

     

    This is when the allocation is charged to a process via ExAllocatePoolWithQuotaTag.  Typically, we will see ExAlloatePoolWithTag used and thus this counter is less effective…but hey…don’t pass up free information in Perfmon so be on the lookout for this easy win.

     

     

    Additional Resources:

     

     “Who's Using the Pool?” from Driver Fundamentals > Tips: What Every Driver Writer Needs to Know

    http://www.microsoft.com/whdc/driver/tips/PoolMem.mspx

     

    Poolmon Remarks:  http://technet2.microsoft.com/WindowsServer/en/library/85b0ba3b-936e-49f0-b1f2-8c8cb4637b0f1033.mspx

     

     

     

     

     I hope you have enjoyed this post and hopefully it will get you going in the right direction next time you see one of these events or hit a pool consumption issue!

     

    -Tate

     

  • Ntdebugging Blog

    Getting Ready for Windows Debugging

    • 8 Comments

     

    Welcome to the Microsoft NTDebugging blog!  I’m Matthew Justice, an Escalation Engineer on Microsoft’s Platforms Critical Problem Resolution (CPR) team.  Our team will be blogging about troubleshooting Windows problems at a low level, often by using the Debugging Tools for Windows.  For more information about us and this blog, check out the about page.

     

    To get things started I want to provide you with a list of tools that we’ll be referencing in our upcoming blog posts, as well as links to some technical documents to help you get things configured.

     

    The big list of tools:

     

    The following tools are part of the “Debugging Tools for Windows” – you’ll definitely need these

    http://www.microsoft.com/whdc/devtools/debugging/

    ·         windbg

    ·         cdb

    ·         ntsd

    ·         tlist

    ·         gflags

    ·         adplus

    ·         UMDH

    ·         symcheck

     

    Sysinternals provides some great tools that we’ll be discussing

    http://www.sysinternals.com

    ·         Process Explorer

    ·         Process Monitor

    ·         Regmon

    ·         Filemon

    ·         DbgView

    ·         Handle.exe

    ·         Tcpview

    ·         LiveKD

    ·         AutoRuns

    ·         WinObj

     

    There are many tools contained in “MPS Reports” (MPSRPT_SETUPPerf.EXE), but I’m listing it here specifically for Checksym

    http://www.microsoft.com/downloads/details.aspx?FamilyID=CEBF3C7C-7CA5-408F-88B7-F9C79B7306C0&displaylang=en

    ·         Checksym

     

    “Windows Server 2003 Resource Kit Tools” is another great set of tools.  In particular Kernrate is a part of that package

    http://www.microsoft.com/downloads/details.aspx?displaylang=en&familyid=9D467A69-57FF-4AE7-96EE-B18C4790CFFD

    ·         Kernrate

     

    Windows XP SP2 Support Tools

    http://www.microsoft.com/downloads/details.aspx?FamilyID=49AE8576-9BB9-4126-9761-BA8011FABF38&displaylang=en

    ·         netcap

    ·         poolmon

    ·         memsnap

    ·         tracefmt  (64-bit versions available in the DDK)

    ·         tracelog

    ·         tracepdb

    ·         depends

    ·         pstat

     

    “Visual Studio “ – in addition to the compilers and IDE, the following tools come in handy:

    ·         SPY++

    ·         dumpbin

     

    Perfwiz (Performance Monitor Wizard)

    http://www.microsoft.com/downloads/details.aspx?FamilyID=31fccd98-c3a1-4644-9622-faa046d69214&DisplayLang=en

     

    DebugDiag

    http://www.iis.net/handlers/895/ItemPermaLink.ashx

     

    Userdump (User Mode Process Dumper)

    http://www.microsoft.com/downloads/details.aspx?FamilyID=E089CA41-6A87-40C8-BF69-28AC08570B7E&displaylang=en

     

    Dheapmon (Desktop Heap Monitor)

    http://www.microsoft.com/downloads/details.aspx?familyid=5CFC9B74-97AA-4510-B4B9-B2DC98C8ED8B&displaylang=en

     

    Netmon 3.0

    §  Go to http://connect.microsoft.com/

    §  Sign in with your passport account

    §  Choose "Available Connections" on the left

    §  Choose "Apply for Network Monitor 3.0” (once you've finished with the application, the selection appears in your "My Participation" page)

    §  Go to the Downloads page (On the left side), and select the appropriate build 32 or 64 bit build.

     

     

     

    Some articles you may find useful:

     

    Debugging Tools and Symbols: Getting Started

    http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

     

    Boot Parameters to Enable Debugging

    http://msdn2.microsoft.com/en-us/library/ms791527.aspx

     

    How to Generate a Memory Dump File When a Server Stops Responding (Hangs)

    http://support.microsoft.com/kb/303021/

     

    After installing the “Debugging Tools for Windows”, you’ll find two documents at the root of the install folder that are helpful:

     

    ·         kernel_debugging_tutorial.doc - A guide to help you get started using the kernel debugger.

     

    ·         debugger.chm - The help file for the debuggers.  It details the commands you can use in the debugger.  Think of this as a reference manual, rather than a tutorial.





Page 24 of 24 (235 items) «2021222324