• Ntdebugging Blog

    The Memory Shell Game


    Hello, this is Somak.  Today I’d like to drop some Memory Manager info on the blog that I’ve used to communicate in brief (believe it or not) how the system deals with memory.  If you are ever faced with checking how much Available Memory you have(or don’t have), poor system performance, questions about page faults, or having a Working Set Trimming performance issue, or just want a primer into how Windows manages memory on the system, be not afraid and read on!

    How this memory stuff works

    The fastest and most expensive memory is built into the CPU.  The next fastest and less expensive is physical RAM.  Next we have the hard drive, followed by remote storage or backup.  Each step down the ladder adds significantly to access time.  For example, physical RAM can be almost one million times faster than a hard disk.  Since it is volatile and cost considerably more than hard disk space, we are limited at how much we can put in the system.  This limit is far less than the much slower and non-volatile hard disk.  Due to these design constraints we have a tiered memory structure in the computer.  To achieve the fastest response times and best overall performance, an operating system must efficiently use this tiered memory structure.  It must do its best to reduce the need to retrieve data from a slower storage media like the hard disk.  It has to do this while juggling the memory and I/O demand from all running processes.  The following paragraphs are an overview of how the Memory Manager achieves this in Windows.  A more detailed description can be found in Chapter 7 of Microsoft Windows Internals, Fourth Edition (ISBN: 0-7356-1917-4).  This book is the great source for how stuff works in Windows.

    First let’s lay out some definitions.  They will be useful later on when I talk about the interactions.  These definitions are high level to maintain brevity.

    Virtual Memory – This is a memory that an operating system can address.  Regardless of the amount of physical RAM or hard drive space, this number is limited by your processor architecture.  On a 32 bit processor you are limited to 4 GB of addressable virtual memory (2^32).  With a default installation on a 32 bit box (not using /3GB) the kernel reserves 2GB for itself.  Applications are left with 2GB of addressable virtual memory.  When applications execute they are only presented with this 2GB of addressable memory.  Each application gets its own 2GB virtual memory to play with.  If you have 50 processes running, you’ll have 50 independent 2GB Virtual Memory address spaces and one 2GB Virtual Address space for kernel.  This is possible because Virtual Memory always exists, but doesn't really exist (hence the term virtual).  We basically lie to the application and say, here is 2GB of memory address space for you to use.  This memory isn’t allocated until the application explicitly uses it.  Once the application uses the page, it becomes committed.  A virtual memory page is then translated to a physical memory page.  From this translation, the virtual page can reside in physical RAM or on the hard disk.

    Physical Memory –  This is the physical storage media.  It can be physical RAM, the hard disk, optical disks, tape backups, etc.  This is anything that can store data.  Most times when people talk about Physical Memory, they refer to physical RAM (the memory sticks on your motherboard), but with virtual page translation, physical memory can also be on the hard drive (in your paging file).  Physical RAM is limited by your processor architecture.

    Committed Memory –  When an application touches a virtual memory page (reads/write/programmatically commits) the page becomes a committed page.  It is now backed by a physical memory page.  This will usually be a physical RAM page, but could eventually be a page in the page file on the hard disk, or it could be a page in a memory mapped file on the hard disk.  The memory manager handles the translations from the virtual memory page to the physical page.   A virtual page could be in located in physical RAM, while the page next to it could be on the hard drive in the page file.

    Commit Limit –  This is the maximum amount of memory that all your applications and the OS can commit.  If you had 50 applications fully allocate their 2 GB of virtual address space, you would need 100GB of commit limit (ignore kernel memory usage to keep the numbers simple).  So 100GB of committed pages can be backed by physical RAM or the hard drive.  If you have 100 GB of RAM, you could handle this memory load.   In most cases, 100 GB of RAM isn't economically feasible so the Commit Limit is comprised of physical RAM and the page file.  If you have 2 GB of physical RAM and 98 GB of page file, then your commit limit would be 100 GB.

    Page file –   This is the storage area for virtual memory that has been committed.  It is located on the hard drive.  Since hard drive space is cheaper than physical RAM, it is an inexpensive way to increase the commit limit.

    Working Set – This is a set of virtual memory pages (that are committed) for a process and are located in physical RAM.  These pages fully belong to the process.  A working set is like a "currently/recently working on these pages" list.

    Modified pages - Once a virtual memory page leaves the process's working set, it is moved to another list.  If the page has been modified, it is placed on the modified page list.  This page is in physical RAM.  A thread will then write the page to the page file and move it to the standby list.

    Standby pages - This is a page that has left the process' working set.  This page is in physical RAM.  A standby page is like a cache for virtual memory pages.  It is still associated with the process, but not in its working set.  If the process touches the page, it is quickly faulted back into the working set.  That page also has one foot out the door.  If another process or cache needs more memory, the process association is broken and it is moved to the free page list.  Most of the pages in available memory are actually standby pages.  This makes sense when you realize that these pages can be quickly given to another process (hence available), but you should also understand that they are page caches for working sets and can be quickly given back if the process touches the page again.  The vast majority of available memory is not wasted or empty memory.

    Free pages - When a page is taken off of the standby page list, it is moved to the Free page list.  This page is in physical RAM.  These pages are not associated with any process.   When a process exits, all of its pages are then dumped onto this list. Typically, there is a very small to no amount of free pages hanging around physical RAM. 

    Zeroed pages - When a free page is zeroed out, it is placed on the Zero page list.  This page is in physical RAM.  These are the pages that are given to processes that are making memory allocations.  Due to C2 security requirements, all pages must be scrubbed before handed to a new process.  When the system is idle, a thread will scrub free pages and put them on this list.  Only a small amount of zero pages are required to handle the typical small memory allocations of processes.  Once this list is depleted, and if there is demand for more pages, we pull pages off of the Free page list and scrub them on the fly.  If the Free page list is depleted, then we pull pages off of the standby list, scrub them on the fly and hand them to the new process.

    What is Task Manager telling me?

    Prior to Windows Vista, Task Manager reports memory usage using accounting methods that you probably are not expecting.  It is because of these accounting practices, that we rarely use Task Manager to gauge system performance and memory usage.  We typically use it for a quick overview or to kill processes.  I highly recommend using Performance Monitor (perfmon.msc) for investigating performance issues.  Here's the breakdown of the numbers on the Performance tab:

    Physical Memory

    Total - The is the total physical RAM installed in the system.

    Available - This is the total of the Standby, Free and Zeroed list.  Free and Zeroed makes sense, but Standby seems odd at first.  Standby pages were added to this number because they are available for a quick scrub and given to a new process.  So they are technically available (with minimal effort).

    System Cache- This is the total of the Standby list and the size of the system working set (which includes the file cache).  Standby pages are added to this list because they are cached pages for working sets.

    PF Usage - This is the total number of committed pages on the system.  It does not tell you how many are actually written to the page file.  It only tells you how much of the page file would be used if all committed pages had to be written out to the page file at the same time.

    Commit Charge

    Total - This is the total virtual memory that has been committed.  This includes all committed memory for all processes and the kernel.

    Limit - This is the maximum amount of committed memory this system can handle.  This is a combination of physical RAM and the page file.

    Peak – This is the highest amount of memory committed thus far on this system, since boot.

    How does this work?

    The memory manager optimizes physical RAM usage across the entire system.  Since physical RAM is a finite resource, it has to balance sharing this critical resource amongst all process, the kernel and file I/O.   It tries to keep disk I/O to a minimum, which results in a more responsive system.  It does this by moving pages around to meet the demand of the system.

    Typically, large sections of physical RAM are used for file cache.  This is cache is necessary to improve disk performance.  Without it, disk I/O would make the system crawl along at an nearly unusable pace.  The file system cache is just like a working set for a process.  Pages removed from the file cache are moved to the standby or modified page list.  Many of the standby pages in RAM are probably file cache pages that were removed from its working set.  For example, on a file server, if you see 8 GB of available memory, most of these pages are probably standby pages for the file cache.  The file cache's working set could be 500 MB, but the 8 GB of standby pages should also be considered part of the file cache.

    Now let's take a look at how the memory manager handles processes.  While an application is working with its virtual memory pages, the memory manager keeps the pages in the process' working set.  Since the vast majority of application do not use all of its memory all the time, some pages will age.  Old pages are removed from the working set.  If they are modified, they are moved to the modified list.  The page is saved to the page file and moved to the standby list.  If the page hasn't been modified, it is moved directly to the standby list.  These pages will remain on the standby page list until there is a demand for it.

    If the application touches the page again, it is soft faulted back into the process' working set.  If the process doesn't use the page for a very long time, or if the demand for the page is greater elsewhere, the page is moved off of the standby list.  It is disassociated with the process and moved to a the free page list.  From the free page list, the page is scrub on demand or lazily and placed on the zero page list.  It is from the zero page list that other processes or the kernel or the file cache will get a new page.

    If after a very long time the application once again needs a page that is not in its working set, the memory manager will handle the memory fault.  If the page is on the standby list, it is quickly put back into the process' working set.  If the page is no longer in the standby list, a hard fault occurs.  The memory manager issues I/O to the hard disk to read the page(s) from the page file.  Once the I/O complete, the page is placed back into the process' work set.

    All of this is done to keep physical RAM highly utilized and disk I/O to a minimum.  We don't want to allow process to horde physical RAM for pages that are rarely used.  The physical RAM must be shared with other processes, the kernel and the file cache.  If you see lots of available memory on your system, rest assured that it is not going to waste.  The vast majority is on standby lists for processes and the file cache.

    Also note that page file usage isn't that bad.  The page file allows the Memory Manager to save modified pages before placing the page on the standby list.  The page is still in physical RAM and can be quickly faulted back into the process.  This method gives the process a chance to reclaim an old page and it allows the page to be quickly used if there is demand elsewhere.

    The best way to see the totals of these lists is to use a kernel debugger (live or postmortem).  Use the !memusage command and you'll get an output like this:

    0: kd> !memusage

    loading PFN database

    loading (100% complete)

    Compiling memory usage data (99% Complete).

    Zeroed:    414 (  1656 kb)

                   Free:      2 (     8 kb)

    Standby: 864091 (3456364 kb)

    Modified:    560 (  2240 kb)

        ModifiedNoWrite:     30 (   120 kb)

    Active/Valid: 182954 (731816 kb)

             Transition:      2 (     8 kb)

                    Bad:      0 (     0 kb)

                Unknown:      0 (     0 kb)

    TOTAL: 1048053 (4192212 kb)

    Of the 4GB of physical RAM, only 1.6 MB are on Zeroed or free pages.  731 MB is in process, system and file cache working sets.  2 MB are on the modified page list.  The vast majority, 3.4 GB, is on the standby list.  On this server, most people will see 3.4 GB of wasted physical RAM, but you will know better.

    What should I be worried about?

    Typically you shouldn't be worried about these things, until you have a performance problem.  If your system is sluggish or slow to respond or you are getting errors about out of memory, then you need to rely on this information.  You will need to collect a performance monitor log of the problem time.  If the counter list is daunting, then use the Performance Monitor Wizard to configure the performance monitor log.

    Once the log is collected, you'll need to analyze several counters.  I'm not going into detail about how to review performance monitor logs this time.  I'll save that lengthy topic for another time.  For now I'll focus on the counters relevant to the Memory Manager.

    One of the biggest reasons for slow performance and sluggish system responsiveness is disk bottleneck.  Look at Physical Disk\% Idle Time, Avg. Disk sec/Read and Avg. Disk sec/Write counters.  If your system drive is under 50% idle or your disk response times are way above your drive specifications, then you need to investigate further.  Look at the Memory\Available Mbytes.  You should have a couple hundred Mbytes of Available Memory.  This is one of the most important performance monitor counters.  If this number drops too low, your standby lists, process working sets and cache will be greatly reduced.  You'll need to find out if a process is consuming physical RAM.  Check for large process working sets or for large file cache.

    You will also need to see if paging is really affecting system performance.  Take a look at Memory\Pages Input/sec and correlate that to Physical Disk\Avg. Disk sec/Read.  Pages Input/sec is the number of pages being read in from the page file.  These are the hard faults (when the page wasn't on the standby list).  If your Avg. Disk sec/Read is close to your drive's specification and the drive's idle time is high, than paging really isn't a problem.  Small amounts of hard faults are expected as applications will every once in a while re-touch an old page.  As long as this I/O is not consistent or the disk can't keep up, you probably will not notice this impact.

    You can also look at Memory\Pages Output/sec and Physical Disk\Avg. Disk sec/Write.  These are the page commits to the page file when a modified page is removed from a process' working set.  As long as the disk can keep up with the writes, this shouldn't be a problem.  Remember that once the page is saved to the page file, it is placed on the standby list.  If there isn't a great demand for new pages, it can remain on the standby list for a very long time.  When the process touches the old virtual page again, it can be soft faulted back into the working set.  If there is great demand for memory, you'll see process working sets aggressively being trimmed.  Unless there is memory pressure, this is all done with lazy I/O, so you should not see much of an impact from this activity.

    The Memory Manager works to meet current demand and prepares for future demand when possible.  You need to look at a performance monitor log to see if there is memory pressure on the system.  You'll see this in low Available Mbytes and reductions in process working sets.  You'll be able to correlate this to increase disk I/O to the page file.  If you have established that there is memory pressure on the box, you need to figure where that demand is coming from.  Check for working set increases from processes, file cache or bottlenecked disk I/O to data drives.

  • Ntdebugging Blog

    Detecting and automatically dumping hung GUI based windows applications..


    Written by Jeff Dailey 

    My name is Jeff,  I’m an Escalation Engineer on CPR Platforms team.  Following Tate’s blog on scoping hangs I’d like discus a common category of hangs and some creative ways to track them down.  I will be providing a couple of labs to go with this post that you can run and debug on your machine and I will also be showing you how to write a hang detection tool that will dump processes that go unresponsive.  In addition to this I will be writing several more blog entries about the various hang scenarios contained in the badwindows.zip that is included with this blog.

    GUI Hangs

    Sometimes a Windows application that is GUI (Graphical User Interface) based, that is to say uses windows, buttons, scroll bars etc. may stop responding (Not Responding status in Task Manager).  When this happens in most cases the rest of the operating system seems to continue functioning ok.  However the application does not repaint or respond to mouse clicks or key strokes.  Sometimes these types of problem may be transient.   Your app may hang once or twice a day for 10-30 seconds.  In other cases it may hang for long periods of time or never recover.

    To get a better understand of this scenario it’s important to understand that all GUI based Windows applications work by passing messages to one another via a message queues.   Each Windows application typically has a single main thread that is responsible for processes these messages.   Though the application may be multi threaded there will typically be one thread processes messages.  This functionally is normally implement in WinMain.    This thread does different tasks based on the messages it receives.   It could open a dialog, create anther thread, or take actions based on a mouse click of even send a message to another Windows application or applications.

    When your application stops responding it’s generally due to this thread making a blocking / synchronous call that takes too long.  If the thread is unable to pull incoming messages from the OS it will appear to be hung.   Most of the time once you have the dump of the process you can look at thread 0 by doing a ~0s in cdb or windbg.  Then do a KB and see what the thread is blocking on or possibly looping in that is preventing it from processing messages.  If thread 0 is not the thread processing your messages you may be able to find it by dumping all the thread stacks, ~*kb

    The problem is you may not be able to fire up cdb or windbg to get a dump in time.  Or you may have a non technical user community that does not know about debuggers or creating dumps.  In this case you can do what I sometimes do. 

    Create your own tool.

    That’s right.  Sometimes I will see a scenario that warrants a slightly more elegant solution and there is nothing more powerful than a determined engineer and a C complier.   

    What is required?  Visual Studio (The Express edition is free), Windows SDK (free), the debugger SDK (free with Debugging Tools Install), and a little knowledge of how Windows works.

    Let’s take a moment and think about what our ideal debug application will and won’t do.  

    1.     It will be easy to use and configure and use.

    2.     It will not break or negatively impact our operating system.  That is to say, it will not use much CPU or resources.

    3.     It will wait quietly for our desired condition (in this case a hung window) to manifest.

    4.     It will spring into action and gather the critical information about the state of our misbehaving application by creating a dump file without raising a fuss.

    5.     It will be multi user aware and not place dumps in insecure locations, this means the dumps will go in the user’s temp directory.

    6.     We will only collect a limited number of dump files so we do not fill up the hard drive.

    7.     It will notify the admin of a hang and dump event by putting a message in the event log. 

    8.     It will execute an optional binary when a hang is detected.


    Here are the details of how it will work.

    To keep things simple we will just create a console application.  The application will be called dumphungwindow.exe.  We will run in a loop until we collect the desired number of dump files.  We will wake up every so many seconds,  get the top most window, loop through each window sending it a message with the SendMessageTimeout  API and if any window takes more then what we signify as our timeout we will create a dump of that process and log an event in the event log.   

    I have the sample dumphungwindow.zip and badwindow.zip embedded within it available for download here, it has the EXEs and the visual studio 2005 project with all of the source.  The tool project is called dumphungwindow, and the test application is in a project called badwindow.  This project contains a lab with three different hang scenarios that cause a window to stop responding. 

    The command line options are as follows.

    C:\source\dumphungwindow\debug>dumphungwindow.exe /?
     This sample application shows you how to use the debugger
     help api to dump an application if it stop responding.

     This tool depends on dbghelp.dll, this comes with the Microsoft debugger tools on www.microsoft.com

     Please make sure you have the debugger tools installed before running this tool.
     This tool is based on sample source code and is provided as is without warranty.

     feel free to contact jeffda@microsoft.com to provide feedback on this sample application

     /m[Number] Default is 5 dumps

     The max number of dumps to take of hung windows before exiting.

     /t[Seconds]  Default is 5 seconds

     The number of seconds a window must hang before dumping it.

     /p[Seconds] Default is 0 seconds

     The number of seconds to pause when dumping before continuing scan.

     /s[Seconds] Default is 5 seconds.

     The scan interval in seconds to wait before rescanning all windows.

     /d[DUMP_FILE_PATH] The default is the SystemRoot folder

     The path or location to place the dump files.

     /e[EXECUTABLE NAME] This allows you to start another program if an application hangs

    To run the tool simply start dumphungwindow.exe  The output should look something like this.

    Dumps will be saved in C:\Users\jeff\AppData\Local\Temp\
    scanning for hung windows


    To start our bad application extract the badwindowapp.zip file contained in the dumphungwindows.zip


    Then run badwindow.exe and from the menu select hang \ hang type 2.


    After a few seconds findhungwindow should detect the unresponsive badwindow.exe and generate a dump.


    Hung Window found dumping process (7064) badwindow.exe

    Dumping unresponsive process




    Please take a moment and review the source.  I’ve added comments that explain how we go about finding the hung window, and how we go about dumping it to a dump file you can open in windbg..


    Feel free to download and try out dumphungwindow against the badwindow.exe application.  Try looking at “hang type 1” first as that will be my next blog.  Over the coming weeks I’ll be writing about hang types 1,2 and 3 in the badwindow.exe application.   Once you have the dump file you can open it by inside of windbg via file \ open crash dump.  See Debugging tools for the install location.


    I hope you find this tool and sample helpful.


    Thank you Jeff-




    Warranty Disclaimer


    This sample code, utilities, and documentation are provided as is, without warranty of any kind. Microsoft further disclaims all

    implied warranties including without limitation any implied warranties of merchantability or of fitness for a particular  purpose.

    The entire risk arising out of the use or performance of the product and documentation remains with you.


    In no event shall Microsoft be liable for any damages whatsoever  (including, without limitation, damages for loss of business

    profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to

    use the sample code, utilities, or documentation, even if  Microsoft has been advised of the possibility of such damages.

    Because some states do not allow the exclusion or limitation of liability for consequential or incidental damages, the above

    limitation may not apply to you.




    #include <stdio.h>

    #include <windows.h>

    #include <dbghelp.h>

    #include <psapi.h>


    // don't warn about old school strcpy etc.

    #pragma warning( disable : 4996 )


    int iMaxDump=5;

    int iDumpsTaken=0;

    int iHangTime=5000;

    int iDumpPause=1;

    int iScanRate=5000;

    HANDLE hEventLog;

    char * szDumpLocation;

    int FindHungWindows(void);

    char * szDumpFileName = 0;

    char * szEventInfo = 0;

    char * szDumpFinalTarget = 0;

    char * szModName = 0;

    char * szAppname = 0;

    DWORD dwExecOnHang = 0;


    #define MAXDUMPFILENAME 1000

    #define MAXEVENTINFO 5000


    #define MAXDUMPLOCATION 1000

    #define MAXAPPPATH 1000

    #define MAXMODFILENAME 500

    #define HMODSIZE 255


    int main(int argc, char * argv[])


          int i;

          int z;

          size_t j;

          char scan;


          // check to make sure we have dbghelp.dll on the machine.



                printf("dbghelp.dll not found please install the debugger tools and place this tool in \r\nthe debugging tools directory or a copy of dbghelp.dll in this tools directory\r\n");

                return 0;



          // Allocate a buffer for our dump location

          szDumpLocation = (char *)malloc(MAXDUMPLOCATION);




                printf("Failed to alloc buffer for szdumplocation %d",GetLastError());

                return 0;




          szAppname = (char *)malloc(MAXAPPPATH);




                printf("Failed to alloc buffer for szAppname  %d",GetLastError());

                return 0;




          // We use temp path because if we are running under terminal server sessions we want the dump to go to each

          // users secure location, ie. there private temp dir. 

          GetTempPath(MAXDUMPLOCATION, szDumpLocation );


          for (z=0;z<argc;z++)




                case '?':


                      printf("\n This sample application shows you how to use the debugger \r\n help api to dump an application if it stop responding.\r\n\r\n");

                      printf("\n This tool depends on dbghelp.dll, this comes with the Microsoft debugger tools on www.microsoft.com");

                      printf("\n Please make sure you have the debugger tools installed before running this tool.");

                      printf("\n This tool is based on sample source code and is provided as is without warranty.");

                      printf("\n feel free to contact jeffda@microsoft.com to provide feedback on this sample application\r\n\r\n");

                      printf(" /m[Number] Default is 5 dumps\r\n The max number of dumps to take of hung windows before exiting.\r\n\r\n");

                      printf(" /t[Seconds]  Default is 5 seconds\r\n The number of seconds a window must hang before dumping it. \r\n\r\n");

                      printf(" /p[Seconds] Default is 0 seconds\r\n The number of seconds to pause when dumping before continuing scan. \r\n\r\n");

                      printf(" /s[Seconds] Default is 5 seconds.\r\n The scan interval in seconds to wait before rescanning all windows.\r\n\r\n");

                      printf(" /d[DUMP_FILE_PATH] The default is the SystemRoot folder\r\n The path or location to place the dump files.  \r\n\r\n");

                      printf(" /e[EXECUTABLE NAME] This allows you to start another program if an application hangs\r\n\r\n");


                      return 0;


                case 'm':

                case 'M':


                            iMaxDump = atoi(&argv[z][2]);



                case 't':

                case 'T':


                            iHangTime= atoi(&argv[z][2]);




                case 'p':

                case 'P':


                            iDumpPause= atoi(&argv[z][2]);




                case 's':

                case 'S':


                            iScanRate = atoi(&argv[z][2]);




                case 'd':

                case 'D':

                      { // Dump file directory path


                            j = strlen(szDumpLocation);


                            if (szDumpLocation[j-1]!='\\')







                case 'e':

                case 'E':

                      { // applicaiton path to exec if hang happens


                            dwExecOnHang = 1;







          printf("Dumps will be saved in %s\r\n",szDumpLocation);

          puts("scanning for hung windows\n");


          hEventLog = OpenEventLog(NULL, "HungWindowDump");








                      if ('*'==scan)















                      return 0;


                if (iMaxDump == iDumpsTaken)


                      printf("\r\n%d Dumps taken, exiting\r\n",iDumpsTaken);

                      return 0;






          return 0;



    int FindHungWindows(void)


    DWORD dwResult = 0;

    DWORD ProcessId = 0;

    DWORD tid = 0;

    DWORD dwEventInfoSize = 0;


    // Handles

    HWND hwnd = 0;

    HANDLE hDumpFile = 0;

    HANDLE hProcess = 0;

    HRESULT hdDump = 0;


    SYSTEMTIME SystemTime;

    MINIDUMP_TYPE dumptype = (MINIDUMP_TYPE) (MiniDumpWithFullMemory | MiniDumpWithHandleData | MiniDumpWithUnloadedModules | MiniDumpWithProcessThreadData);


    // These buffers are presistant.


    // security stuff to report the SID of the dumper to the event log.

    PTOKEN_USER pInstTokenUser;

    HANDLE ProcessToken;

    TOKEN_INFORMATION_CLASS TokenInformationClass = TokenUser;

    DWORD ReturnLength =0;


    // This allows us to get the first window in the chain of top windows.

    hwnd = GetTopWindow(NULL);



          printf("Could not GetTopWindow\r\n");

          return 0;



    // We will iterate through all windows until we get to the end of the list.



          // Get the process ID for the current window   

          tid = GetWindowThreadProcessId(hwnd, &ProcessId);


          // Sent a message to this window with our timeout. 

          // If it times out we consider the window hung

          if (!SendMessageTimeout(hwnd, WM_NULL, 0, 0, SMTO_BLOCK, iHangTime, &dwResult))


                // SentMessageTimeout can fail for other reasons, 

                // if it's not a timeout we exit try again later

                if(ERROR_TIMEOUT != GetLastError())


                      printf("SendMessageTimeout has failed with error %d\r\n",GetLastError());

                      return 1;


                      // Iint our static buffers points.

                      // On our first trip through if we have not

                      // malloced memory for our buffers do so now.



                            szModName = (char *)malloc(MAXMODFILENAME);




                                  printf("Failed to alloc buffer for szModName %d",GetLastError());

                                  return 0;




                      if(!szDumpFileName)// first time through malloc a buffer.


                            szDumpFileName = (char *)malloc(MAXDUMPFINALTARGET);




                                        printf("Failed to alloc buffer for dumpfilename %d",GetLastError());

                                        return 0;




                      if(!szDumpFinalTarget)// first time through malloc a buffer.


                            szDumpFinalTarget= (char *)malloc(MAXDUMPFINALTARGET);




                                  printf("Failed to alloc buffer for dumpfiledirectory %d",GetLastError());

                                  return 0;






                            szEventInfo= (char *)malloc(MAXEVENTINFO);




                                  printf("Failed to alloc buffer for szEventInfo %d",GetLastError());

                                  return 0;




                      // End of initial buffer allocations.


                GetLocalTime (&SystemTime);


                // Using the process id we open the process for various tasks.

                hProcess = OpenProcess(PROCESS_ALL_ACCESS,NULL,ProcessId);

                if(!hProcess )


                      printf("Open process of hung window failed with error %d\r\n",GetLastError());

                      return 1;


                // What is the name of the executable?

                GetModuleBaseName( hProcess, NULL, szModName,MAXMODFILENAME);


                printf("\r\n\r\nHung Window found dumping process (%d) %s\n",ProcessId,szModName);


                // Here we build the dump file name time, date, pid and binary name





                // We have to create the file and then pass it's handle to the dump api

                hDumpFile = CreateFile(szDumpFinalTarget,FILE_ALL_ACCESS,0,NULL,CREATE_ALWAYS,FILE_ATTRIBUTE_NORMAL,NULL);



                      printf("CreateFile failed to open dump file at location %s, with error %d\r\n",szDumpLocation,GetLastError());

                      return 0;



                printf("Dumping unresponsive process\r\n%s",szDumpFinalTarget);


                // This dump api will halt the target process while it writes it's

                // image to disk in the form a dump file.

                // this can be opened later by windbg or cdb for debugging.

                if(!MiniDumpWriteDump(hProcess,ProcessId,hDumpFile,dumptype ,NULL,NULL,NULL))


                      // We do this on failure

                      hdDump = HRESULT_FROM_WIN32(GetLastError());

                      printf("MiniDumpWriteDump failed with a hresult of %d last error %d\r\n",hdDump,GetLastError());

                      CloseHandle (hDumpFile);

                      return 0;




                      // If we are here the dump worked.  Now we need to notify the machine admin by putting a event in

                      // the application event log so someone knows a dump was taken and where it is stored.

                      sprintf(szEventInfo,"An application hang was caught by findhungwind.exe, the process was dumped to %s",szDumpFinalTarget);


                      // We need to get the process token so we can get the user sit so ReportEvent will have the

                      // User name / account in the event log.

                      if (OpenProcessToken(hProcess,      TOKEN_QUERY,&ProcessToken ) )


                            // Make the firt call to findout how big the sid needs to be.    

                            GetTokenInformation(ProcessToken,TokenInformationClass, NULL,NULL,&ReturnLength);

                            pInstTokenUser = (PTOKEN_USER) malloc(ReturnLength);



                                  printf("Failed to malloc buffer for InstTokenUser exiting error %d\r\n",GetLastError());

                                  return 0;


                            if(!GetTokenInformation(ProcessToken,TokenInformationClass, (VOID *)pInstTokenUser,ReturnLength,&ReturnLength))


                                  printf("GetTokenInformation failed with error %d\r\n",GetLastError());

                                  return 0;



                      // write the application event log message. 

                      // This will show up as source DumpHungWindow





                      // Free to token buffer, we don't want to leak anything.



                      // In additon to leaking a handle if you don't close the handle

                      // you may not get the dump to flush to the hard drive.

                      CloseHandle (hDumpFile);

                      printf("\r\nDump complete");


                      // This allows you to execute something if you get a hang like crash.exe

                      if (dwExecOnHang)





                      //  The Sleep is here so in the event you want to wait N seconds

                      //  before collecting another dump

                      //  you can pause.  This is helpful if you want to see if any

                      //  forward progress is happening over time




                // Once we are at our threadshold for max dumps

                // we exit so we do not fill up the hard drive.


                if (iMaxDump == iDumpsTaken)


                      return 0;



            // This is where we traverse to the next window.

                hwnd = GetNextWindow(hwnd, GW_HWNDNEXT);


          return 1;



  • Ntdebugging Blog

    How Windows Starts Up (part 1 of 4)


    Hi folks, my name is David and I’m an Escalation Engineer for Microsoft.  Since Bryan wrote about How Windows Shuts Down, I thought it would be a good idea to cover How Windows Starts Up.


    This information applies specifically to Windows 2000, Windows XP, and Windows Server 2003.  I will blog separately on the boot changes in Windows Vista.


    Additional information about this topic may be found in Chapter 5 of Microsoft Windows Internals by Russinovich and Solomon and also on TechNet:  http://technet.microsoft.com/en-us/library/bb457123.aspx



    The key to understanding and troubleshooting system startup issues is to accurately ascertain the point of failure.  To facilitate this determination, I have divided the boot process into the following four phases.


                    1. Initial

                    2. Boot Loader

                    3. Kernel

                    4. Logon


    Over the next few weeks, I’ll be describing each of these phases in detail and providing appropriate guidance for troubleshooting relevant issues for each phase. 


    Initial Phase

    The Initial Phase of boot is divided into the Power-On Self Test (POST) and Initial Disk Access. 


    Power-On Self Test

    POST activities are fully implemented by the computer’s BIOS and vary by manufacturer.  Please refer to the technical documentation provided with your hardware for details.  However, regardless of manufacturer, certain generic actions are performed to ensure stable voltage, check RAM, enable interrupts for system usage, initialize the video adapter, scan for peripheral cards and perform a memory test if necessary.  Depending on manufacturer and configuration, a single beep usually indicates a successful POST. 


    Troubleshooting the POST

    ü  Make sure you have the latest BIOS and firmware updates for the hardware installed in the system.

    ü  Replace the CMOS battery if it has failed.

    ü  Investigate recently added hardware (RAM, Video cards, SCSI adapters, etc.)

    ü  Remove recently added RAM modules.

    ü  Remove all adapter cards, then replace individually, ensuring they are properly seated.

    ü  Move adapter cards to other slots on the motherboard.

    ü  If the computer still will not complete POST, contact your manufacturer.


    Initial Disk Access

    Depending on the boot device order specified in your computer’s BIOS, your computer may attempt to boot from a CD-ROM, Network card, Floppy disk, USB device or a hard disk.  For the purposes of this document, we’ll assume that we’re booting to a hard disk since that is the most common scenario.


    Before we discuss the sequence of events that occur during this phase of startup, we need to understand a little bit about the layout of the boot disk.  The structure of the hard disk can be visualized this way:  (Obviously, these data areas are not to scale)




    Hard disks are divided into Cylinders, Heads and Sectors.  A sector is the smallest physical storage unit on a disk and is almost always 512 bytes in size.  For more information about the physical structure of a hard disk, please refer to the following Resource Kit chapter:  http://www.microsoft.com/resources/documentation/windowsnt/4/server/reskit/en-us/resguide/diskover.mspx


    There are two disk sectors critical to starting the computer that we’ll be discussing in detail:  


    ·         Master Boot Record (MBR)

    ·         Boot Sector


    The MBR is always located at Sector 1 of Cylinder 0, Head 0 of each physical disk.   The Boot Sector resides at Sector 1 of each partition.  These sectors contain both executable code and the data required to run the code.


    Please note that there is some ambiguity involved in sector numbering.  Cylinder/Head/Sector (CHS) notation begins numbering at C0/H0/S1.  However, Absolute Sector numbering begins numbering at zero.  Absolute Sector numbering is often used in disk editing utilities such as DskProbe.  These differences are discussed in the following knowledge base article:

    Q97819  Ambiguous References to Sector One and Sector Zero



    Now that we have that straight, when does this information get written to disk?  The MBR is created when the disk is partitioned.  The Boot Sector is created when you format a volume.  The MBR contains a small amount of executable code called the Master Boot Code, the Disk Signature and the partition table for the disk.  At the end of the MBR is a 2-byte structure called a Signature Word or End of Sector marker, which should always be set to 0x55AA.  A Signature Word also marks the end of an Extended Boot Record (EBR) and the Boot Sector.


    The Disk Signature, a unique number at offset 0x1B8, identifies the disk to the operating system.  Windows 2000 and higher operating systems use the disk signature as an index to store and retrieve information about the disk in the registry subkey HKLM\System\MountedDevices.


    The Partition Table is a 64-byte data structure within the MBR used to identify the type and location of partitions on a hard disk.  Each partition table entry is 16 bytes long (four entries max).  Each entry starts at a predetermined offset from the beginning of the sector as follows:


                                    Partition 1                           0x1BE    (446)

                                    Partition 2                           0x1CE    (462)

                                    Partition 3                           0x1DE    (478)

                                    Partition 4                           0x1EE    (494)


    The following is a partial example of a sample MBR showing three partition table entries in-use and one empty:


    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00 00 00

    000001D0: 81 0A 07 FE FF FF 8A F5 – 7F 00 3D 26 9C 00 00 00

    000001E0: C1 FF 05 FE FF FF C7 1B – 1C 01 D6 96 92 00 00 00

    000001F0: 00 00 00 00 00 00 00 00 – 00 00 00 00 00 00


    Let’s take a look at each of the fields of a partition table entry individually.  For each of these explanations, I’ll use the first partition table entry above and highlight the relevant section.  Keep in mind that these values are little-endian.


    Byte Offset

    Field Length

    Sample Value

    Field Description


    8 Bits


    Boot Indicator

                    00=Do Not Use for Booting

                    80=Active partition (Use for Booting)

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00     



    8 Bits


    Starting Head

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00




    2 Byte



    Starting Sector

    Only the first 6 bits are used.  The upper 2 bits of this byte are used by the Starting Cylinder field.


    Starting Cylinder

    Uses 1 byte + 2 bits from the Starting Sector field to make up the cylinder value.  The Starting Cylinder is a 10-bit number with a maximum value of 1023.

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00





    8 Bits


    System ID

    Defines the volume type.  0x07=NTFS

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00


    Other Possible System ID Values:  

    Partition Type

    ID Value


    FAT12 primary partition or logical drive (fewer than 32,680 sectors in the volume)


    FAT16 partition or logical drive (32,680–65,535 sectors or 16 MB–33 MB)


    Extended partition


    BIGDOS FAT16 partition or logical drive (33 MB–4 GB)


    Installable File System (NTFS partition or logical drive)


    FAT32 partition or logical drive


    FAT32 partition or logical drive using BIOS INT 13h extensions


    BIGDOS FAT16 partition or logical drive using BIOS INT 13h extensions


    Extended partition using BIOS INT 13h extensions


    EISA partition


    Dynamic disk volume


    Legacy FT FAT16 disk *


    Legacy FT NTFS disk *


    Legacy FT volume formatted with FAT32 *


    Legacy FT volume using BIOS INT 13h extensions formatted with FAT32 *


    Partition types denoted with an asterisk (*) indicate that they are also used to designate non-FT configurations such as striped and spanned volumes.



    8 Bits


    Ending Head      (0xFE=254 decimal)

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00




    2 Byte



    Ending Sector

    As with the Starting Sector, it only uses the first 6 bits of the byte.  The upper 2 bits are used by the Ending Cylinder field.


    Ending Cylinder

    Uses 1 byte in addition to the upper 2 bits from the Ending Sector field to make up the cylinder value.  The Ending Cylinder is a 10-bit number with a maximum value of 1023.

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00



    32 Bits


    Relative Sectors

    The offset from the beginning of the disk to the beginning of the volume, counting by sectors.

    0x0000003F = 63

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00




    32 Bits


    Total Sectors

    The total number of sectors in the volume.

    0x007FF54B = 8,385,867 Sectors = 4GB

    000001B0:                                             80 01

    000001C0: 01 00 07 FE BF 09 3F 00 - 00 00 4B F5 7F 00


    Are you with me so far?  Good!  Now, Cylinder/Sector encoding can be a bit tricky, so let’s take a closer look. 


    Cylinder - Sector Encoding

















    Cylinder bits 1-8

    Cyl bits

    9 & 10

    Sector value

    Bits 1-6


    As you can see, the Sector value occupies the lower 6 bits of the word and the Cylinder occupies the upper 10 bits of the word.  In our example, the starting values for Cylinder and Sector are 01 00.  Values in the MBR use reverse-byte ordering, which is also referred to as ‘little-endian’ notation.  Therefore, we swap the bytes and find that our starting values are Cyl 0, Sec 1.


    Our ending values are more interesting:  BF 09.  First, we swap the bytes and obtain a hex value of 0x09BF.  This value in binary notation is 100110111111.  The following table illustrates how we derive the correct partition table values from these bytes:


    Example:  BF 09

































    10 Cylinder value bits 1-8

    Cyl bits

    9 & 10

    Sector value

    bits 1-6


    The 6 low bits are all set to 1, therefore our Sector value is 111111 or 63.   You can see above how the bits are arranged for the Cylinder value.  The value above is 1000001001 (521).  Since both Cylinder and Head values begin numbering at zero, we have a total of 522 Cylinders and 255 Heads represented here.  This gives us an ending CHS value of:  522 x 255 x 63 = 8,385,930 sectors. 


    Subtracting the starting CHS address (Cylinder 0, Head 1, Sector 1) (63) gives us the total size of this partition:  8,385,867 sectors or 4GB.  We can verify this number by comparing it to the Total Sectors represented in the partition table:  4B F5 7F 00.  Applying reverse-byte ordering gives us 00 7F F5 4B which equals 8,385,867 sectors.


    So, now that we have an understanding of what is contained within the structures on the disk, let’s look at the sequence of events that occur.  Remember, this is just after POST has successfully completed.


    1.       The motherboard ROM BIOS attempts to access the first boot device specified in the BIOS.  (This is typically user configurable and can be edited using the BIOS configuration utility.)


    2.       The ROM BIOS reads Cylinder 0, Head 0, and Sector 1 of the first boot device.


    3.       The ROM BIOS loads that sector into memory and tests it.

    a.       For a floppy drive, the first sector is a FAT Boot Sector.

    b.      For a hard drive, the first sector is the Master Boot Record (MBR).


    4.       When booting to the hard drive, the ROM BIOS looks at the last two signature bytes of Sector 1 and verifies they are equal to 55AA.

    a.       If the signature bytes do not equal 55AA the system assumes that the MBR is corrupt or the hard drive has never been partitioned.  This invokes BIOS Interrupt 18, which displays an error that is BIOS-vendor specific such as “Operating System not found”.

    b.      If the BIOS finds that the last two bytes are 55AA, the MBR program executes.


    5.       The MBR searches the partition table for a boot indicator byte 0x80 indicating an active partition.

    a.       If no active partition is found, BIOS Interrupt 18 is invoked and a BIOS error is displayed such as “Operating System not found”.

    b.      If any boot indicator in the MBR has a value other than 0x80 or 0x00, or if more than one boot indicator indicates an active partition (0x80), the system stops and displays “Invalid partition table”.

    c.       If an active partition is found, that partition’s Boot Sector is loaded and tested.


    6.       The MBR loads the active partition’s Boot Sector and tests it for 55AA.

    a.       If the Boot Sector cannot be read after five retries, the system halts and displays “Error loading operating system”.

    b.      If the Boot Sector can be read but is missing its 55AA marker, “Missing operating system” is displayed and the system halts.

    c.       The bootstrap area is made up of a total of 16 sectors (0-15).  If sector 0 for the bootstrap code is valid, but there is corruption in sectors 1-15, you may get a black screen with no error.  In this case, it may be possible to transplant sectors 1-15 (not Sector 0) from another NTFS partition using DskProbe.

    d.      If the Boot Sector is loaded and it passes the 55AA test, the MBR passes control over to the active partition’s Boot Sector.


    7.       The active partition’s Boot Sector begins executing and looks for NTLDR.  The contents of the Boot Sector are entirely dependent on the format of the partition.   For example, if the boot partition is a FAT partition, Windows writes code to the Boot Sector that understands the FAT file system.  However, if the partition is NTFS, Windows writes NTFS-capable code.  The role of the Boot Sector code is to give Windows just enough information about the structure and format of a logical drive to enable it to read the NTLDR file from the root directory.   The errors at this point of the boot process are file system specific.  The following are possible errors with FAT and NTFS Boot Sectors.

    a.       FAT:  In all cases it displays “press any key to restart” after either of the following errors.

    (1)    If NTLDR is not found “NTLDR is missing” is displayed

    (2)    If NTLDR is on a bad sector, “Disk Error” displays.

    b.      NTFS:  In all cases it displays “Press CTRL+ALT+DEL to restart” after any of the following errors.

    (1)    If NTLDR is on a bad sector, “A disk read error occurred” is displayed.  This message may also be displayed on Windows 2000 or higher systems if Extended Int13 calls are required, but have been disabled in the CMOS or SCSI BIOS.  This behavior may also be seen if an FRS (File Record Segment) cannot be loaded or if any NTFS Metadata information is corrupt.

    (2)    If NTLDR is not found, “NTLDR is missing” displays.

    (3)    If NTLDR is compressed, “NTLDR is compressed” displays.


    8.       Once NTLDR is found, it is loaded into memory and executed.  At this point the Boot Loader phase begins.


    Troubleshooting the Initial Phase (Initial Disk Access)

    If you are unable to boot Windows and the failure is occurring very early in the boot process, begin by creating a Windows boot floppy and using it to attempt to boot the system.


                    Q119467 How to Create a Bootable Disk for an NTFS or FAT Partition



    If you are unable to boot to Windows NT with a floppy, skip to the section titled “Unable to boot using floppy”.


    If you can successfully start the computer from the boot disk, the problem is limited to the Master Boot Record, the Boot Sector, or one or more of the following files:  NTLDR, NTDetect.com, or Boot.ini.  After Windows is running, you should immediately back up all data before you attempt to fix the Boot Sector or Master Boot Record.


    1.       Run a current virus scanning program to verify that no virus is present.

    2.       Select the method of troubleshooting based upon the error generated.

    ·         Operating System not found

    Typical BIOS-specific error indicating no 55AA signature on the MBR.  This must be repaired manually using a disk editor such as DskProbe.

    Warning:  Running FixMBR or FDISK /MBR in this case will clear the partition table.

    ·         Invalid Partition Table

    May be repaired using DiskProbe – check for multiple 0x80 boot indicators.

    ·         Error Loading Operating System

    This error indicates an invalid Boot Sector.  Use the “Repairing the Boot Sector” section below.

    ·         Missing Operating System

    This error indicates that the 55AA signature is missing from the Boot Sector.  This may be replaced using DskProbe or by using the “Repairing the Boot Sector” section below.

    ·         NTLDR is missing

    Use the “Repairing the NTLDR file” section below.

    ·         Disk error (FAT only)

    Indicates that NTLDR is located on a bad sector.

    ·         A disk read error occurred (NTFS only)

    Indicates that NTLDR is on a bad sector.  Use the “Repairing the NTLDR file” section below.

    ·         NTLDR is compressed

    Indicates that NTLDR is compressed.  Uncompress or replace NTLDR.  Note:  This error is rare.

    ·         Computer boots to a black screen with blinking cursor

    Could be an issue with the MBR or the active partition’s boot code (Q228734).  Use the appropriate section below.


    Repairing the MBR

    Use the FIXMBR command from the Windows Recovery Console to re-write the boot code located in the first 446 bytes of the MBR, but to leave the partition table intact.


    326215  How To Use the Recovery Console on a Windows Server 2003-Based Computer That Does Not Start



    WARNING:  DO NOT use this method if you are receiving “Operating System not found” or other BIOS-specific error.  FIXMBR will zero out the partition table if the 55AA signature is missing from the MBR.  The data will be unrecoverable.


    Repairing the Boot Sector

    Use the FIXBOOT command from the Windows Recovery Console to write a new Boot Sector to the system partition.


    326215  How To Use the Recovery Console on a Windows Server 2003-Based Computer That Does Not Start



    If you are not able to repair the Boot Sector by using this method, you may repair it manually using the following knowledge base article:


    Q153973 Recovering NTFS Boot Sector on NTFS Partitions



    Repairing the NTLDR file

    Use the Recovery Console to copy NTLDR from \i386 directory of the Windows CD-ROM to the root of the hard drive.


    Unable to boot using floppy

    If you are unable to start the computer with a boot floppy and you are not receiving an error message, then the problem is most likely with your partition tables.  If invalid partitions are present and you are unable to start your computer with a boot disk, formatting the disk, reinstalling Windows and restoring from backup may be your only option.  It may be possible to recreate the partition table using DskProbe if a backup of the partition information is available.  It also may be possible to manually reconstruct the partition table however this is outside the scope of this blog post.  We may cover this later.


    If you do not have a current backup of the data on the computer, as a last resort a data recovery service may be able to recover some of your information. 


    Helpful knowledge base articles:

                    Q272395 Error Message: Boot Record Signature AA55 Not Found



                    Q155892 Windows NT Boot Problem: Kernel File Is Missing From the Disk



                    Q228004 Changing Active Partition Can Make Your System Unbootable



                    Q155053 Black Screen on Boot



                    Q314503 Computer Hangs with a Black Screen When You Start Windows



                    Q153973 Recovering NTFS Boot Sector on NTFS Partitions



    Okay boys & girls, that’s it for now.  Next time, I’ll continue with the boot sequence and cover the Boot Loader phase.  This is where things really get interesting… J


  • Ntdebugging Blog

    Easily Resolving an Event Viewer Error using a Process Memory Dump


    My name is Ryan Mangipano (ryanman) and I am a Sr. Support Escalation Engineer at Microsoft.  Today I will be blogging about how I used the SOS  .Net Framework debugging extension (and !analyze -v) to easily troubleshoot a .Net Framework exception. This exception was preventing Event Viewer from displaying properly. Event Viewer was returning an error that provided very little information about what was actually causing the issue. I will demonstrate how, in this case, it was very easy to use windbg to obtain information about what went wrong. I did not have to perform a live debug on the issue.  Instead, I used a process dump to obtain very exact information, which was returned by the debugger, relating to the root cause. I was then able to use Process Monitor to identify the file that needed to be examined. These actions led me to the source of the problem, which was easily corrected. Also, the issue discussed in this blog was easily reproduced on Windows Server 2008. This means that you should be able to practice this debug exercise, on your non-production Windows 2008 SP1 Server, for learning purposes if you are interested in doing so.


    Issue Reported:  The following error was encountered when opening eventvwr.msc (Event Viewer) on a Windows 2008 Server system:


                           "MMC could not create the snap-in."

    MMC could not create the snap-in.

    The snap-in might not have been installed correctly

    Name: Event Viewer

                   CLSID: FX:{b05566ad-fe9c-4363-be05-7a4cbb7cb510}


    First Step- Research & Data Gathering: After ensuring I first understood the problem reported, I searched for known issues. I found out that we have seen this error before.  It may occur when the following registry key gets deleted or corrupted:


                    HKLM\software\Microsoft\MMC\SnapIns\FX:{ b05566ad-fe9c-4363-be05-7a4cbb7cb510}


    I had the customer export this key and found that it was not corrupted in any way. I verified that all data was as expected


    Next,  a memory dump of the mmc.exe process was collected.  The mmc.exe process is used to host the eventvwr.msc snap-in. This was easily obtained using the built in Windows 2008 Server "Windows Task Manager" feature: "Create Dump File" . If you have several mmc console instances executing on your system, you can use the Task Manager context menu shortcuts "Switch To" and "Go To Process" to help you to identify the correct instance.




    Note: We also collected a process monitor logfile during the startup of eventvwr.msc. This log file later proved very helpful in resolving the issue (as I will show below). Process monitor can be obtained at the following URL:  http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx



     Now let's have a look at the debug.


    1.          First, I navigated Windows Explorer to the location of the dump file and double-clicked it to open it in windbg.exe.


    It opened in windbg because I had previously run the command windbg -IA, which associates .dmp files with windbg. You can read more about the command line options in windbg in the help file that is included with the debugging tools.



    2.            I noticed the following output from the debugger after it loaded the dump file:


    This dump file has an exception of interest stored in it.

    The stored exception information can be accessed via .ecxr.

    (ff8.a2c): CLR exception - code e0434f4d (first/second chance not available)



    3.            Next, I wanted to ensure my symbol path was set correctly. I could have set it using the .sympath command:


    0:011> .sympath SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

    Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

    Expanded Symbol search path is: srv*c:\websymbols*http://msdl.microsoft.com/download/symbols

    0:011> .sympath

    Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols

    Expanded Symbol search path is: srv*c:\websymbols*http://msdl.microsoft.com/download/symbols


    However, when your goal is to simply point to the default symbol server, .symfix is a very nice shortcut. It prevents one from having to try to remember the URL.  Here’s the syntax: 


    0:011> .symfix c:\websymbols

    0:011> .sympath

    Symbol search path is: SRV*c:\websymbols*http://msdl.microsoft.com/download/symbols


    4.            To ensure that I didn't waste time reviewing the wrong data, I performed a quick check to ensure that we collected a dump of the requested snap-in.


    0:005> !peb

    PEB at 000007fffffdb000


       CommandLine:  '"C:\Windows\system32\mmc.exe" "C:\Windows\system32\eventvwr.msc" '


    You could alternatively dump the  CommandLine from the nt!_PEB using the dt command


    0:005> dt nt!_PEB ProcessParameters->CommandLine 000007fffffdb000


       +0x020 ProcessParameters              :

          +0x070 CommandLine                    : _UNICODE_STRING ""C:\Windows\system32\mmc.exe" "C:\Windows\system32\eventvwr.msc" "


    5.            Next, I then dumped out all of the threads in this process and found the following thread contained a stack that was raising a .Net Framework exception


    0:011> ~* kL

    ... (ommitted the non-relevent threads)


    # 11  Id: ff8.a2c Suspend: 1 Teb: 7ffd3000 Unfrozen

    ChildEBP RetAddr 

    0691f03c 7343a91c kernel32!RaiseException+0x58

    0691f09c 7343d81a mscorwks!RaiseTheExceptionInternalOnly+0x2a8

    *** WARNING: Unable to verify checksum for MMCEx.ni.dll

    0691f140 6bfe0b5a mscorwks!JIT_Rethrow+0xbf

    *** WARNING: Unable to verify checksum for mscorlib.ni.dll

    0691f1e8 69926cf6 MMCEx_ni+0xd0b5a

    0691f1f4 6993019f mscorlib_ni+0x216cf6

    0691f208 69926c74 mscorlib_ni+0x22019f

    0691f220 733d1b4c mscorlib_ni+0x216c74

    0691f230 733e21b1 mscorwks!CallDescrWorker+0x33

    0691f2b0 733f6501 mscorwks!CallDescrWorkerWithHandler+0xa3

    0691f3e8 733f6534 mscorwks!MethodDesc::CallDescr+0x19c

    0691f404 733f6552 mscorwks!MethodDesc::CallTargetWorker+0x1f

    0691f41c 7349d803 mscorwks!MethodDescCallSite::CallWithValueTypes+0x1a

    0691f604 733f845f mscorwks!ThreadNative::KickOffThread_Worker+0x192

    0691f618 733f83fb mscorwks!Thread::DoADCallBack+0x32a

    0691f6ac 733f8321 mscorwks!Thread::ShouldChangeAbortToUnload+0xe3

    0691f6e8 733f84ad mscorwks!Thread::ShouldChangeAbortToUnload+0x30a

    0691f710 7349d5d4 mscorwks!Thread::ShouldChangeAbortToUnload+0x33e


    6.            Out of curiosity, I also ran the Get Last Error command


    0:011> !gle

    LastErrorValue: (Win32) 0 (0) - The operation completed successfully.

    LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0


    7.            After this, I ran analyze -v to see what helpful information the debugger would provide. The debugger did output exception information but informed me that I needed to use the x86 debugger instead.


    0:011> !analyze -v


                            EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)

                            ExceptionAddress: 771a42eb (kernel32!RaiseException+0x00000058)

                              ExceptionCode: e0434f4d (CLR exception)

                              ExceptionFlags: 00000001

                            NumberParameters: 1

                            Parameter[0]: 80131604


                                                    Managed code needs matching platform of sos.dll for proper analysis. Use 'x86' debugger.

                            FAULTING_THREAD:  00000a2c

                                                    PRIMARY_PROBLEM_CLASS:  CLR_EXCEPTION


    8.            I fired up the x86 debugger and loaded the appropriate version of the SOS .Net Framework debugger extension.  This extension ships in the Operating System along with the .Net Framework.  On most occasions, I would have initiated the loading of the extension through the use of the following syntax:


     0:011> .load C:\Windows\Microsoft.NET\Framework\v2.0.50727\sos.dll




    0:011> .load  c:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos.dll


    However, once you realize that managed debugging will be necessary and that you need the services of the SOS extension, it’s best to use the .loadby command rather than .load.  This is due to the fact that the version of SOS must match the version of the CLR loaded into that process.  Here’s the recommended syntax: 


                            0:011>.loadby sos mscorwks


    I always verify that my extensions are loaded properly by using the  .chain command.


    0:011> .chain

    ...   Extension DLL chain:

                            C:\Windows\Microsoft.NET\Framework\v2.0.50727\sos.dll: image 2.0.50727.1434, API 1.0.0, built Wed Dec 05 22:42:38 2007


    9.            Running !help printed out the following helpful information about the SOS extension since sos.dll was at the top of the .chain output:


    0:011> !help


    SOS is a debugger extension DLL designed to aid in the debugging of managed

    programs. Functions are listed by category, then roughly in order of

    importance. Shortcut names for popular functions are listed in parenthesis.

    Type "!help <functionname>" for detailed info on that function.


    Object Inspection                  Examining code and stacks

    -----------------------------      -----------------------------

    DumpObj (do)                       Threads

    DumpArray (da)                     CLRStack

    DumpStackObjects (dso)             IP2MD

    DumpHeap                           U

    DumpVC                             DumpStack

    GCRoot                             EEStack

    ObjSize                            GCInfo

    FinalizeQueue                      EHInfo

    PrintException (pe)                COMState

    TraverseHeap                       BPMD


    10.            Using the exception address, displayed by the debugger when opening the dump, and the !pe command listed above, I obtained more information about the exception:


    0:011> !pe 771a42eb

    Invalid object

    There are nested exceptions on this thread. Run with -nested for details


    0:011> !pe -nested 771a42eb

    Invalid object


    Nested exception -------------------------------------------------------------

    Exception object: 040a676c

    Exception type: System.Reflection.TargetInvocationException

    Message: Exception has been thrown by the target of an invocation.

    InnerException: System.Reflection.TargetInvocationException, use !PrintException 040a6a20 to see more

    StackTrace (generated):

        SP       IP       Function


    StackTraceString: <none>

    HResult: 80131604


    0:011> !PrintException 040a6a20

    Exception object: 040a6a20

    Exception type: System.Reflection.TargetInvocationException

    Message: Exception has been thrown by the target of an invocation.

    InnerException: System.Configuration.ConfigurationErrorsException, use !PrintException 040a6cf8 to see more

    StackTrace (generated):


    StackTraceString: <none>

    HResult: 80131604

    There are nested exceptions on this thread. Run with -nested for details


    0:011> !PrintException 040a6cf8

    Exception object: 040a6cf8

    Exception type: System.Configuration.ConfigurationErrorsException

    Message: Configuration system failed to initialize

    InnerException: System.Configuration.ConfigurationErrorsException, use !PrintException 040a7174 to see more

    StackTrace (generated):


    StackTraceString: <none>

    HResult: 80131902

    There are nested exceptions on this thread. Run with -nested for details


    0:011> !PrintException 040a7174

    Exception object: 040a7174

    Exception type: System.Configuration.ConfigurationErrorsException

    Message: Unrecognized configuration section system.web/myInvalidData  

    InnerException: <none>

    StackTrace (generated):


    StackTraceString: <none>

    HResult: 80131902

    There are nested exceptions on this thread. Run with -nested for details


    11.            Based on the exception information listed above, it appeared that a .Net Framework configuration section, system.web, contained an invalid configuration section named myInvalidData  inside of it. I then re-ran !analyze -v against the dump again (now that I had loaded the x86 debugger) and found that !analyze -v will load the sos.dll extension and even run the !pe extension automatically. It then automatically displayed the exception record information for me as well. Also, notice that the thread listed by !analyze -v matches the thread I examined earlier.


                0:011> !analyze -v


                            EXCEPTION_MESSAGE:  Unrecognized configuration section system.web/myInvalidData.

                                                    MANAGED_OBJECT_NAME:  System.Configuration.ConfigurationErrorsException

                                                    FAULTING_THREAD:  00000a2c


    0:011> ~

       0  Id: ff8.c84 Suspend: 1 Teb: 7ffdf000 Unfrozen

       1  Id: ff8.96c Suspend: 1 Teb: 7ffde000 Unfrozen

       2  Id: ff8.d10 Suspend: 1 Teb: 7ffdd000 Unfrozen

       3  Id: ff8.d94 Suspend: 1 Teb: 7ffdc000 Unfrozen

       4  Id: ff8.a14 Suspend: 1 Teb: 7ffda000 Unfrozen

       5  Id: ff8.fbc Suspend: 1 Teb: 7ffd9000 Unfrozen

       6  Id: ff8.f88 Suspend: 1 Teb: 7ffd8000 Unfrozen

       7  Id: ff8.a64 Suspend: 1 Teb: 7ffd6000 Unfrozen

       8  Id: ff8.bf8 Suspend: 1 Teb: 7ffd5000 Unfrozen

       9  Id: ff8.d24 Suspend: 1 Teb: 7ffd4000 Unfrozen

      10  Id: ff8.ff0 Suspend: 1 Teb: 7ffd7000 Unfrozen

    . 11  Id: ff8.a2c Suspend: 1 Teb: 7ffd3000 Unfrozen


    12. At this point I was interested in identifying the source of this unrecognized configuration. Instead of engaging our .Net support team, I started with a quick search using www.live.com for     


                "unrecognized configuration section" system.web site:microsoft.com


    This returned the following results http://search.live.com/results.aspx?q=%22unrecognized+configuration+section%22+system.web+site%3Amicrosoft.com&form=QBRE


    By quickly reviewing some of the hits returned, I found that others had encountered this exception in their own applications.  This is due to invalid entries in the various .config  files used in .Net. Looking through the posts, different configuration file names and paths were observed.


    So, I opened up the process monitor logfile to see which configuration files we were reading data from. I added filter criterion to match entries from the mmc.exe process, the TID from the FAULTING_THREAD listed in the exception data, path data containing .config,  and a successful status result. It's best to be as specific as possible.



    I found that we were reading in a large amount of settings over and over again from the .net Framework global configuration file:             


                    (on x64 this would be     C:\Windows\Microsoft.NET\Framework64\v2.0.50727\CONFIG\machine.config)


    Final Step- Putting it all together, Reproducing the issue, & confirming resolution : Using notepad, a quick search of the suspect xml file (C:\Windows\Microsoft.NET\Framework64\v2.0.50727\CONFIG\machine.config) on my system revealed a <system.web> section. At this point, I suspected that this section contained an invalid section which may have been related to the problem. To verify this, and since I like to break things, I added an invalid configuration setting <myInvalidData/> to my global configuration file. Doing so, I successfully reproduced the issue on my system. I then contacted the customer and asked if they had by any chance added any settings under the <system.web> in the configuration file: c:\Windows\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config.


    The customer informed me that, per the request of their ASP.net developer, they had in fact added settings to that section of the file.  By researching http://msdn.microsoft.com/en-us/library/system.web.aspx and the schema documentation at http://msdn.microsoft.com/en-us/library/dayb112d.aspx, we were able to determine that the settings that were present in this file should not have been present inside of <system.web> . The settings were moved to the proper location per the developer and the issue was resolved.


    Here are the steps I used to reproduce the issue in case you are attempting to replicate this at home-


    A. Using notepad, open the following configuration file on a non-production Windows Server 2008 SP1 system:

         (please make a backup copy first in case you make a mistake)  


         OR   (Open the version that matches the architecture of your platform )




    B. Find the section <system.web> in this file (you can use the find function in notepad):


    C.  Add the following line directly after <system.web> as shown in the example below:



    D. Save the file and then open eventvwr.msc and verify that the following error is displayed:



    Hopefully this blog has demonstrated an example of how you can use the "create dump file" feature of Windows 2008, windbg, and other related tools in an attempt to gain more specific data when your error message is not revealing the source of the problem. Feel free to post any questions, comments, or concerns.


  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000005 (Better late than never)


    Hello NTDebuggers, from time to time we see the following problem.  It’s another access violation, and the debug notes below are from a minidump.   


    Here is what we need to know…


    ·         Generally speaking what happened to cause this AV?

    ·         What method you would use to isolate root cause of the failure?


    There are a lot of ways to do this.  We look forward to hearing your approach.


    We will post our methods and answer at the end of the week.   If you need anything please let us know.




    Microsoft (R) Windows Debugger  Version 6.8.0001.0

    Copyright (c) Microsoft Corporation. All rights reserved.



    Loading Dump File [D:\test123.dmp]

    User Mini Dump File: Only registers, stack and portions of memory are available


    0:000> k 123

    ChildEBP RetAddr 

    0017f93c 75e4edb5 ntdll!ZwWaitForMultipleObjects+0x15

    0017f9d8 75e430c3 kernel32!WaitForMultipleObjectsEx+0x11d

    0017f9f4 75ef2084 kernel32!WaitForMultipleObjects+0x18

    0017fa60 75ef22b1 kernel32!WerpReportFaultInternal+0x16c

    0017fa74 75ebbe60 kernel32!WerpReportFault+0x70

    0017fb00 7732d15a kernel32!UnhandledExceptionFilter+0x1c1

    0017fb08 773000c4 ntdll!_RtlUserThreadStart+0x6f

    0017fb1c 77361d05 ntdll!_EH4_CallFilterFunc+0x12

    0017fb44 772eb6d1 ntdll!_except_handler4+0x8e

    0017fb68 772eb6a3 ntdll!ExecuteHandler2+0x26

    0017fc10 772cee57 ntdll!ExecuteHandler+0x24

    0017fc10 10011127 ntdll!KiUserExceptionDispatcher+0xf

    *** ERROR: Module load completed but symbols could not be loaded for crash3.exe

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    0017ff40 0040104a 0x10011127

    0017ffa0 75eb19f1 crash3+0x104a

    0017ffac 7732d109 kernel32!BaseThreadInitThunk+0xe

    0017ffec 00000000 ntdll!_RtlUserThreadStart+0x23


    0:000> lm

    start    end        module name

    00400000 0040d000   crash3     (no symbols)          

    6c250000 6c288000   odbcint    (deferred)            

    6c290000 6c2f5000   odbc32     (deferred)            

    72a00000 72a86000   comctl32   (deferred)            

    74820000 749b4000   comctl32_74820000   (deferred)            

    75240000 75251000   samlib     (deferred)            

    75260000 75281000   ntmarta    (deferred)            

    754b0000 75510000   secur32    (deferred)            

    75510000 75570000   imm32      (deferred)            

    75700000 75790000   gdi32      (deferred)            

    757a0000 75870000   user32     (deferred)            

    758a0000 758a6000   nsi        (deferred)            

    758b0000 759f4000   ole32      (deferred)            

    75a00000 75aaa000   msvcrt     (deferred)            

    75ab0000 75ba0000   rpcrt4     (deferred)            

    75ba0000 75c1d000   usp10      (deferred)            

    75c20000 75c75000   shlwapi    (deferred)            

    75d60000 75e27000   msctf      (deferred)            

    75e30000 75f40000   kernel32   (pdb symbols)  

    76140000 76189000   Wldap32    (deferred)            

    76190000 7624f000   advapi32   (deferred)            

    76250000 76d1e000   shell32    (deferred)             

    76d20000 76d94000   comdlg32   (deferred)            

    76da0000 76dcd000   ws2_32     (deferred)            

    77280000 77287000   psapi      (deferred)            

    77290000 77299000   lpk        (deferred)            

    772b0000 77400000   ntdll      (pdb symbols)     


    Good luck and happy debugging.





    [Update: our answer. Posted 5/13/2008]


    We enjoyed seeing different people’s approaches on this week’s puzzler.   This was a simple module unload.  We loaded a lib, did a GetProcAddress, freed the lib, and called the function.  The dump was a mini dump created via .dump /m C:\dump file.   There are various ways this type of scenario may arise.   Obviously someone could unload a lib, but why?    In most cases I’ve seen, it was due to a ref count problem in a com object.  Poor accounting leading to one too many decrements, and the dll will get unloaded causing a simple crash footprint.


    There are quite a few ways to track this down.  First of all, if you had the debugger attached and got a full dump or /ma dump you would have seen the loaded module list.  This would have been a dead giveaway and part of why we did the .dump /m.  There are other options you can enable that make tracking of module loads easy under the debugger.  I personally like “loader snaps” if I’m trying to track down module load shenanigans.   To enable this, just go into the image section of the gflags tool and enable loader snaps for the exe in question.  Now attach a debugger and watch the mode load and GetProcAddress details scroll by.


    Yet another popular approach is to use process monitor.  This tool is not only easy to set up, but it also gives you great logs with call stacks and other details such as registry accesses.  


    This puzzler provided the bare minumum data required.  We did not give you much to go on because sometimes in real debugging scenarios you have to work with a lack of data.   I really liked how many people questioned the source of the dump file.  It really shows how familiar you all are with the various dump types.


    Great work!


  • Ntdebugging Blog

    Understanding ARM Assembly Part 1


    My name is Marion Cole, and I am a Sr. EE in Microsoft Platforms Serviceability group.  You may be wondering why Microsoft support would need to know ARM assembly.  Doesn’t Windows only run on x86 and x64 machines?  No.  Windows has ran on a variety of processors in the past.  Those include i860, Alpha, MIPS, Fairchild Clipper, PowerPC, Itanium, SPARC, 286, 386, IA-32, x86, x64, and the newest one is ARM.  Most of these processors are antiquated now.  The common ones now are IA-32, x86, x64.  However Windows has started supporting ARM processors in order to jump into the portable devices arena.  You will find them in the Microsoft Surface RT, Windows Phones, and other things in the future I am sure.  So you may be saying that these devices are locked, and cannot be debugged.  That is true from a live debug perspective, but you can get memory dumps and application dumps from them and those can be debugged.




    There are limitations on ARM processors that Windows supports.  There are 3 System on Chip (SOC) vendors that are supported.  nVidia, Texas-Instruments, and Qualcomm. Windows only supports the ARMv7 (Cortex, Scorpion) architecture in ARMv7-A in (Application Profile) mode.  This implements a traditional ARM architecture with multiple modes and supporting a Virtual Memory System Architecture (VMSA) based on an MMU.  It supports the ARM and Thumb-2 instruction sets which allows for a mixture of 16 (Thumb) and 32 (ARM) bit opcodes.  So it will look strange in the assembly.  Luckily the debuggers know this and handle it for you.  This also helps to shrink the size of the assembly code in memory.  The processor also has to have the Optional ISA extensions of VFP (Hardware Floating Point) and NEON (128-bit SIMD Architecture).


    In order to understand the assembly that you will see you need to understand the processor internals.


    ARM is a Reduced Instruction Set Computer (RISC) much like some of the previous processors that Windows ran on.  It is a 32 bit load/store style processor.  It has a “Weakly-ordered” memory model: similar to Alpha and IA64, and it requires specific memory barriers to enforce ordering.  In ARM devices these as ISB, DSB, and DMB instructions.




    The processor has 16 available registers r0 – r15.

    0: kd> r

    r0=00000001  r1=00000000  r2=00000000  r3=00000000  r4=e1820044  r5=e17d0580

    r6=00000001  r7=e17f89b9  r8=00000002  r9=00000000 r10=1afc38ec r11=e1263b78

    r12=e127813c  sp=e1263b20  lr=e16c12c3  pc=e178b6d0 psr=00000173 ----- Thumb


    r0, r1, r2, r3, and r12 are volatile registers.  Volatile registers are scratch registers presumed by the caller to be destroyed across a call.  Nonvolatile registers are required to retain their values across a function call and must be saved by the callee if used. 


    On Windows four of these registers have a designated purpose.  Those are:

    • PC (r15) – Program Counter (EIP on x86)
    • LR (r14) – Link Register.  Used as a return address to the caller.
    • SP (r13) – Stack Pointer (ESP on x86).
    • R11 – Frame Pointer (EBP on x86).
    • CPSR – Current Program Status Register (Flags on x86).


    In Windbg all but r11 will be labeled appropriately for you.  So you may be asking why r11 is not labeled “fp” in the debugger.  That is because r11 is only used as a frame pointer when you are calling a non-leaf subroutine.  The way it works is this: when a call to a non-leaf subroutine is made, the called subroutine pushes the value of the previous frame pointer (in r11) to the stack (right after the lr) and then r11 is set to point to this location in the stack, so eventually we end up with a linked list of frame pointers in the stack that easily enables the construction of the call stack. The frame pointer is not pushed to the stack in leaf functions.  Will discuss leaf functions later.


    CPSR (Current Program Status Register)


    Now we need to understand some about the CPSR register.  Here is the bit breakdown:


















































    • Bits [31:28] – Condition Code Flags
      • N – bit 31 – If this bit is set, the result was negative.  If bit is cleared the result was positive or zero.
      • Z – bit 30 – If set this bit indicates the result was zero or values compared were equal.  If it is cleared, the value is non-zero or the compared values are not equal.
      • C – bit 29 – If this bit is set the instruction resulted in a carry condition.  E.g. Adding two unsigned values resulted in a value too large to be strored.
      • V – bit 28 – If this bit is set then the instruction resulted in an overflow condition.  E.g. An overflow of adding two signed values.
    • Instructions variants ending with ‘s’ set the condition codes (mov/movs)
    • E – bit 9 – Endianness (big = 1/Little = 0)
    • T – bit 5 – Set if executing Thumb instructions
    • M – bits [4:0] – CPU Mode (User 10000/Supervisor 10011)


    So why do I need to know about the CPSR (Current Program Status Register)?  You will need to know where some of these bits are due to how some of the assembly instruction affect these flags.  Example of this is:


    ADD will add two registers together, or add an immediate value to a register.  However it will not affect the flags.


    ADDS will do the same as ADD, but it does affect the flags.


    MOV will allow you to move a value into a register, and a value between registers.  This is not like the x86/x64.  MOV will not let you read or write to memory.  This does not affect the flags.


    MOVS does the same thing as MOV, but it does affect the flags.


    I hope you are seeing a trend here.  There are instructions that will look the same.  However if they end in “S” then you need to know that this will affect the flags.  I am not going to list all of those assembly instructions here.  Those are already listed in the ARM Architecture Reference Manual ARMv7-A and ARMv7-R edition at http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b/index.html.


    So now we have an idea of what can set the flags.  Now we need to understand what the flags are used for.  They are mainly used for branching instructions.  Here is an example:

    003a11d2 429a     cmp         r2,r3

    003a11d4 d104     bne         |MyApp!FirstFunc+0x28 (003a11e0)|


    The first instruction in this code (cmp) compares the value stored in register r2 to the value stored in register r3. This comparison instruction sets or resets the Z flag in the CPSR register. The second instruction is a branch instruction (b) with the condition code ne which means that if the result of the previous comparison was that the values are not equal (the CPSR flag Z is zero) then branch to the address MyApp!FirstFunc+0x28 (003a11e0). Otherwise the execution continues.


    There are a few compare instructions.  “cmp” subtracts two register values, sets the flags, and discards the result.  “cmn” adds two register values, sets the flags, and discards the results.  “tst” does a bit wise AND of two register values, sets the flags, and discards the results.  There is even an If Then (it) instruction.  I am not going to discuss that one here as I have never seen it in any of the Windows code.


    So is “bne” the only branch instruction?  No.  There is a lot of them.  Here is a table of things that can be seen beside “b”, and what they check the CPSR register:



    Meaning (Integer)

    Condition Flags (in CPSR)





    Not Equal



    Negative (Minus)



    Positive or Zero (Plus)



    Unsigned higher

    C==1 and Z==0


    Unsigned lower or same

    C==0 or Z==1


    Signed greater than or equal



    Signed less than



    Signed greater than

    Z==0 and N==V


    Signed less than or equal

    Z==1 or N!=V





    No overflow



    Carry set



    Carry clear


    None (AL)

    Execute always



    Floating Point Registers


    As mentioned earlier the processor also has to have the ISA extensions of VFP (Hardware Floating Point) and NEON (128-bit SIMD Architecture).  Here is what they are.

    Floating Point


    As you can see this is 16 – 64bit regiters (d0-d15) that is overlaid with 32 – 32bit registers (s0-s31).  There are varieties of the ARM processor that has 32 – 64bit registers and 64 – 32bit registers.  Windows 8 will support both 16 and 32 register variants.  You have to be careful when using these, because if you access unaligned floats you may cause an exception.




    As you can see here the SIMD (NEON) extension adds 16 – 128 bit registers (q0-q15) onto the floating point registers.  So if you reference Q0 it is the same as referencing D0-D1 or S0-S1-S2-S3.


    In part 2 we will discuss how Windows utilizes this processor.

  • Ntdebugging Blog

    Storport ETW Logging to Measure Requests Made to a Disk Unit


    Greetings NTDEBUGGING community, Bob here again, and today I would like to let everyone know about a new feature implemented in the STORPORT.SYS binaries released in the following KB articles:

    ·         KB 979764 (Windows Server 2008)

    ·         KB 978000 (Windows Server 2008 R2)


    Beginning with these new versions, it is now possible to measure timing statistics for requests made to a system’s disk unit.  These measurements are taken at the lowest possible level of OS interaction with the storage adapter hardware, making it much easier to diagnose storage performance issues. The measurements are taken from the port driver, STORPORT.SYS, via ETW (Event Tracing for Windows).  Windows drivers are implemented in a layered architecture, so in our case the STORPORT driver interfaces directly with the adapter or miniport driver.  The STORPORT driver is responsible for managing requests and queues, and providing other upper level functions.  Each miniport driver is specific to a particular storage adapter card, and knows how to deliver requests to the disk unit over the transport medium, like fibre channel for instance.

    Configuring and Starting STORPORT ETW Logging

    To measure storage performance, the user enters a threshold value via the GUI interface. (This can be scripted as well; see KB 978000 for details.)  When an I/O time is equal to or greater than the entered threshold, an event 13 is written to the ETW log.  

    Here is a walkthrough to enable this feature via the GUI:

    Click Start, type “perfmon”, and press ENTER to start Performance Monitor.



    Expand “Data Collector Sets” and select “Event Trace Sessions”.



    Right-Click on “Event Trace Sessions” and select “New -> Data Collector Set”. 



    The following dialog will appear:



    Give the new data collector set a name in the dialog box.  In this example I called it “Storport Performance Logging”. Choose the “Create manually (Advanced)” option then click Next to see the following dialog.



    After clicking “Add...” on the dialog above the following list of providers will appear. Select “Microsoft-Windows-StorPort” and click OK.



    When the dialog below opens, select “Keywords (Any)” then click Edit.



    Check the box for IOPerfNotification, and then click OK. Note:  For Windows Server 2008 SP1 and SP2, some properties may not be present (e. g. Reset, PortNotification and PauseResume). 




    Next, select “Filter” and click Edit…



    This is where we enter our time threshold.  Remember that timing values greater than or equal to this filter value will be logged to the trace file.


    Select “Binary” and check the “Filter enabled” box


    In the “Filter data” field, enter the threshold value in the number of milliseconds. This must be entered in little endian format. Please refer to this table for the most common entries.



    Binary(little endian)









    01 00 00 00 00 00 00 00

    05 00 00 00 00 00 00 00

    0A 00 00 00 00 00 00 00

    0F 00 00 00 00 00 00 00


    Note: The “Filter type” value will always remain 0 as in the example above showing an entry for 1 ms. 


    WARNING - The whole data line needs to be filled in when entering the threshold. For demonstration purposes here is the WRONG WAY to enter 256 ms. This will not work because of the missing 00 entries on the line.


    Below is the correct way to enter the value (256ms) Note: The Filter value may have to be re-entered for next run because it could get reset.


    After clicking OK, you will see a new Event Trace session (screenshot below).  Right-Click the new Event Trace session and select “Properties”



    Next we need to set a location for the trace file.  Ensure you have sufficient disk space in the target folder because logs can be many Megabytes in size depending on the number of hits. 



    The target folder may be changed in the “Directory” tab



    I recommend choosing a directory that is more easily located, like C:\PerfLogs, in which case you can click “Browse”



    Note:  It is not recommended to store the trace file on the same storage you are troubleshooting.


    The name for the file is selected on the “File” tab.


    When you are done setting the file and directory information click OK.

    Now right-click the log and select “Start” to begin tracing.


    The “Status” should indicate that it is running.


    How long to run the log depends on you issue.  You should let the log run until the issue or slow down occur then check the log for delays.

    Viewing the Log File

    First right-click and select “Stop” to flush any cached data and stop the logging.


    Then use Event Viewer to examine the log.



    Next choose "Open the saved log…"






    After clicking OK a popup dialog will be displayed to ask if you want to convert the log. Select “No”. Here is some sample event data that you will likely see in the log.



    Example: Event ID 13



    Let's examine the “Details” tab which shows information about the I/O that exceeded or equaled the threshold we had set.  If you do not set a threshold, then all I/O events are logged, but for performance reasons we recommend setting a "reasonable" threshold like 15ms.


    Breaking down the Event Data we have …


    Miniport Extension: (Kernel address of the miniport extension, not of use to performance logging, only debugging)

    PortNumber:  This is the adapter port number (i.e. RaidPort1, etc.)

    PathID:  Path number if multi-pathing is in place

    TargetID:  Target LUN exposed to the Operating System

    LUN:  The Logical Unit Number of the physical storage where the particular I/O is directed

    *RequestDuration:  How long the request took in milliseconds (the number of note here for storage latency diagnostics)

    Command:  the SCSI command sent to the unit. (Usually read or write)  This is listed in decimal.  Convert this value to hexadecimal with Calc.exe for referencing SCSI commands http://en.wikipedia.org/wiki/SCSI_command

    ScsiStatus:  the status that came back from the drive

    SRBStatus:  the status as returned from adapter  (see srb.h and scsi.h in the Microsoft WDK or http://en.wikipedia.org/wiki/SCSI_Status_Code)


     Example: Command 42 equals 2A which is a Write of 10 bytes.



    Here’s the respective SCSI command:


    When viewing the log, the commands will be mostly reads and writes.  If I/O is determined to be slow at this level, then the issue is most likely disk or SAN related.

    Hope you enjoyed the walkthrough!









  • Ntdebugging Blog

    Would you be interested in a live chat session on debugging techniques?


    We’ve been thinking about hosting a live group chat session to talk with the debugging community.  If we had such a chat, the discussion would focus on debugging techniques and any questions you may have about anything we’ve previously blogged about.  If you’d be interested in participating in a chat session, please answer the survey question below.  Also, feel free to leave a comment on this page if there’s a particular debugging topic you would like for us to cover in the chat.


    Thanks everyone for your feedback! This survey is now closed.

  • Ntdebugging Blog

    Tracking Down a Multi-Process Deadlock



    Hello, my name is Louis Shanks and my story starts off with a tricky little deadlock issue that I worked on recently. Let me set the stage by explaining the surface level symptom, and then we will get to the good stuff. Randomly when users running Vista would attempt to launch Internet Explorer nothing would happen. So they would again double-click the icon, and still nothing happened. So the users would reboot and go back to their web activities.


    After some basic trouble shooting it was discovered that Iexplore.exe was running multiple times and every attempt to open the browser would result in another Iexplore.exe instance.  Since no UI was ever presented I went ahead and collected a few Iexplore.exe process dumps along with a complete system dump.


     The process dumps didn’t prove to be terribly useful so I decided to move over to the complete memory dump.


    Since I knew we had multiple IExplore.exe processes running I wanted to investigate them first.


    0: kd>!process 0 0



    PROCESS 87289d90  SessionId: 5  Cid: 0fdc    Peb: 7ffdc000  ParentCid: 0968

        DirBase: 7d66c520  ObjectTable: a203a178  HandleCount: 205.

        Image: iexplore.exe


    PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

        DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

        Image: ieuser.exe


    PROCESS 86c175b0  SessionId: 5  Cid: 1250    Peb: 7ffd9000  ParentCid: 0968

        DirBase: 7d66c940  ObjectTable: a0853f10  HandleCount:  91.

        Image: iexplore.exe

    PROCESS 861ac7a8  SessionId: 5  Cid: 1024    Peb: 7ffde000  ParentCid: 0968

        DirBase: 7d66c620  ObjectTable: a83f7898  HandleCount:  91.

    Image: iexplore.exe

    I decided to just start with the first one in the list (87289d90)


    0: kd> !process 87289d90
        Image: iexplore.exe

        ElapsedTime                       00:24:06.666


    It appears this process has been around a while, so I wanted to look at the first thread in this process as it is responsible for the UI in Internet Explorer.  I am looking at this thread because the surface level problem illustrates that when we launch Iexplore.exe the UI is never displayed. Here I used the !thread command to display the specific thread.


            THREAD 870b6398  Cid 0fdc.16a4  Teb: 7ffdf000 Win32Thread: fd850b78 WAIT: (UserRequest) UserMode Non-Alertable

                85e86cf0  SynchronizationEvent

                85cb35e0  SynchronizationEvent

            Not impersonating

            Attached Process          87289d90       Image:         iexplore.exe

            Wait Start TickCount      10817197       Ticks: 69571 (0:00:18:05.314)

            UserTime                  00:00:00.015

            KernelTime                00:00:00.093  


    Based on the tick count this thread has been waiting for 18 minutes.


    0: kd> kvn

    # ChildEBP RetAddr  Args to Child             

    00 931208d8 81a940a1 870b6398 803d2120 870b6420 nt!KiSwapContext+0x26

    01 9312091c 81a8dacc 870b6398 00000000 00000002 nt!KiSwapThread+0x44f

    02 93120970 81bedb4f 00000002 93120aa8 00000001 nt!KeWaitForMultipleObjects+0x53d

    03 93120bfc 81bee2e3 00000002 00000001 00000000 nt!ObpWaitForMultipleObjects+0x256

    04 93120d48 81a30b3a 00000002 0027e9c4 00000001 nt!NtWaitForMultipleObjects+0xcc

    05 93120d48 77568254 00000002 0027e9c4 00000001 nt!KiFastCallEntry+0x12a

    06 0027e970 775679fc 7708c973 00000002 0027e9c4 ntdll!KiFastSystemCallRet

    07 0027e974 7708c973 00000002 0027e9c4 00000001 ntdll!NtWaitForMultipleObjects+0xc

    08 0027ea10 7608ff95 0027e9c4 0027ea38 00000000 kernel32!WaitForMultipleObjectsEx+0x11d

    09 0027ea64 773d3a46 00000028 0027eaac ffffffff USER32!RealMsgWaitForMultipleObjectsEx+0x13c

    0a 0027ea8c 773d3b7f 0027eaac ffffffff 0027eabc ole32!CCliModalLoop::BlockFn+0x97

    0b 0027eab4 774e75be ffffffff 003ddf90 0027ebc0 ole32!ModalLoop+0x5b

    0c 0027ead0 774e9294 00000000 0027ebd4 00000000 ole32!ThreadSendReceive+0x12c

    0d 0027eaf8 774e9100 0027ebc0 003d3538 0027ec1c ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x194

    0e 0027ebd8 773d3c25 003d3538 0027ecfc 0027ece4 ole32!CRpcChannelBuffer::SendReceive2+0xef

    0f 0027ebf4 773d3bd7 0027ecfc 0027ece4 003d3538 ole32!CCliModalLoop::SendReceive+0x1e

    10 0027ec6c 77403efb 003d3538 0027ecfc 0027ece4 ole32!CAptRpcChnl::SendReceive+0x73

    11 0027ecc0 76052ff4 003d3538 0027ecfc 0027ece4 ole32!CCtxComChnl::SendReceive+0x1c5

    12 0027ecd8 76053055 003e50a4 0027ede4 76052e0d RPCRT4!NdrProxySendReceive+0x43

    13 0027ece4 76052e0d 09ea0d59 0027f1fc 070001f3 RPCRT4!NdrpProxySendReceive+0xc


    Based on the call stack it appears this thread made an OLE call and is now waiting for some objects to signal.


    What is IExplore doing in this thread?  It is waiting on another process to service this OLE request.   We can find the PID of the other process by digging into the ole32 SendReceive call.


    I looked at the first argument:


    0: kd> dd 003d3538

    003d3538  77403a50 773f57b4 00000003 00000002

    003d3548  00000000 00000000 003d2ef0 003d8840

    003d3558  0038b1c0 003e05f8 77403980 00070005


    This gives us a pointer to another data structure that contains the information we are attempting to locate.

    0: kd> dd 003d2ef0

    003d2ef0  774ee600 003d2e70 00000d34 00000000

    003d2f00  862941fa c94cde09 55a88424 801a2601

    003d2f10  55a88424 801a2601 0000c400 00000d34


    Ah there it is! 0d34 is the process id servicing our OLE request.


    I went back and reviewed the !process 0 0 output to locate the process containing PID 0D34. I’m dumping it here.


    PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

        DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

        Image: ieuser.exe


    Ok, so the process handing this request is IEUSER.exe, which is the broker process used by Internet Explorer in Vista when running in protect mode.  This is new in Vista.


    More on IEUser and Protected mode can be seen at the following location:


    Now I needed to dig into the IEUser.exe process to find out what it was working on causing Iexplore to wait.


    0: kd> !process 861a9d90 

    PROCESS 861a9d90  SessionId: 5  Cid: 0d34    Peb: 7ffd9000  ParentCid: 0fdc

        DirBase: 7d66c7a0  ObjectTable: a5651a98  HandleCount: 170.

        Image: ieuser.exe

    I discovered the thread in the ieuser.exe process working on our call from Iexplore.exe. However this thread is also waiting on access to something before it can complete.

    0: kd> kvn+200

     # ChildEBP RetAddr  Args to Child             

    00 a41b0c50 81a940a1 85ecbac0 85ecbb48 81af9920 nt!KiSwapContext+0x26

    01 a41b0c94 81a2ddc8 85ecbac0 00000000 85e18510 nt!KiSwapThread+0x44f

    02 a41b0ce8 81bf0d9f 85e18510 00000006 00000001 nt!KeWaitForSingleObject+0x492

    03 a41b0d50 81a30b3a 00000224 00000000 00000000 nt!NtWaitForSingleObject+0xbe

    04 a41b0d50 77568254 00000224 00000000 00000000 nt!KiFastCallEntry+0x12a

    05 0163e8a4 77567a0c 7708c7c4 00000224 00000000 ntdll!KiFastSystemCallRet

    06 0163e8a8 7708c7c4 00000224 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

    07 0163e918 7708c732 00000224 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xbe

    08 0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12

    09 0163e948 747e5fe0 0163ec48 00497b38 00000000 RASAPI32!ReadPhonebookFileEx+0x1a4

    0a 0163e968 747e4e9d 0163ec48 00000000 00000000 RASAPI32!ReadPhonebookFile+0x1b

    0b 0163e9a8 747e52d3 0163ec48 0048f660 0163e9e0 RASAPI32!DwEnumEntriesFromPhonebook+0x35

    0c 0163ee58 747e4fd4 0163ee9c 00000001 0048f660 RASAPI32!DwEnumEntriesInDir+0x1cc

    0d 0163f0ac 747e677e 00000001 0048f660 0163f0f4 RASAPI32!DwEnumEntriesForPbkMode+0xab

    0e 0163f0e0 76e534b9 00000000 00004180 0048f660 RASAPI32!RasEnumEntriesW+0xb0

    0f 0163f10c 76e5342a 76ecc2dc 00000000 00000000 WININET!RasEnumHelp::RasEnumHelp+0x55

    10 0163f120 76e533a3 00000001 00000000 00000001 WININET!DoConnectoidsExist+0x2b

    11 0163f14c 76e534ff 76ecf02c 76ecc2dc 00000000 WININET!GetRasConnections+0x34

    12 0163f168 76e4196e 00000000 0163f180 6ee3228c WININET!IsDialUpConnection+0xa9

    13 0163f184 76e70564 00000001 00000000 00000000 WININET!FixProxySettingsForCurrentConnection+0x31

    14 0163f1a8 6ec2db3b 00000000 0163f1bc 0163f3d4 WININET!InternetInitializeAutoProxyDll+0x6c

    15 0163f1b8 75fe9142 004884a0 00000202 0047e900 ieframe!CShdocvwBroker::StartAutoProxyDetection+0x4d

    16 0163f1d0 76054294 6ec2daf8 0163f3d8 00000001 RPCRT4!Invoke+0x2a

    17 0163f5fc 76054a49 004858d0 0046e4e0 0047e900 RPCRT4!NdrStubCall2+0x27b

    18 0163f64c 774e86d7 004858d0 0047e900 0046e4e0 RPCRT4!CStdStubBuffer_Invoke+0xa0

    19 0163f694 774e877d 0047e900 0047f968 00480b38 ole32!SyncStubInvoke+0x3c

    1a 0163f6e0 7740249e 0047e900 0047f860 004858d0 ole32!StubInvoke+0xb9

    1b 0163f7bc 774023af 0046e4e0 00000000 004858d0 ole32!CCtxComChnl::ContextInvoke+0xfa

    1c 0163f7d8 774e85df 0047e900 00000001 004858d0 ole32!MTAInvoke+0x1a

    1d 0163f808 774e8a18 d0908070 0046e4e0 004858d0 ole32!AppInvoke+0xaa

    1e 0163f8e4 774e85ef 0047e8a8 00474118 0047ebc0 ole32!ComInvokeWithLockAndIPID+0x32c

    1f 0163f930 75fe94b0 0047ebc0 08ac70b7 0047ebc0 ole32!ThreadInvoke+0x2fd


    I’m dumping the first parameter passed to KeWaitForSingleObject to determine the object type.


    0: kd> !object 85e18510

    Object: 85e18510  Type: (85484d40) Mutant

        ObjectHeader: 85e184f8 (old version)

        HandleCount: 3  PointerCount: 5

        Directory Object: 881f3030  Name: RasPbFile

    Ok, we are waiting on a Mutex named RasPbFile. The big question in your mind now is, “who owns this lock”, right?

    To determine this I’ll dump the structure for the mutex object:


    0: kd> dt _KMUTANT 85e18510


       +0x000 Header           : _DISPATCHER_HEADER

       +0x010 MutantListEntry  : _LIST_ENTRY [ 0x870b6564 - 0x870b6564 ]

       +0x018 OwnerThread      : 0x870b6398 _KTHREAD

       +0x01c Abandoned        : 0 ''

       +0x01d ApcDisable       : 0 ''


    The owner of the mutex is 870b6398. The bad news is this is the same thread I started with in the Iexplore.exe process so we’re looking at a classic deadlock scenario. Thread 870b6398 running under Iexplore.exe has made an OLE call over to the IEuser.exe broker process. The IEuser process needs the mutex owned by the thread that called it in the first place.

    This turned out to be a bug in the RasApi code, not the browser or IEuser process.  The RasAPI code was written in such a way that it would wait for an infinite period of time to gain the Mutex.


    0163e92c 747e60aa 00000224 ffffffff 00000000 kernel32!WaitForSingleObject+0x12


     The good news is that we have created hotfix 951738 to address this issue.  The official KB article is still pending.

  • Ntdebugging Blog

    This button doesn’t do anything!



    Hello - Matthew here again.  Today I'll be discussing in detail hang scenario #1 that Tate first mentioned a few blogs posts ago.  From a debugging perspective, in an ideal world an application would always provide some kind of feedback when a failure occurs.  The reality is that sometimes an application just doesn't do anything in response to a user-initiated action (such as a mouse-click) that results in a failure of some kind.  When this happens, how can you determine what is going wrong?


    In general, Process Monitor from Sysinternals is a great tool for gathering clues about this kind of failure.  If the failure is due to a file system or registry problem, Process Explorer will catch the problematic event.  Or if you suspect that the problem is due to something failing over the network, a network trace could help.


    Let's assume that the techniques described above do not move us any closer to understanding the failure.  What next?  If you don't know what the app does in the background, and you don't have source code for the app, how do you proceed?  Let's answer those questions by debugging a sample application...


    The Nothing button


    Download the sample application here.



    Here's what we know:

    1. Clicking the “Button 1” button is supposed to display a useful dialog box

    2. For certain users, clicking “Button 1” results in no observable response

    3. You don't have access to the source code or debug symbols for the app

    4. No one seems to know what the “Button 1” button does behind the scenes

    5. The developer of the application hasn't been heard from in 3 years


    We need to understand what exactly happens when the user clicks the “Button 1” button.  Every window has a WindowProc function that receives messages upon user input.  Since a button is considered a "control", the WindowProc for the main application window will get a WM_COMMAND message indicating that the button was clicked.  To differentiate which specific control sent the message, each WM_COMMAND message has a control identifier associated with it.


    So we need to:


    a)     Find the control identifier for “Button 1”

    b)    Find the WindowProc for the main application window

    c)     Determine where WM_COMMAND messages for the “Button 1” button are handled

    d)    Figure out what is failing in that code


    Let’s get started...




    Find the control identifier for “Button 1”      


    Spy++ (available as part of Visual Studio) is the tool for this job.  Click Spy -> Log Messages.  Use the Finder Tool to select the main window of ntdbghang1.exe.  On the messages tab, click “Clear All” then select “WM_COMMAND”.  Click “OK” and SPY++ will start logging.  Click “Button 1” in ntdbghang1.exe and you should see one line of text appear in the output window.  For good measure, click “Button 2” also, and you should see a second line of text.  At this point, your SPY++ window should look something like this....


    Spy Windows Msg


    We can tell from this output, that the control identifier for “Button 1” is 257 (0x101), and the control identifier for “Button 2” is 258 (0x102).  We’ll need this information later.




    Find the WindowProc for the main application window


    We can find the WindowProc address by using SPY++ again.  Click Spy -> Find Window and use the Finder Tool to select the main window of ntdbghang1.exe.  Make sure “Show properties” is selected, and then click “OK”.  You will get a dialog that looks like this....


    Spy Window Properties


    Note the value of Window Proc.  This is the address of the window procedure for the main window (the exact value may differ on your system).   We know that this function is a WindowProc, which means it is of the form...


    LRESULT CALLBACK WindowProc(         

        HWND hwnd,

        UINT uMsg,

        WPARAM wParam,

        LPARAM lParam



    The meaning of wParam and lParam differ depending on what uMsg is being processed.  When a WM_COMMAND message is passed, the low-word of wParam is the control identifier, which we determined to be 0x101.  The high-word of wParam is the control notification code, which in the case of a button being clicked is BN_CLICKED (literally 0).


    So, we specifically are interested in the case where:


                uMsg = WM_COMMAND (literally 0x111)

                wParam = 0x101


    While looking at assembly language for the WindowProc, the stack frame will look like this...


    ebp    =  “old ebp”

    ebp+4  =  “return address”

    ebp+8  =  hwnd

    ebp+c  =  uMsg

    ebp+10 =  wParam

    ebp+14 =  lParam




    Determine where WM_COMMAND messages for the “Button 1” button are handled


    Once you have the address of the WindowProc, we can examine the assembly code for that function using Windbg.  Launch windbg.exe and select “File -> Attach to a Process”.  Select “ntdbghang1.exe” from the list and click “OK”.  When the debugger breaks in, you can unassemble the beginning of the WindowProc by typing “u <address>”.  Based on the address on my system, that command will be “u 01002830”.  To unassemble more just type “u” again.  I’m now going to unassemble the relevant code and explain what it means along the way...



    0:001> u 01002830



    The first 3 instructions are a function prologue, just getting everything set up.

    01002830 8bff            mov     edi,edi

    01002832 55              push    ebp

    01002833 8bec            mov     ebp,esp


    Here is the code that moves the value of uMsg into ecx

    01002835 8b4d0c          mov     ecx,dword ptr [ebp+0Ch]


    Now the app enters the assembly equivalent of a switch statement.

    We are interested in the check for uMsg = WM_COMMAND


    if uMsg = WM_CREATE (1) goto 01002893

    01002838 49              dec     ecx

    01002839 7458            je      ntdbghang1+0x2893 (01002893)

    if uMsg = WM_ DESTROY (2) goto 01002889

    0100283b 49              dec     ecx

    0100283c 744b            je      ntdbghang1+0x2889 (01002889)

    if uMsg = WM_CLOSE (0x10) goto 01002889

    0100283e 83e90e          sub     ecx,0Eh

    01002841 743b            je      ntdbghang1+0x287e (0100287e)

    if uMsg = WM_COMMAND (0x111) goto 01002853

    01002843 b801010000      mov     eax,101h

    01002848 2bc8            sub     ecx,eax

    0100284a 7407            je      ntdbghang1+0x2853 (01002853)


    From the above switch statement we see that when uMsg = WM_COMMAND, execution moves to 01002853.  So let’s go there...


    0:001> u 01002853



    Move the value of wParam into edx

    01002853 8b5510          mov     edx,dword ptr [ebp+10h]

    If LOWORD(wParam) == 0x101 (control ID for button 1) goto 0100286f

    01002856 0fb7ca          movzx   ecx,dx

    01002859 2bc8            sub     ecx,eax

    0100285b 7412            je      ntdbghang1+0x286f (0100286f)


    From the above assembly we can see that right away we are checking for the control ID for button 1.  This is the code path we want to follow....


    0:001> u 0100286f



    If HIWORD(wparam) != BN_CLICKED (0) then goto 0100289b

    Otherwise, call the function at 010027f6

    0100286f c1ea10          shr     edx,10h

    01002872 6685d2          test    dx,dx

    01002875 7524            jne     ntdbghang1+0x289b (0100289b)

    01002877 e87affffff      call    ntdbghang1+0x27f6 (010027f6)


    Now it is clear that when the BN_CLICKED occurs for control id 0x101, we are going to execute the function at address 010027f6. 




    Figure out what is failing


    So now we know what code is going to run every time the button is clicked (the function at 010027f6).  Let’s examine that code and figure out what may be failing.  We’ll use “uf” to unassemble the entire function, as it is rather short.  During this first pass we’ll make up names for our local variables and functions that are called, and then we’ll revisit and figure out what they are.



    0:001> uf 010027f6



    Function Prologue

    010027f6 8bff            mov     edi,edi

    010027f8 55              push    ebp

    010027f9 8bec            mov     ebp,esp


    010027fb 51              push    ecx

    010027fc 56              push    esi


    call func1(0x20) [func1 address at 01002dca]

    set localvar1=0x10

    010027fd 6a20            push    20h

    010027ff c745fc10000000  mov     dword ptr [ebp-4],10h

    01002806 e8bf050000      call    ntdbghang1+0x2dca (01002dca)


    Store the result of func1 in esi (call it localvar2)

    0100280b 8bf0            mov     esi,eax


    Call func2(localvar2, &localvar1) [func2 address at 0100b484]

    0100280d 59              pop     ecx

    0100280e 8d45fc          lea     eax,[ebp-4]

    01002811 50              push    eax

    01002812 56              push    esi

    01002813 e86c8c0000      call    ntdbghang1+0xb484 (0100b484)


    If func2 returns 0, then goto 01002821

    01002818 85c0            test    eax,eax

    0100281a 7405            je      ntdbghang1+0x2821 (01002821)


    Call func3() [func3 address at 0100278d]


    0100281c e86cffffff      call    ntdbghang1+0x278d (0100278d)


    Call func4(localvar2) [func4 address at 01002ce3]


    01002821 56              push    esi

    01002822 e8bc040000      call    ntdbghang1+0x2ce3 (01002ce3)


    Clean up and exit the function

    01002827 59              pop     ecx

    01002828 5e              pop     esi

    01002829 c9              leave

    0100282a c3              ret



    So let’s translate this to some pseudo-code...


    localvar1 = 0x10;

    localvar2 = func1(0x20);


    if(func2(localvar2, &localvar1)






    So the behavior of the function will differ, depending on the results of func2.  Let’s see if we can figure out what func2 does...


    0:001> u 0100b484


    0100b484 ff2568110001    jmp     dword ptr [ntdbghang1+0x1168 (01001168)]

    0100b48a cc              int     3

    0100b48b cc              int     3

    0100b48c cc              int     3


    0:001> dps 01001168 L1

    01001168  70b88cb1 WINSPOOL!GetDefaultPrinterW



    GetDefaultPrinterW is a public API function.  Here’s the function prototype from MSDN...


    BOOL GetDefaultPrinter(

      LPTSTR pszBuffer,   // printer name buffer

      LPDWORD pcchBuffer  // size of name buffer



    This fits our pseudo-code, since the function takes two parameters and returns a BOOL.  Let’s update our pseudo code to match what we know, replacing func2 with GetDefaultPrinter, etc...


    DWORD cchBuffer = 0x10;

    LPWSTR pszBuffer = func1(0x20);


    if(GetDefaultPrinterW(pszBuffer,  &cchBuffer))







    Based on the above, we can make a couple of assumptions.  Func1 is likely an allocator function of some kind (such as malloc) and func4 is likely a memory free function (such as free). Since the app is using the Unicode version of GetDefaultPrinter (it ends with a W) it make sense that func1 is an allocator asking for 0x20 bytes, and the size of the buffer, in characters, passed to GetDefaultPrinterW is 0x10.  So that accounts for all of the functions, except for func3.  We could unassemble func3 to see what it does, but that may not be needed.  From the pseudo-code we’ve already created, we can see that a failure in GetDefaultPrinterW will prevent the button click function from doing anything except allocating and freeing some memory.  So GetDefaultPrinterW is a likely potential point of failure.


    Let’s test this theory out.  We’ll set a breakpoint on GetDefaultPrinterW and see if it is failing upon the button click.


    0:001> bp WINSPOOL!GetDefaultPrinterW


    0:001> g


    <Now click Button 1>


    Breakpoint 0 hit

    eax=0006fb84 ebx=00000000 ecx=00000020 edx=00dc0e98 esi=00dc0e70 edi=0006fc0c

    eip=70b88cb1 esp=0006fb74 ebp=0006fb88 iopl=0         nv up ei pl nz na po nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202


    70b88cb1 8bff            mov     edi,edi


    // Dump out the call stack...

    0:000> kb

    ChildEBP RetAddr  Args to Child             

    0006fb70 01002818 00dc0e70 0006fb84 00000111 WINSPOOL!GetDefaultPrinterW

    WARNING: Stack unwind information not available. Following frames may be wrong.

    0006fb88 0100287c 0006fbbc 75d41a10 00320f78 ntdbghang1+0x2818

    0006fb90 75d41a10 00320f78 00000111 00000101 ntdbghang1+0x287c

    0006fbbc 75d41ae8 01002830 00320f78 00000111 USER32!InternalCallWinProc+0x23

    0006fc34 75d4286a 00000000 01002830 00320f78 USER32!UserCallWinProcCheckWow+0x14b

    0006fc74 75d42bba 00a90b80 0095ee08 00000101 USER32!SendMessageWorker+0x4b7


    // Go to the return address.  Note that eax=0, meaning GetDefaultPrinterW returned FALSE.

    0:000> gu

    eax=00000000 ebx=00000000 ecx=76f22033 edx=00e10178 esi=00dc0e70 edi=0006fc0c

    eip=01002818 esp=0006fb80 ebp=0006fb88 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246


    01002818 85c0            test    eax,eax


    // Check the last error...

    0:000> !gle

    LastErrorValue: (Win32) 0x7a (122) - The data area passed to a system call is too small.


           // Resume execution

    0:000> g


    The error returned was 0x7a = ERROR_INSUFFICIENT_BUFFER.  It looks like the buffer of 0x20 bytes being passed to GetDefaultPrinterW is too small to hold the name of the default printer for that user.  This explains why the app works for some users and fails for others.  A quick change of the default printer name to something less than 0x10 characters (16 decimal) will work around this issue.


    To wrap things up, here is the C source for the Button1_OnClick function (at address 010027f6 in the assembly) so you can see how this was actually written...


    VOID Button1_OnClick()


          DWORD cch = 16;

          LPTSTR pPrinterName;


          pPrinterName = (LPTSTR) malloc(16 * sizeof(TCHAR));


          if(GetDefaultPrinter(pPrinterName, &cch))










    Look for an upcoming blog post that shows another way this problem could have been approached – using the “wt” command in the debugger.  I hope you found this useful and can apply to real-world debugging scenarios.  Please post any questions / comments!

Page 3 of 24 (240 items) 12345»