Okay, I've previously written some random posts about how to set up windbg and how to troubleshoot OutOfMemoryExceptions. I thought I'd take a few steps back and review some of the basics in order to help you get started on using this fantastic tool.

Basic Configuration

  1. Copy sos.dll from the framework directory to the folder where you installed windbg. Make sure you copy it from the same Framework version as the one you wish to investigate. If you'll be working with both 1.1 and 2.0 you can rename the SOS.dlls to SOS11.dll and SOS20.dll or put them in separate folders.
  2. Create a folder where you want to cache all the symbol files. For example: "C:\Symbols".
  3. Start windbg and open the dialogue to configure the symbol path by clicking File -> Symbol File Path.
  4. Enter the path, as well as the address from which you'll want to download missing symbols using the following syntax:
    srv*[cache path]*[symbols path]
    I'd recommend the following path:
    srv*c:\symbols\public*http://msdl.microsoft.com/download/symbols

You should now be set to go. You're now ready to open up a saved dump, or attach to a process

Useful commands

I'll be using a dump from an IIS6-server to demonstrate some useful commands.

The first thing you'll want to do is load SOS. You'll do this using the .load command. The syntax is simple. .load [extension filename]. So if you want to load sos and haven't renamed the .dll you'd simply write:

.load sos

You'll now have all the cool commands from the SOS-extension at your disposal as well as the default windbg ones. Commands from extensions are always preceded by a "!", so if you want to run the help-command for sos you'd write

!help

If you should happen to have two extensions with an identically named command you can always separate them by typing ![extension name].[command] Example:

!sos.help

Okay, now that we know how to run the commands from the extension, try running !help. It should give you the following result.

0:000> !help
-------------------------------------------------------------------------------
SOS is a debugger extension DLL designed to aid in the debugging of managed
programs. Functions are listed by category, then roughly in order of
importance. Shortcut names for popular functions are listed in parenthesis.
Type "!help " for detailed info on that function.

Object Inspection                  Examining code and stacks
-----------------------------      -----------------------------
DumpObj (do)                       Threads
DumpArray (da)                     CLRStack
DumpStackObjects (dso)             IP2MD
DumpHeap                           U
DumpVC                             DumpStack
GCRoot                             EEStack
ObjSize                            GCInfo
FinalizeQueue                      EHInfo
PrintException (pe)                COMState
TraverseHeap                       BPMD

Examining CLR data structures      Diagnostic Utilities
-----------------------------      -----------------------------
DumpDomain                         VerifyHeap
EEHeap                             DumpLog
Name2EE                            FindAppDomain
SyncBlk                            SaveModule
DumpMT                             GCHandles
DumpClass                          GCHandleLeaks
DumpMD                             VMMap
Token2EE                           VMStat
EEVersion                          ProcInfo
DumpModule                         StopOnException (soe)
ThreadPool                         MinidumpMode
DumpAssembly                      
DumpMethodSig                      Other
DumpRuntimeTypes                   -----------------------------
DumpSig                            FAQ
RCWCleanupList
DumpIL

For more documentation on a specific command, type !help [name of command]

.time

This is not an SOS-command, which is evident by the command not beginning with a "!". Running the .Time command will show you relevant info about the time, as well as system uptime, process uptime and the amount of time spent in kernel & user mode.

0:000> .time
Debug session time: Tue Oct 23 08:38:35.000 2007 (GMT+1)
System Uptime: 4 days 17:48:01.906
Process Uptime: 0 days 0:24:37.000
  Kernel time: 0 days 0:04:23.000
  User time: 0 days 0:03:28.000

As you can see the system has been up for over 4 days. The process has been running for 24½ minutes and has an accumulated CPU-time of about 8 minutes total. This would give us an average CPU-usage for the process of around 32,5%

!threadpool

We can then use the !Threadpool-command to find out exactly what the CPU-usage was at the time the dump was taken. We'll also get some useful information like the number of work requests in the queue, completion port threads and timers.

0:000> !threadpool
CPU utilization 100%
Worker Thread: Total: 5 Running: 4 Idle: 1 MaxLimit: 200 MinLimit: 2
Work Request in Queue: 16
Unknown Function: 6a2d945d  Context: 023ede30
Unknown Function: 6a2d945d  Context: 023ee1e8
AsyncTimerCallbackCompletion TimerInfo@11b53760
Unknown Function: 6a2d945d  Context: 023ee3a8
Unknown Function: 6a2d945d  Context: 023e3040
Unknown Function: 6a2d945d  Context: 023ee178
Unknown Function: 6a2d945d  Context: 023edfb0
AsyncTimerCallbackCompletion TimerInfo@11b36428
AsyncTimerCallbackCompletion TimerInfo@11b53868
Unknown Function: 6a2d945d  Context: 023ee060
Unknown Function: 6a2d945d  Context: 023ee290
Unknown Function: 6a2d945d  Context: 023eded0
Unknown Function: 6a2d945d  Context: 023edd88
Unknown Function: 6a2d945d  Context: 023ede98
Unknown Function: 6a2d945d  Context: 023ee258
Unknown Function: 6a2d945d  Context: 023edfe8
--------------------------------------
Number of Timers: 9
--------------------------------------
Completion Port Thread:Total: 3 Free: 3 MaxFree: 4 CurrentLimit: 2 MaxLimit: 200 MinLimit: 2

So we can see that currently we're using 100% of the CPU, which leads us to the next command.

!runaway

This is a nice command that will list all running threads and their CPU-usage. It's your best friend when troubleshooting a high CPU hang issue.

0:000> !runaway
User Mode Time
  Thread       Time
  25:1a94      0 days 0:00:39.937
  16:1bc0      0 days 0:00:38.390
  50:1e8c      0 days 0:00:08.859
  52:1e40      0 days 0:00:08.687
  20:1c2c      0 days 0:00:08.234
  51:1340      0 days 0:00:08.171
  21:1bcc      0 days 0:00:06.953
  26:13ec      0 days 0:00:06.671
  44:131c      0 days 0:00:03.906
  22:d8c       0 days 0:00:03.375
  33:78c       0 days 0:00:02.656
  34:1a8c      0 days 0:00:00.906
  29:1f5c      0 days 0:00:00.828
   6:e28       0 days 0:00:00.625
   5:1c78      0 days 0:00:00.546
  23:14a4      0 days 0:00:00.484
   4:5ac       0 days 0:00:00.437
  45:5dc       0 days 0:00:00.421
   3:13b4      0 days 0:00:00.421
  47:19c8      0 days 0:00:00.375
  28:1b6c      0 days 0:00:00.250
  46:1dac      0 days 0:00:00.156
   7:1dd8      0 days 0:00:00.109
  48:cdc       0 days 0:00:00.093
  49:1eac      0 days 0:00:00.062
  15:1a64      0 days 0:00:00.062
   0:1804      0 days 0:00:00.046
  36:4a4       0 days 0:00:00.031
  11:1eb4      0 days 0:00:00.031
   1:10b4      0 days 0:00:00.031
  31:16ac      0 days 0:00:00.015
  14:4ac       0 days 0:00:00.015
   2:186c      0 days 0:00:00.015
  59:590       0 days 0:00:00.000
  58:294       0 days 0:00:00.000
  57:16d0      0 days 0:00:00.000
  56:1578      0 days 0:00:00.000
  55:1428      0 days 0:00:00.000
  54:16d8      0 days 0:00:00.000
  53:fd8       0 days 0:00:00.000
  43:1b8c      0 days 0:00:00.000
  42:1c24      0 days 0:00:00.000
  41:1e2c      0 days 0:00:00.000
  40:11b0      0 days 0:00:00.000
  39:edc       0 days 0:00:00.000
  38:1a08      0 days 0:00:00.000
  37:171c      0 days 0:00:00.000
  35:1254      0 days 0:00:00.000
  32:1f9c      0 days 0:00:00.000
  30:1ae8      0 days 0:00:00.000
  27:190c      0 days 0:00:00.000
  24:1d2c      0 days 0:00:00.000
  19:1e38      0 days 0:00:00.000
  18:ee4       0 days 0:00:00.000
  17:fb8       0 days 0:00:00.000
  13:1b54      0 days 0:00:00.000
  12:1a48      0 days 0:00:00.000
  10:f64       0 days 0:00:00.000
   9:1024      0 days 0:00:00.000
   8:1b78      0 days 0:00:00.000

As you can see the total amount of time does not match the total CPU utilization time that we got from the .time command. That's simply because threads get reused and recycled. This means that the total amount of CPU time used by a thread may have been split up over several page requests.

!threads

To get more information about the running threads we can run the !Threads-command. This will list all managed threads in the application, what application domain the thread is currently executing under, etc. The output will look like this:

0:000> !threads
ThreadCount: 48
UnstartedThread: 0
BackgroundThread: 29
PendingThread: 0
DeadThread: 19
Hosted Runtime: no
                                      PreEmptive   GC Alloc           Lock
       ID OSID ThreadOBJ    State     GC       Context       Domain   Count APT Exception
  16    1 1bc0 001fccd0   1808220 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  22    2  d8c 002016f0      b220 Enabled  00000000:00000000 0019daf0     0 MTA (Finalizer)
  14    4  4ac 00242e58   880a220 Enabled  00000000:00000000 0019daf0     0 MTA (Threadpool Completion Port)
  23    5 14a4 11b39f18    80a220 Enabled  00000000:00000000 0019daf0     0 MTA (Threadpool Completion Port)
  24    6 1d2c 11b41ad8      1220 Enabled  00000000:00000000 0019daf0     0 Ukn
  25    7 1a94 11b46c70   180b220 Enabled  27240c98:27241fd8 11b42540     1 MTA (Threadpool Worker)
  26    9 13ec 12ce2888   200b220 Enabled  2a9f1434:2a9f33c0 11b42540     0 MTA
  27    a 190c 12d85eb8   200b220 Enabled  00000000:00000000 11b42540     0 MTA
  29    b 1f5c 13df6a50   200b220 Enabled  2ab1da6c:2ab1f1c0 11b42540     0 MTA
  30    c 1ae8 12d44a58      b220 Enabled  00000000:00000000 11b42540     0 MTA
  31    d 16ac 12e2e008   200b220 Enabled  2a81348c:2a8153c0 11b42540     1 MTA
   5    e 1c78 12da2160       220 Enabled  00000000:00000000 0019daf0     0 Ukn
  33    8  78c 11b674c8   200b220 Enabled  2707b818:2707c1d8 11b42540     0 MTA
  34   12 1a8c 13f163c8       220 Enabled  00000000:00000000 0019daf0     0 Ukn
  36   13  4a4 13eef718   200b220 Enabled  2a7db4a4:2a7dd3c0 11b42540     0 MTA
   4   14  5ac 13ef2008       220 Enabled  00000000:00000000 0019daf0     0 Ukn
  42   10 1c24 13f0e950   880b220 Enabled  00000000:00000000 0019daf0     0 MTA (Threadpool Completion Port)
   6   11  e28 13f16008       220 Enabled  00000000:00000000 0019daf0     0 Ukn
   3    f 13b4 13eba008       220 Enabled  00000000:00000000 0019daf0     0 Ukn
  43   15 1b8c 140db008   880b220 Enabled  00000000:00000000 0019daf0     0 MTA (Threadpool Completion Port)
  44   17 131c 140ceb28   200b220 Enabled  272288c8:27229fd8 11b42540     0 MTA
  45   1d  5dc 140cd0a0       220 Enabled  00000000:00000000 0019daf0     0 Ukn
  47   20 19c8 1651a008       220 Enabled  00000000:00000000 0019daf0     0 Ukn
XXXX   24    0 16468880   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  46   1f 1dac 1650ab48       220 Enabled  00000000:00000000 0019daf0     0 Ukn
XXXX   1a    0 140d5008   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   16    0 140c5008   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  50    3 1e8c 14064420   180b220 Enabled  27246f54:27247fd8 11b42540     1 MTA (Threadpool Worker)
XXXX   35    0 1406e800   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  51   36 1340 140df008   180b220 Enabled  2adec9cc:2aded1c0 11b42540     1 MTA (Threadpool Worker)
XXXX   37    0 16566868   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  48   38  cdc 16578840       220 Enabled  00000000:00000000 0019daf0     0 Ukn
XXXX   39    0 16566c28   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   3b    0 1646b8b0   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   3c    0 16674008   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   3d    0 16676418   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   3e    0 16676fb8   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   3f    0 16674d48   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   40    0 1667de10   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   41    0 16680050   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   42    0 166812e8   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   43    0 16683e60   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  52   44 1e40 165259e8   180b220 Enabled  2adf126c:2adf31c0 11b42540     1 MTA (Threadpool Worker)
XXXX   45    0 165b7c08   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   46    0 165aa3d8   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   47    0 165242c8   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
XXXX   48    0 165e9500   1801820 Enabled  00000000:00000000 0019daf0     0 Ukn (Threadpool Worker)
  49   3a 1eac 165676f0       220 Enabled  00000000:00000000 0019daf0     0 Ukn 

The threads with an ID of XXXX have ended and are waiting to be recycled. We can also see that the finalizer thread has an ID of 22. So if we'd seen an unhealthy amount of activity on thread 22 when we ran the !runaway-command we would now have known that we had a finalizer-issue on our hands.

Switching to a specific thread

To go to a specific thread we use the ~-command. The syntax is as follows: ~[number of thread]s. So to switch to thread 50 we would type the following:

0:000> ~50s

We have then switched to thread 50, and can use a lot of other useful commands.

!clrstack

This great command will list the callstack for the current thread. If you want additional information you can add the "-p" switch which will show you parameters and local variables as well.

Below is a sample listing of the clrstack for thread 50.

0:050> !clrstack
OS Thread Id: 0x1e8c (50)
ESP       EIP    
17a9e750 7d61c828 [NDirectMethodFrameSlim: 17a9e750] System.DirectoryServices.Protocols.Wldap32.ldap_bind_s(IntPtr, System.String, System.DirectoryServices.Protocols.SEC_WINNT_AUTH_IDENTITY_EX, System.DirectoryServices.Protocols.BindMethod)
17a9e768 14df70f9 System.DirectoryServices.Protocols.LdapConnection.BindHelper(System.Net.NetworkCredential, Boolean)
17a9e794 14df6de0 System.DirectoryServices.Protocols.LdapConnection.Bind()
17a9e79c 14df59e9 System.DirectoryServices.Protocols.LdapConnection.SendRequestHelper(System.DirectoryServices.Protocols.DirectoryRequest, Int32 ByRef)
17a9e8b8 14df56e8 System.DirectoryServices.Protocols.LdapConnection.SendRequest(System.DirectoryServices.Protocols.DirectoryRequest, System.TimeSpan)
17a9e8bc 14df5657 [InlinedCallFrame: 17a9e8bc]

So, reading from the bottom up we can see that an LdapConnection called the SendRequest method, which in turn called the SendRequestHelper method, which called the Bind method, and so on.

If we run !clrstack -p we get the following:

0:050> !clrstack -p
OS Thread Id: 0x1e8c (50)
ESP       EIP    
17a9e750 7d61c828 [NDirectMethodFrameSlim: 17a9e750] System.DirectoryServices.Protocols.Wldap32.ldap_bind_s(IntPtr, System.String, System.DirectoryServices.Protocols.SEC_WINNT_AUTH_IDENTITY_EX, System.DirectoryServices.Protocols.BindMethod)
17a9e768 14df70f9 System.DirectoryServices.Protocols.LdapConnection.BindHelper(System.Net.NetworkCredential, Boolean)
    PARAMETERS:
        this = 0x271fdfe0
        newCredential =
        needSetCredential =

17a9e794 14df6de0 System.DirectoryServices.Protocols.LdapConnection.Bind()
    PARAMETERS:
        this =

17a9e79c 14df59e9 System.DirectoryServices.Protocols.LdapConnection.SendRequestHelper(System.DirectoryServices.Protocols.DirectoryRequest, Int32 ByRef)
    PARAMETERS:
        this = 0x271fdfe0
        request = 0x27246e38
        messageID = 0x17a9e8ec

17a9e8b8 14df56e8 System.DirectoryServices.Protocols.LdapConnection.SendRequest(System.DirectoryServices.Protocols.DirectoryRequest, System.TimeSpan)
    PARAMETERS:
        this = 0x271fdfe0
        request = 0x27246e38
        requestTimeout =

17a9e8bc 14df5657 [InlinedCallFrame: 17a9e8bc]

We can now look at the parameters, like the DirectoryRequest that was sent to the SendRequest and SendRequestHelper methods. To do this we simply copy the address of the request, (0x27246e38) and use it as an argument for our next command.

!dumpobject (!do)

This is another crucial command. Dumpobject will dump the object at the specified address, so if we send the address of the request as a parameter we will get the request dumped to screen.:

0:050> !do 0x27246e38
Name: System.DirectoryServices.Protocols.SearchRequest
MethodTable: 14b394c4
EEClass: 14d97ce0
Size: 52(0x34) bytes
GC Generation: 0
(C:\WINDOWS\assembly\GAC_MSIL\System.DirectoryServices.Protocols\2.0.0.0__b03f5f7f11d50a3a\System.DirectoryServices.Protocols.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
02c39310  4000102        4        System.String  0 instance 00000000 directoryRequestID
14b398bc  4000103        8 ...ControlCollection  0 instance 27246e90 directoryControlCollection
02c39310  4000111        c        System.String  0 instance 27246d00 dn
12579f5c  4000112       10 ....StringCollection  0 instance 27246eb4 directoryAttributes
02c36ca0  4000113       14        System.Object  0 instance 27246ddc directoryFilter
14b39344  4000114       18         System.Int32  1 instance        1 directoryScope
14b393fc  4000115       1c         System.Int32  1 instance        0 directoryRefAlias
0fd3da00  4000116       20         System.Int32  1 instance        0 directorySizeLimit
1202af88  4000117       28      System.TimeSpan  1 instance 27246e60 directoryTimeLimit
120261c8  4000118       24       System.Boolean  1 instance        0 directoryTypesOnly

Okay, so what's this? Well, it's a System.DirectoryServices.Protocols.SearchRequest object. This means that it has various properties defined by the System.DirectoryServices.Protocols.SearchRequest class. If you want to know more about these properties I suggest you look up the SearchRequest class in msdn. We have the RequestId, the Scope, the DistinguishedName, etc.

So, let's say we want to know what the distinguished name is for this particular request. We boldly assume that the dn-property we see in the listing above is what is called DistinguishedName by MSDN. Simply copy the address of the dn-property (27246d00) and use !dumpobject again. We can see that the property is a System.String, so the output should be pretty clear.

0:050> !do 27246d00
Name: System.String
MethodTable: 02c39310
EEClass: 0fb610ac
Size: 112(0x70) bytes
GC Generation: 0
(C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: CN=Dummy,CN=Accounts,CN=useradm,DC=dummy,DC=net
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
0fd3da00  4000096        4         System.Int32  1 instance       48 m_arrayLength
0fd3da00  4000097        8         System.Int32  1 instance       47 m_stringLength
0fb80010  4000098        c          System.Char  1 instance       43 m_firstChar
02c39310  4000099       10        System.String  0   shared   static Empty
    >> Domain:Value  0019daf0:03380310 11b42540:03380310 <<
0fb86d44  400009a       14        System.Char[]  0   shared   static WhitespaceChars
    >> Domain:Value  0019daf0:03380324 11b42540:033855bc <<

Apparently, the distinguished name used was "CN=Dummy,CN=Accounts,CN=useradm,DC=dummy,DC=net". If we want to find out more we simply continue using !dumpobject to examine the objects and their respective values.

In my next post I thought I'd continue using the !dumpobject-command to probe through the w3wp-process, and introduce a couple of other great commands.

 

To be continued...

/ Johan