So you have a dump from a hung server and you’re the first person on the scene. Your IT Manager is jumping up and down, the phone is ringing off the hook and people are hovering outside your cube. It’s game time and the pressure is on!!! Now what do you do?
Well take a deep breath, get a cup of coffee, and relax because I’m here to help you out! Let me share what we typically do on our first pass through a hung server kernel debug. This works for both live debugs and dumps. These are steps you can take and they will find problems!
Here’s something else to consider. If the server is mission critical you will probably want to get a dump vs. a live debug so you can get the server back up and running. This will take the pressure off because you can then do the debug offline, and if need be, send the dump to other people for review.
Before we get started let me state that the following data is completely fabricated and many of the process names and address in this output have been made up. Do not question odd offsets or alignments.
I’m also assuming that you know how to
1. Collect a kernel dump: http://support.microsoft.com/kb/244139
2. Set up the debugger: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx
3. Know how to use the symbol server: http://support.microsoft.com/kb/311503
0) Before I start these types of debugs I like to open a log file.
1: kd> .logopen H:\repro\hungserver.log
Opened log file 'H:\repro\hungserver.log'
1) !vm - Look for memory usage. Generally speaking you want to look at what the current pool or memory usage values are and compare them to the max available.
1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 982890 ( 3931560 Kb)
Page File: \??\P:\pagefile.sys
Current: 3931560 Kb Free Space: 3742548 Kb
Minimum: 3931560 Kb Maximum: 4193280 Kb
Available Pages: 631300 ( 2525200 Kb)
ResAvail Pages: 888171 ( 3552684 Kb)
Locked IO Pages: 195 ( 780 Kb)
Free System PTEs: 202830 ( 811324 Kb) < THIS IS OK
Free NP PTEs: 32765 ( 131060 Kb) < THIS IS OK
Free Special NP: 0 ( 0 Kb)
Modified Pages: 241 ( 964 Kb)
Modified PF Pages: 241 ( 964 Kb)
NonPagedPool Usage: 11377 ( 45508 Kb) < THIS IS OK
NonPagedPool Max: 65536 ( 262144 Kb)
PagedPool 0 Usage: 6398 ( 25592 Kb)
PagedPool 1 Usage: 2201 ( 8804 Kb)
PagedPool 2 Usage: 2216 ( 8864 Kb)
PagedPool 3 Usage: 2179 ( 8716 Kb)
PagedPool 4 Usage: 2199 ( 8796 Kb)
PagedPool Usage: 15193 ( 60772 Kb) < THIS IS OK
PagedPool Maximum: 67584 ( 270336 Kb)
Shared Commit: 24569 ( 98276 Kb)
Special Pool: 0 ( 0 Kb)
Shared Process: 12519 ( 50076 Kb)
PagedPool Commit: 15252 ( 61008 Kb)
Driver Commit: 2083 ( 8332 Kb)
Committed pages: 313611 ( 1254444 Kb) < THIS IS OK
Commit limit: 1925815 ( 7703260 Kb)
Check to see if any apps are using tons of memory. In this case I don’t see a problem.
Total Private: 239673 ( 958692 Kb)
36b0 EXCEL.EXE 10775 ( 43100 Kb) < THIS IS OK, etc
2ee8 myapploc.exe 10288 ( 41152 Kb)
097c MySSrv.exe 7497 ( 29988 Kb)
0418 MyFun32.exe 6277 ( 25108 Kb)
0474 svchost.exe 6164 ( 24656 Kb)
1be8 ABCDEFGH.EXE 4984 ( 19936 Kb)
0480 IEXPLORE.EXE 4924 ( 19696 Kb)
09c4 ANOTHER.exe 4768 ( 19072 Kb)
19a4 HMMINTER.exe 4207 ( 16828 Kb)
1b30 ohboya.EXE 4146 ( 16584 Kb)
4558 aprocess.EXE 4138 ( 16552 Kb)
30e8 another.exe 3691 ( 14764 Kb)
0924 aservicec.exe 3508 ( 14032 Kb)
0854 RRXXc.exe 3400 ( 13600 Kb)
3458 MYWIN.EXE 3389 ( 13556 Kb)
0d90 FunService.exe 3298 ( 13192 Kb)
1180 CustomAp.exe 3221 ( 12884 Kb)
06ac XYZvrver.exe 2769 ( 11076 Kb)
2cdc ABCDEFGH.exe 2591 ( 10364 Kb)
02f4 lsass.exe 2567 ( 10268 Kb)
21b4 IEXPLORE.EXE 2516 ( 10064 Kb)
3420 Process.exe 2450 ( 9800 Kb)
4cd4 XYZXY.EXE 2305 ( 9220 Kb)
4a30 lookup.EXE 2244 ( 8976 Kb)
4360 Process.exe 2201 ( 8804 Kb)
0564 spoolsv.exe 2166 ( 8664 Kb)
2e5c XYZXYZEXE 2076 ( 8304 Kb)
02bc winlogon.exe 1964 ( 7856 Kb)
4e48 winlogon.exe 1958 ( 7832 Kb)
42bc ABCDEFGH.exe 1943 ( 7772 Kb)
0eb8 svchost.exe 1922 ( 7688 Kb)
3b98 Process.exe 1919 ( 7676 Kb)
4c1c IEXPLORE.EXE 1864 ( 7456 Kb)
17b8 winlogon.exe 1852 ( 7408 Kb)
3124 winlogon.exe 1849 ( 7396 Kb)
14b8 winlogon.exe 1847 ( 7388 Kb)
32cc winlogon.exe 1843 ( 7372 Kb)
1f84 winlogon.exe 1843 ( 7372 Kb)
2ebc winlogon.exe 1842 ( 7368 Kb)
1548 winlogon.exe 1840 ( 7360 Kb)
21c4 PROCESS213.EXE 1833 ( 7332 Kb)
3b58 MYWIN.EXE 1817 ( 7268 Kb)
4b3c winlogon.exe 1816 ( 7264 Kb)
NOTE if you see high pool values you will want to issue a !poolused 2 and a !poolused 4 to dump out the pool usages so you can see what pool tags are consuming pool. (We will write a dedicated blog on this topic later.)
2) !sysptes - See if one of the lists is low (less than 10)
1: kd> !sysptes
All of these are ok
System PTE Information
Total System Ptes 224223
SysPtes list of size 1 has 225 free
SysPtes list of size 2 has 57 free
SysPtes list of size 4 has 136 free
SysPtes list of size 8 has 59 free
SysPtes list of size 16 has 95 free
starting PTE: c022b000
ending PTE: c03dff78
free blocks: 652 total free: 202831 largest free block: 191973
3) !defwrites - If throttling, the server is doing nothing other than writing to the disk.
1: kd> !defwrites
*** Cache Write Throttle Analysis ***
CcTotalDirtyPages: 187 ( 748 Kb)
CcDirtyPageThreshold: 130560 ( 522240 Kb)
MmAvailablePages: 631300 ( 2525200 Kb)
MmThrottleTop: 450 ( 1800 Kb)
MmThrottleBottom: 80 ( 320 Kb)
MmModifiedPageListHead.Total: 241 ( 964 Kb)
Write throttles not engaged < THIS IS OK. Good = NOT engaged.
4) !ready to see if we're holding stuff up
1: kd> !ready
Processor 0: No threads in READY state < THIS IS OK
Processor 1: No threads in READY state < THIS IS OK
If we had threads in a ready state you would want to investigate what those threads were and what is running on the processor.
5) !pcr x; kv on each processor - If they aren't idle then we could be doing DPCs
1: kd> !pcr 0 < Dump the processor control registers for CPU 0
KPCR for Processor 0 at ffdff000:
Major 1 Minor 1
NtTib.ExceptionList: ffffffff
NtTib.StackBase: 00000000
NtTib.StackLimit: 00000000
NtTib.SubSystemTib: 80042000
NtTib.Version: 012e7ace
NtTib.UserPointer: 00000001
NtTib.SelfTib: 00000000
SelfPcr: ffdff000
Prcb: ffdff120
Irql: 00000000
IRR: 00000000
IDR: ffffffff
InterruptMode: 00000000
IDT: 8003f400
GDT: 8003f000
TSS: 80042000
CurrentThread: 8056cd00
NextThread: 00000000
IdleThread: 8056cd00
DpcQueue: < NO DPCs: Not much to look at then
1: kd> !pcr 1 < Dump the processor control registers for CPU 1
KPCR for Processor 1 at f773f000:
NtTib.ExceptionList: f5ba1d30
NtTib.SubSystemTib: f773fef0
NtTib.Version: 0121925d
NtTib.UserPointer: 00000002
NtTib.SelfTib: 7ffda000
SelfPcr: f773f000
Prcb: f773f120
IDT: f77456e0
GDT: f77452e0
TSS: f773fef0
CurrentThread: 8963cb90
IdleThread: f7741fa0
6) !locks - Look for deadlocks and contention
The following output is of interest.
The thread ID with the <*> next to it means that he has exclusive access to the resource and that all the other threads are waiting on that thread to finish its work. Typically you would !thread that OWNER THREAD ID <*> (e.g., !thread 87bddda0) to see what that thread is doing. If you have two threads that have exclusive access to two different resources, and these threads are in each other’s exclusive waiters list, you have a deadlock. The following is an example of what a deadlock might look like. In this case you would want to !thread each owner and evaluate the logic of the code in each stack that allowed the threads to get into this state
1: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks......
Resource @ 0x8a50ee98 Shared 4 owning threads
Threads: 896856d0-01<*> 89686778-01<*> 896862d0-01<*> 89685da0-01<*>
KD: Scanning for held locks............................................................
Resource @ 0x896da1bc Exclusively owned
Threads: 896e3b20-01<*>
KD: Scanning for held locks..
Resource @ 0x81234567 Shared 1 owning threads
Contention Count = 15292
NumberOfSharedWaiters = 1
NumberOfExclusiveWaiters = 39
Threads: 87bddda0-01<*> 806d2020-01
Threads Waiting On Exclusive Access:
80ced020 80c036f8 80cdc7a0 80c438b0
80e6cda0 80f96987 8007fd60 8004dc10
80d7b020 80a2dd70 80b89620 80b58020
8036eda0 87abc123 80606da0 8056e890
802b3630 80cc7590 80d64020 80f7dda0
80129580 80b73da0 806d2578 80b505d8
KD: Scanning for held locks................
Resource @ 0x83245678 Exclusively owned
Contention Count = 4827
NumberOfExclusiveWaiters = 35
Threads: 87abc123-01<*>
803e6aa0 80876020 80240020 80f56588
808174f0 80bd6b28 80c3c448 8046d6c8
801e8da0 80356518 80b4c978 8069e020
80cb9020 87bddda0 80c65020 86daaac0
80379020 80fe4020
8) !process 0 0 - Search for drwtsn32. This would indicate that we have a process that has crashed and is in the process of being dumped. This could cause a server hang. Look at the PEB for drwtsn32 and get its command line to see what process is being dumped. You should be able to do this by getting its process id and doing a .process PROCESSID;.reload;!PEB
The following is how to extract a command line for any process, but it would work for Watson also.
1: kd> .process 89f31020
Implicit process is now 89f31020
1: kd> .reload
Loading Kernel Symbols
...........................................................................................................................................
Loading User Symbols
...............................
Loading unloaded module list
...............
1: kd> !peb
PEB at 7ffdf000
InheritedAddressSpace: No
ReadImageFileExecOptions: Yes
BeingDebugged: No
ImageBaseAddress: 01000000
Ldr 77fc23a0
Ldr.Initialized: Yes
Ldr.InInitializationOrderModuleList: 00171ef8 . 00176c90
Ldr.InLoadOrderModuleList: 00171e90 . 00176c80
Ldr.InMemoryOrderModuleList: 00171e98 . 00176c88
Base TimeStamp Module
1000000 3e80245d Mar 24 05:41:49 2003 \??\P:\WINDOWS\system32\winlogon.exe
77f40000 3e802494 Mar 25 05:42:44 2003 P:\WINDOWS\system32\ntdll.dll
77e40000 44c60ec8 Jul 25 08:30:00 2006 P:\WINDOWS\system32\kernel32.dll
77ba0000 3e802496 Mar 25 05:42:46 2003 P:\WINDOWS\system32\msvcrt.dll
77da0000 3e802495 Mar 25 05:42:45 2003 P:\WINDOWS\system32\ADVAPI32.dll
77c50000 40566fc9 Mar 15 23:08:57 2004 P:\WINDOWS\system32\RPCRT4.dll
77d00000 45e7bafc Mar 02 00:49:48 2007 P:\WINDOWS\system32\USER32.dll
77c00000 45e7bafc Mar 02 00:49:48 2007 P:\WINDOWS\system32\GDI32.dll
75970000 3e8024a2 Mar 25 05:42:58 2003 P:\WINDOWS\system32\USERENV.dll
75810000 3e8024a3 Mar 25 05:42:59 2003 P:\WINDOWS\system32\NDdeApi.dll
761b0000 3e8024a0 Mar 25 05:42:56 2003 P:\WINDOWS\system32\CRYPT32.dll
SubSystemData: 00000000
ProcessHeap: 00070000
ProcessParameters: 00020000
WindowTitle: '< Name not readable >'
ImageFile: '\??\P:\WINDOWS\system32\winlogon.exe'
CommandLine: 'winlogon.exe' < HERE IS THE COMMAND LINE.. No args in this case
( output is truncated ... )
9) Look at the handle table size. If it’s over 10000 you may have trouble. If you do have a handle leak refer to TalkBackVideo Understanding handle leaks and How to use !htrace to find them
1: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 8a613270 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 0acc0000 ObjectTable: e1001d10 HandleCount: 2510.
Image: System
PROCESS 8a294328 SessionId: none Cid: 0274 Peb: 7ffdf000 ParentCid: 0004
DirBase: ef1ac000 ObjectTable: e14ac1d0 HandleCount: 124.
Image: smss.exe
PROCESS 8a103424 SessionId: 0 Cid: 02a4 Peb: 7ffdf000 ParentCid: 0274
DirBase: ed804000 ObjectTable: e18caa68 HandleCount: 1171.
Image: csrss.exe
PROCESS 8a104343 SessionId: 0 Cid: 02bc Peb: 7ffdf000 ParentCid: 0274
DirBase: ed539000 ObjectTable: e18c67b0 HandleCount: 498.
Image: winlogon.exe
PROCESS 8a0f6634 SessionId: 0 Cid: 02e8 Peb: 7ffdf000 ParentCid: 02bc
DirBase: ece72000 ObjectTable: e1668e40 HandleCount: 568.
Image: services.exe
PROCESS 8a123423 SessionId: 0 Cid: 02f4 Peb: 7ffdf000 ParentCid: 02bc
DirBase: ecd7a000 ObjectTable: e16684a0 HandleCount: 30000. < This is bad
Image: lsass.exe
PROCESS 89f96453 SessionId: 0 Cid: 03e0 Peb: 7ffdf000 ParentCid: 02e8
DirBase: eb99c000 ObjectTable: e16bb570 HandleCount: 500.
Image: svchost.exe
PROCESS 8a0c6532 SessionId: 0 Cid: 042c Peb: 7ffdf000 ParentCid: 02e8
DirBase: eb6d7000 ObjectTable: e1731170 HandleCount: 156.
PROCESS 8a0a8d88 SessionId: 0 Cid: 0460 Peb: 7ffdf000 ParentCid: 02e8
DirBase: eb58f000 ObjectTable: e17372e8 HandleCount: 124.
PROCESS 89f77678 SessionId: 0 Cid: 0474 Peb: 7ffdf000 ParentCid: 02e8
DirBase: eb484000 ObjectTable: e17305b8 HandleCount: 1457.
9) !process 0 0 system - Check the worker threads in the system process (search for srv! to find server worker threads). What are these threads doing? These are the server service threads. Are they blocked on I/O or waiting for a resource?
10) 1: kd> !process 0 17 csrss.exe - Look for 16 LPC server threads.
What are they doing? Are they blocked?
11) !stacks 2, This will dump every call stack on the server. You may need to go through and evaluate every stack on the server. Look for critical sections, etc.
15) !qlocks This will allow you to check the stack of all the Queued spin locks on the machine. For further information on spinlocks refer to the Windows Internals book.
1: kd> !qlocks
Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt
Processor Number
Lock Name 0 1 << Nothing to worry about here.
KE - Dispatcher
MM - Expansion
MM - PFN
MM - System Space
CC - Vacb
CC - Master
EX - NonPagedPool
IO - Cancel
EX - WorkQueue
IO - Vpb
IO - Database
IO - Completion
NTFS - Struct
AFD - WorkQueue
CC - Bcb
MM - NonPagedPool
16) !process 0 17 winlogon.exe to look for hung LPC calls. If you find a LPC call calling out of winlogon you can follow the call with the !LPC debugger command. This will allow you to see what the thread is doing in the other process.
If you have further questions on any of these commands, please refer to the debugger.chm file in the Windows debugger tools install.
Good luck and happy debugging.
“This debugger is mine, there are many like it but this one is mine!” Jeff Dailey
PingBack from http://blog.a-foton.ru/2008/09/red-alert-my-server-is-hung-what-do-i-do/
Hi All, Debugging a dump from a hung server may not be something you do every day so you may want to
Great post. I will appreciate other posts like this!
Other Don't Tell Me "How", Tell me "What" Microsoft Network Monitor 3.2 .NET MSDN
This was a great post with lots of really useful information.