Holy cow, I wrote a book!
As we saw some time ago,
process shutdown is a multi-phase affair.
After you call ExitProcess,
all the threads are forcibly terminated.
After that's done, each DLL is sent a DLL_PROCESS_DETACH
You may be debugging
a problem with DLL_PROCESS_DETACH handling
that suggests that some of those threads were not cleaned up properly.
For example, you might assert that a reference count is zero,
and you find during process shutdown that this assertion sometimes fires.
Maybe you terminated a thread before it got a chance to release
How can you test this theory if the thread is already gone?
It so happens that when all the threads are terminated during the
early phase of process shutdown,
the kernel is a bit lazy and doesn't free their stacks.
It figures, hey, the entire process is going away soon,
so the stack memory is going to be cleaned up as part of process
(It's sort of the kernel equivalent of
not bothering to sweep the floor
of a building that's about to be demolished.)
You can use this to your advantage by grovelling the stacks
that were left behind.
Hey, this is why you get called in to debug the hard stuff, right?
Before continuing, I need to emphasize that this information is
for debugging purposes only.
The structures and offsets are all implementation details
which can change from release to release.
The first step is to identify where all the stacks are.
The direct approach is difficult because the stacks can be all
different sizes, so it's not easy to pick them out of a line-up.
But one thing does come in a consistent size: The
From the kernel debugger, use the !process command
to dump the process you are interested in,
and from the header information, extract the VadRoot.
1: kd> !process -1
PROCESS 8731bd40 SessionId: 1 Cid: 0748 Peb: 7ffda000 ParentCid: 0620
DirBase: 4247b000 ObjectTable: 96f66de0 HandleCount: 104.
VadRoot 893de570 Vads 124 Clone 0 Private 518. Modified 643. Locked 0.
Dump this VAD root with the !vad command,
and pay attention only to the entries which say
1 Private READWRITE.
1 Private READWRITE
1: kd> !vad 893de570
VAD level start end commit
... ignore everything except "1 Private READWRITE" ...
8730a5f0 ( 6) 50 50 1 Private READWRITE
9ab0cb40 ( 5) 60 7f 1 Private READWRITE
893978b0 ( 6) 80 9f 1 Private READWRITE
87302d30 ( 5) 110 110 1 Private READWRITE
889693f8 ( 6) 120 121 1 Private READWRITE
872f3fb8 ( 6) 170 170 1 Private READWRITE
87089a80 ( 6) 1a0 1a0 1 Private READWRITE
8cbf1cb0 ( 5) 1c0 1df 1 Private READWRITE
88c079d0 ( 6) 1e0 1e0 1 Private READWRITE
9abc33e0 ( 6) 410 48f 1 Private READWRITE
873173b0 ( 7) 970 970 1 Private READWRITE
8ca1c158 ( 7) 7ffd5 7ffd5 1 Private READWRITE
88c02a78 ( 6) 7ffd6 7ffd6 1 Private READWRITE
872f9298 ( 5) 7ffd7 7ffd7 1 Private READWRITE
8750d210 ( 7) 7ffd8 7ffd8 1 Private READWRITE
87075ce8 ( 6) 7ffda 7ffda 1 Private READWRITE
87215da0 ( 4) 7ffdc 7ffdc 1 Private READWRITE
872f2200 ( 6) 7ffdd 7ffdd 1 Private READWRITE
8730a670 ( 5) 7ffdf 7ffdf 1 Private READWRITE
(If you are debugging from user mode, then you can use
!vadump but the output format is different.)
Each of these is a candidate TEB.
In practice, TEBs tend to be allocated at the high end of memory,
so the ones with a low start value are probably
Therefore, you should investigate these candidates in reverse order.
For each candidate, take the start address and append
(Each page on x86 is 4KB, which conveniently maps to 1000 in hex.)
Dump the first seven
pointers of the TEB with the dp xxxxx000 L7
dp xxxxx000 L7
1: kd> dp 7ffdf000 L7
7ffdf000 0016fbb0 00170000 0016b000 00000000
7ffdf010 00001e00 00000000 7ffdf000 ← hit
If the TEB is valid, then the seventh pointer points back
to the start of the TEB.
In a valid TEB,
the second and third values are the
stack limits; in this case, the candidate stack lives between
0016b000 and 00170000.
(As a double-check, you can verify that the upper limit of the
stack, 00170000 in this case, matches up with
the end of a VAD allocation in the !vad output above.)
Now that you know where the stack is, you can dps it
look for EBP frames.
(I usually start about two to four pages below the upper limit of the stack.)
Test out each candidate EBP frame with the k= command
until you find one that seems to be solid.
Record this candidate stack trace in a text file for further study.
Repeat for each candidate TEB, and you will eventually reconstruct
what each thread in the process was doing at the moment it was
If you're really lucky, you might even see the code that incremented
the reference count
but was terminated before it could release it.
The above discussion also applies to debugging 64-bit processes.
However, instead of looking for
1 Private READWRITE pages, you want to look for
2 Private READWRITE pages.
As an additional wrinkle, if you are debugging ia64, then converting
a page frame to a linear address is sadly not as simple as appending
Pages on ia64 are 8KB, not 4KB, so you need to shift the value left
by 25 bits: Add three zeroes and then multiply by two.
2 Private READWRITE
And finally, if you are debugging a 32-bit process on x64,
then you want to look for 3 Private READWRITE pages,
but add 2 before appending the three zeroes.
That's because the TEB for a 32-bit process on x64 is really two
TEBs glued together: A 64-bit TEB followed by a 32-bit TEB.
3 Private READWRITE
I did not come up with this debugging technique on my own.
I learned it from an even greater debugging genius.
Next time, we'll look at debugging this issue from a user-mode
The informal term for these terminated-but-not-yet-completely-destroyed
threads is ghost threads.
The term was coined by the Exchange support team,
because they often have to study server failures
that require them to do this type of investigation,
and they needed a cute name for it.