As we saw some time ago, process shutdown is a multi-phase affair. After you call ExitProcess, all the threads are forcibly terminated. After that's done, each DLL is sent a DLL_PROCESS_DETACH notification. You may be debugging a problem with DLL_PROCESS_DETACH handling that suggests that some of those threads were not cleaned up properly. For example, you might assert that a reference count is zero, and you find during process shutdown that this assertion sometimes fires. Maybe you terminated a thread before it got a chance to release its reference? How can you test this theory if the thread is already gone?

It so happens that when all the threads are terminated during the early phase of process shutdown, the kernel is a bit lazy and doesn't free their stacks. It figures, hey, the entire process is going away soon, so the stack memory is going to be cleaned up as part of process termination. (It's sort of the kernel equivalent of not bothering to sweep the floor of a building that's about to be demolished.) You can use this to your advantage by grovelling the stacks that were left behind.

Hey, this is why you get called in to debug the hard stuff, right?

Before continuing, I need to emphasize that this information is for debugging purposes only. The structures and offsets are all implementation details which can change from release to release.

The first step is to identify where all the stacks are. The direct approach is difficult because the stacks can be all different sizes, so it's not easy to pick them out of a line-up. But one thing does come in a consistent size: The TEB.

From the kernel debugger, use the !process command to dump the process you are interested in, and from the header information, extract the VadRoot.

1: kd> !process -1
PROCESS 8731bd40  SessionId: 1  Cid: 0748    Peb: 7ffda000  ParentCid: 0620
    DirBase: 4247b000  ObjectTable: 96f66de0  HandleCount: 104.
    Image: oopsie.exe
    VadRoot 893de570 Vads 124 Clone 0 Private 518. Modified 643. Locked 0.
    DeviceMap 995628c0

Dump this VAD root with the !vad command, and pay attention only to the entries which say 1 Private READWRITE.

1: kd> !vad 893de570
VAD     level      start      end    commit
... ignore everything except "1 Private READWRITE" ...
8730a5f0 ( 6)         50       50         1 Private      READWRITE
9ab0cb40 ( 5)         60       7f         1 Private      READWRITE
893978b0 ( 6)         80       9f         1 Private      READWRITE
87302d30 ( 5)        110      110         1 Private      READWRITE
889693f8 ( 6)        120      121         1 Private      READWRITE
872f3fb8 ( 6)        170      170         1 Private      READWRITE
87089a80 ( 6)        1a0      1a0         1 Private      READWRITE
8cbf1cb0 ( 5)        1c0      1df         1 Private      READWRITE
88c079d0 ( 6)        1e0      1e0         1 Private      READWRITE
9abc33e0 ( 6)        410      48f         1 Private      READWRITE
873173b0 ( 7)        970      970         1 Private      READWRITE
8ca1c158 ( 7)      7ffd5    7ffd5         1 Private      READWRITE
88c02a78 ( 6)      7ffd6    7ffd6         1 Private      READWRITE
872f9298 ( 5)      7ffd7    7ffd7         1 Private      READWRITE
8750d210 ( 7)      7ffd8    7ffd8         1 Private      READWRITE
87075ce8 ( 6)      7ffda    7ffda         1 Private      READWRITE
87215da0 ( 4)      7ffdc    7ffdc         1 Private      READWRITE
872f2200 ( 6)      7ffdd    7ffdd         1 Private      READWRITE
8730a670 ( 5)      7ffdf    7ffdf         1 Private      READWRITE

(If you are debugging from user mode, then you can use !vadump but the output format is different.)

Each of these is a candidate TEB. In practice, TEBs tend to be allocated at the high end of memory, so the ones with a low start value are probably red herrings. Therefore, you should investigate these candidates in reverse order.

For each candidate, take the start address and append three zeroes. (Each page on x86 is 4KB, which conveniently maps to 1000 in hex.) Dump the first seven pointers of the TEB with the dp xxxxx000 L7 command.

1: kd> dp 7ffdf000 L7
7ffdf000  0016fbb0 00170000 0016b000 00000000
7ffdf010  00001e00 00000000 7ffdf000 ← hit

If the TEB is valid, then the seventh pointer points back to the start of the TEB. In a valid TEB, the second and third values are the stack limits; in this case, the candidate stack lives between 0016b000 and 00170000. (As a double-check, you can verify that the upper limit of the stack, 00170000 in this case, matches up with the end of a VAD allocation in the !vad output above.)

Now that you know where the stack is, you can dps it and look for EBP frames. (I usually start about two to four pages below the upper limit of the stack.) Test out each candidate EBP frame with the k= command until you find one that seems to be solid. Record this candidate stack trace in a text file for further study.

Repeat for each candidate TEB, and you will eventually reconstruct what each thread in the process was doing at the moment it was terminated. If you're really lucky, you might even see the code that incremented the reference count but was terminated before it could release it.

The above discussion also applies to debugging 64-bit processes. However, instead of looking for 1 Private READWRITE pages, you want to look for 2 Private READWRITE pages. As an additional wrinkle, if you are debugging ia64, then converting a page frame to a linear address is sadly not as simple as appending three zeroes. Pages on ia64 are 8KB, not 4KB, so you need to shift the value left by 25 bits: Add three zeroes and then multiply by two.

And finally, if you are debugging a 32-bit process on x64, then you want to look for 3 Private READWRITE pages, but add 2 before appending the three zeroes. That's because the TEB for a 32-bit process on x64 is really two TEBs glued together: A 64-bit TEB followed by a 32-bit TEB.

Note: I did not come up with this debugging technique on my own. I learned it from an even greater debugging genius.

Next time, we'll look at debugging this issue from a user-mode debugger.

Trivia: The informal term for these terminated-but-not-yet-completely-destroyed threads is ghost threads. The term was coined by the Exchange support team, because they often have to study server failures that require them to do this type of investigation, and they needed a cute name for it.