My name is Trey Nash and I am an Escalation Engineer on the Core OS team. My experience is as a software developer, and therefore, my blog posts tend to be slanted in the direction of helping developers during the feature development, testing, and the support phases.
Windbg is definitely a feature-rich debugger. Sometimes, reading the debugger help during idle time can provide some great insight into the capabilities of the debugger. However, the debugger help comes up short when you ask questions such as, “command huh-huh sure is cool, but when would I ever want to do that?!?” Besides, if you’re caught reading the windbg help in your spare time, you may be on the receiving end of some ridicule form those in your social circle.
In this post, I would like to speak a bit about the !search command, among others, and when you would want to use it. Additionally, I’ll be demonstrating some related techniques germane to when you would use !search in the first place.
Not long ago, I was working with a dump from a machine that was hung and it was my job to find out why. After applying many of the techniques in our hang dump blog post, I discovered that there was a thread in particular that was stuck, which I show below:
0: kd> !thread fe016330THREAD fe016330 Cid 0004.02e0 Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable fcf1a698 SynchronizationEvent fe0163a8 NotificationTimerIRP List: fcb47650: (0006,01d8) Flags: 00000404 Mdl: 00000000Not impersonatingDeviceMap e18008e8Owning Process fe790648 Image: SystemAttached Process N/A Image: N/AWait Start TickCount 75337682 Ticks: 145 (0:00:00:02.265)Context Switch Count 31848752 UserTime 00:00:00.000KernelTime 00:25:38.000Start Address Treyresearch (0xf45629e0)Stack Init f50e1000 Current f50e05e8 Base f50e1000 Limit f50de000 Call 0Priority 15 BasePriority 15 PriorityDecrement 0ChildEBP RetAddr Args to Child f50e0600 e103d5b1 fe016330 fe0163d8 00000000 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])f50e062c e103df9e fe016330 fd0321f8 00000000 nt!KiSwapThread+0x2e5 (FPO: [0,7,0])f50e0674 e101e05b fcf1a698 0000001b 00000000 nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])f50e06b0 e102e00a e3faf6e0 f50e0900 00000000 nt!ExpWaitForResource+0xd5 (FPO: [0,5,4])f50e06d0 f5a988cb fd0321f8 00000001 f50e08e4 nt!ExAcquireResourceExclusiveLite+0x8d (FPO: [2,3,0])f50e06e0 f5ad81c4 f50e0900 e3faf6e0 00000001 Ntfs!NtfsAcquirePagingResourceExclusive+0x20 (FPO: [3,0,0])f50e08e4 f5ad8909 f50e0900 fcb47650 fdcc3020 Ntfs!NtfsCommonCleanup+0x193 (FPO: [SEH])f50e0a54 e1040153 fe00d718 fcb47650 fcb47650 Ntfs!NtfsFsdCleanup+0xcf (FPO: [SEH])f50e0a68 f5b4fd28 fddbc818 fe6d1a28 00000000 nt!IofCallDriver+0x45 (FPO: [0,0,4])f50e0a94 e1040153 fdcc3020 fcb47650 fcb47650 fltmgr!FltpDispatch+0x152 (FPO: [2,6,0])f50e0aa8 f5b4fb25 fdde0cb0 fcb47650 fdd8dc18 nt!IofCallDriver+0x45 (FPO: [0,0,4])f50e0acc f5b4fcf5 f50e0aec fdde0cb0 00000000 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x20b (FPO: [3,4,4])f50e0b04 e1040153 fdde0cb0 fcb47650 fcb47650 fltmgr!FltpDispatch+0x11f (FPO: [2,6,0])f50e0b18 e112ec0a fdfd9bd8 fe774730 fdfd9bf0 nt!IofCallDriver+0x45 (FPO: [0,0,4])f50e0b48 e112b6af fe790648 fdde0cb0 00010003 nt!IopCloseFile+0x2ae (FPO: [5,7,0])f50e0b78 e112b852 fe790648 00000001 fe774730 nt!ObpDecrementHandleCount+0xcc (FPO: [4,2,4])f50e0ba0 e112b776 e1802e48 fdfd9bf0 00006e54 nt!ObpCloseHandleTableEntry+0x131 (FPO: [5,1,0])f50e0be4 e112b7c1 00006e54 00000000 f50e0c00 nt!ObpCloseHandle+0x82 (FPO: [2,7,4])f50e0bf4 e1033bdf 00006e54 f50e0cfc e103b00c nt!NtClose+0x1b (FPO: [1,0,0])f50e0bf4 e103b00c 00006e54 f50e0cfc e103b00c nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ f50e0c00)f50e0c70 f4562119 00006e54 00030000 00000068 nt!ZwClose+0x11 (FPO: [1,0,0])WARNING: Stack unwind information not available. Following frames may be wrong.f50e0cfc f456229f f50e0d34 f50e0d2c f4577f50 Treyresearch+0x11119f50e0d38 f45626f9 fe016330 fc825368 00000000 Treyresearch+0x1129ff50e0d70 f45629ae f1ed8000 00002000 00000000 Treyresearch+0x116f9f50e0d90 f4562ba3 fc825318 fde59b38 00000003 Treyresearch+0x119aef50e0dac e1120833 f4577e20 00000000 00000000 Treyresearch+0x11ba3f50e0ddc e103fe9f f45629e0 f4577e20 00000000 nt!PspSystemThreadStartup+0x2e (FPO: [SEH])00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x16
Note: Eagle-eye readers may have noticed that the debugger states every frame in the above thread uses frame pointer optimization (FPO). This is a bug in version 6.11.0001.402 of the debugger.
I have highlighted the interesting bits above. It seems that this thread is some sort of worker thread, probably created by the Treyresearch driver. It is doing some work that includes closing a particular file. In the process of closing the file, NTFS wants to acquire the paging resource for this particular file, and that is where this thread gets stuck.
What is the paging resource? Many file systems have a per-file lock that one acquires when performing paging I/O such that other destabilizing activity cannot occur at the same time as a paging operation. The paging resource for the file is this lock.
To further illustrate the paging resource, let’s check out the file in question. One handy things that you can do is follow the stack down to where you see the most recent call to nt!IofCallDriver. You can see in the MSDN documentation that IoCallDriver accepts two parameters, a DEVICE_OBJECT* and an IRP*. However, nt!IofCallDriver is a fastcall function, so you cannot find its parameters on the stack. But since you know that nt!IofCallDriver is calling a driver dispatch routine, and since driver dispatch routines have the same prototype as IoCallDriver, you can easily find the IRP in question which I have highlighted in the Ntfs!NtfsFsdCleanup frame of the thread’s stack above and dumped out below:
0: kd> !irp fcb47650Irp is active with 10 stacks 10 is current (= 0xfcb47804) No Mdl: No System Buffer: Thread fe016330: Irp stack trace. cmd flg cl Device File Completion-Context [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000>[ 12, 0] 0 0 fe00d718 fdfd9bf0 00000000-00000000 \FileSystem\Ntfs Args: 00000000 00000000 00000000 00000000
And from the IRP above, we can find the real file that the thread above is trying to acquire the paging lock for:
0: kd> !fileobj fdfd9bf0\Program Files\Treyresearch\Treyresearch.dataDevice Object: 0xfe6da738 \Driver\FtdiskVpb: 0xfe791818Access: Read Write Delete SharedRead SharedWrite SharedDelete Flags: 0x43062 Synchronous IO Sequential Only Cache Supported Modified Size Changed Handle CreatedFile Object is currently busy and has 0 waiters.FsContext: 0xe3faf7a8 FsContext2: 0xe3faf8f0Private Cache Map: 0xfccf1fa0CurrentByteOffset: 6400164Cache Data: Section Object Pointers: fc956f3c Shared Cache Map: fccf1ec8 File Offset: 6400164 Vacb: fe77bd80 Your data is at: cbe80164
The file object contains two fields named FsContext and FsContext2 shown above. These fields are for the file system to store file system specific information. Most file systems would store the paging resource in these context fields somewhere. For example, NTFS uses FsContext to hold the stream control block (SCB) and you can surmise that somewhere down in the SCB is where NTFS stores the paging resource. (It’s actually more complicated than that, but that’s good enough for sake of this discussion)
Now, let’s take a look at the paging resource itself. You can see from the documentation of ExAcquireResourceExclusiveLite, the first parameter is an ERESOURCE and I have highlighted it in our thread stack above. Given that, we can use the !locks command to get a better idea of what’s going on:
0: kd> !locks -v fd0321f8Resource @ 0xfd0321f8 Shared 1 owning threads Contention Count = 2 NumberOfSharedWaiters = 1 NumberOfExclusiveWaiters = 1 Threads: fe77f1e0-07<*> ß This thread is the owner THREAD fe77f1e0 Cid 0004.0064 Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable fc825320 NotificationEvent Not impersonating DeviceMap e18008e8 Owning Process fe790648 Image: System Attached Process fced2d88 Image: store.exe Wait Start TickCount 75281842 Ticks: 55985 (0:00:14:34.765) Context Switch Count 4440231 UserTime 00:00:00.000 KernelTime 00:01:51.171 Start Address nt!MiMappedPageWriter (0xe101962c) Stack Init f6137000 Current f61366ac Base f6137000 Limit f6134000 Call 0 Priority 17 BasePriority 8 PriorityDecrement 0 ChildEBP RetAddr f61366c4 e103d5b1 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4]) f61366f0 e103df9e nt!KiSwapThread+0x2e5 (FPO: [0,7,0]) f6136738 f4562e28 nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])WARNING: Stack unwind information not available. Following frames may be wrong. f6136760 f4563229 Treyresearch+0x11e28 f6136788 f4559d8f Treyresearch+0x12229 f61367e4 f4570b95 Treyresearch+0x8d8f f613684c f4570e39 Treyresearch+0x1fb95 f6136898 f4570f4b Treyresearch+0x1fe39 f61368c4 f5b4cb73 Treyresearch+0x1ff4b f613692c f5b4efc2 fltmgr!FltpPerformPostCallbacks+0x1c5 (FPO: [1,17,4]) f6136940 f5b4f4f1 fltmgr!FltpProcessIoCompletion+0x10 (FPO: [1,0,0]) f6136950 f5b4fb83 fltmgr!FltpPassThroughCompletion+0x89 (FPO: [3,0,4]) f6136980 f5b4fcf5 fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x269 (FPO: [3,4,4]) f61369b8 e1040153 fltmgr!FltpDispatch+0x11f (FPO: [2,6,0]) f61369cc f452b2f8 nt!IofCallDriver+0x45 (FPO: [0,0,4]) f6136a18 f452b6d3 exifs!NtSystemWrite+0x1ff (FPO: [Non-Fpo]) f6136ab0 f452ae1d exifs!IfsInternalWrite+0x1a0 (FPO: [Non-Fpo]) f6136b24 f4549e02 exifs!MRxIfsWrite+0x333 (FPO: [Non-Fpo]) f6136b44 f4541a8e exifs!RxLowIoSubmit+0x180 (FPO: [Uses EBP] [2,2,4]) f6136b54 f45427ed exifs!RxLowIoWriteShell+0x2e (FPO: [1,0,1]) f6136c64 f452fbe3 exifs!RxCommonWrite+0xcc1 (FPO: [Non-Fpo]) f6136cf8 f453dffd exifs!RxFsdCommonDispatch+0x2c4 (FPO: [Non-Fpo]) f6136d24 f452439a exifs!RxFsdDispatch+0x93 (FPO: [Non-Fpo]) f6136d40 e1040153 exifs!MRxIfsFsdDispatch+0x6c (FPO: [Non-Fpo]) f6136d54 e101c5b4 nt!IofCallDriver+0x45 (FPO: [0,0,4]) f6136d68 e101971d nt!IoAsynchronousPageWrite+0xd0 (FPO: [8,0,4]) f6136dac e1120833 nt!MiMappedPageWriter+0x12e (FPO: [1,4,0]) f6136ddc e103fe9f nt!PspSystemThreadStartup+0x2e (FPO: [SEH]) 00000000 00000000 nt!KiThreadStartup+0x16fe78eb40-01 THREAD fe78eb40 Cid 0004.001c Teb: 00000000 Win32Thread: 00000000 WAIT: (Unknown) KernelMode Non-Alertable fcea3890 Semaphore Limit 0x7fffffff fe78ebb8 NotificationTimer Not impersonating DeviceMap e18008e8 Owning Process fe790648 Image: System Attached Process N/A Image: N/A Wait Start TickCount 75337718 Ticks: 109 (0:00:00:01.703) Context Switch Count 3480314 UserTime 00:00:00.000 KernelTime 00:01:35.875 Start Address nt!ExpWorkerThread (0xe102da4b) Stack Init f60ef000 Current f60eebf0 Base f60ef000 Limit f60ec000 Call 0 Priority 14 BasePriority 13 PriorityDecrement 1 ChildEBP RetAddr f60eec08 e103d5b1 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4]) f60eec34 e103df9e nt!KiSwapThread+0x2e5 (FPO: [0,7,0]) f60eec7c e101e05b nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4]) f60eecb8 e1024ba8 nt!ExpWaitForResource+0xd5 (FPO: [0,5,4]) f60eecd8 f5a98915 nt!ExAcquireResourceSharedLite+0xf5 (FPO: [2,3,4]) f60eece8 f5ae198a Ntfs!NtfsAcquirePagingResourceShared+0x20 (FPO: [3,0,0]) f60eed04 e1044997 Ntfs!NtfsAcquireScbForLazyWrite+0x7a (FPO: [2,0,0]) f60eed40 e104328e nt!CcWriteBehind+0x27 (FPO: [0,8,4]) f60eed80 e102db08 nt!CcWorkerThread+0x15a (FPO: [SEH]) f60eedac e1120833 nt!ExpWorkerThread+0xeb (FPO: [1,5,0]) f60eeddc e103fe9f nt!PspSystemThreadStartup+0x2e (FPO: [SEH]) 00000000 00000000 nt!KiThreadStartup+0x16
Threads Waiting On Exclusive Access:
fe016330
Now this is some juicy output. I used the –v option to also expand some of the threads related to this lock. The owner thread is the one with the asterisk (*) next to it and you can see from the expanded thread listing and based on the fact that the function at the bottom of the stack is nt!MiMappedPageWriter, that the thread in question is the mapped page writer. This thread is a system thread that periodically sweeps through a list of dirty pages flushing them out to disk. Interestingly, the mapped page writer has acquired the ERESOURCE seven times. That is shown by the -07 next to the owner thread in the above output. The second thread is waiting for shared access and it is a system file cache thread. And finally, the third thread is our initial hung thread that is waiting on exclusive access.
As a sanity check, let’s make sure that the ERESOURCE and the file in question are related. Previously, I stated that in the NTFS file system the FsContext field of the file object contains an SCB. Let’s pass that pointer to !pool and get some more information about it:
0: kd> !pool 0xe3faf7a8 2Pool page e3faf7a8 region is Paged pool*e3faf6d8 size: 330 previous size: 20 (Allocated) *Ntff Pooltag Ntff : FCB_DATA, Binary : ntfs.sys
Now, we can use the search command (s), to search the pool memory above and see if our ERESOURCE is in there. If so, that would satisfy our sanity check:
0: kd> s -d e3faf6d8 L 330/4 fd0321f8e3faf72c fd0321f8 0c9013aa 01c9969b 42002f46 .!..........F/.Be3faf7b4 fd0321f8 06410000 00000000 06400164 .!....A.....d.@.
Now that we are satisfied that we have matched up the ERESOURCE with the file that owns it, let’s move on. At first glance of the mapped writer thread, it looks like the offending entity is exifs. After all, it’s the most interesting component on the mapped page writer stack. But don’t be fooled. What you see in the mapped page writer stack is a snapshot of what it was doing when the dump was taken, and that’s not necessarily the work that caused things to go bad in the first place. Even though exifs is a file system, it is not NTFS. And we know an NTFS file’s paging resource is locked. Keep in mind that the mapped page writer is processing a list. So the items on the list that have caused the contention may have long been taken off the list and processed.
So what do you do? Unfortunately, the badness happened some time ago. We don’t have a stack to look at to show who did this and when. But what we can do is perform a search of memory to see if there are any references to the ERESOURCE elsewhere in memory. If we find some hits, maybe they will shed some more light on what is going on. So, let’s go ahead and do that:
0: kd> !search fd0321f8Searching PFNs in range 0000000B - 000DFFF9 for [FFFFFFFFFD0321F8 - FFFFFFFFFD0321F8]Pfn Offset Hit Va Pte - - - - - - - - - - - - - - - - - - - - - - - - - - -00007AB7 000004E8 FD0321F8 FCAB74E8 C03F2ADC fcab74e0+0x8 : Ntfr -- ERESOURCE00007CDF 00000950 FD0321F8 FCCDF950 C03F337C fccdf938+0x18 : NpFc -- CCB, client control block - Process: fdd702480000803A 0000022C FD03A1F8 FD03A22C C03F40E8 fd03a220+0xc : Vad -- Mm virtual address descriptors0000977F 0000003C FD0321F8 FE77F03C C03F9DFC fe77f000+0x3c : MmWe -- Work entries for writing out modified filesystem pages.0000977F 00000514 FD0321F8 FE77F514 C03F9DFC fe77f4d8+0x3c : MmWe -- Work entries for writing out modified filesystem pages.0000977F 000005BC FD0321F8 FE77F5BC C03F9DFC fe77f580+0x3c : MmWe -- Work entries for writing out modified filesystem pages.0000977F 00000904 FD0321F8 FE77F904 C03F9DFC fe77f8c8+0x3c : MmWe -- Work entries for writing out modified filesystem pages.0000977F 00000BA4 FD0321F8 FE77FBA4 C03F9DFC fe77fb68+0x3c : MmWe -- Work entries for writing out modified filesystem pages.0000978A 00000324 FD0321F8 FE78A324 C03F9E28 fe78a2e8+0x3c : MmWe -- Work entries for writing out modified filesystem pages.0000978A 0000051C FD0321F8 FE78A51C C03F9E28 fe78a4e0+0x3c : MmWe -- Work entries for writing out modified filesystem pages.00060B7B 0000072C FD0321F8 E3FAF72C C038FEBC e3faf6d8+0x54 : Ntff -- FCB_DATA00060B7B 000007B4 FD0321F8 E3FAF7B4 C038FEBC e3faf6d8+0xdc : Ntff -- FCB_DATA000D9653 00000638 FD0321F8 F50E0638 C03D4380 000D9653 00000694 FD0321F8 F50E0694 C03D4380 000D9653 000006D8 FD0321F8 F50E06D8 C03D4380 000DE415 0000054C F50321F8 F503254C C03D40C8 000DFBF4 00000C40 FD0321F8 F60EEC40 C03D83B8 000DFBF4 00000C9C FD0321F8 F60EEC9C C03D83B8 000DFBF4 00000CE0 FD0321F8 F60EECE0 C03D83B8 Search done.
One thing to note is that there are some hits that are not exactly what we were looking for. That’s because !search also looks for values that are one bit off from what you requested. Check out the help for how you can adjust this behavior. Also, !search performs some extra work along the way. If it notices that the virtual address found is in the pool, it displays information about that pool entry.
Do you spot the curiosity? Remember that the ERESOURCE had been acquired seven times. Correspondingly, there are seven hits in the !search list with the MmWe tag! And not surprisingly, the description of that tag pulled from pooltag.txt file in the triage directory where the debugger is installed reveals that these are paging work entries. Now we’re on to something.
Note: Incidentally, if you want to determine where a virtual address in the list above resides, you can always pass it to !address. The addresses at the end of the !seach output are often addresses on some thread’s stack. If you pass those addresses to !thread, it is smart enough to find the thread that is associated with that stack and display it for you.
“OK, but how do I find the real culprit?”, you then say. Well, you have to continue to dig with what you have. Unfortunately, the contents of the pool entries with MmWe tags are not documented, although, we definitely know what they are. Let’s take a look at one of them using !pool:
0: kd> !pool FE77F03CPool page fe77f03c region is Nonpaged pool*fe77f000 size: a8 previous size: 0 (Allocated) *MmWe Pooltag MmWe : Work entries for writing out modified filesystem pages., Binary : nt!mm
Now, we see where the pool entry starts and how big the entry is. So, let’s take a look at the contents of the memory:
0: kd> dps fe77f000 L a8/@$ptrsizefe77f000 0a150000fe77f004 65576d4dfe77f008 fe78a4e8fe77f00c e10b3af0 nt!MmMappedPageWriterListfe77f010 06140000fe77f014 00000000fe77f018 06150000fe77f01c 00000000fe77f020 fcc20788fe77f024 00000000fe77f028 e10b3f20 nt!MmMappedFileHeaderfe77f02c 00000000fe77f030 00000000fe77f034 fdfd9bf0fe77f038 fdc6f008fe77f03c fd0321f8fe77f040 00000000fe77f044 00000000fe77f048 00000000fe77f04c 0002005cfe77f050 00000000fe77f054 f1d20000fe77f058 00000000fe77f05c 00010000fe77f060 00000000fe77f064 0007036cfe77f068 000cb59dfe77f06c 000a97fefe77f070 0000d7effe77f074 00021c90fe77f078 0005d6b1fe77f07c 000a5642fe77f080 0004d5c3fe77f084 000ae354fe77f088 00038249fe77f08c 000509eafe77f090 0009c915fe77f094 00018dd6fe77f098 000b94d7fe77f09c 0006ca58fe77f0a0 00091e29fe77f0a4 00000000
I used the dps command so that it would check to see if any of the values matched to any known symbols. As you can see, there are a couple of symbols in there. Notice that I divided the size of the block by the pseudo register $ptrsize as well. The symbols that dps found validate that this block of memory is associated with the mapped page writer.
You may also ask yourself, “are any of these values pointers to other pool blocks?” If you wanted to know that, you could iterate over each one of them passing them to !pool or !address. That sounds tedious to do manually. But thankfully, the debugger has some nice command tokens such as .foreach that make this a breeze. If you want to pass each of the above values to !pool, you can perform the following in the debugger:
.foreach /pS 1 /ps 1 ( value { dp /c 1 fe77f000 L a8/@$ptrsize } ) { .if( value != 0 ) {.printf "**** %p ****\n", ${value}; !pool ${value} 0x2} }
The address highlighted above is the address of the pool block revealed by the previous !pool command. a8 is the size of the block in bytes and since dp lists the memory in units of pointer size, I divide a8 by $ptrsize.
I won’t show the full output here, because it’s rather verbose. But I have duplicated the output from the dps command above with added pool tags next to items that are pool entries:
0: kd> dps fe77f000 L a8/@$ptrsizefe77f000 0a150000fe77f004 65576d4dfe77f008 fe78a4e8 Pooltag MmWe : Work entries for writing out modified filesystem pages., Binary : nt!mmfe77f00c e10b3af0 nt!MmMappedPageWriterListfe77f010 06140000fe77f014 00000000fe77f018 06150000fe77f01c 00000000fe77f020 fcc20788 Pooltag Irp : Io, IRP packetsfe77f024 00000000fe77f028 e10b3f20 nt!MmMappedFileHeaderfe77f02c 00000000fe77f030 00000000fe77f034 fdfd9bf0 Pooltag File : File objectsfe77f038 fdc6f008 Pooltag MmCa : Mm control areas for mapped files, Binary : nt!mmfe77f03c fd0321f8 Pooltag Ntfr : ERESOURCE, Binary : ntfs.sysfe77f040 00000000fe77f044 00000000fe77f048 00000000fe77f04c 0002005c<snip>
Sure enough, the file object above is identical to the file we identified earlier on as the one that the worker thread was attempting to close a handle to. Also, you can see that our ERESOURCE is in there as well. And even more, now we have an IRP that may reveal even more information. Let’s see:
0: kd> !irp fcc20788 1Irp is active with 10 stacks 12 is current (= 00000000) Mdl=fe77f048: No System Buffer: Thread fd089e6c: Irp is completed. Pending has been returnedFlags = 00000003ThreadListEntry.Flink = fcc20798ThreadListEntry.Blink = fcc20798IoStatus.Status = 00000000IoStatus.Information = 00010000RequestorMode = 00000000Cancel = 00CancelIrql = 0ApcEnvironment = 00UserIosb = fe77f018UserEvent = 00000000Overlay.AsynchronousParameters.UserApcRoutine = e101c95bOverlay.AsynchronousParameters.UserApcContext = fe77f008Overlay.AllocationSize = 00000000 - 00000000CancelRoutine = 00000000 UserBuffer = 00000000&Tail.Overlay.DeviceQueueEntry = fcc207c8Tail.Overlay.Thread = fd089e6cTail.Overlay.AuxiliaryBuffer = e101cb86Tail.Overlay.ListEntry.Flink = 00000000Tail.Overlay.ListEntry.Blink = 00000000Tail.Overlay.CurrentStackLocation = 00000000Tail.Overlay.OriginalFileObject = 00000000Tail.Apc = 00300012Tail.CompletionKey = 00300012 cmd flg cl Device File Completion-Context [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 0, 0] 0 0 00000000 00000000 00000000-00000000 Args: 00000000 00000000 00000000 00000000 [ 4, 0] 0 0 fe755ab8 00000000 f5ce857e-fe709df8 \Driver\Disk PartMgr!PmIoCompletion Args: 00000000 00000000 00000000 0000000c [ 4, 0] 0 0 fe709df8 00000000 f5c16558-fe6da7f0 \Driver\PartMgr ftdisk!FtpRefCountCompletionRoutine Args: 00000000 00000000 00000000 0000000c [ 4, 0] 0 0 fe6da738 00000000 f5bc0638-fe6d6a80 \Driver\Ftdisk volsnap!VspRefCountCompletionRoutine Args: 00000000 00000000 00000000 0000000c [ 4, 0] 0 0 fe6d69c8 00000000 f5d0af28-00000000 \Driver\VolSnap vsp Args: 00000000 00000000 00000000 0000000c [ 4, 0] 0 0 fe6d66c0 00000000 f5a9351c-fde38cd0 \Driver\VSP Ntfs!NtfsSingleAsyncCompletionRoutine Args: 00000000 00000000 00000000 0000000c [ 4, 0] 0 0 fe00d718 00000000 00000000-00000000 \FileSystem\Ntfs Args: 00000000 00000000 00000000 00000000
What is apparent from looking at this IRP is that it is flagged as completed. Moreover, the information in the IoStatus fields looks to be relevant as well. But after studying the situation a little deeper, it appeared that the completion routine had never been fired. We can find out more about the completion routine by dumping out the Tail.Apc portion of the IRP as shown below:
0: kd> dt nt!_IRP fcc20788 Tail.Apc. +0x040 Tail : +0x000 Apc : +0x000 Type : 0x12 '' +0x001 SpareByte0 : 0 '' +0x002 Size : 0x30 '0' +0x003 SpareByte1 : 0 '' +0x004 SpareLong0 : 0 +0x008 Thread : 0xfe77f1e0 _KTHREAD +0x00c ApcListEntry : _LIST_ENTRY [ 0xfcde8ce4 - 0xfd089e6c ] +0x014 KernelRoutine : 0xe101cb86 void nt!IopCompletePageWrite+0 +0x018 RundownRoutine : (null) +0x01c NormalRoutine : (null) +0x020 NormalContext : (null) +0x024 SystemArgument1 : (null) +0x028 SystemArgument2 : (null) +0x02c ApcStateIndex : 0 '' +0x02d ApcMode : 0 '' +0x02e Inserted : 0x1 ''
Recall from the rules of IRP processing on Windows that IRPS like these have their completion routines called within the thread context that initiated the I/O. As you can see above, that thread is the same thread that is running nt!MiMappedPageWriter. Moreover, the Inserted flag is set above, which means that the APC has been placed in the APC queue for the thread. Deductive reasoning would imply that if the completion routine has not run, then the APC has not been delivered. And the APC will not be delivered if normal kernel mode APCs are disabled at the moment. So, let’s check on that by looking in the nt!KTHREAD structure:
0: kd> dt nt!_KTHREAD 0xfe77f1e0 KernelApcDisable +0x070 KernelApcDisable : -2
Sure enough, kernel APCs are disabled for this thread at the moment. How can that be? Well, there are several ways to disable normal kernel APC delivery and it often involves either directly or indirectly entering a critical or guarded region. Critical regions are entered directly via KeEnterCriticalRegion and guarded regions are entered via KeEnterGuardedRegion. However, there are several means of indirectly entering critical regions including FsRtlEnterFileSystem. Additionally, holding a mutex object automatically places the holder in a critical region.
Therefore, the root cause in this case was that the file system drivers appear to have put the thread into a state where it cannot receive APCs and, therefore, I/O initiated on that thread could not be completed. The APCs build up in the queue so that they can be delivered when kernel APC delivery is re-enabled. Incidentally, the documentation for FsRtlEnterFileSystem states that file system filter drivers should never disable normal kernel APCs across calls to IoCallDriver.
Many times, when it looks like you are hitting up against a brick wall in determining what went wrong in a dump, you can get a lot further than you initially expect with a little bit of intuition and the right tools. Of course, this intuition will grow as you become more and more familiar with the Windows operating system internals, or whatever platform you work on. Using !search to search physical memory in the dump file can help find references (a.k.a pointers) to objects in hard to find places. Additionally, pool tag information along with the helpful text in the pooltag.txt file displayed by !pool goes a long way when it comes to figuring out what a particular pool block is used for. Armed with all of these tools, you can always get farther than one may initially expect.
Happy debugging everyone!
"The example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious. No association with any real company, organization, product, domain name, email address, logo, person, places, or events is intended or should be inferred."
Recently I was working with a customer who had a file server experiencing high CPU in the WMIprvse.exe process. We received multiple user dumps and noted that someone or something was running the same query again and again. We needed to figure out what was running the query in a tight loop, causing the high CPU.
Figure 1 - Task Manager on FileServer
Before we get into the exact troubleshooting steps, let me provide some background on WMI. Winmgmt is the WMI service within a SVCHOST process running under the LocalSystem account. In all cases, the WMI service automatically starts when the first management application or script requests connection to a WMI namespace. For more information, see Starting and Stopping the WMI Service. To avoid stopping all the services when a provider fails, each provider is loaded into a separate host process named "Wmiprvse.exe". This also allows each instance of Wmiprvse to run under a different account with varying security. For more details you can look at the MSDN documentation on WMI.
I dumped out all the services in the various svchost.exe processes. You can do this from a command prompt by running the tasklist /svc command. In my instance, I found that the WinMgmt service was running in svchost, PID 452 (PID number will vary). Someone had to be making RPC calls to this svchost.exe process to run the WMI queries. It could be some local process on the machine; it could even be a process on a remote machine.
At this point I requested user dumps of PID 452 from the customer. This would allow me to determine who was making the RPC calls to svchost.exe to run the WMI queries. While the customer was uploading the dumps, we decided to get a Network Monitor trace to see if the RPC calls were coming over the network.
Immediately, I could see a lot of RPC traffic to the svchost.exe process(PID=452).
Figure 2 - Network Monitor Output from the FileServer. Notice the Source and destination ports and IP addresses. IP addresses are hidden by the aliases
Looking at the RPC payload, I could see the text of the WMI query. You can see this in the Hex Details Pane. The query that was running in a loop was “Select * from Win32_Process”. Looks like I found the source of the WMI queries.
At this point, we got the source IP for the RPC packets. We logged into the machine, and brought up the Task Manager.
Figure 3 - Task Manager on Remote Machine(Machine1)
Immediately we saw that there was some script running inside a Wscript.exe process. At this point I was pretty sure that this script was the culprit. The customer was not sure what this was, and was not comfortable terminating the process. To prove my suspicion, I had him open a command prompt and run the following command, netstat –ano.
Figure 4 - Netstat output from Remote Machine
From the output in Fig. 4, I could see a TCP connection created by PID 3532 (wscript.exe). Looking at the local and foreign addresses from the above output, they matched up exactly to what we were seeing in the Network Monitor trace.
In the above case, we already had our suspicions on the wscript.exe process; however, sometimes it might not be that easy. In that case, we could have used the netstat output to look at all connections to the file server (157.59.123.121). If there were multiple connections, then we can also narrow it down by the port number. Based on that, we could have found the PID responsible.
The customer called me later in the day, and told me that they had recently updated their scripts. One of their scripts had a bug which was running WMI scripts in a tight loop. Fixing the script caused the problem to go away.
Had the query being coming from a local process, I would have had to debug the svchost.exe process, and figure out who was making the WMI calls. However, since we could see the traffic on netmon, we didn’t need to use the debugger. Interesting way to get to the root of the problem without using a debugger!
Hi, my name is Anurag Sarin, and I'm an Escalation Engineer from the Platforms Global Escalation Services 24x7 Team, located in Bangalore, India.
I recently worked a case dealing with a component that reported an exception. But it was not critical enough to crash the system and it was simply logged in a file. Because the root cause of the problem was not yet known, we did not have a deterministic way to reproduce the exception. The exception would just come and go. So catching the exception became a challenge. Do we take a dump? How do we break into the debugger to catch the exception when it occurs? Manual dumps wouldn't help because we didn't know when the exception would occur, and even if we did, it happened too fast to use Ctrl + Scroll Lock + Scroll Lock.
A similar challenge is when you want to start debugging just after a module loads without knowing when the module load will happen. You can run into this while reverse engineering code or while live debugging. You may want to save all debugging efforts until a particular module gets loaded. Sometimes you just want to know if a module ever gets loaded in a particular code path.
In these situations, the ‘sx*’ command comes to our rescue. Below you will find a couple of nuggets to assist with these types of scenarios.
Let's say Windows Media Player reports “RPC Server is unavailable” when a user clicks "File\Open" and attempts to browse to a remote computer. In this situation the ‘sxe’ command is your friend. This command ‘sets an exception’ from the debugger. When this is set the target breaks into the debugger before any other error handlers are activated
By the way, RPC_S_SERVER_UNAVAILABLE is win32 error 1722 (0x06ba) with the description “RPC Server is unavailable” and is a common networking issue.
This is how WinError.h in Windows SDK documents it // // MessageId: RPC_S_SERVER_UNAVAILABLE // // MessageText: // // The RPC server is unavailable. // #define RPC_S_SERVER_UNAVAILABLE 1722L
This is how WinError.h in Windows SDK documents it
//
// MessageId: RPC_S_SERVER_UNAVAILABLE
// MessageText:
// The RPC server is unavailable.
#define RPC_S_SERVER_UNAVAILABLE 1722L
Here are the steps to catch the “RPC Server is unavailable” exception.
1. Attach the debugger to the process with the RPC_S_SERVER_UNAVAILABLE error. In our example we’ll attach to wmplayer.exe.
2. Break into debugger to set or verify the symbol path.
3. Reload the symbols.
.reload
4. Clear any previously set exceptions.
sdx *
5. Set the exception on 1722 (0x06ba) i.e. RPC_S_SERVER_UNAVAILABLE “RPC Server is unavailable”.
sxe 06ba
6. Let the process run until we hit the exception. You may hit a different exception before the one you’re looking for in which case you hit 'g' to continue execution.
g
7. This is the output from the debugger on a Vista RTM machine.
(b2c.f28): Unknown exception - code 000006ba (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=0601e47c ebx=00000000 ecx=00000000 edx=049cd488 esi=000006ba edi=0601e5a4
eip=76dbb09e esp=0601e47c ebp=0601e4cc iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
kernel32!RaiseException+0x58:
76dbb09e c9 leave
0:036> kv100
ChildEBP RetAddr Args to Child
0601e4cc 76ea0911 000006ba 00000001 00000000 kernel32!RaiseException+0x58 (FPO: [4,20,0])
0601e4e4 76ea08e9 000006ba 0601e9dc 76f2e07d RPCRT4!RpcpRaiseException+0x1e (FPO: [0,0,0])
0601e4f0 76f2e07d 000006ba f61cdc85 05545938 RPCRT4!RpcRaiseException+0x16
0601e9dc 760a9e12 760a8908 760a95ea 0601e9fc RPCRT4!NdrClientCall2+0xd51
0601e9f4 760a9e5b 0553fee0 00000064 0601ea54 NETAPI32!NetrWkstaGetInfo+0x19 (FPO: [3,1,0])
0601ea40 75106f02 0553fee0 00000064 0601ea54 NETAPI32!NetWkstaGetInfo+0x3b (FPO: [SEH]) ßNetWkstaGetInfo called for request on information on workstation.
0601ea5c 75106f86 0553fee0 0601ea84 0601eb5c ntlanman!ServerExists+0x19 (FPO: [2,2,0])
0601ea98 75e24ede 0601ec00 05545938 0601ec44 ntlanman!NPDfsLmGetResourceInformation+0x177 (FPO: [4,8,4])
0601eab0 75e23bb9 055939f0 caa0f4a6 00000000 MPR!CGetResourceInformation::TestProvider+0x1a (FPO: [1,0,0])
0601eb08 75e23ac7 caa0f4ea 0601ec44 0601eb5c MPR!CRoutedOperation::GetResult+0xf6 (FPO: [SEH])
0601eb44 75e23a79 0601ec54 0601eb90 75e24f0d MPR!CMprOperation::Perform+0x65 (FPO: [SEH])
0601eb50 75e24f0d 00000001 75e24f5c 000000a4 MPR!CRoutedOperation::Perform+0x22 (FPO: [1,0,4])
0601eb90 76f5978a 0601ec00 05545938 0601ec44 MPR!WNetGetResourceInformationW+0x26 (FPO: [4,13,0])
0601ebb4 76f597f5 0601ec00 00000000 00000000 SHELL32!SHWNetGetResourceInformationAlloc+0x40 (FPO: [4,0,4])
0601ebd8 76f63d75 0601ec00 0601ec54 0601ec44 SHELL32!CNetFolder::_GetResourceInformationAlloc+0xf4 (FPO: [5,0,4])
0601f278 76f63bc3 001d019a 05560288 0553db08 SHELL32!CNetFolder::_ParseUNCName+0x1a7 (FPO: [7,414,4])
0601f4c4 76fe2c3e 0553db50 001d019a 05560288 SHELL32!CNetFolder::ParseDisplayName+0x123 (FPO: [7,135,4])
0601f52c 6de11f8c 05547c94 001d019a 05560288 SHELL32!CRegFolder::ParseDisplayName+0x90 (FPO: [7,14,4])
0601f5b0 6de11e0d 001d019a 05560288 0553db08 NetworkExplorer!CNetworkExplorerFolder::_IsValidServer+0x9f (FPO: [5,21,4])
0601f5e0 76fe58ab 0553da98 001d019a 05560288 NetworkExplorer!CNetworkExplorerFolder::ParseDisplayName+0x68 (FPO: [7,2,4])
0601f608 76fe2d13 0553da98 049d0008 001d019a SHELL32!CDesktopFolder::_ChildParseDisplayName+0x22 (FPO: [8,0,4])
0601f654 76fe2c3e 0553da98 001d019a 049d0008 SHELL32!CDesktopFolder::ParseDisplayName+0x8c (FPO: [7,6,4])
0601f6bc 76fe2b91 0559544c 001d019a 05560288 SHELL32!CRegFolder::ParseDisplayName+0x90 (FPO: [7,14,4])
0601f6f4 76c8c0f3 00000000 30c10000 0601f788 SHELL32!SHParseDisplayName+0x98 (FPO: [5,2,4])
0601f734 76c8c03a 0601f994 0601f788 0601f780 COMDLG32!CAsyncParser::_ParseFromDesktop+0x73 (FPO: [3,7,0])
0601fba0 76c808d8 0556b850 00000000 00000000 COMDLG32!CAsyncParser::_ParseOneItem+0x2e1 (FPO: [2,275,4])
0601fbbc 76c80bc6 0601fc40 77c1b132 0555b088 COMDLG32!CAsyncParser::_Parse+0x4a (FPO: [0,1,0])
0601fbc4 77c1b132 0555b088 00000000 00000000 COMDLG32!CAsyncParser::s_ThreadProc+0xd (FPO: [1,0,0])
0601fc40 76de3833 0023c72c 0601fc8c 77b1a9bd SHLWAPI!WrapperThreadProc+0x10f (FPO: [1,25,4])
0601fc4c 77b1a9bd 0023c72c 0601b824 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
0601fc8c 00000000 77c1b09a 0023c72c 00000000 ntdll!_RtlUserThreadStart+0x23 (FPO: [SEH])
8. After you break into the stack for the 6ba exception, you might want to get a dump using the following command.
0:000> .dump /ma c:\dump1.dmp
Creating c:\dump2.dmp - mini user dump used
Dump successfully written
9. Now that we have a dump in the exception code path, it might be a good idea to tell the debugger to stop breaking for this exception.
sxd 06ba
Now the debugger does not break for this exception.
Note: To detach the debugger from the process do a ‘.detach’. If you just close the debugger it will exit the process too.
Looking at the top of the stack, we see that wmplayer sent a request for a share access and failed due to (what looks like a network problem). I say this as we see that MRP invokes ntlanman (Windows Redirector for Network Shares) which further request information on workstation.name ( NetWkstaGetInfo ) on the network (a Remote RPC call RPCRT4!NdrClientCall2). The next step should be to take a network trace and investigate if there are any networking problems on the network.
And it would make root cause determination much easier if you knew the user was attempting to connect to a nonexistent machine name (\\<Bad Machine Name>) while doing the File\Open in Windows Media Player. J
Now let’s talk about a scenario where we want to start debugging when a particular DLL gets loaded. Let’s take the example of mpr.dll loading using the scenario above.
1. Follow steps 1 through 4 mentioned above, to attach to wmplayer.exe
2. In this case we set an exception on ‘module load of MRP.DLL‘.
sxe ld mpr.dll
3. Now let the process run. It will break back into the debugger when the DLL is loaded.
This is the debugger output on a Windows 2008 machine when MPR.DLL gets loaded into the wmplayer process.
0:001> g
ModLoad: 753d0000 753e4000 C:\Windows\system32\MPR.dll
eax=75a3b111 ebx=00000000 ecx=00000000 edx=75a7275c esi=7ffad000 edi=20000000
eip=771a9a94 esp=0291df74 ebp=0291dfb8 iopl=0 nv up ei pl zr na pe nc
ntdll!KiFastSystemCallRet:
771a9a94 c3 ret
0:014> kvn ffff
# Memory ChildEBP RetAddr Args to Child
00 0291df70 771a8764 7717df04 000003d4 ffffffff ntdll!KiFastSystemCallRet (FPO: [0,0,0])
01 4 0291df74 7717df04 000003d4 ffffffff 0291e0a0 ntdll!ZwMapViewOfSection+0xc (FPO: [10,0,0])
02 44 0291dfb8 7717c855 000003d4 0291e0a0 0291e088 ntdll!LdrpMapViewOfDllSection+0x67 (FPO: [5,2,4])
03 108 0291e0c0 7717fab9 01e2f640 0291e100 01e66a58 ntdll!LdrpMapDll+0x417 (FPO: [SEH])
04 274 0291e334 7717610e 01e2f640 72410321 0291e378 ntdll!LdrpLoadImportModule+0x206 (FPO: [4,145,4])
05 4c 0291e380 77176233 01e2f640 01e4f020 0291e3ac ntdll!LdrpHandleOneNewFormatImportDescriptor+0x84 (FPO: [4,10,4])
06 1c 0291e39c 77176248 01e2f640 01e4f020 724102c0 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d (FPO: [3,0,4])
07 7c 0291e418 7717c5ba 01e2f640 01e4f020 75a35de0 ntdll!LdrpWalkImportDescriptor+0x1f2 (FPO: [SEH])
08 288 0291e6a0 77177a52 00000000 01e2f640 0291e994 ntdll!LdrpLoadDll+0x2f1 (FPO: [SEH])
09 284 0291e924 76d031ba 01e2f640 0291e994 0291e954 ntdll!LdrLoadDll+0x22a (FPO: [SEH])
0a 64 0291e988 75a4b34f 0291ea04 00000000 00000008 kernel32!LoadLibraryExW+0x252 (FPO: [SEH])
0b 24 0291e9ac 75a4b239 0291ea04 0291e9d0 0291e9d4 ole32!CClassCache::CDllPathEntry::LoadDll+0xae (FPO: [4,1,4])
0c 30 0291e9dc 75a469fe 0291ea04 0291ecec 0291e9fc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])
0d 24c 0291ec28 75a46913 00000001 0291ecec 0291ec58 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [3,139,4])
0e 48 0291ec70 75a467b4 00000001 00505350 0291ec9c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [3,10,4])
0f 38 0291eca8 75a474c0 0291ecec 00000000 0291f2f4 ole32!CClassCache::GetClassObject+0x30 (FPO: [1,3,4])
10 7c 0291ed24 75a4705b 75b4e428 00000000 0291f2f4 ole32!CServerContextActivator::CreateInstance+0x110 (FPO: [4,25,4])
11 40 0291ed64 75a4772d 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])
12 54 0291edb8 75a4764f 75b4e4ac 00000000 0291f2f4 ole32!CApartmentActivator::CreateInstance+0x112 (FPO: [4,13,4])
13 20 0291edd8 75a47609 75b4e48c 00000001 00000000 ole32!CProcessActivator::CCICallback+0x6d (FPO: [6,1,4])
14 20 0291edf8 75a475ba 75b4e48c 0291f150 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])
15 3c 0291ee34 75a476af 75b4e48c 0291f150 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,4])
16 28 0291ee5c 75a4705b 75b4e48c 00000000 0291f2f4 ole32!CProcessActivator::CreateInstance+0x49 (FPO: [4,1,4])
17 40 0291ee9c 75a47121 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])
18 260 0291f0fc 75a4705b 75b4e4b4 00000000 0291f2f4 ole32!CClientContextActivator::CreateInstance+0xb0 (FPO: [4,6,4])
19 40 0291f13c 75a472c5 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])
1a 7d4 0291f910 75a6e2a2 0291f9f4 00000000 00000017 ole32!ICoCreateInstanceEx+0x403 (FPO: [8,50,4])
1b 60 0291f970 75a6e203 0291f9f4 00000000 00000017 ole32!CComActivator::DoCreateInstance+0xd9 (FPO: [7,11,4])
1c 24 0291f994 75a6e1bc 0291f9f4 00000000 00000017 ole32!CoCreateInstanceEx+0x38 (FPO: [6,0,0])
1d 30 0291f9c4 72751489 0291f9f4 00000000 00000017 ole32!CoCreateInstance+0x37 (FPO: [5,3,4])
1e 44 0291fa08 727550b9 00000000 022cd14c 022cd140 FunDisc!CQueryWorker::CreateProvider+0x1a6 (FPO: [1,7,4])
1f 34 0291fa3c 72755028 0291fa80 00000000 00000000 FunDisc!CQueryWorker::Execute+0x77 (FPO: [1,7,4])
20 18 0291fa54 74892c26 00000000 0291fa80 0057f198 FunDisc!CFunctionInstanceCollectionQuery::Execute+0x66 (FPO: [2,0,4])
21 1c 0291fa70 74892d21 0057f198 00000001 00000000 NetworkItemFactory!CNetworkItemFactory::_StartFDQuery+0x84 (FPO: [4,0,4])
22 30 0291faa0 74892d8b 0057f198 0291fac0 7489478d NetworkItemFactory!CNetworkItemFactory::s_StartFDInMTA+0xa8 (FPO: [1,3,4])
23 c 0291faac 7489478d 0057f198 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x1b (FPO: [1,0,0])
24 14 0291fac0 76d24911 01e4abd0 0291fb0c 7718e4b6 NetworkItemFactory!ThreadHandler+0xf (FPO: [1,0,0])
25 c 0291facc 7718e4b6 01e4abd0 75a3404c 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
26 40 0291fb0c 7718e489 7489477e 01e4abd0 ffffffff ntdll!__RtlUserThreadStart+0x23 (FPO: [SEH])
27 18 0291fb24 00000000 7489477e 01e4abd0 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])
From the stack above, you can clearly see that seven frames below the top is what ‘was’ happening just ‘before’ the dll got loaded. How do I get to the point just ‘after’ the module gets loaded? Well one way is to use ‘gu’ twelve times. Note: The purpose of this is to show you how to use the various exception related commands.
0:014> gu
eax=00000000 ebx=00000000 ecx=0291df74 edx=771a9a94 esi=7ffad000 edi=20000000
eip=771a8764 esp=0291df78 ebp=0291dfb8 iopl=0 nv up ei pl zr na pe nc
ntdll!ZwMapViewOfSection+0xc:
771a8764 c22800 ret 28h
<SNIP>
This is the point just ‘after’ MPR.DLL got loaded.
00 0291e9dc 75a469fe 0291ea04 0291ecec 0291e9fc ole32!CClassCache::CDllPathEntry::Create_rl+0x37 (FPO: [3,3,4])
01 24c 0291ec28 75a46913 00000001 0291ecec 0291ec58 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4 (FPO: [3,139,4])
02 48 0291ec70 75a467b4 00000001 00505350 0291ec9c ole32!CClassCache::GetClassObjectActivator+0x224 (FPO: [3,10,4])
03 38 0291eca8 75a474c0 0291ecec 00000000 0291f2f4 ole32!CClassCache::GetClassObject+0x30 (FPO: [1,3,4])
04 7c 0291ed24 75a4705b 75b4e428 00000000 0291f2f4 ole32!CServerContextActivator::CreateInstance+0x110 (FPO: [4,25,4])
05 40 0291ed64 75a4772d 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])
06 54 0291edb8 75a4764f 75b4e4ac 00000000 0291f2f4 ole32!CApartmentActivator::CreateInstance+0x112 (FPO: [4,13,4])
07 20 0291edd8 75a47609 75b4e48c 00000001 00000000 ole32!CProcessActivator::CCICallback+0x6d (FPO: [6,1,4])
08 20 0291edf8 75a475ba 75b4e48c 0291f150 00000000 ole32!CProcessActivator::AttemptActivation+0x2c (FPO: [7,0,0])
09 3c 0291ee34 75a476af 75b4e48c 0291f150 00000000 ole32!CProcessActivator::ActivateByContext+0x4f (FPO: [6,3,4])
0a 28 0291ee5c 75a4705b 75b4e48c 00000000 0291f2f4 ole32!CProcessActivator::CreateInstance+0x49 (FPO: [4,1,4])
0b 40 0291ee9c 75a47121 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])
0c 260 0291f0fc 75a4705b 75b4e4b4 00000000 0291f2f4 ole32!CClientContextActivator::CreateInstance+0xb0 (FPO: [4,6,4])
0d 40 0291f13c 75a472c5 0291f2f4 00000000 0291f85c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108 (FPO: [3,7,4])
0e 7d4 0291f910 75a6e2a2 0291f9f4 00000000 00000017 ole32!ICoCreateInstanceEx+0x403 (FPO: [8,50,4])
0f 60 0291f970 75a6e203 0291f9f4 00000000 00000017 ole32!CComActivator::DoCreateInstance+0xd9 (FPO: [7,11,4])
10 24 0291f994 75a6e1bc 0291f9f4 00000000 00000017 ole32!CoCreateInstanceEx+0x38 (FPO: [6,0,0])
11 30 0291f9c4 72751489 0291f9f4 00000000 00000017 ole32!CoCreateInstance+0x37 (FPO: [5,3,4])
12 44 0291fa08 727550b9 00000000 022cd14c 022cd140 FunDisc!CQueryWorker::CreateProvider+0x1a6 (FPO: [1,7,4])
13 34 0291fa3c 72755028 0291fa80 00000000 00000000 FunDisc!CQueryWorker::Execute+0x77 (FPO: [1,7,4])
14 18 0291fa54 74892c26 00000000 0291fa80 0057f198 FunDisc!CFunctionInstanceCollectionQuery::Execute+0x66 (FPO: [2,0,4])
15 1c 0291fa70 74892d21 0057f198 00000001 00000000 NetworkItemFactory!CNetworkItemFactory::_StartFDQuery+0x84 (FPO: [4,0,4])
16 30 0291faa0 74892d8b 0057f198 0291fac0 7489478d NetworkItemFactory!CNetworkItemFactory::s_StartFDInMTA+0xa8 (FPO: [1,3,4])
17 c 0291faac 7489478d 0057f198 00000000 00000000 NetworkItemFactory!FDBackgroundThreadHandler+0x1b (FPO: [1,0,0])
18 14 0291fac0 76d24911 01e4abd0 0291fb0c 7718e4b6 NetworkItemFactory!ThreadHandler+0xf (FPO: [1,0,0])
19 c 0291facc 7718e4b6 01e4abd0 75a3404c 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
1a 40 0291fb0c 7718e489 7489477e 01e4abd0 ffffffff ntdll!__RtlUserThreadStart+0x23 (FPO: [SEH])
1b 18 0291fb24 00000000 7489477e 01e4abd0 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])
Disable First Chance Exceptions
Another case where sx* commands are helpful is when you need to disable first change exceptions. Click here for more on the topic.
If you do not want to see a first chance exception in the debugger, you should disable first chance exception handling for the specific exception code. Otherwise, when the first chance exception occurs, you may need to instruct the debugger to pass on the exception to the program to be handled as usual. How do we do this?
This is an example of disabling first chance Access Violation Exception.
sxd av
Let me show you a quick example using the MMC.exe process. On my Windows XP machine, I attached a debugger to the mmc.exe process and added “Computer Management” from File -> ‘Add/Remove Snap in’. The mmc.exe broke into the debugger on a first chance exception every time I clicked “Services and Applications”. I did not want this to happen because this was not the focus of my investigation.
This is what the first chance exception looks like in the debugger.
(d84.cc): Break instruction exception - code 80000003 (first chance)
eax=7ffd7000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c901230 esp=00aeffcc ebp=00aefff4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
ntdll!DbgBreakPoint:
7c901230 cc int 3
BTW, if you type ‘sx’, you get the current exception settings.
0:002> sx
<Snip>
av - Access violation - break - not handled
So I disabled first chance exceptions by using this command.
After running the command, first chance exceptions don’t cause the debugger to break. We only see the notifications.
(d84.d40): Access violation - code c0000005 (first chance)
After disabling first chance exceptions, this is our sx settings.
av - Access violation - second-chance break - not handled
A common challenge we face in ‘day to day’ debugging is how to break on a ‘user mode’ DLL - ‘module load’, while doing a kernel mode debug. Here's an example of setting a break point on the mpr!WNetEnumResource API from kernel mode.
1. You must run !gflag +ksl to enable Kernel Stack Loading in the debugger. Now the kernel will notify the debugger when a user mode module loads occur (normally the kernel debugger is only notified of kernel module load, not user). One caveat is the debugger is only notified on the first load of a particular module, rather than every load of the module. However, this will let you set any breakpoints you're interested in the user mode module with ‘bu’.
kd> !gflag +ksl
New NtGlobalFlag contents: 0x00040000
ksl - Enable loading of kernel debugger symbols
2. Set the process context to your user mode process using ‘.process -r -p <EPROCESS>’. I used mmc.exe in my case .
PROCESS 899d0c78 SessionId: 1 Cid: 0850 Peb: 7ffdf000 ParentCid: 0138
DirBase: 7f7bf160 ObjectTable: 94fedcc8 HandleCount: 345.
Image: mmc.exe
kd> .process -r -p 899d0c78
Implicit process is now 899d0c78
.cache forcedecodeuser done
Loading User Symbols
................................................................
.................
3. Set an exception on module load ‘sxe ld usermode.dll’
kd> sxe ld mpr.dll
kd> sx
ld - Load module - break
(only break for mpr.dll)
This is what the debugger displays when mpr.dll gets loaded into the mmc process.
nt!DbgLoadImageSymbols+0x47:
816497f3 cc int 3
kd> k ffff
Memory ChildEBP RetAddr
8e089b20 81780dcb nt!DbgLoadImageSymbols+0x47
54 8e089b74 8186d735 nt!MiLoadUserSymbols+0x1ed
a4 8e089c18 8185f655 nt!MiMapViewOfImageSection+0x885
70 8e089c88 8185f745 nt!MiMapViewOfSection+0x22a
30 8e089cb8 8185f952 nt!MmMapViewOfSection+0x2a
7c 8e089d34 81692a1a nt!NtMapViewOfSection+0x203
0 8e089d34 77a59a94 nt!KiFastCallEntry+0x12a
0771e050 77a58764 ntdll!KiFastSystemCallRet
4 0771e054 77a2df04 ntdll!ZwMapViewOfSection+0xc
44 0771e098 77a2c855 ntdll!LdrpMapViewOfDllSection+0x67
108 0771e1a0 77a2fab9 ntdll!LdrpMapDll+0x417
274 0771e414 77a2610e ntdll!LdrpLoadImportModule+0x206
4c 0771e460 77a26233 ntdll!LdrpHandleOneNewFormatImportDescriptor+0x84
1c 0771e47c 77a26248 ntdll!LdrpHandleNewFormatImportDescriptors+0x1d
7c 0771e4f8 77a2c5ba ntdll!LdrpWalkImportDescriptor+0x1f2
288 0771e780 77a27a52 ntdll!LdrpLoadDll+0x2f1
284 0771ea04 777131ba ntdll!LdrLoadDll+0x22a
64 0771ea68 7689b34f kernel32!LoadLibraryExW+0x252
24 0771ea8c 7689b239 ole32!CClassCache::CDllPathEntry::LoadDll+0xae
30 0771eabc 768969fe ole32!CClassCache::CDllPathEntry::Create_rl+0x37
24c 0771ed08 76896913 ole32!CClassCache::CClassEntry::CreateDllClassEntry_rl+0xd4
48 0771ed50 768967b4 ole32!CClassCache::GetClassObjectActivator+0x224
38 0771ed88 768974c0 ole32!CClassCache::GetClassObject+0x30
7c 0771ee04 7689705b ole32!CServerContextActivator::CreateInstance+0x110
40 0771ee44 7689772d ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108
54 0771ee98 7689764f ole32!CApartmentActivator::CreateInstance+0x112
20 0771eeb8 76897609 ole32!CProcessActivator::CCICallback+0x6d
20 0771eed8 768975ba ole32!CProcessActivator::AttemptActivation+0x2c
3c 0771ef14 768976af ole32!CProcessActivator::ActivateByContext+0x4f
28 0771ef3c 7689705b ole32!CProcessActivator::CreateInstance+0x49
40 0771ef7c 76897121 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108
260 0771f1dc 7689705b ole32!CClientContextActivator::CreateInstance+0xb0
40 0771f21c 768972c5 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108
7dc 0771f9f8 768be2a2 ole32!ICoCreateInstanceEx+0x403
60 0771fa58 768be203 ole32!CComActivator::DoCreateInstance+0xd9
24 0771fa7c 768be1bc ole32!CoCreateInstanceEx+0x38
30 0771faac 733d1489 ole32!CoCreateInstance+0x37
44 0771faf0 733d50b9 fundisc!CQueryWorker::CreateProvider+0x1a6
34 0771fb24 733d5028 fundisc!CQueryWorker::Execute+0x77
18 0771fb3c 751d2c26 fundisc!CFunctionInstanceCollectionQuery::Execute+0x66
1c 0771fb58 751d2d21 networkitemfactory!CNetworkItemFactory::_StartFDQuery+0x84
30 0771fb88 751d2d8b networkitemfactory!CNetworkItemFactory::s_StartFDInMTA+0xa8
c 0771fb94 751d478d networkitemfactory!FDBackgroundThreadHandler+0x1b
14 0771fba8 77734911 networkitemfactory!ThreadHandler+0xf
c 0771fbb4 77a3e4b6 kernel32!BaseThreadInitThunk+0xe
40 0771fbf4 77a3e489 ntdll!__RtlUserThreadStart+0x23
18 0771fc0c 00000000 ntdll!_RtlUserThreadStart+0x1b
So now we can set an unresolved break point on mpr!WNetEnumResourceW
kd> bu mpr!WNetEnumResourceW
And this is how we hit the break point for mpr!WNetEnumResourceW
Breakpoint 0 hit
mpr!WNetEnumResourceW:
001b:75c83522 6a14 push 14h
kd> !thread
THREAD 8a0bfaf0 Cid 0850.0540 Teb: 7ff49000 Win32Thread: 00000000 RUNNING on processor 0
Not impersonating
DeviceMap 92b291e8
Owning Process 0 Image: <Unknown>
Attached Process 899d0c78 Image: mmc.exe
Wait Start TickCount 678430 Ticks: 0
Context Switch Count 8
UserTime 00:00:00.015
KernelTime 00:00:00.031
Win32 Start Address ntdll!TppWorkerThread (0x77a3dbb0)
Stack Init 8e10e000 Current 8e10dc98 Base 8e10e000 Limit 8e10b000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 2 IoPriority 2 PagePriority 5
070bfd34 75193c14 06db4658 070bfd5c 06ec9c28 mpr!WNetEnumResourceW
070bfd74 75193ee3 00000000 00000000 00000000 fdWNet!CWNetProvider::EnumerateFunc+0x1c3
070bfd84 75193f23 7ff49000 06e4f258 75193ee9 fdWNet!CWNetProviderWorkItem::Enumerate+0x24
070bfd9c 77a18a5c 00000000 7007c2c5 06259e38 fdWNet!CWNetProvider::ThreadProc+0x3a
070bfe00 77a3de3f 01cfc920 06e4f258 7007c3f5 ntdll!RtlpTpWorkCallback+0xbf
070bff30 77734911 06259e30 070bff7c 77a3e4b6 ntdll!TppWorkerThread+0x545
070bff3c 77a3e4b6 06259e30 7007c3b9 00000000 kernel32!BaseThreadInitThunk+0xe
070bff7c 77a3e489 77a3dbb0 06259e30 ffffffff ntdll!__RtlUserThreadStart+0x23
070bff94 00000000 77a3dbb0 06259e30 00000000 ntdll!_RtlUserThreadStart+0x1b
Hope this article helps you navigate around using the sx commands to collect data or debug. Take a look at the Windbg help files for more info on the sx* commands.
We appreciate the comments and feedback sent about the blog. You guys are great! We’ve had comments requesting to make the XPERF videos downloadable. I’m working to move them to a hosting site and will post the link here soon.
Are there any content topics you would like to see discussed on NTDebugging, please drop us a note and we’ll try to address it in a future blog.
Thanks for reading the blog!
Cheers,
Ron Stock