Diagnosing device driver problems leading to system freezes
Recently I started having some device driver related problems on my laptop. Sometimes when I would leave my laptop in a standby power state and come back, the system would have strange problems after coming out of standby. IE would hang trying to load a page and then no matter what I did the IE process would not get killed. Opening more IE instances did the same thing. The system would not respond to a restart and the only way of recovering was to do a power reset (remove batteries). Now I knew that to get Windows XP into such a bad state it had to be some device driver related issues. At first I thought its probably some obscure bug related to power management and ignored it as I had important work to do. But then it happened again. Twice isn't a coincidence and I decided that I would investigate it the next time it happened.
So I installed the kernel debuggers by installing the Debugging Tools for Windows package. As I was heading to the excellent Sysinternals site to download their live kernel debugging utility LiveKD to use with the kernel debuggers, what a coincidence that I came across a new blog post by Mark Russinovich which talked about the very same issue of unkillable hung apps. Turns out it happens due to driver bugs causing hung IRPs. Aha so this is exactly what's going in my case and the device driver suspicion of mine turned out to be correct. The interesting thing is which device eventually truned out to be causing the problem as I thought that its likely to be the Wireless Adapter driver as that's certainly one driver being used by IE to download pages. This was disturbing as the Wireless Adapter on my laptop is made by a large company with a good reputation. It wasn't long before it happened again and I fired up windbg using LiveKD -
kd> !process 91c
Searching for Process with Cid == 91c
PROCESS f93d27a0 SessionId: 0 Cid: 091c Peb: 7ffde000 ParentCid: 008c
DirBase: 038d2000 ObjectTable: e2425ad8 HandleCount: 356.
Image: IEXPLORE.EXE
VadRoot fb283f70 Vads 210 Clone 0 Private 2686. Modified 113. Locked 0.
DeviceMap e243d2b0
Token e12e0980
ElapsedTime 00:04:12.463
UserTime 00:00:00.360
KernelTime 00:00:00.280
QuotaPoolUsage[PagedPool] 73840
QuotaPoolUsage[NonPagedPool] 10160
Working Set Sizes (now,min,max) (5922, 50, 345) (23688KB, 200KB, 1380KB)
PeakWorkingSetSize 5936
VirtualSize 105 Mb
PeakVirtualSize 106 Mb
PageFaultCount 6718
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 2761
THREAD f8e69400 Cid 091c.06a4 Teb: 7ffdd000 Win32Thread: e2ac6838 WAIT: (Executive) KernelMode Non-Alertable
ba902140 Mutant - owning thread ffa3da70
IRP List:
f8de6008: (0006,01b4) Flags: 00000070 Mdl: 00000000
Not impersonating
DeviceMap e243d2b0
Owning Process f93d27a0 Image: IEXPLORE.EXE
Wait Start TickCount 22193499 Ticks: 23399 (0:00:03:54.326)
Context Switch Count 1964 LargeStack
UserTime 00:00:00.0280
KernelTime 00:00:00.0290
Start Address kernel32!BaseProcessStartThunk (0x7c810867)
Win32 Start Address 0x00402451
Stack Init ba97c000 Current ba97bbb0 Base ba97c000 Limit ba975000 Call 0
Priority 10 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr
ba97bbc8 804dc0f7 nt!KiSwapContext+0x2e (FPO: [Uses EBP] [0,0,4])
ba97bbd4 804dc143 nt!KiSwapThread+0x46 (FPO: [0,0,0])
ba97bbfc ba8ff4c4 nt!KeWaitForSingleObject+0x1c2 (FPO: [Non-Fpo])
ba97bc14 ba90234c wdmaud!WdmaGrabMutex+0x17 (FPO: [1,0,0])
ba97bc34 804e37f7 wdmaud!SoundDispatch+0x66 (FPO: [Non-Fpo])
ba97bc44 8056a101 nt!IopfCallDriver+0x31 (FPO: [0,0,0])
ba97bc58 80579a8a nt!IopSynchronousServiceTail+0x60 (FPO: [Non-Fpo])
ba97bd00 8057bfa5 nt!IopXxxControlFile+0x611 (FPO: [Non-Fpo])
ba97bd34 804de7ec nt!NtDeviceIoControlFile+0x2a (FPO: [Non-Fpo])
ba97bd34 7c90eb94 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ ba97bd64)
0013b6e8 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
Grabbing the IRP and looking it up shows -
kd> !irp f8de6008
Irp is active with 2 stacks 2 is current (= 0xf8de609c)
No Mdl System buffer = ff901d08 Thread f8e69400: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
>[ e, 0] 5 0 822d5b00 82207028 00000000-00000000
\Driver\wdmaud
Args: 00000064 00000086 001d8000 00000000
Now the wdmaud.sys driver is part of the Audio driver infrastructure. So it turns out the problem is with the audio device drivers not the Wireless Adaptor. I'm no device driver expert but I assumed that the audio drivers are also split into a miniport like model and the real culprit was probably the device specific driver and not the wdmaud.sys. So I went to my laptop manufacturer's site looking for updated audio drivers and installed them. So far I haven't had the problem come back again. No wonder that Windows Server 2003 disables audio hardware acceleration by default and doesn't include drivers for many audio devices. Basically operating system stability depends a lot on good device drivers and it's unfortunate that things like these happen. In Windows Vista drivers are being made more reliable by moving more things into common driver infrastructure and having the device manufaturer write very little device specific code. In fact some things like printer drivers are being moved into user mode.