The most weird cause of failure for a SQL Server clustered instance I’ve ever seen

The most weird cause of failure for a SQL Server clustered instance I’ve ever seen

Rate This
  • Comments 2

I have a customer who has a two node (W2K3 R2 SP2+ x64) failover cluster (MSCS) which are mainly dedicated to host 11 instances of SQL Server 2005 and 7 instances of Analysis Services 2005.

When the total percentage of processor time reached 100% and was pegged there for a few seconds, the Network Interface/Output Queue Length counter for the MS TCP Loopback interface started increasing and any packet routed to that pseudo-interface got hung for as much time as it took the system load to come down a bit.

The problem with that is, as you may know already, that the cluster library for SQL Server resource type implements the IsAlive callback so that it tries to connect to the instance of SQL Server and executes the “SELECT @@SERVERNAME” query. Since the IP of the clustered instance is local, it’s routed to the this loopback interface, and since it isn’t responding because of the situation explained above the cluster service ends up assuming the resource is not operational and restarts it (or eventually fail it over to the other node, depending on how many failures had occurred for the resource in a given period, and its configuration).

While this was occurring over the loopback interface, if we tried to connect to any of the instances of SQL Server from outside the node, so that all traffic went over the public interface, connections were successfully made and the query returned complete resultsets in a timely manner. Therefore, we could say that what the cluster resource library concluded (i.e. resource wasn’t operational) wasn’t true, since clients were still able to connect to the resource and it responded to all queries sooner than later.

It was McAffee’s mini-firewall driver which was causing the Delayed Work Queue threads starvation. Uninstalling this component from the system solved the issue.

Following I’m attaching a screenshot of real time activity captured with Performance Monitor when the problem was seen. In it one can see the relationship between CPU consumption and queueing of network packets targeting any locally bound IP.

image

This is a screenshot showing McAffee’s system drivers that were causing the problem.

image

And, finally, here’s an analysis of the Kernel Dump taken just when the problem occurred.

===== Dump analysis =====

A. Identification of the dump:

2: kd> x srv!srvcomputername

fffffadc`188b2b10 srv!SrvComputerName = struct _UNICODE_STRING "UNDISCLOSED"

2: kd> .time

Debug session time: Sat Jul 31 01:44:30.018 2010 (UTC + 2:00)

System Uptime: 0 days 1:16:34.910

2: kd> lm vmnt

start             end                 module name

fffff800`01000000 fffff800`01497000   nt         (private pdb symbols)  c:\symcache\private\ntkrnlmp.pdb\2AA800A0CDE341E19301E2A6DE72222E2\ntkrnlmp.pdb

    Loaded symbol image file: ntkrnlmp.exe

    Image path: ntkrnlmp.exe

    Image name: ntkrnlmp.exe

    Timestamp:        Tue Feb 16 16:43:02 2010 (4B7ABD06)

    CheckSum:         0046913C

    ImageSize:        00497000

    File version:     5.2.3790.4666

B. From previous analysis it was determined that the loopback adapter posts some of its work (namely moving data from a socket end to the other) as workitem in the delayed workqueue. So first step was to understand how the exqueue looks like. For this purpose !exqueue 26 command can be used (26=delayed work queue with call stack of thread).

This is a sample thread in the delayed queue, all of the others show a similar call stack:

5: kd> .THREAD fffffadc4461bbf0

Implicit thread is now fffffadc`4461bbf0

5: kd> kL

  *** Stack trace for last set context - .thread/.cxr resets it

Child-SP          RetAddr           Call Site

fffffadc`1f1ac990 fffff800`01027682 nt!KiSwapContext+0x85

fffffadc`1f1acb10 fffff800`0102828e nt!KiSwapThread+0x3c9

fffffadc`1f1acb70 fffff800`01047688 nt!KeWaitForSingleObject+0x5a6

fffffadc`1f1acbf0 fffff800`01047709 nt!ExpWaitForResource+0x48

fffffadc`1f1acc60 fffffadc`167a720b nt!ExAcquireResourceExclusiveLite+0x1ab

fffffadc`1f1acc90 fffffadc`167a6c87 mfehidk+0x1220b

fffffadc`1f1accd0 fffff800`010375ca mfehidk+0x11c87

fffffadc`1f1acd00 fffff800`0124a972 nt!ExpWorkerThread+0x13b

fffffadc`1f1acd70 fffff800`01020226 nt!PspSystemThreadStartup+0x3e

fffffadc`1f1acdd0 00000000`00000000 nt!KxStartSystemThread+0x16

!exqueue then shows a gigantic queue of pending workitems (more than 17,000 items). They can't run because of the mentioned threads hung and waiting for a resource. The code that leads to this attempt to acquire a resource is mfehidk.sys.

C. Since a resource is involved in this starvation !locks command can be used to gather more information. !locks command shows a resource exclusively owned by thread fffffadffb9cebf0, which causes other 81 other threads to wait (3 shared waiters+81 exclusive waiters). The resources seems to be a global resource in the mfehidk address range (i.e. owned and defined by mfehidk). Contention count is always very high equal to 122245.

Investigation of the blocking thread shows it is ready and scheduled to run on CPU 5, at priority 11:

2: kd> !ready

Processor 0: No threads in READY state

. . .

Processor 5: Ready Threads at priority 11

    THREAD fffffadffb9cebf0  Cid 295c.2644  Teb: 000007ffffefc000 Win32Thread: 0000000000000000 READY

It's a SQL Server thread.

Looking at its call stack it's clear that it was readied because the object on which it was waiting was signalled (i.e. the resource became free and it was acquired by this thread) but it didn't have a chance to run yet and still into waitforsingleobject call:

2: kd> kL

  *** Stack trace for last set context - .thread/.cxr resets it

Child-SP          RetAddr           Call Site

fffffadc`10eaaf80 fffff800`01027682 nt!KiSwapContext+0x85

fffffadc`10eab100 fffff800`0102828e nt!KiSwapThread+0x3c9

fffffadc`10eab160 fffff800`01047688 nt!KeWaitForSingleObject+0x5a6

fffffadc`10eab1e0 fffff800`01010d64 nt!ExpWaitForResource+0x48

fffffadc`10eab250 fffffadc`167a76f4 nt!ExAcquireResourceSharedLite+0x32e

fffffadc`10eab2a0 fffffadc`16799275 mfehidk+0x126f4

fffffadc`10eab330 fffffadc`1679944b mfehidk+0x4275

fffffadc`10eab370 fffffadc`1d5a4058 mfehidk+0x444b

fffffadc`10eab460 fffffadc`1d5a40ef mfetdik+0x1058

fffffadc`10eab4c0 fffffadc`1d5a48a3 mfetdik+0x10ef

fffffadc`10eab530 fffffadc`1d5a6b7a mfetdik+0x18a3

fffffadc`10eab570 fffffadc`1d5a6c13 mfetdik+0x3b7a

fffffadc`10eab650 fffffadc`1d5a7d55 mfetdik+0x3c13

fffffadc`10eab690 fffffadc`19b9b757 mfetdik!DEVICEDISPATCH::DispatchPassThrough+0x5d

fffffadc`10eab6f0 fffffadc`19bb8b1c afd!AfdFastConnectionSend+0x2d7

fffffadc`10eab790 fffff800`0127ef56 afd!AfdFastIoDeviceControl+0xb1a

fffffadc`10eaba70 fffff800`0127ec36 nt!IopXxxControlFile+0x5a4

fffffadc`10eabb90 fffff800`0102e33d nt!NtDeviceIoControlFile+0x56

fffffadc`10eabc00 00000000`77ef0a5a nt!KiSystemServiceCopyEnd+0x3

00000000`1ad9f018 00000000`00000000 0x77ef0a5a

Processor 5 is executing an high priority thread, 15, belonging to smlogsvc.exe.

2: kd> ~5

5: kd> k

Child-SP          RetAddr           Call Site

00000000`01e9f850 00000000`00000000 0x7ff`5b179414 <<< user mode execution

5: kd> !process -1 0

PROCESS fffffadffb996040

    SessionId: 0  Cid: 3f64    Peb: 7fffffdf000  ParentCid: 0228

    DirBase: 94f4ad000  ObjectTable: fffffa80094c75b0  HandleCount: 327.

    Image: smlogsvc.exe

Summary

=======

- There are a large number of threads running at medium/high priority and they keep the CPUs busy. This is legitimate.

- Apparently from this dump, mfehidk posts a lot of workitems and all of them are synchronized on a single "eresource" object

- This eresource becomes a bottleneck and independently on the number of threads in the delayed work queue they all are serialized when attempting to acquire this resource.

- Additionally much more threads (i.e. not only the ones in the exqueue) are required to synchronize their execution on this "eresource", which causes an enormous contention and exqueue threads waiting on "eresource" object to be released by non exqueue thread.

- This tight bottleneck is not scalable and is the root of the problem

Note: an "eresource" object is a complex sinchronization object that implements the multiple readers/single writer model, which is equivalent to implement a shared or exclusive acquisition.

Recommendation

==============

- Report this analysis to McAfee to check whether they are aware of this component unable to scale well.

- Possibly uninstall McAfee as a test to check whether the problem is still experienced

- Possible work on exclusion, a number of threads involved in the crowded queues belonged to SQL server so if possible to exclude traffic related to this server the impact could be significantly lowered.

Detailed command execution

--------------------------

B.

5: kd> !exqueue 26

Dumping ExWorkerQueue: FFFFF800011CD9C0

**** Delayed WorkQueue( current = 0 maximum = 24 )

THREAD fffffadc4461bbf0  Cid 0004.0038  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc4461bca8  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294068         Ticks: 6 (0:00:00:00.093)

Context Switch Count      194006            

UserTime                  00:00:00.000

KernelTime                00:00:03.796

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1ace00 Current fffffadc1f1ac950

Base fffffadc1f1ad000 Limit fffffadc1f1a7000 Call 0

Priority 12 BasePriority 12 PriorityDecrement 0

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1ac990 fffff800`01027682 : 00000000`000007ff fffffadc`1ed7b180 00000000`0000000c fffffadc`1ee4b180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1acb10 fffff800`0102828e : 0000000a`b30746bc fffff800`011b4dc0 fffffadc`4461bc88 fffffadc`4461bbf0 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1acb70 fffff800`01047688 : 00000000`000000d4 00000000`0000001b fffffadc`1ed7b100 00000000`00000400 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1acbf0 fffff800`01047709 : fffffadc`1ed7b180 00000000`00000000 fffffadf`fbc449c0 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1acc60 fffffadc`167a720b : fffffadc`3d41fc20 fffffadc`167a6c70 fffffadc`4461bbf0 fffffadc`3d3deef8 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1acc90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc449c0 fffff800`011cda18 fffffadc`4461bbf0 : mfehidk+0x1220b

fffffadc`1f1accd0 fffff800`010375ca : 00000000`00000000 fffffadc`1d6f7001 00000000`00000000 fffffadf`fbc449c0 : mfehidk+0x11c87

fffffadc`1f1acd00 fffff800`0124a972 : fffffadc`4461bbf0 00000000`00000080 fffffadc`4461bbf0 fffffadc`1ed93680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1acd70 fffff800`01020226 : fffffadc`1ed8b180 fffffadc`4461bbf0 fffffadc`1ed93680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1acdd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

THREAD fffffadc4461b7a0  Cid 0004.003c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc4461b858  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294071         Ticks: 3 (0:00:00:00.046)

Context Switch Count      222172            

UserTime                  00:00:00.000

KernelTime                00:00:04.265

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1b3e00 Current fffffadc1f1b3950

Base fffffadc1f1b4000 Limit fffffadc1f1ae000 Call 0

Priority 12 BasePriority 12 PriorityDecrement 0

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1b3990 fffff800`01027682 : 00000000`00000000 00000000`00000001 fffff780`00000008 fffffadc`1ee4b180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1b3b10 fffff800`0102828e : 0000000a`b30cf052 00000000`00000000 fffffadc`4461b838 fffffadc`4461b7a0 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1b3b70 fffff800`01047688 : 00000000`000000d7 fffff800`0000001b 00000000`00040000 fffffadc`167b4800 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1b3bf0 fffff800`01047709 : fffffadc`1ee0b180 00000000`00000000 fffffadf`fbc44940 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1b3c60 fffffadc`167a720b : fffffadc`3d7a6930 fffffadc`167a6c70 fffffadc`4461b7a0 fffffadc`3da4d268 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1b3c90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44940 fffff800`011cda18 fffffadc`4461b7a0 : mfehidk+0x1220b

fffffadc`1f1b3cd0 fffff800`010375ca : 00000000`00000000 fffffadc`1d6f7001 00000000`00000000 fffffadf`fbc44940 : mfehidk+0x11c87

fffffadc`1f1b3d00 fffff800`0124a972 : fffffadc`4461b7a0 00000000`00000080 fffffadc`4461b7a0 fffffadc`1ed33680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1b3d70 fffff800`01020226 : fffffadc`1ed2b180 fffffadc`4461b7a0 fffffadc`1ed33680 fffff800`011b4dc0 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1b3dd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

THREAD fffffadc44605040  Cid 0004.0040  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc446050f8  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294073         Ticks: 1 (0:00:00:00.015)

Context Switch Count      183503            

UserTime                  00:00:00.000

KernelTime                00:00:03.750

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1bae00 Current fffffadc1f1ba950

Base fffffadc1f1bb000 Limit fffffadc1f1b5000 Call 0

Priority 13 BasePriority 12 PriorityDecrement 1

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1ba990 fffff800`01027682 : 00000009`acb01000 fffffadc`1edfb180 00000000`00000016 fffffadc`1edeb180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1bab10 fffff800`0102828e : 0000000a`b312e633 fffff800`011b4dc0 fffffadc`446050d8 fffffadc`44605040 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1bab70 fffff800`01047688 : 00000000`000000d9 00000000`0000001b 00000000`00040000 fffffadc`167b4800 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1babf0 fffff800`01047709 : 00000000`00000000 fffffadc`167a6c70 fffffadf`fbc44880 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1bac60 fffffadc`167a720b : fffffadc`3d1a6700 fffffadc`167a6c70 fffffadc`44605040 fffffadc`3cb45828 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1bac90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44880 fffff800`011cda18 fffffadc`44605040 : mfehidk+0x1220b

fffffadc`1f1bacd0 fffff800`010375ca : 00000000`00000000 fffff800`011cc701 00000000`00000000 fffffadf`fbc44880 : mfehidk+0x11c87

fffffadc`1f1bad00 fffff800`0124a972 : fffffadc`44605040 00000000`00000080 fffffadc`44605040 fffffadc`1ee53680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1bad70 fffff800`01020226 : fffffadc`1ee4b180 fffffadc`44605040 fffffadc`1ee53680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1badd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

THREAD fffffadc44605bf0  Cid 0004.0044  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc44605ca8  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294068         Ticks: 6 (0:00:00:00.093)

Context Switch Count      200681            

UserTime                  00:00:00.000

KernelTime                00:00:03.500

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1c1e00 Current fffffadc1f1c1950

Base fffffadc1f1c2000 Limit fffffadc1f1bc000 Call 0

Priority 13 BasePriority 12 PriorityDecrement 1

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1c1990 fffff800`01027682 : fffffadc`1edcb910 fffffadc`1edeb180 00000000`0000000b fffffadc`1ed2b180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1c1b10 fffff800`0102828e : 0000000a`b306fa71 fffff800`011b4dc0 fffffadc`44605c88 fffffadc`44605bf0 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1c1b70 fffff800`01047688 : 00000000`000000d4 fffffadc`0000001b fffffadc`1edeb100 fffffadc`1edeb100 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1c1bf0 fffff800`01047709 : 00000000`00000000 fffffadc`167a6c70 fffffadf`fbc44a00 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1c1c60 fffffadc`167a720b : fffffadc`3d41fc20 fffffadc`167a6c70 fffffadc`44605bf0 fffffadc`3d3deef8 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1c1c90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44a00 fffff800`011cda18 fffffadc`44605bf0 : mfehidk+0x1220b

fffffadc`1f1c1cd0 fffff800`010375ca : 00000000`00000000 fffffadc`1d6f7001 00000000`00000000 fffffadf`fbc44a00 : mfehidk+0x11c87

fffffadc`1f1c1d00 fffff800`0124a972 : fffffadc`44605bf0 00000000`00000080 fffffadc`44605bf0 fffffadc`1edf3680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1c1d70 fffff800`01020226 : fffffadc`1edeb180 fffffadc`44605bf0 fffffadc`1edf3680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1c1dd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

THREAD fffffadc446057a0  Cid 0004.0048  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc44605858  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294071         Ticks: 3 (0:00:00:00.046)

Context Switch Count      233919            

UserTime                  00:00:00.000

KernelTime                00:00:04.171

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1c8e00 Current fffffadc1f1c8950

Base fffffadc1f1c9000 Limit fffffadc1f1c3000 Call 0

Priority 12 BasePriority 12 PriorityDecrement 0

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1c8990 fffff800`01027682 : 00000000`00000000 fffffadc`1ed7b180 00000000`0000000c fffffadc`1ee4b180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1c8b10 fffff800`0102828e : 0000000a`b30d1677 fffff800`011b4dc0 fffffadc`44605838 fffffadc`446057a0 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1c8b70 fffff800`01047688 : 00000000`000000d7 fffffadc`0000001b 00000000`00020000 00000000`00000400 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1c8bf0 fffff800`01047709 : fffffadc`1ed7b180 00000000`00000000 fffffadf`fbc44900 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1c8c60 fffffadc`167a720b : fffffadc`3d7a6930 fffffadc`167a6c70 fffffadc`446057a0 fffffadc`3da4d268 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1c8c90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44900 fffff800`011cda18 fffffadc`446057a0 : mfehidk+0x1220b

fffffadc`1f1c8cd0 fffff800`010375ca : 00000000`00000000 fffffadc`1e4ff501 00000000`00000000 fffffadf`fbc44900 : mfehidk+0x11c87

fffffadc`1f1c8d00 fffff800`0124a972 : fffffadc`446057a0 00000000`00000080 fffffadc`446057a0 fffffadc`1ed93680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1c8d70 fffff800`01020226 : fffffadc`1ed8b180 fffffadc`446057a0 fffffadc`1ed93680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1c8dd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

THREAD fffffadc44604040  Cid 0004.004c  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc446040f8  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294071         Ticks: 3 (0:00:00:00.046)

Context Switch Count      181596            

UserTime                  00:00:00.000

KernelTime                00:00:23.031

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1cfe00 Current fffffadc1f1cf950

Base fffffadc1f1d0000 Limit fffffadc1f1ca000 Call 0

Priority 12 BasePriority 12 PriorityDecrement 0

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1cf990 fffff800`01027682 : 00000000`00000000 fffffadc`1edcb180 00000000`00000013 fffffadc`1ed8b180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1cfb10 fffff800`0102828e : 0000000a`b30d1677 fffff800`011b4dc0 fffffadc`446040d8 fffffadc`44604040 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1cfb70 fffff800`01047688 : 00000000`000000d7 00000000`0000001b fffffadc`1edcb100 fffffadc`167b4800 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1cfbf0 fffff800`01047709 : fffffadc`1edcb180 00000000`00000000 fffffadf`fbc448c0 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1cfc60 fffffadc`167a720b : fffffadc`3d7a6930 fffffadc`167a6c70 fffffadc`44604040 fffffadc`3da4d268 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1cfc90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc448c0 fffff800`011cda18 fffffadc`44604040 : mfehidk+0x1220b

fffffadc`1f1cfcd0 fffff800`010375ca : 00000000`00000000 fffffadc`1d6f7001 00000000`00000000 fffffadf`fbc448c0 : mfehidk+0x11c87

fffffadc`1f1cfd00 fffff800`0124a972 : fffffadc`44604040 00000000`00000080 fffffadc`44604040 fffffadc`1ed33680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1cfd70 fffff800`01020226 : fffffadc`1ed2b180 fffffadc`44604040 fffffadc`1ed33680 fffff800`011b4dc0 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1cfdd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

THREAD fffffadc44604bf0  Cid 0004.0050  Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable

    fffffadc3e9dc0a0  SynchronizationEvent

    fffffadc44604ca8  NotificationTimer

Not impersonating

DeviceMap                 fffffa80000036d0

Owning Process            fffffadc44622040       Image:         System

Attached Process          N/A            Image:         N/A

Wait Start TickCount      294070         Ticks: 4 (0:00:00:00.062)

Context Switch Count      188540            

UserTime                  00:00:00.000

KernelTime                00:00:13.250

Start Address nt!ExpWorkerThread (0xfffff800010039f0)

Stack Init fffffadc1f1d6e00 Current fffffadc1f1d6950

Base fffffadc1f1d7000 Limit fffffadc1f1d1000 Call 0

Priority 12 BasePriority 12 PriorityDecrement 0

Child-SP          RetAddr           : Args to Child                                                           : Call Site

fffffadc`1f1d6990 fffff800`01027682 : 00000000`00000000 fffffadc`1edeb180 00000000`0000000b fffffadc`1ed2b180 : nt!KiSwapContext+0x85 [d:\nt\base\ntos\ke\amd64\ctxswap.asm @ 80]

fffffadc`1f1d6b10 fffff800`0102828e : 0000000a`b30b268e fffff800`011b4dc0 fffffadc`44604c88 fffffadc`44604bf0 : nt!KiSwapThread+0x3c9 [d:\nt\base\ntos\ke\thredsup.c @ 2018]

fffffadc`1f1d6b70 fffff800`01047688 : 00000000`000000d6 00000000`0000001b fffffadc`1f1d6c00 fffffadc`167b4800 : nt!KeWaitForSingleObject+0x5a6 [d:\nt\base\ntos\ke\wait.c @ 1256]

fffffadc`1f1d6bf0 fffff800`01047709 : fffffadc`1ed5b180 00000000`00000000 fffffadf`fbc44980 fffff800`01024d4a : nt!ExpWaitForResource+0x48 [d:\nt\base\ntos\ex\resource.c @ 2904]

fffffadc`1f1d6c60 fffffadc`167a720b : fffffadc`3cadc1c0 fffffadc`167a6c70 fffffadc`44604bf0 fffffadc`3cf49b68 : nt!ExAcquireResourceExclusiveLite+0x1ab [d:\nt\base\ntos\ex\resource.c @ 887]

fffffadc`1f1d6c90 fffffadc`167a6c87 : 00000000`00000001 fffffadf`fbc44980 fffff800`011cda18 fffffadc`44604bf0 : mfehidk+0x1220b

fffffadc`1f1d6cd0 fffff800`010375ca : 00000000`00000000 fffff800`011dde01 00000000`00000000 fffffadf`fbc44980 : mfehidk+0x11c87

fffffadc`1f1d6d00 fffff800`0124a972 : fffffadc`44604bf0 00000000`00000080 fffffadc`44604bf0 fffffadc`1ee53680 : nt!ExpWorkerThread+0x13b [d:\nt\base\ntos\ex\worker.c @ 1186]

fffffadc`1f1d6d70 fffff800`01020226 : fffffadc`1ee4b180 fffffadc`44604bf0 fffffadc`1ee53680 00000000`00000000 : nt!PspSystemThreadStartup+0x3e [d:\nt\base\ntos\ps\create.c @ 2272]

fffffadc`1f1d6dd0 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16 [d:\nt\base\ntos\ke\amd64\threadbg.asm @ 72]

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadc3d2b3440

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadc3d2b3400

. . . tens, hundreds or thousands . . .

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadc3d3b93d0

PENDING: WorkerRoutine TDI!CTEpEventHandler (fffffadc1d6f7010) Parameter fffffadffbdbb7c0

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffdc5df90

. . . tens, hundreds or thousands . . .

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffba51b80

PENDING: WorkerRoutine TDI!CTEpEventHandler (fffffadc1d6f7010) Parameter fffffadc19cfaec0

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffd092940

. . . tens, hundreds or thousands . . .

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb905400

PENDING: WorkerRoutine TDI!CTEpEventHandler (fffffadc1d6f7010) Parameter fffffadfff130420

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb9fbb80

. . . tens, hundreds or thousands . . .

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb667ed0

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb667d90

PENDING: WorkerRoutine nt!CmpDelayDerefKCBWorker (fffff8000125b280) Parameter 0000000000000000

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb667d50

. . . tens, hundreds or thousands . . .

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffbb03650

PENDING: WorkerRoutine nt!PopUserPresentSetWorker (fffff80001252540) Parameter 0000000000000001

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb803a60

. . . tens, hundreds or thousands . . .

PENDING: WorkerRoutine mfehidk+0xfffffadc167a6c70 (0000000000011c70) Parameter fffffadffb702a40

C.

2: kd> !locks

**** DUMP OF ALL RESOURCE OBJECTS ****

KD: Scanning for held locks..........................................................................................................

*** ERROR: Symbol file could not be found.  Defaulted to export symbols for mfehidk.sys -

Resource @ mfehidk!DEVICEDISPATCH::LowerDevicePassThrough (0xfffffadc167c9390)    Shared 1 owning threads

    Contention Count = 122245

    NumberOfSharedWaiters = 3

    NumberOfExclusiveWaiters = 78

     Threads: fffffadffff86bf0-01    fffffadc3cb407a0-01    fffffadc3d2c27a0-01    fffffadffb9cebf0-01<*>

     Threads Waiting On Exclusive Access:

              fffffadc44605bf0<-WKTH fffffadc3d50c720       fffffadc3d53d040       fffffadc4461bbf0<-WKTH      

              fffffadc3d7627a0       fffffadffb8a8bf0       fffffadffc3e6bf0       fffffadfff1e3bf0      

              fffffadc3d5d7bf0       fffffadc3cd64bf0       fffffadc3d1a2880       fffffadffb79a040      

              fffffadffd78cbf0       fffffadc3d56d4c0       fffffadc3d5827a0       fffffadfffdad940      

              fffffadc3d2f0040       fffffadfff122bf0       fffffadc3cbc37a0       fffffadc44604bf0<-WKTH      

              fffffadc3dae2ae0       fffffadc3d766040       fffffadffcb4b780       fffffadffb771040      

              fffffadffb886bf0       fffffadc3d4ffbf0       fffffadc3d12fbf0       fffffadc3cdfb040      

              fffffadc3d56d990       fffffadc3d758bf0       fffffadc3d56fbf0       fffffadc4461b7a0<-WKTH       

              fffffadc446057a0<-WKTH fffffadc44604040<-WKTH fffffadc3dcbbbf0       fffffadc3d5417a0      

              fffffadc3d58d040       fffffadc3d5c3bf0       fffffadffbb0cbf0       fffffadc3d3d3bf0      

              fffffadffb736040       fffffadffb8a7720       fffffadffef7abf0       fffffadffff96190      

              fffffadffddd6040       fffffadc3d537bf0       fffffadffca7e720       fffffadffb883bf0      

              fffffadc3ca87190       fffffadc3cc765e0       fffffadffb6aebf0       fffffadc3cc39040      

              fffffadffb9e9040       fffffadc3cbf5bf0       fffffadffb789940       fffffadc41d66040      

              fffffadffb7cc7a0       fffffadffbc00bf0       fffffadc3d27d6e0       fffffadffb98fbf0      

              fffffadc3d557040       fffffadffba317a0       fffffadc3d762040       fffffadc3d4fcbf0      

              fffffadc3d53b040       fffffadffb850bf0       fffffadc3cd25470       fffffadc3d38ebf0      

              fffffadc3d592b70       fffffadc3d508040       fffffadc3ccc97a0       fffffadc3d037040      

              fffffadc3d3177a0       fffffadc3ca57040       fffffadc3cb787a0       fffffadc3d603040      

              fffffadc44605040<-WKTH fffffadfff063bf0      

KD: Scanning for held locks.................................................................................................................................................................................................................................................................................................................................................................................................................................................................

17714 total locks, 1 locks currently held

Note: <-WKTH marker above comes from identification of threads as belonging to delayed worker threads queue (focus of this analysis).

Leave a Comment
  • Please add 8 and 5 and type the answer here:
  • Post
  • A hidden moral of the above story is:

    Stopping an Anti-Virus Service does not stop that Anti-Virus software's filter drivers. We may think we know what a software vendor did back when such an issue is caught, but what about now ... could it not be that the software vendor has since changed their code? Thus while selectively unloading/disabling filter drivers might be tempting, one also might step on a land mine. In contrast, unnstalling anti-virus software is a far safer and more certain practice (especially when one incorrectly thought stopping a service accomplished the same thing as uninstalling the service's software, including its filter drivers).

  • Excellent post, Nacho!

    I've seen something similar in a different context, on an IIS server where PING responses where 'timing out'. Stop the IIS worker process, and the PING would suddenly come back to life. In that case, the diagnosis was similar, and based on the increasing Output Queue Length counter for the network card, we figured something was wrong with a 3rd party filter driver for a firewall software component of an antivirus - the manufacturer will not be named here. Uninstallation was what we had to do, though it must be said that in that specific case, the network card drivers were very old as well.

Page 1 of 1 (2 items)