Chapter three of a beginner’s guide to debugging with NDISKD

In Part 1 of the series, we set up a kernel debugger. In the second installment, we took a closer look at ndiskd’s output for miniports. Today, we will use what we know to debug an actual network issue.

The symptoms are thus:

  1. The network seems to work ok – you can download files fine.
  2. When you open Task Manager, most tabs work ok.
  3. As soon as you switch to the Network tab in Task Manager, the whole Task Manager window locks up.
  4. Task Manager can’t be closed, even with taskkill /f.

The fourth symptom suggests the problem is in kernel mode. The third symptom tells us to start the investigation with the network stack. The first symptom hints that the datapath is fine, and PNP state is probably also fine as well.

While taskmgr.exe is hung, we look for its UI thread in the debugger. The !process and .thread commands can find the stuck thread.

kd> !process 0 2 taskmgr.exe
PROCESS fffffa800b3a2360
    SessionId: 1  Cid: 087c    Peb: 7fffffdf000  ParentCid: 0354
    DirBase: 14e05000  ObjectTable: fffff8a0015869f0  HandleCount: 127.
    Image: taskmgr.exe
        THREAD fffffa800bafaa60  Cid 087c.0884  Teb: 000007fffffdd000 Win32Thread: fffff900c061ac30 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 2
            fffffa800bafad38  Semaphore Limit 0x2
kd> .thread fffffa800bafaa60
Implicit thread is now fffffa80`0bafaa60
kd> kn
  *** Stack trace for last set context - .thread/.cxr resets it
# Child-SP          RetAddr           Call Site
00 fffff880`02bdbbe0 fffff800`0307f052 nt!KiSwapContext+0x7a
01 fffff880`02bdbd20 fffff800`030811af nt!KiCommitThreadWait+0x1d2
02 fffff880`02bdbdb0 fffff800`03056920 nt!KeWaitForSingleObject+0x19f
03 fffff880`02bdbe50 fffff800`03054969 nt!KiSuspendThread+0x54
04 fffff880`02bdbe90 fffff800`0307f25d nt!KiDeliverApc+0x211
05 fffff880`02bdbf10 fffff800`030811af nt!KiCommitThreadWait+0x3dd
06 fffff880`02bdbfa0 fffff800`03514711 nt!KeWaitForSingleObject+0x19f
07 fffff880`02bdc040 fffff880`014231e2 nt!VerifierKeWaitForSingleObject+0x151
08 fffff880`02bdc0c0 fffff880`01424a53 ndis!ndisQuerySetMiniportEx+0x162
09 fffff880`02bdc130 fffff880`01420769 ndis!ndisIfGetMiniportStatistics+0x163
0a fffff880`02bdc360 fffff880`014219e3 ndis!ndisIfQueryObject+0x389
0b fffff880`02bdc4f0 fffff880`0142261d ndis!ndisNsiGetInterfaceRodInformation+0x1c3
0c fffff880`02bdc590 fffff880`0153453d ndis!ndisNsiGetAllInterfaceInformation+0x42e
0d fffff880`02bdc650 fffff880`038029d6 NETIO!NsiGetAllParametersEx+0x44d
0e fffff880`02bdc750 fffff880`03804902 nsiproxy!NsippGetAllParameters+0x2b2
0f fffff880`02bdc940 fffff880`038049db nsiproxy!NsippDispatchDeviceControl+0x8a
10 fffff880`02bdc980 fffff800`0351dc16 nsiproxy!NsippDispatch+0x4b
11 fffff880`02bdc9b0 fffff800`033903a7 nt!IovCallDriver+0x566

Notice that the thread is waiting on something. The wait came from ndisQuerySetMiniportEx in frame 08. Hmm, it looks like NDIS is waiting for an OID request to come back. Let’s check out the miniport. Using what we learned last time, we’ll run !ndiskd.miniport once to get a list of all miniports.

kd> !ndiskd.miniport
    MiniDriver         Miniport            Name                                _
    fffffa800bd34560   fffffa800a4ac1a0    RAS Async Adapter
    fffffa800acf72d0   fffffa800acd31a0    WAN Miniport (SSTP)
    fffffa800acf5ce0   fffffa800ad0a1a0    WAN Miniport (PPTP)
    fffffa800acf1ce0   fffffa800ad0c1a0    WAN Miniport (PPPOE)
    fffffa800aceb120   fffffa800ad081a0    WAN Miniport (IPv6)
    fffffa800aceb120   fffffa800ad061a0    WAN Miniport (IP)
    fffffa800aceb120   fffffa800ad041a0    WAN Miniport (Network Monitor)
    fffffa800acdc6e0   fffffa800acce1a0    WAN Miniport (L2TP)
    fffffa800acba6e0   fffffa800ad001a0    MAC Bridge Miniport
    fffffa800ac99ab0   fffffa800ac9c1a0    WAN Miniport (IKEv2)
    fffffa800ac75020   fffffa800ac771a0    Microsoft Virtual Machine バス ネットワーク アダプター
    fffffa800ac4a500   fffffa800ac5e1a0    Teredo Tunneling Pseudo-Interface
    fffffa800ac4a500   fffffa800ac5c1a0    Microsoft ISATAP Adapter #6
    fffffa800ac4a500   fffffa800ac5a1a0    Microsoft ISATAP Adapter #5
    fffffa800ac4a500   fffffa800ac581a0    Microsoft ISATAP Adapter #2
    fffffa800ac4a500   fffffa800ac531a0    Microsoft ISATAP Adapter
    fffffa800ac4a500   fffffa800ac511a0    Microsoft 6to4 Adapter #2
    fffffa800ac4a500   fffffa800ac4e1a0    Microsoft 6to4 Adapter

That’s a lot of miniports – where do we start? Well, usually a machine only has one or two miniports that are actually used for real network traffic. In my case, that’d be the VM NIC. But we are lucky – we don’t need to guess. Dump the hung thread with kv to see function parameters.

kd> kv
  *** Stack trace for last set context - .thread/.cxr resets it
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`02bdc0c0 fffff880`01424a53 : fffffa80`0ac771a0 fffffa80`0b2f3980 fffffa80`0bd3a0d8 00000000`00000000 : ndis!ndisQuerySetMiniportEx+0x162

(It’s a little hard to read the verbose stack dump; it helps if you have a wide screen). Notice that the first “Args to Child” is fffffa80`0ac771a0 – this value is familiar since it’s the same as the NDIS miniport handle for the VM NIC. Now we are confident the problem is that the VM NIC isn’t completing the OID. Let’s look at it.

kd> !ndiskd.miniport fffffa800ac771a0
<snip>
STATE
    Miniport           Running
    Device PnP         Started
    Datapath           Normal
    Interface          Up
    Media              Connected
    Power              D0
    References         0n26
    User Handles       2
    Total Resets       0
    Pending OID        None
    Flags              2c452000
        ↑ NOT_BUS_MASTER, DEFAULT_PORT_ACTIVATED, SUPPORTS_MEDIA_SENSE,
        DOES_NOT_DO_LOOPBACK, MEDIA_CONNECTED
    PnPFlags           00210000          ← RECEIVED_START, HARDWARE_DEVICE

Wait a minute – the status output shows that the Pending OID is None. (It’s even printed in green lettering, and green means “good”!) So our theory was wrong – the miniport did not lose an OID request.

Maybe the OID request is stuck somewhere else? Let’s start walking up the binding stack.

BINDINGS
    Filter List        Filter              Filter Driver      Context          _
    WFP LightWeight Filter-0000
                       fffffa800b2f3980    fffffa800ab13840   fffffa800b2f44f0
    Buggy Filter Driver 1-0000
                       fffffa800b2e1c80    fffffa800ac1fc60   fffffa800b2e3d70

The WFP filter is a standard filter that comes with Windows. For now, let’s assume it is not the guilty party. We can get more details of the other filter instance by clicking on its NDIS Filter Instance handle. (In this example, the value of the handle is fffffa800b2e1c80.)

kd> !ndiskd.filter fffffa800b2e1c80
FILTER
    Microsoft Virtual Machine バス ネットワーク アダプター-Buggy Filter Driver 1-0000
    Ndis Handle        fffffa800b2e1c80
    Filter Driver      fffffa800ac1fc60 - Buggy Filter Driver 1
    Driver Context     fffffa800b2e3d70
    Miniport           fffffa800ac771a0 - Microsoft Virtual Machine バス ネットワーク アダプター
    State              Running
    Datapath           Normal
    References         1
    Flags              00000810          ← RUNNING, PROCESSING_REQUEST
    Pending OID        OID_IP4_OFFLOAD_STATS
    Higher Filter      fffffa800b2f3980 - Microsoft Virtual Machine バス ネットワーク アダプター-WFP LightWeight Filter-0000
    Lower Filter       [None]
     → Driver handlers

Ah-ha, this filter instance has a pending OID. Now our new theory is that this filter is not completing the OID in a timely manner, which is causing other OIDs to languish in the OID queue. (Recall that a filter instance or miniport adapter instance can only have at most one outstanding OID request – all others are automatically queued by NDIS until the pending request is completed). We can confirm our theory by looking at the topmost filter to see its OID queue.

kd> !ndiskd.filter fffffa800b2f3980
FILTER
    Microsoft Virtual Machine バス ネットワーク アダプター-WFP LightWeight Filter-0000
    Ndis Handle        fffffa800b2f3980
    Filter Driver      fffffa800ab13840 - WFP LightWeight Filter
    Driver Context     fffffa800b2f44f0
    Miniport           fffffa800ac771a0 - Microsoft Virtual Machine バス ネットワーク アダプター
    State              Running
    Datapath           Normal
    References         1
    Flags              00004810
        ↑ RUNNING, PROCESSING_REQUEST, IS_MANDATORY
    Pending OID        OID_IP4_OFFLOAD_STATS
    More OIDs are queued  → Show queued OIDs
    Higher Filter      [None]
    Lower Filter       fffffa800b2e1c80 - Microsoft Virtual Machine バス ネットワーク アダプター-Buggy Filter Driver 1-0000
     → Driver handlers

Indeed, the topmost filter (the WFP filter) is also processing an OID_IP4_OFFLOAD_STATS request, which is the same as the OID stuck in the other filter. Furthermore, we see that “More OIDs are queued”. Let’s click the “Show queued OIDs” link to see the full queue.

kd> !ndiskd.oid
ALL PENDING OIDs
    Filter             fffffa800b2e1c80 - Microsoft Virtual Machine バス ネットワーク アダプター-Buggy Filter Driver 1-0000
        Current OID        OID_IP4_OFFLOAD_STATS
    Filter             fffffa800b2f3980 - Microsoft Virtual Machine バス ネットワーク アダプター-WFP LightWeight Filter-0000
        Current OID        OID_IP4_OFFLOAD_STATS
        Queued OIDs        OID_GEN_STATISTICS
                           OID_GEN_STATISTICS
                           OID_GEN_STATISTICS
                           OID_802_3_MULTICAST_LIST
                           OID_GEN_STATISTICS
                           OID_GEN_NETWORK_LAYER_ADDRESSES
                           OID_GEN_STATISTICS
                           OID_GEN_STATISTICS
                           OID_GEN_STATISTICS
                           OID_GEN_XMIT_OK
                           OID_802_3_CURRENT_ADDRESS

Yikes, it’s no wonder that usermode is all clogged up. There are about a dozen OID requests queued behind the OID_IP4_OFFLOAD_STATS request, which in turn is stuck in a buggy filter driver.

We’ve found the culprit filter driver, and the next step is to code review its OID request handler, especially for OID_IP4_OFFLOAD_STATS, to see if there are any bugs that result in the driver forgetting to complete the OID. Meanwhile, we have one last ndiskd tip for you today.

We actually could have saved some debugging steps with the !ndiskd.oid command. This command, when run without parameters, scans the entire system for any pending or queued OID requests on any miniport or filter instance. Since it’s easy to scan the system, it’s not usually necessary to manually check each miniport or filter one-by-one. As always, you can find several other useful commands, like !ndiskd.oid, on the !ndiskd.help menu.