• Ntdebugging Blog

    NDIS - Part 1

    • 2 Comments

    Hi, my name Anurag Sarin, I am an escalation engineer in the Platforms Global Escalation Team.  I would like to give some insight on NDIS.

     

    NDIS Introduction

    The Network Driver Interface Specification (NDIS) library abstracts the network hardware from network drivers. NDIS also specifies a standard interface between layered network drivers, thereby abstracting lower-level drivers that manage hardware from upper-level drivers, such as network transports. NDIS also maintains state information and parameters for network drivers, including pointers to functions, handles, and parameter blocks for linkage, and other system values.

    Types of network drivers

    • Miniport drivers
    • Intermediate drivers
    • Filter drivers
    • Protocol drivers

    ndiskd is a good extension for debugging NDIS drivers .The document Debugging NDIS Drivers  has more information  about ndiskd.

     

    To get a list of NDIS protocols drivers in the system, the protocols option in ndiskd can be used.

    This also gives a list of all NDIS open blocks on each protocol driver.  Protocol open block is described in the later section.

     

    A Protocol Driver is represented by a NDIS Protocol Block, which is shown as Protocol below:

     

    1: kd> !ndiskd.protocols

     Protocol 8ef57590: NDISUIO

        Open 8ef61860 - Miniport: 902bbab0 X Network Team #1

     

     Protocol 8f3aea50: TCPIP_WANARP

        Open 8f3ae508 - Miniport: 90287ae8 WAN Miniport (IP)

     

     Protocol 8f220008: TCPIP

        Open 8f21d210 - Miniport: 902bbab0 X Network Team #1

     

     Protocol 9029bbf8: NDPROXY

        Open 901618e0 - Miniport: 9019f130 Direct Parallel

        Open 90161c80 - Miniport: 9019f130 Direct Parallel

        Open 9018bcd0 - Miniport: 901ba130 WAN Miniport (L2TP)

        Open 90171490 - Miniport: 901ba130 WAN Miniport (L2TP)

     

     Protocol 901e3008: RASPPPOE

     

     Protocol 901bb008: NDISWAN

        Open 90216b30 - Miniport: 9019f130 Direct Parallel

        Open 9047e518 - Miniport: 901fdab0 WAN Miniport (PPTP)

        Open 90198c20 - Miniport: 90276ab0 WAN Miniport (PPPOE)

        Open 901989e0 - Miniport: 901ba130 WAN Miniport (L2TP)

     

     Protocol 902de6e0: Y_TEAM

        Open 9028ef10 - Miniport: 9029b5e8 P Gigabit Server Adapter #2

        Open 90198b10 - Miniport: 9029eab0 Q Multifunction Gigabit Server Adapter #2

     

    NDIS filter drivers are represented by Filter Driver Block (s) shown below.

     

    0: kd> !ndiskd.filters

    NDIS Driver verifier level: 0

    NDIS Failed allocations   : 0

     

    Filter Driver Block: 97412e58

      Filter: 97414c10 Z Network Connection-Native WiFi Filter Driver-0000

        Miniport 85d160e8   Z Network Connection

     

    Filter Driver Block: 8797bdb0

      Filter: 97446c10 Z Network Connection - H Miniport-QoS Packet Scheduler-0000

        Miniport 8610b0e8   Z Network Connection - H Miniport

      Filter: 87b1b730 Y Network Adapter - H Miniport-QoS Packet Scheduler-0000

        Miniport 8611c0e8   Y Network Adapter - H Miniport

      Filter: 879c3008 G Network Connection - H Miniport-QoS Packet Scheduler-0000

        Miniport 861150e8   G Network Connection - H Miniport

      Filter: 879c0a50 WAN Miniport (IP) - H Miniport-QoS Packet Scheduler-0000

        Miniport 861240e8   WAN Miniport (IP) - H Miniport

      Filter: 879bb3f8 WAN Miniport (IPv6) - H Miniport-QoS Packet Scheduler-0000

        Miniport 861250e8   WAN Miniport (IPv6) - H Miniport

      Filter: 87981870 WAN Miniport (Network Monitor) - H Miniport-QoS Packet Scheduler-0000

        Miniport 861260e8   WAN Miniport (Network Monitor) - H Miniport

      Filter: 8797f518 Nortel IPSECSHM Adapter - H Miniport-QoS Packet Scheduler-0000

        Miniport 861170e8   Nortel IPSECSHM Adapter - H Miniport

     

    The “miniports” option lists all NDIS miniport drivers represented by a Miniport Driver Block (s).

     

    kd> !ndiskd.miniports

    NDIS Driver verifier level: 0

    NDIS Failed allocations   : 0

    Miniport Driver Block: 885915b8, Version 0.0

      Miniport: 8863a0e8, NetLuidIndex: 1, IfIndex: 7, RAS Async Adapter

    Miniport Driver Block: 88018010, Version 0.0

      Miniport: 8828d488, NetLuidIndex: 1, IfIndex: 3, WAN Miniport (PPTP)

    Miniport Driver Block: 87e535a8, Version 0.0

      Miniport: 88150200, NetLuidIndex: 0, IfIndex: 4, WAN Miniport (PPPOE)

    Miniport Driver Block: 87f63510, Version 0.0

      Miniport: 880ac4b8, NetLuidIndex: 0, IfIndex: 5, WAN Miniport (IPv6)

      Miniport: 880844c0, NetLuidIndex: 3, IfIndex: 6, WAN Miniport (IP)

    Miniport Driver Block: 87f2ccb8, Version 0.0

      Miniport: 88091488, NetLuidIndex: 0, IfIndex: 2, WAN Miniport (L2TP)

    Miniport Driver Block: 8809de60, Version 10.1

      Miniport: 883bf0e8, NetLuidIndex: 10, IfIndex: 15, MY PCI Fast Ethernet Adapter (Emulated) #4

      Miniport: 883be0e8, NetLuidIndex: 5, IfIndex: 11, MY PCI Fast Ethernet Adapter (Emulated) #3

      Miniport: 883bd0e8, NetLuidIndex: 6, IfIndex: 9, MY PCI Fast Ethernet Adapter (Emulated) #2

      Miniport: 883bc0e8, NetLuidIndex: 4, IfIndex: 8, MY PCI Fast Ethernet Adapter (Emulated)

    Miniport Driver Block: 87f77df0, Version 1.0

      Miniport: 87f9c488, NetLuidIndex: 6, IfIndex: 16, isatap.{584AF5A9-63C2-44C5-970D-DB85057F2931}

      Miniport: 87f75488, NetLuidIndex: 4, IfIndex: 14, isatap.fareast.corp.microsoft.com

      Miniport: 87fc6488, NetLuidIndex: 3, IfIndex: 10, isatap.{CCDB4297-B958-4C2A-95A5-29150BD0A371}

     

    The interfaces option lists all Network interfaces

     

    kd> !ndiskd.interfaces

    Interface block 87fb12a8 

     

    <Snip>

      IfIndex: 20, IfType: 6

    Inerface Guid: 5d0bd81a-47c7-11dc-a9d2-0003ff2b6bfa

    Interface block 88101ab0 

     

      IfIndex: 21, IfType: 6

    Inerface Guid: ed1c50d5-ff1f-11db-9b85-0003ff7133d2

    Interface block 87f82ab0 

     

      IfIndex: 15, IfType: 6

    Inerface Guid: f442c036-8bf5-43a6-91fd-6792ef752100

    Interface block 881cd5d8 

     

      IfIndex: 22, IfType: 6

    Inerface Guid: ed1c50d4-ff1f-11db-9b85-9459bf825974

    Interface block 87f313f8 

     

    <Snip>

     

     Interface guids correspond to each network interface. Some interfaces have their information in the in the registry .For Example on my machine Interface Guid: f442c036-8bf5-43a6-91fd-6792ef752100 corresponds to registry:-

     

    HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\{F442C036-8BF5-43A6-91FD-6792EF752100}

     

    NDIS Driver Stack

     

    Basic Stack Configuration

     

    The MSDN web page  NDIS Driver Stack  has  basic documentation on the NDIS stack

     

    For Miniport details, the miniport option can be used.

     

    kd> !ndiskd.miniport 883bc0e8

     

     Miniport 883bc0e8 : MY PCI Fast Ethernet Adapter (Emulated), v5.0

     

        AdapterContext : 8809e000

        Flags          : 2c412008

                         BUS_MASTER, IGNORE_TOKEN_RING_ERRORS, RESOURCES_AVAILABLE

                         SUPPORTS_MEDIA_SENSE, DOES_NOT_DO_LOOPBACK, MEDIA_CONNECTED

        PnPFlags       : 80210000

                         RECEIVED_START, HARDWARE_DEVICE,

        MiniportState        : STATE_RUNNING

        IfIndex                  : 8

        Ndis5MiniportInNdis6Mode : 1

        InternalResetCount    : 0000

        MiniportResetCount    : 0000

        References            : 5

        UserModeOpenReferences: 0

        PnPDeviceState        : PNP_DEVICE_STARTED

        CurrentDevicePowerState : PowerDeviceD0

        Bus PM capabilities

           DeviceD1:            0

           DeviceD2:            0

           WakeFromD0:          0

           WakeFromD1:          0

           WakeFromD2:          0

           WakeFromD3:          0

     

           SystemState          DeviceState

           PowerSystemUnspecified     PowerDeviceUnspecified

           S0                   D0

           S1                   PowerDeviceUnspecified

           S2                   PowerDeviceUnspecified

           S3                   PowerDeviceUnspecified

           S4                   D3

           S5                   D3

           SystemWake: PowerSystemUnspecified

           DeviceWake: PowerDeviceUnspecified

        Current PnP and PM Settings:          : 00000030

                         DISABLE_WAKE_UP, DISABLE_WAKE_ON_RECONNECT,

        Translated Allocated Resources:

            IO Port: 0000e480, Length: 80

            Memory: febfc000, Length: 1000

            Interrupt Level: 10, Vector: 3b

        MediaType      : 802.3

        DeviceObject   : 883bc030, PhysDO : 87ca1030  Next DO: 87ca1030

        MapRegisters   : 00000000

        FirstPendingPkt: 00000000

        DriverVerifyFlags  : 00000000

        Miniport Interrupt : 8809e008

        Miniport version 5.0

        Miniport Filter List:

     Filter  88301c28: FilterDriver  88129300, FilterModuleContext 880d7230  MY PCI Fast Ethernet Adapter (Emulated)-QoS Packet Scheduler-0000

        Miniport Open Block Queue:

          88263278: Protocol 88300830 = RSPNDR, ProtocolBindingContext 87fb7820, v6.0

          882446d0: Protocol 8835a300 = LLTDIO, ProtocolBindingContext 88323310, v6.0

          880d2c58: Protocol 8831e2e0 = TCPIP, ProtocolBindingContext 88329008, v6.0

     

    Current PnP capacities and Power Management Settings are shown by flags and can be one of these values in the Current PnP and PM Settings’ section.

     

    NOT_STOPPABLE       :   The device is not stoppable i.e. ISA

    NOT_REMOVEABLE  :    The device cannot be safely removed

    NOT_SUSPENDABLE :    The device cannot be safely suspended

    DISABLE_PM              :    Disable all Power Management features

    DISABLE_WAKE_UP  :    Disable device waking up the system .This is evident when the user disables Wake-On-LAN (WOL)  feature on the miniport adaptor

    DISABLE_WAKE_ON_RECONNECT: Disable device waking up the -system- due to a cable re-connect

     

    Above , the miniport block 883bc0e8 represents  MY  PCI Fast Ethernet Adapter, i.e. the NIC driver on my machine. The NIC driver has a binding with filter driver MY PCI Fast Ethernet Adapter (Emulated)-QoS Packet Scheduler and protocols drivers RSPNDR, LLTDIO and TCPIP.

     

                       The stack with the debug output above would look somewhat like this.

    To see what all miniport drivers the Protocol driver has bound to - the protocol option can be used.

     

    kd> !protocol 8831e2e0

     Protocol 8831e2e0 : TCPIP

     RootDeviceName is \DEVICE\{B4982B71-0255-4D04-A585-4C339162A25D}

     v6.0       RefCount 5

     

        Open 880d2c58 - Miniport: 883bc0e8 Intel 21140-Based PCI Fast Ethernet Adapter (Emulated)

        Open 881bbc58 - Miniport: 883bd0e8 Intel 21140-Based PCI Fast Ethernet Adapter (Emulated) #2

        Open 881bcc58 - Miniport: 883be0e8 Intel 21140-Based PCI Fast Ethernet Adapter (Emulated) #3

        Open 881c0c58 - Miniport: 883bf0e8 Intel 21140-Based PCI Fast Ethernet Adapter (Emulated) #4

     

     BindAdapterHandlerEx               8e93da28, UnbindAdapterHandlerEx       8e9d7a87

     PnPEventHandler                    8e933b83, UnloadHandler                00000000

     OpenAdapterCompleteEx              8e9d74ff, CloseAdapterCompleteEx       8e9d7722

     SendNetBufferListsCompleteHandler  8e997067, ReceiveNetBufferListsHandler  8e98ff5f

     StatusComplete                     00000000, StatusHandler                8e942cad

     AssociatedMiniDriver 00000000

     

          Flags          : 00000000

     

    This also shows the various handler routines of the Protocol TCPIP Driver routines.

    Un-assembling the routines would verify them further.

     

    kd> u 8e93da28

    tcpip!FlBindAdapter:

    8e93da28 8bff            mov     edi,edi

    8e93da2a 55              push    ebp

    8e93da2b 8bec            mov     ebp,esp

    8e93da2d 83ec1c          sub     esp,1Ch

    8e93da30 53              push    ebx

    8e93da31 56              push    esi

    8e93da32 57              push    edi

    8e93da33 6a06            push    6

     

    kd> u 8e933b83

    tcpip!Fl48PnpEvent:

    8e933b83 8bff            mov     edi,edi

    8e933b85 55              push    ebp

    8e933b86 8bec            mov     ebp,esp

    8e933b88 837d0800        cmp     dword ptr [ebp+8],0

    8e933b8c 7406            je      tcpip!Fl48PnpEvent+0x11 (8e933b94)

    8e933b8e 5d              pop     ebp

    8e933b8f e96c030000      jmp     tcpip!FlPnpEvent (8e933f00)

    8e933b94 8b450c          mov     eax,dword ptr [ebp+0Ch]

     

    NDIS Open Block is a block that represents the binding between a Miniport Driver and a Protocol Driver. So there is one NDIS Open Block per binding between a protocol and a miniport.

    kd> !ndiskd.opens

      Open 885283c0

        Miniport: 8863a0e8 - RAS Async Adapter

        Protocol: 8803be48 -

     

      Open 88263278

        Miniport: 883bc0e8 - MY PCI Fast Ethernet Adapter (Emulated)

        Protocol: 88300830 - RSPNDR

     

      Open 88263628

        Miniport: 883bd0e8 - MY PCI Fast Ethernet Adapter (Emulated) #2

        Protocol: 88300830 - RSPNDR

     

      Open 88124008

        Miniport: 883be0e8 - MY PCI Fast Ethernet Adapter (Emulated) #3

        Protocol: 88300830 - RSPNDR

     

      Open 883276a0

        Miniport: 883bf0e8 - MY PCI Fast Ethernet Adapter (Emulated) #4

        Protocol: 88300830 - RSPNDR

     

      Open 882446d0

        Miniport: 883bc0e8 - MY PCI Fast Ethernet Adapter (Emulated)

        Protocol: 8835a300 - LLTDIO

     

      Open 882fa398

        Miniport: 883bd0e8 - MY PCI Fast Ethernet Adapter (Emulated) #2

        Protocol: 8835a300 - LLTDIO

     

      Open 882d5550

        Miniport: 883be0e8 - MY PCI Fast Ethernet Adapter (Emulated) #3

        Protocol: 8835a300 - LLTDIO

     

      Open 882cf470

        Miniport: 883bf0e8 - MY PCI Fast Ethernet Adapter (Emulated) #4

        Protocol: 8835a300 - LLTDIO

     

      Open 88219968

        Miniport: 880ac4b8 - WAN Miniport (IPv6)

        Protocol: 880f04a0 - WANARPV6

     

      Open 8816e008

        Miniport: 880844c0 - WAN Miniport (IP)

        Protocol: 882e5008 - WANARP

     

      Open 881eec58

        Miniport: 883bd0e8 - MY PCI Fast Ethernet Adapter (Emulated) #2

        Protocol: 88383138 - TCPIP6

     

      Open 881b6c58

        Miniport: 883be0e8 - MY PCI Fast Ethernet Adapter (Emulated) #3

        Protocol: 88383138 - TCPIP6

     

      Open 880d2c58

        Miniport: 883bc0e8 - MY PCI Fast Ethernet Adapter (Emulated)

        Protocol: 8831e2e0 - TCPIP

     

      Open 881bbc58

        Miniport: 883bd0e8 - MY PCI Fast Ethernet Adapter (Emulated) #2

        Protocol: 8831e2e0 - TCPIP

     

      Open 881c2c58

        Miniport: 883bf0e8 - MY PCI Fast Ethernet Adapter (Emulated) #4

        Protocol: 88383138 - TCPIP6

     

      Open 881bcc58

        Miniport: 883be0e8 - MY PCI Fast Ethernet Adapter (Emulated) #3

        Protocol: 8831e2e0 - TCPIP

     

      Open 881c0c58

        Miniport: 883bf0e8 - MY PCI Fast Ethernet Adapter (Emulated) #4

        Protocol: 8831e2e0 - TCPIP

     

      Open 8811a6a0

        Miniport: 87fbb0e8 - isatap.{B4982B71-0255-4D04-A585-4C339162A25D}

        Protocol: 882d8298 - TCPIP6TUNNEL

     

      Open 8811a008

        Miniport: 87fc6488 - isatap.{CCDB4297-B958-4C2A-95A5-29150BD0A371}

        Protocol: 882d8298 - TCPIP6TUNNEL

     

      Open 881f8870

        Miniport: 87f75488 - isatap.fareast.corp.microsoft.com

        Protocol: 882d8298 - TCPIP6TUNNEL

     

      Open 882a4850

        Miniport: 87f9c488 - isatap.{584AF5A9-63C2-44C5-970D-DB85057F2931}

        Protocol: 882d8298 - TCPIP6TUNNEL

     

       Open 881ad960

        Miniport: 8828d488 - WAN Miniport (PPTP)

        Protocol: 8803be48 -

     

       Open 88018c58

        Miniport: 88150200 - WAN Miniport (PPPOE)

        Protocol: 8803be48 -

     

       Open 8831f818

        Miniport: 88091488 - WAN Miniport (L2TP)

        Protocol: 87f1e100 -

     

       Open 8831fc10

        Miniport: 88091488 - WAN Miniport (L2TP)

        Protocol: 87f1e100 -

     

       Open 881425a8

        Miniport: 88091488 - WAN Miniport (L2TP)

        Protocol: 8803be48 -

     

    Use mopen option to see the details of the NDIS Open Block.

     

    kd> !mopen 881bbc58

     Miniport Open Block 881bbc58

        Protocol 8831e2e0 = TCPIP, ProtocolContext 88348008, v6.0

        Miniport 883bd0e8 = MY PCI Fast Ethernet Adapter (Emulated) #2, v5.0

     

        MiniportAdapterContext: 880a1000

        Flags                 : 01000000

                         OPEN_USE_MULTICAST_LIST,

        References            : 1

     

    The ‘References’ section above shows the number of outstanding Input Output Requests. This can be use full to investigate how many requests by a protocol driver are passed to the next lower driver which are currently outstanding.

    Network Data

    Network data consists of packets of data that are sent or received over the network. NDIS provides data structures to describe and organize such data. The primary NDIS 6.0 network data structures include the following:

    ·         NET_BUFFER structures

    ·         NET_BUFFER LIST structures

    ·         NET_BUFFER_LIST_CONTEXT structures

    For NDIS 5.x we have NDIS PACKETS in place of NET_BUFFER structure.

    NDIS_PACKET Structure

    NDIS packets (represented by a NDIS_PACKET  structure) are allocated by a protocol driver, filled with data, and passed to the next lower NDIS driver so that the data can be sent on the network. Some lowest level NIC drivers allocate packets to hold received data and pass the packet up to interested higher-layer drivers. Sometimes, a protocol driver allocates a packet and passes it to a NIC driver with a request that the NIC driver copy received data into the provided packet. NDIS provides functions for allocating and manipulating the substructures that make up a packet. The following figure illustrates a structure of a packet.

    Each NDIS Packet is basically a Packet Descriptor. Each Packet Descriptor has a series of Buffer Descriptors.

    A packet is composed of the following:

    • A packet descriptor that contains private areas for the miniport driver and a protocol driver, a set of flags associated with the packet and whose meaning is defined by a cooperating miniport driver(s) and protocol driver(s), the number of physical pages that contain the packet, the total length of the packet, and a pointer to the first buffer descriptor that maps the first buffer in the packet.
    • A set of buffer descriptors. A buffer descriptor describes the starting virtual address of each buffer, the buffer's byte offset into the page pointed to by the virtual address, the total number of bytes in the buffer and a pointer to the next buffer descriptor, if any.
    • The virtual range, possibly spanning more than one page that makes up the buffer described by the buffer descriptor. These virtual pages map to physical memory.

    The pkt option in ndiskd helps us to see the contents of the NDIS Packet. It has various verbose options:

    Usage: pkt <pointer to packet> <verbosity>

    <verbosity>  can be between 1 to 5.

    1-Packet Private

    2-Packet Extension

    3-Ndis Reference

     4-Buffer List

    5- Data in Packet List

     

    1: kd> !ndiskd.pkt 0x8f3aabf8

    NDIS_PACKET at 8f3aabf8

     

    Packet.Private

      PhysicalCount       00000001  Total Length        00000036

      Head                8f3aa630  Tail                8a667d30

      Pool                90331d20  Count               00000001

      Flags               00000002  ValidCounts         01

      NdisPacketFlags     00000000  NdisPacketOobOffset 006c

     

          Private.Flags          : 00000002

          Private.NdisPacketFlags: 0

     

    Above output indicates a typical Packet Descriptor.

     

    Below is a description of the fields in above output.

     

    PhysicalCount :   Number of physical pages in packet.

    TotalLength       :   Total amount of data in the packet in bytes.

    Head                       :   First buffer in the chain. If Head is NULL the chain is empty.

    Tail                       :   Last buffer in the chain.

    Count                     :   The number of Buffers in the chain.

    ValidCounts       :   Represent a Boolean value on validity of the Counts.

    Pool                       :   NDIS Packet Pool address so we know where to free it back to.

    To demonstrate what an NDIS packet looks like, a breakpoint was placed on routine NdisMSendComplete. The definition of NdisMSendComplete states that ‘PNDIS_PACKET  Packet’ is the second parameter. So the address of NDIS Packet can be found  at EBP+0xC position on the stack.

     kd> kvn

     # ChildEBP RetAddr  Args to Child             

    00 818f1c44 8b8848c3 883be0e8 88727538 00000000 ndis!NdisMSendComplete+0x10 (FPO: [Non-Fpo])

    01 818f1c7c 8b884b2f 8817d280 81827c97 00000000 dc21x4vm!ProcessTransmitDescRing+0x363 (FPO: [Non-Fpo])

    02 818f1c9c 81718cb1 00010005 88166008 883be0e8 dc21x4vm!DC21X4HandleInterrupt+0xfb (FPO: [Non-Fpo])

    03 818f1cc8 81682d1f 8816601c 88166008 00000000 ndis!ndisMDpc+0x16b (FPO: [Non-Fpo])

    04 818f1ce8 818a93ae 8816601c 88166008 00000000 ndis!ndis5InterruptDpc+0x9c (FPO: [Non-Fpo])

    05 818f1d50 818912ae 00000000 0000000e 00000000 nt!KiRetireDpcList+0x147

    06 818f1d54 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x46 (FPO: [0,0,0])

     

    You can use verbosity 4 for looking at the Buffer Descriptors and a pointer to the next buffer descriptor, if any as shown below.

    ndis.h available with Windows Driver Kit (WDK) or Windows Driver Device Kit (DDK) has definition for NdisPacketFlags

    The verbosity 5 as most interesting -  the NDIS packet data contents are displayed.

     

     

    kd> !pkt poi(@ebp+c) 5

    NDIS_PACKET at 88727538

           MDL = 886f3ca0

                  StartVa ffffffff886f3000, ByteCount 0x36, ByteOffset 0xd06, NB MdlOffset 0x0

           886f3d06:  00 13 5f 0b ef ca 00 15 5d 50 f3 34 08 00 45 00

           886f3d16:  03 23 04 44 40 00 80 06 ce 2e 41 34 50 de 41 34

           886f3d26:  52 1c c0 51 00 50 6d f9 e6 c2 9a 14 d4 99 50 18

           886f3d36:  40 29 0e e5 00 00

                  MDL = 87f3aad0

                  StartVa ffffffff886f4000, ByteCount 0x2fb, ByteOffset 0x40, NB MdlOffset 0x0

           886f4040:  47 45 54 20 68 74 74 70 3a 2f 2f 77 77 77 2e 6d

           886f4050:  73 6e 2e 63 6f 6d 2f 61 6a 61 78 2f 48 2e 61 73

           886f4060:  70 78 20 48 54 54 50 2f 31 2e 31 0d 0a 41 63 63

           886f4070:  65 70 74 3a 20 2a 2f 2a 0d 0a 41 63 63 65 70 74

           886f4080:  2d 4c 61 6e 67 75 61 67 65 3a 20 65 6e 2d 75 73

           886f4090:  0d 0a 52 65 66 65 72 65 72 3a 20 68 74 74 70 3a

           886f40a0:  2f 2f 77 77 77 2e 6d 73 6e 2e 63 6f 6d 2f 0d 0a

           886f40b0:  55 41 2d 43 50 55 3a 20 78 38 36 0d 0a 41 63 63

           886f40c0:  65 70 74 2d 45 6e 63 6f 64 69 6e 67 3a 20 67 7a

           886f40d0:  69 70 2c 20 64 65 66 6c 61 74 65 0d 0a 55 73 65

           886f40e0:  72 2d 41 67 65 6e 74 3a 20 4d 6f 7a 69 6c 6c 61

           886f40f0:  2f 34 2e 30 20 28 63 6f 6d 70 61 74 69 62 6c 65

           886f4100:  3b 20 4d 53 49 45 20 37 2e 30 3b 20 57 69 6e 64

           886f4110:  6f 77 73 20 4e 54 20 36 2e 30 3b 20 53 4c 43 43

           886f4120:  31 3b 20 2e 4e 45 54 20 43 4c 52 20 32 2e 30 2e

           886f4130:  35 30 37 32 37 3b 20 2e 4e 45 54 20 43 4c 52 20

           886f4140:  33 2e 30 2e 30 34 35 30 36 29 0d 0a 48 6f 73 74

           886f4150:  3a 20 77 77 77 2e 6d 73 6e 2e 63 6f 6d 0d 0a 50

           886f4160:  72 6f 78 79 2d 43 6f 6e 6e 65 63 74 69 6f 6e 3a

           886f4170:  20 4b 65 65 70 2d 41 6c 69 76 65 0d 0a 43 6f 6f

           886f4180:  6b 69 65 3a 20 4d 43 31 3d 56 3d 33 26 47 55 49

           886f4190:  44 3d 31 61 31 61 65 64 38 31 38 36 34 30 34 32

           886f41a0:  39 62 61 63 32 37 38 61 65 37 65 39 63 38 35 39

           886f41b0:  64 39 3b 20 6d 68 3d 4d 53 46 54 3b 20 43 55 4c

           886f41c0:  54 55 52 45 3d 45 4e 2d 55 53 3b 20 4d 55 49 44

           886f41d0:  3d 32 43 38 33 35 42 36 32 34 35 42 46 34 46 30

           886f41e0:  36 38 36 36 43 34 37 38 38 42 46 39 30 43 35 38

           886f41f0:  32 3b 20 7a 69 70 3d 7a 3a 45 43 31 7c 6c 61 3a

           886f4200:  35 31 2e 35 31 32 32 32 31 38 38 7c 6c 6f 3a 30

           886f4210:  7c 63 3a 47 42 7c 68 72 3a 31 3b 20 46 6c 69 67

           886f4220:  68 74 47 72 6f 75 70 49 64 3d 34 37 3b 20 46 6c

           886f4230:  69 67 68 74 49 64 3d 42 61 73 65 50 61 67 65 3b

           886f4240:  20 75 73 68 70 73 76 72 3d 4d 3a 35 7c 46 3a 35

           886f4250:  7c 54 3a 35 7c 45 3a 35 7c 44 3a 62 6c 75 7c 57

           886f4260:  3a 46 7c 50 3a 4e 7c 56 3a 30 3b 20 75 73 68 70

           886f4270:  63 6c 69 3d 30 7c 48 2e 30 2e 31 7c 47 2e 30 2e

           886f4280:  31 7c 5a 2e 30 2e 31 7c 52 2e 30 2e 31 2e 63 61

           886f4290:  70 7c 43 2e 30 2e 31 2e 6c 67 3a 6e 65 77 79 6f

           886f42a0:  72 6b 6e 79 7c 4c 2e 30 2e 31 2e 4c 4e 3a 57 4e

           886f42b0:  42 43 3b 20 75 73 68 70 77 65 61 3d 77 63 3a 55

           886f42c0:  53 4e 59 30 39 39 36 3b 20 75 73 68 70 70 72 3d

           886f42d0:  43 3a 31 3a 30 38 30 37 32 31 7c 53 3a 31 3a 30

           886f42e0:  38 30 38 30 36 3b 20 68 70 63 6c 69 3d 57 2e 48

           886f42f0:  7c 4c 2e 7c 53 2e 7c 52 2e 7c 55 2e 4c 7c 43 2e

           886f4300:  3b 20 68 70 73 76 72 3d 4d 3a 35 7c 46 3a 35 7c

           886f4310:  54 3a 35 7c 45 3a 35 7c 44 3a 62 6c 75 7c 57 3a

           886f4320:  46 3b 20 68 70 6f 6c 79 3d 4f 3a 31 7c 48 3a 31

           886f4330:  3b 20 77 70 76 3d 30 0d 0a 0d 0a

    Above output shows starting virtual address of each buffer, the buffer's byte offset into the page pointed to by the virtual address and the total number of bytes in the buffer

     

    Looking at the contents of the memory buffer closely :-

     

    kd> dc 886f4040 886f4330+0xc

    886f4040  20544547 70747468 772f2f3a 6d2e7777  GET http://www.m

    886f4050  632e6e73 612f6d6f 2f78616a 73612e48  sn.com/ajax/H.as

    886f4060  48207870 2f505454 0d312e31 6363410a  px HTTP/1.1..Acc

    886f4070  3a747065 2a2f2a20 63410a0d 74706563  ept: */*..Accept

    886f4080  6e614c2d 67617567 65203a65 73752d6e  -Language: en-us

    886f4090  65520a0d 65726566 68203a72 3a707474  ..Referer: http:

    886f40a0  77772f2f 736d2e77 6f632e6e 0a0d2f6d  //www.msn.com/..

    886f40b0  432d4155 203a5550 0d363878 6363410a  UA-CPU: x86..Acc

    886f40c0  2d747065 6f636e45 676e6964 7a67203a  ept-Encoding: gz

    886f40d0  202c7069 6c666564 0d657461 6573550a  ip, deflate..Use

    886f40e0  67412d72 3a746e65 7a6f4d20 616c6c69  r-Agent: Mozilla

    886f40f0  302e342f 6f632820 7461706d 656c6269  /4.0 (compatible

    886f4100  534d203b 37204549 203b302e 646e6957  ; MSIE 7.0; Wind

    886f4110  2073776f 3620544e 203b302e 43434c53  ows NT 6.0; SLCC

    886f4120  2e203b31 2054454e 20524c43 2e302e32  1; .NET CLR 2.0.

    886f4130  32373035 2e203b37 2054454e 20524c43  50727; .NET CLR

    886f4140  2e302e33 30353430 0a0d2936 74736f48  3.0.04506)..Host

    886f4150  7777203a 736d2e77 6f632e6e 500a0d6d  : www.msn.com..P

    886f4160  79786f72 6e6f432d 7463656e 3a6e6f69  roxy-Connection:

    886f4170  65654b20 6c412d70 0d657669 6f6f430a   Keep-Alive..Coo

    886f4180  3a65696b 31434d20 333d563d 49554726  kie: MC1=V=3&GUI

    886f4190  61313d44 64656131 36383138 32343034  D=1a1aed81864042

    886f41a0  63616239 61383732 39653765 39353863  9bac278ae7e9c859

    886f41b0  203b3964 4d3d686d 3b544653 4c554320  d9; mh=MSFT; CUL

    886f41c0  45525554 2d4e453d 203b5355 4449554d  TURE=EN-US; MUID

    886f41d0  3843323d 36423533 42353432 30463446  =2C835B6245BF4F0

    886f41e0  36363836 38373443 39464238 38354330  6866C4788BF90C58

    886f41f0  7a203b32 7a3d7069 3143453a 3a616c7c  2; zip=z:EC1|la:

    886f4200  352e3135 32323231 7c383831 303a6f6c  51.51222188|lo:0

    886f4210  473a637c 72687c42 203b313a 67696c46  |c:GB|hr:1; Flig

    886f4220  72477468 4970756f 37343d64 6c46203b  htGroupId=47; Fl

    886f4230  74686769 423d6449 50657361 3b656761  ightId=BasePage;

    886f4240  68737520 72767370 353a4d3d 353a467c   ushpsvr=M:5|F:5

    886f4250  353a547c 353a457c 623a447c 577c756c  |T:5|E:5|D:blu|W

    886f4260  507c463a 567c4e3a 203b303a 70687375  :F|P:N|V:0; ushp

    886f4270  3d696c63 2e487c30 7c312e30 2e302e47  cli=0|H.0.1|G.0.

    886f4280  2e5a7c31 7c312e30 2e302e52 61632e31  1|Z.0.1|R.0.1.ca

    886f4290  2e437c70 2e312e30 6e3a676c 6f797765  p|C.0.1.lg:newyo

    886f42a0  796e6b72 302e4c7c 4c2e312e 4e573a4e  rkny|L.0.1.LN:WN

    886f42b0  203b4342 70687375 3d616577 553a6377  BC; ushpwea=wc:U

    886f42c0  30594e53 3b363939 68737520 3d727070  SNY0996; ushppr=

    886f42d0  3a313a43 37303830 537c3132 303a313a  C:1:080721|S:1:0

    886f42e0  30383038 68203b36 696c6370 482e573d  80806; hpcli=W.H

    886f42f0  7c2e4c7c 527c2e53 2e557c2e 2e437c4c  |L.|S.|R.|U.L|C.

    886f4300  7068203b 3d727673 7c353a4d 7c353a46  ; hpsvr=M:5|F:5|

    886f4310  7c353a54 7c353a45 6c623a44 3a577c75  T:5|E:5|D:blu|W:

    886f4320  68203b46 796c6f70 313a4f3d 313a487c  F; hpoly=O:1|H:1

    886f4330  7077203b 0d303d76 2e0a0d0a 2e6e736d  ; wpv=0.....msn.

     

    So this packets contains HTTP traffic for MSN ! (Very true I had the msn site open while I was debugging this machineJ).

     

     A  list all NDIS packet pools can be displayed with the pktpools option, each pool would have set of NDIS packets.

     

    kd> !pktpools

    Pool      Allocator  BlocksAllocated  BlockSize  PktsPerBlock  PacketLength

    87faa268  8b88d1a1   0x6         0x1000  0x13         0xd0   dc21x4vm!AllocateAdapterMemory+16b

    87f6f4e0  8b88d1a1   0x6         0x1000  0x13         0xd0   dc21x4vm!AllocateAdapterMemory+16b

    87fbf2f0  8b88d1a1   0x6         0x1000  0x13         0xd0   dc21x4vm!AllocateAdapterMemory+16b

    87f4eb40  8b88d1a1   0x6         0x1000  0x13         0xd0   dc21x4vm!AllocateAdapterMemory+16b

    87e19620  8174d66c   0x1         0x1000  0x12         0xd8   ndis!DriverEntry+43d

    87e19670  8174d65a   0x1         0x1000  0x13         0xd0   ndis!DriverEntry+42b

     

     A list of the NDIS packets in a packet pool can be displayed with the findpacket option.  The pool address 87e19670 was obtained from the pktpools output above.

     

    kd> !findpacket p 87e19670

     

    Searching Free block <0x88727000>

    Packet at 0x88727538

     

    0x88727538 is our http packet shown above.

     

    Another variant of findpacket is used  to find an NDIS packet with a Virtual address from the packet buffer. A random address 886f4110 was obtained from the packet buffer above with http contents.

     

    kd> !findpacket v 886f4110

     

    Searching Free block <0x881a3000>

     

    Searching Used block <0x88185000>

     

    Searching Used block <0x88187000>

     

    Searching Used block <0x8818b000>

     

    Searching Used block <0x8818f000>

     

    Searching Used block <0x88193000>

     

    Searching Free block <0x88176000>

     

    <Snip>

     

    Searching Free block <0x88727000>

     

    Packet found

    Packet at 0x88727538

     

    Packet.Private

      PhysicalCount       00000000  Total Length        00000000

      Head                00000000  Tail                00000000

      Pool                00000000  Count               00000000

      Flags               00000000  ValidCounts         00

      NdisPacketFlags     00000000  NdisPacketOobOffset 0000

     

          Private.Flags          : 00000082

                         DONT_LOOPBACK,

          Private.NdisPacketFlags: 90

                         fPACKET_PENDING, fPACKET_CLEAR_ITEMS, fPACKET_ALLOCATED_BY_NDIS

     

    I hope this gives the reader a better understanding of NDIS stacks.

     

     

  • Ntdebugging Blog

    Red alert! My Server is hung - what do I do?

    • 5 Comments

    So you have a dump from a hung server and you’re the first person on the scene. Your IT Manager is jumping up and down, the phone is ringing off the hook and people are hovering outside your cube.  It’s game time and the pressure is on!!!  Now what do you do? 

     

    Well take a deep breath, get a cup of coffee, and relax because I’m here to help you out!  Let me share what we typically do on our first pass through a hung server kernel debug.  This works for both live debugs and dumps. These are steps you can take and they will find problems!

     

    Here’s something else to consider.  If the server is mission critical you will probably want to get a dump vs. a live debug so you can get the server back up and running.  This will take the pressure off because you can then do the debug offline, and if need be, send the dump to other people for review.

     

    Before we get started let me state that the following data is completely fabricated and many of the process names and address in this output have been made up.  Do not question odd offsets or alignments.

     

    I’m also assuming that you know how to

     

    1.       Collect a kernel dump: http://support.microsoft.com/kb/244139

     

    2.       Set up the debugger: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

     

    3.       Know how to use the symbol server: http://support.microsoft.com/kb/311503

     

     

    0)      Before I start these types of debugs I like to open a log file.

     

    1: kd> .logopen H:\repro\hungserver.log

    Opened log file 'H:\repro\hungserver.log'

     

     

    1)      !vm - Look for memory usage.  Generally speaking you want to look at what the current pool or memory usage values are and compare them to the max available.

     

     

    1: kd> !vm

     

     

    *** Virtual Memory Usage ***

          Physical Memory:      982890 (   3931560 Kb)

          Page File: \??\P:\pagefile.sys

            Current:   3931560 Kb  Free Space:   3742548 Kb

            Minimum:   3931560 Kb  Maximum:      4193280 Kb

          Available Pages:      631300 (   2525200 Kb)

          ResAvail Pages:       888171 (   3552684 Kb)

          Locked IO Pages:         195 (       780 Kb)

          Free System PTEs:     202830 (    811324 Kb) < THIS IS OK

          Free NP PTEs:          32765 (    131060 Kb) < THIS IS OK

          Free Special NP:           0 (         0 Kb)

          Modified Pages:          241 (       964 Kb)

          Modified PF Pages:       241 (       964 Kb)

          NonPagedPool Usage:    11377 (     45508 Kb) < THIS IS OK

          NonPagedPool Max:      65536 (    262144 Kb) 

          PagedPool 0 Usage:      6398 (     25592 Kb)

          PagedPool 1 Usage:      2201 (      8804 Kb)

          PagedPool 2 Usage:      2216 (      8864 Kb)

          PagedPool 3 Usage:      2179 (      8716 Kb)

          PagedPool 4 Usage:      2199 (      8796 Kb)

          PagedPool Usage:       15193 (     60772 Kb) < THIS IS OK

          PagedPool Maximum:     67584 (    270336 Kb)

          Shared Commit:         24569 (     98276 Kb)

          Special Pool:              0 (         0 Kb)

          Shared Process:        12519 (     50076 Kb)

          PagedPool Commit:      15252 (     61008 Kb)

          Driver Commit:          2083 (      8332 Kb)

          Committed pages:      313611 (   1254444 Kb) < THIS IS OK

          Commit limit:        1925815 (   7703260 Kb)

     

    Check to see if any apps are using tons of memory.  In this case I don’t see a problem.

     

          Total Private:        239673 (    958692 Kb)

             36b0 EXCEL.EXE        10775 (     43100 Kb) < THIS IS OK, etc

             2ee8 myapploc.exe     10288 (     41152 Kb)

             097c MySSrv.exe        7497 (     29988 Kb)

             0418 MyFun32.exe       6277 (     25108 Kb)

             0474 svchost.exe       6164 (     24656 Kb)

             1be8 ABCDEFGH.EXE      4984 (     19936 Kb)

             0480 IEXPLORE.EXE      4924 (     19696 Kb)

             09c4 ANOTHER.exe       4768 (     19072 Kb)

             19a4 HMMINTER.exe      4207 (     16828 Kb)

             1b30 ohboya.EXE        4146 (     16584 Kb)

             4558 aprocess.EXE      4138 (     16552 Kb)

             30e8 another.exe       3691 (     14764 Kb)

             0924 aservicec.exe     3508 (     14032 Kb)

             0854 RRXXc.exe         3400 (     13600 Kb)

             3458 MYWIN.EXE         3389 (     13556 Kb)

             0d90 FunService.exe    3298 (     13192 Kb)

             1180 CustomAp.exe      3221 (     12884 Kb)

             06ac XYZvrver.exe      2769 (     11076 Kb)

             2cdc ABCDEFGH.exe      2591 (     10364 Kb)

             02f4 lsass.exe         2567 (     10268 Kb)

             21b4 IEXPLORE.EXE      2516 (     10064 Kb)

             3420 Process.exe       2450 (      9800 Kb)

             4cd4 XYZXY.EXE         2305 (      9220 Kb)

             4a30 lookup.EXE        2244 (      8976 Kb)

             4360 Process.exe       2201 (      8804 Kb)

             0564 spoolsv.exe       2166 (      8664 Kb)

             2e5c XYZXYZEXE         2076 (      8304 Kb)

             02bc winlogon.exe      1964 (      7856 Kb)

             4e48 winlogon.exe      1958 (      7832 Kb)

             42bc ABCDEFGH.exe      1943 (      7772 Kb)

             0eb8 svchost.exe       1922 (      7688 Kb)

             3b98 Process.exe       1919 (      7676 Kb)

             4c1c IEXPLORE.EXE      1864 (      7456 Kb)

             17b8 winlogon.exe      1852 (      7408 Kb)

             3124 winlogon.exe      1849 (      7396 Kb)

             14b8 winlogon.exe      1847 (      7388 Kb)

             32cc winlogon.exe      1843 (      7372 Kb)

             1f84 winlogon.exe      1843 (      7372 Kb)

             2ebc winlogon.exe      1842 (      7368 Kb)

             1548 winlogon.exe      1840 (      7360 Kb)

             21c4 PROCESS213.EXE    1833 (      7332 Kb)

             3b58 MYWIN.EXE         1817 (      7268 Kb)

             4b3c winlogon.exe      1816 (      7264 Kb)

     

    NOTE if you see high pool values you will want to issue a !poolused 2 and a !poolused 4 to dump out the pool usages so you can see what pool tags are consuming pool.  (We will write a dedicated blog on this topic later.)

     

     

    2) !sysptes - See if one of the lists is low (less than 10)

     

     

    1: kd> !sysptes

     

    All of these are ok

     

    System PTE Information

      Total System Ptes 224223

         SysPtes list of size 1 has 225 free

         SysPtes list of size 2 has 57 free

         SysPtes list of size 4 has 136 free

         SysPtes list of size 8 has 59 free

         SysPtes list of size 16 has 95 free

     

        starting PTE: c022b000

        ending PTE:   c03dff78

     

      free blocks: 652   total free: 202831    largest free block: 191973

     

     

    3) !defwrites - If throttling, the server is doing nothing other than writing to the disk.

     

     

    1: kd> !defwrites

    *** Cache Write Throttle Analysis ***

     

          CcTotalDirtyPages:                   187 (     748 Kb)

          CcDirtyPageThreshold:             130560 (  522240 Kb)

          MmAvailablePages:                 631300 ( 2525200 Kb)

          MmThrottleTop:                       450 (    1800 Kb)

          MmThrottleBottom:                     80 (     320 Kb)

          MmModifiedPageListHead.Total:        241 (     964 Kb)

     

    Write throttles not engaged  < THIS IS OK. Good = NOT engaged.

     

     

    4) !ready to see if we're holding stuff up

     

     

    1: kd> !ready

    Processor 0: No threads in READY state  < THIS IS OK

    Processor 1: No threads in READY state  < THIS IS OK

     

    If we had threads in a ready state you would want to investigate what those threads were and what is running on the processor.

     

     

    5) !pcr x; kv on each processor - If they aren't idle then we could be doing DPCs

     

     

    1: kd> !pcr 0  < Dump the processor control registers for CPU 0

    KPCR for Processor 0 at ffdff000:

        Major 1 Minor 1

          NtTib.ExceptionList: ffffffff

              NtTib.StackBase: 00000000

             NtTib.StackLimit: 00000000

           NtTib.SubSystemTib: 80042000

                NtTib.Version: 012e7ace

            NtTib.UserPointer: 00000001

                NtTib.SelfTib: 00000000

     

                      SelfPcr: ffdff000

                         Prcb: ffdff120

                         Irql: 00000000

                          IRR: 00000000

                          IDR: ffffffff

                InterruptMode: 00000000

                          IDT: 8003f400

                          GDT: 8003f000

                          TSS: 80042000

     

                CurrentThread: 8056cd00

                   NextThread: 00000000

                   IdleThread: 8056cd00

     

                    DpcQueue: < NO DPCs: Not much to look at then 

        

    1: kd> !pcr 1  < Dump the processor control registers for CPU 1

    KPCR for Processor 1 at f773f000:

        Major 1 Minor 1

          NtTib.ExceptionList: f5ba1d30

              NtTib.StackBase: 00000000

             NtTib.StackLimit: 00000000

           NtTib.SubSystemTib: f773fef0

                NtTib.Version: 0121925d

            NtTib.UserPointer: 00000002

                NtTib.SelfTib: 7ffda000

     

                      SelfPcr: f773f000

                         Prcb: f773f120

                         Irql: 00000000

                          IRR: 00000000

                          IDR: ffffffff

                InterruptMode: 00000000

                          IDT: f77456e0

                          GDT: f77452e0

                          TSS: f773fef0

     

                CurrentThread: 8963cb90

                   NextThread: 00000000

                   IdleThread: f7741fa0

     

                    DpcQueue: < NO DPCs: Not much to look at then

     

    6) !locks - Look for deadlocks and contention

     

     

    The following output is of interest.

    The thread ID with the <*> next to it means that he has exclusive access to the resource and that all the other threads are waiting on that thread to finish its work. Typically you would !thread that OWNER THREAD ID <*> (e.g., !thread 87bddda0) to see what that thread is doing. If you have two threads that have exclusive access to two different resources, and these threads are in each other’s exclusive waiters list, you have a deadlock.  The following is an example of what a deadlock might look like.  In this case you would want to !thread each owner and evaluate the logic of the code in each stack that allowed the threads to get into this state 

     

    1: kd> !locks

    **** DUMP OF ALL RESOURCE OBJECTS ****

    KD: Scanning for held locks......

     

    Resource @ 0x8a50ee98    Shared 4 owning threads

         Threads: 896856d0-01<*> 89686778-01<*> 896862d0-01<*> 89685da0-01<*>

    KD: Scanning for held locks............................................................

     

    Resource @ 0x896da1bc    Exclusively owned

         Threads: 896e3b20-01<*>

    KD: Scanning for held locks..

     

     

    Resource @ 0x81234567    Shared 1 owning threads

        Contention Count = 15292

        NumberOfSharedWaiters = 1

        NumberOfExclusiveWaiters = 39

         Threads: 87bddda0-01<*> 806d2020-01 

     

     

         Threads Waiting On Exclusive Access:

                  80ced020       80c036f8       80cdc7a0       80c438b0      

                  80e6cda0       80f96987       8007fd60       8004dc10      

                  80d7b020       80a2dd70       80b89620       80b58020      

                  8036eda0       87abc123       80606da0       8056e890      

                  802b3630       80cc7590       80d64020       80f7dda0      

                  80129580       80b73da0       806d2578       80b505d8      

          

     

    KD: Scanning for held locks................

     

    Resource @ 0x83245678    Exclusively owned

        Contention Count = 4827

        NumberOfExclusiveWaiters = 35

         Threads: 87abc123-01<*>

         Threads Waiting On Exclusive Access:

                  803e6aa0       80876020       80240020       80f56588      

                  808174f0       80bd6b28       80c3c448       8046d6c8      

                  801e8da0       80356518       80b4c978       8069e020      

                  80cb9020       87bddda0       80c65020       86daaac0      

                  80379020       80fe4020      

     

     

     

    8) !process 0 0 - Search for drwtsn32.  This would indicate that we have a process that has crashed and is in the process of being dumped.  This could cause a server hang.  Look at the PEB for drwtsn32 and get its command line to see what process is being dumped.  You should be able to do this by getting its process id and doing a .process PROCESSID;.reload;!PEB

     

    The following is how to extract a command line for any process, but it would work for Watson also.

     

    1: kd> .process 89f31020 

    Implicit process is now 89f31020

    1: kd> .reload

    Loading Kernel Symbols

    ...........................................................................................................................................

    Loading User Symbols

    ...............................

    Loading unloaded module list

    ...............

    1: kd> !peb

    PEB at 7ffdf000

        InheritedAddressSpace:    No

        ReadImageFileExecOptions: Yes

        BeingDebugged:            No

        ImageBaseAddress:         01000000

        Ldr                       77fc23a0

        Ldr.Initialized:          Yes

        Ldr.InInitializationOrderModuleList: 00171ef8 . 00176c90

        Ldr.InLoadOrderModuleList:           00171e90 . 00176c80

        Ldr.InMemoryOrderModuleList:         00171e98 . 00176c88

                Base TimeStamp                     Module

             1000000 3e80245d Mar 24 05:41:49 2003 \??\P:\WINDOWS\system32\winlogon.exe

            77f40000 3e802494 Mar 25 05:42:44 2003 P:\WINDOWS\system32\ntdll.dll

            77e40000 44c60ec8 Jul 25 08:30:00 2006 P:\WINDOWS\system32\kernel32.dll

            77ba0000 3e802496 Mar 25 05:42:46 2003 P:\WINDOWS\system32\msvcrt.dll

            77da0000 3e802495 Mar 25 05:42:45 2003 P:\WINDOWS\system32\ADVAPI32.dll

            77c50000 40566fc9 Mar 15 23:08:57 2004 P:\WINDOWS\system32\RPCRT4.dll

            77d00000 45e7bafc Mar 02 00:49:48 2007 P:\WINDOWS\system32\USER32.dll

            77c00000 45e7bafc Mar 02 00:49:48 2007 P:\WINDOWS\system32\GDI32.dll

            75970000 3e8024a2 Mar 25 05:42:58 2003 P:\WINDOWS\system32\USERENV.dll

            75810000 3e8024a3 Mar 25 05:42:59 2003 P:\WINDOWS\system32\NDdeApi.dll

            761b0000 3e8024a0 Mar 25 05:42:56 2003 P:\WINDOWS\system32\CRYPT32.dll

           

        SubSystemData:     00000000

        ProcessHeap:       00070000

        ProcessParameters: 00020000

        WindowTitle:  '< Name not readable >'

        ImageFile:    '\??\P:\WINDOWS\system32\winlogon.exe'

        CommandLine:  'winlogon.exe' < HERE IS THE COMMAND LINE.. No args in this case

     

     

    ( output is truncated ... )

     

    9) Look at the handle table size.  If it’s over 10000 you may have trouble.  If you do have a handle leak refer to TalkBackVideo Understanding handle leaks and How to use !htrace to find them

     

     

    1: kd> !process 0 0

     

    **** NT ACTIVE PROCESS DUMP ****

    PROCESS 8a613270  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000

        DirBase: 0acc0000  ObjectTable: e1001d10  HandleCount: 2510.

        Image: System

     

    PROCESS 8a294328  SessionId: none  Cid: 0274    Peb: 7ffdf000  ParentCid: 0004

        DirBase: ef1ac000  ObjectTable: e14ac1d0  HandleCount: 124.

        Image: smss.exe

     

    PROCESS 8a103424  SessionId: 0  Cid: 02a4    Peb: 7ffdf000  ParentCid: 0274

        DirBase: ed804000  ObjectTable: e18caa68  HandleCount: 1171.

        Image: csrss.exe

     

    PROCESS 8a104343  SessionId: 0  Cid: 02bc    Peb: 7ffdf000  ParentCid: 0274

        DirBase: ed539000  ObjectTable: e18c67b0  HandleCount: 498.

        Image: winlogon.exe

     

    PROCESS 8a0f6634  SessionId: 0  Cid: 02e8    Peb: 7ffdf000  ParentCid: 02bc

        DirBase: ece72000  ObjectTable: e1668e40  HandleCount: 568.

        Image: services.exe

     

    PROCESS 8a123423  SessionId: 0  Cid: 02f4    Peb: 7ffdf000  ParentCid: 02bc

        DirBase: ecd7a000  ObjectTable: e16684a0  HandleCount: 30000. < This is bad

        Image: lsass.exe

     

    PROCESS 89f96453  SessionId: 0  Cid: 03e0    Peb: 7ffdf000  ParentCid: 02e8

        DirBase: eb99c000  ObjectTable: e16bb570  HandleCount: 500.

        Image: svchost.exe

     

    PROCESS 8a0c6532  SessionId: 0  Cid: 042c    Peb: 7ffdf000  ParentCid: 02e8

        DirBase: eb6d7000  ObjectTable: e1731170  HandleCount: 156.

        Image: svchost.exe

     

    PROCESS 8a0a8d88  SessionId: 0  Cid: 0460    Peb: 7ffdf000  ParentCid: 02e8

        DirBase: eb58f000  ObjectTable: e17372e8  HandleCount: 124.

        Image: svchost.exe

     

    PROCESS 89f77678  SessionId: 0  Cid: 0474    Peb: 7ffdf000  ParentCid: 02e8

        DirBase: eb484000  ObjectTable: e17305b8  HandleCount: 1457.

        Image: svchost.exe

     

    9) !process 0 0 system - Check the worker threads in the system process (search for srv! to find server worker threads).  What are these threads doing?  These are the server service threads.  Are they blocked on I/O or waiting for a resource?

     

    10) 1: kd> !process 0 17 csrss.exe  - Look for 16 LPC server threads.

    What are they doing? Are they blocked?

     

    11) !stacks 2,  This will dump every call stack on the server.  You may need to go through and evaluate every stack on the server.  Look for critical sections, etc.

     

    15) !qlocks  This will allow you to check the stack of all the Queued spin locks on the machine.   For further information on spinlocks refer to the Windows Internals book.

     

    1: kd> !qlocks

    Key: O = Owner, 1-n = Wait order, blank = not owned/waiting, C = Corrupt

     

                           Processor Number

        Lock Name         0  1    << Nothing to worry about here.

     

    KE   - Dispatcher        

    MM   - Expansion         

    MM   - PFN               

    MM   - System Space      

    CC   - Vacb              

    CC   - Master            

    EX   - NonPagedPool      

    IO   - Cancel            

    EX   - WorkQueue         

    IO   - Vpb                

    IO   - Database          

    IO   - Completion        

    NTFS - Struct            

    AFD  - WorkQueue         

    CC   - Bcb               

    MM   - NonPagedPool     

     

    16) !process 0 17 winlogon.exe to look for hung LPC calls.  If you find a LPC call calling out of winlogon you can follow the call with the !LPC debugger command. This will allow you to see what the thread is doing in the other process.

     

     

    If you have further questions on any of these commands, please refer to the debugger.chm file in the Windows debugger tools install.

     

    Good luck and happy debugging.

     

    “This debugger is mine, there are many like it but this one is mine!” Jeff Dailey

  • Ntdebugging Blog

    Basics of Debugging Windows

    • 8 Comments

    Hello, this is East again. This blog post is about a topic that we always skip over when discussing debugging; what and where are the tools for debugging. I will touch on the different types of debuggers, loading symbols and the basics of getting started with loading up a dump under your preferred debugger.

    Microsoft currently offers 4 types of debugging tools. With these tools you can remote debug another machine over firewire or serial cable (USB also but may not work consistently), as well as debug usermode processes and dump files.

    Command line debuggers:

    1 ) kd.exe: kernel debugger – Used to review Crash dumps created by a blue screen crash event or a stop error. (kd –z <location of dump> –y <location of symbols>)

    2 ) cdb.exe: User mode debugger for reviewing applications,  processes, and process dumps  (cdb  –z <location of dump> –y <location of symbols> )

    3 ) ntsd.exe: CDB and NTSD are virtually identical, except that NTSD spawns a new text window when it is started, whereas CDB inherits the Command Prompt window from which it was invoked.  When I refer to "CDB", it applies to both CDB and NTSD.

    Graphical User Interface Debugger:

    4) Windbg.exe is a GUI based debugger. It can debug the same things as KD & CDB using the same commands. Windbg gives you the ability to have multiple windows open simultaneously  to review source code or other selectable items under the view menu.

    I like using windbg for all of my user and kernel debugging, while  others I work with prefer kd for kernel debugging and cdb for user mode debugging.

     

    There are 32bit and 64bit debuggers available.

    NOTE: Some people use Visual Studio as well, but this blog post will not cover using Visual Studio as a debugger.

     

    You can review applications that already have started on your machine using CDB or Windbg. You can have the problematic application launch under the debugger as well:

    Cdb or Windbg

    -p <pid> specifies the decimal process ID to attach to ( use tlist or the task manger to obtain the PID)

    -psn <name> specifies the process to attach to by service name

    <application to launch> -y <symbol path>

    NOTE: windbg allows you to use menu options as well: select “Attach to a Process” on the File menu to debug a user-mode application that is currently running.

     

    What are dumps?

    Memory dumps are a record of what was in memory and the registers at the time of a crash. There are 3 types of memory dumps:

    NOTE: The type of dump that will be written upon bugcheck can be configured by right clicking my computer ->properties ->Advanced tab ->Settings, in the Write debugging section you will use the first drop down box to select what type of memory dump you want. (See KB307973)

    ·         Mini dump – is a subset of memory that is in use by the application creating the dump.

    A mini memory Dump file is written to %SystemRoot%\Minidump\Memory.dmp by default and is usually less than a 1mb in size.

    ·         Kernel only – This is used to review the machine’s kernel memory at the time of the crash.

    ·         Full/Complete – This is the largest kernel mode dump file. It contains all information from kernel and user mode address spaces that was in physical memory at the time of the dump (about the same size as the physical memory on the box).

    Kernel and Complete Memory Dumps are written to %SystemRoot%\Memory.dmp by default.

    Note: You can configure the server to crash using certain keystrokes . This would be useful when troubleshooting a hung server or a timing issue, KB244139 explains how to configure your server for a manual crash.

    You can also create dump files from an application or process, these are known as User-mode dumps.  Additional information can be found on these types dump in the Debugging Tools for Windows help file.

     

    How do I read a dump file?

    In order to make fast progress with a memory dump file, it is best to load symbol files. Symbol files contains data that the debugger uses to interpret the application or driver code. They may contain:

    -          Global variable names

    -          Function names

    Private Symbols would contain the above information and:

    -          Local variable names

    -          Source-line numbers

    -          Type information for variables, structures, etc.

     Microsoft currently has two ways you can access symbols for the Operating System:

    Service pack download site – You will need to create:

    -          Separate directories for Windows 2000 RTM, Windows 2000 SP1, Windows 2000 SP2, Windows XP RTM, etc.

    -          Separate directories for all of the above for free vs. checked build

    -          Separate directories for hotfix symbols

     

    Public symbol server – uses a symbol store, which is a collection of symbol files. The symbol server uses the time stamp & file size to match up symbols to the active binary.After getting your symbol files together, you will need a way to tell the debugger where they are located and set up some other options.

    To set the symbol path do one of the following:

    -          _NT_SYMBOL_PATH environment variable

    -          -y command line option

    -          .sympath (Set Symbol Path) debugger command

    -          WinDbg: File | Symbol File Path dialog, or CTRL+S

    To set the executable Image Path (needed for minidumps only), do one of the foolowing:

    -          -i command line option

    -          .exepath debugger command

    -          WinDbg: File | Image File Path dialog, or CTRL+i

    -          Source Path

    -          .srcpath WinDbg: File | Source File Path dialog, or CTRL+P

    If symbol errors appear when you begin, you can try the below commands to help narrow down some problems;

    !sym noisy — gives verbose symbol information

    AND

    .reload —  to reload all symbols

     

    Also using the srv* in your symbol path tells the debugger to load and save symbols being used out to a specific directory:

    srv*DownstreamStore*<symbol locations>

     

    NOTE: You must always use .reload after you change the symbol path or fix a symbol error — the debugger doesn’t automatically reload your symbols!

     

    Now that we are done with the overview, let’s configure our machine as a host computer to open memory a dump.  I will be using Microsoft Public Symbol servers and I want to store current symbols locally to my host machine.

    Using windbg I will set my current workspace symbols to: srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols

    Click the menu option File ->Symbol File Path or Ctrl + S. This will bring up an empty box that will allow you to enter or browse to your symbol path.

    If using kd you want to set an environment variable (_NT_SYMBOL_PATH) under “my computer properties -> advanced tab” to always start with your symbols set to:  “srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols” or use this same path in your command line:

    Kd –z <path to dump.file> -y srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols

     

    NOTE: Windbg will append any workspace symbol path with the one set by the _NT_SYMBOL_PATH environment variable during loading of a memory dump.

    Ok, now we know what debugger we want to use and we know our symbol locations. Let’s open our first kernel memory dump , located on <drive letter> <path to dump file>

    Using windbg, I will load a dump file using menu options File ->Open crash Dump (ctrl + D) or drag the the dump file into the debugger; you can even  start windbg at the command prompt.  My command would look like this:

    Windbg  –z C:\training\case 7f\MEMORY051308.22.DMP

    I did not use the –y for symbol path, as it is set already in my default workspace or in my environment variable.

    When the debugger first loads a dump file it displays several lines of information before giving you a prompt to get started with your commands (by default):

    Microsoft (R) Windows Debugger Version 6.9.0003.113 X86  ß debugger version

    Copyright (c) Microsoft Corporation. All rights reserved. ß Copyright of the debugger creator

    Loading Dump File [C:\training\case 7f\MEMORY051308.22.DMP] ß location of the dump file loading

    Kernel Summary Dump File: Only kernel address space is available ß type of memory dump (mini, kernel, or full)

    Symbol search path is: srv*c:\pubsymbols*http://msdl.microsoft.com/download/symbols ß Symbol path for this debug session

    Executable search path is:  ß points to the directory the executable files are located. For most situations this is not needed. For other situations please check the debugger help file.

     

    The next 4 lines talk about The OS version, service packs and how many processors are on the box

    1 -Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (8 procs) Free x86 compatible

    2 - Product: Server, suite: Enterprise TerminalServer SingleUserTS

    3 - Built by: 3790.srv03_sp2_gdr.070304-2240

    4 - Kernel base = 0x80800000 PsLoadedModuleList = 0x808a6ea8

     

    Next we would see when the machine crashed and how long it was up prior to this crash:

    Debug session time: Wed May 14 01:27:36.768 2008 (GMT-4)

    System Uptime: 0 days 16:32:51.921

     

    After completing the above process, the debugger starts loading the dump file and parsing through the loaded symbols. Here you may notice some warnings for some user space processes which are not included in the kernel dump. This is ok.

    WARNING: Process directory table base BFF0A080 doesn't match CR3 007AF000

    WARNING: Process directory table base BFF0A080 doesn't match CR3 007AF000

    Loading Kernel Symbols

    ...........................................................................................................................................

    Loading User Symbols

    PEB is paged out (Peb.Ldr = 7ffdf00c).  Type ".hh dbgerr001" for details

    Loading unloaded module list

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    1- Use !analyze -v to get detailed debugging information.

    2 - BugCheck 7F, {8, f773ffe0, 0, 0}

    3 - *** ERROR: Module load completed but symbols could not be loaded for ql2300.sy

    The three things I want to point out from above are:

    1 - !analyze –v: This is the debugger command used to help analyze a dump file by reviewing information passed to KeBugCheck including specific parameters of that crash. It will analyze this information and provide a definition of the bugcheck, a stack showing all current function calls, and, when possible, the name of an offending driver or process that the debugger thinks is at fault.  Please review the debugger help file for additional information in this area.

    2 – The type of bugcheck that occurred on the machine.

    3 – An error telling you about symbols missing or not available to help diagnose a particular driver or application. This can lead to a misdiagnostis if you’re not careful.

    Once loading is completed you should be at a kd> prompt. This prompt shows you the current processor you are using (if the machine has more than one).

    For this dump we are at processor 3 on an 8 proc machine:

    3: kd>

     

    To view the current crash stack location you can use the "K" command. There are multiple forms of this command, each one dumping the basic plus additional information. As functions are executed and call other functions, a call stack is created in stack memory. Here are two common commands to view the stack:

     

    3: kd> k

    ChildEBP RetAddr

    00000000 baebf0ce nt!KiTrap08+0x75

    b3a4bffc baebf737 storport!RaCallMiniportInterrupt+0x2

    b3a4c008 8088d889 storport!RaidpAdapterInterruptRoutine+0x1d

    b3a4c008 80a59d8e nt!KiInterruptDispatch+0x49

    b3a4c09c 80a5c2fc hal!HalpGenerateInterrupt+0x1d2

    b3a4c0c0 80a5c44d hal!HalpLowerIrqlHardwareInterrupts+0x108

    b3a4c0d0 808256ed hal!KfLowerIrql+0x59

    <snippet>

     

    3: kd> kb

    ChildEBP RetAddr  Args to Child

    00000000 baebf0ce 00000000 00000000 00000000 nt!KiTrap08+0x75

    b3a4bffc baebf737 97bedb88 b3a4c02c 8088d889 storport!RaCallMiniportInterrupt+0x2

    b3a4c008 8088d889 977b9e18 97bedad0 03010006 storport!RaidpAdapterInterruptRoutine+0x1d

    b3a4c008 80a59d8e 977b9e18 97bedad0 03010006 nt!KiInterruptDispatch+0x49

    b3a4c09c 80a5c2fc 97797004 97bedad0 00000102 hal!HalpGenerateInterrupt+0x1d2

    b3a4c0c0 80a5c44d 00000101 977b9e02 b3a4c0d8 hal!HalpLowerIrqlHardwareInterrupts+0x108

    b3a4c0d0 808256ed b3a4c0e8 baebf1c6 977b9bb0 hal!KfLowerIrql+0x59

    <snippet>

     

    Either one can be used depending on how much information you want to see and can use.

    This completes the Basic of Debugging Windows, Part I. I will create a Part II using specific questions gathered from our readers.

     

    Miscellaneous information:

    To go further with this topic I would suggest starting with the debugger help file included with the Microsoft Debugging Tools. 

    ADPlus – An automated way to use the cdb.exe to capture/create a usermode dump when a process hangs or crashes. (more info - http://msdn.microsoft.com/en-us/library/cc265629.aspx or kb286350)

    Public Symbols for Microsoft Operating Systems:

    Microsoft Public Symbol server : srv * DownstreamStore * http://msdl.microsoft.com/download/symbols

    example: srv*c:\mysyms*http://msdl.microsoft.com/download/symbols

     Microsoft Symbol packages http://www.microsoft.com/whdc/devtools/debugging/symbolpkg.mspx#d

    Use !Analyze-v to gather additional information about the bugcheck and a bucket-id for your dump file. The bucket-id can be submitted to Microsoft for review for similar crashes and resolutions. Try using the Microsoft Online Crash Analysis to submit your crash dump bucket-id for possible follow up from Microsoft or for Microsoft to look for trends: http://oca.microsoft.com/en/Welcome.aspx

    For concepts, tools and information about the system architecture:

    http://msdn.microsoft.com/en-us/default.aspx

    Windows Internal 4th edition (by Mark E. Russinovich & David A. Solomon) the whole book or Chapter 14 - Crash Dump Analysis

    Advanced Windows Debugging (by Mario Hewardt & Daniel Pravat )

    http://technet.microsoft.com/en-us/default.aspx

  • Ntdebugging Blog

    Transcript of Windows NT Debugging Blog Live Chat

    • 3 Comments

    For those of you that could not make the live chat on 8/13, here is the transcript of the chat session....

     

    Chat Topic: PGES-Windows NT Debugging Blog Live Chat
    Date: Wednesday, August 13, 2008

    Daniel (Moderator):
    Hello everyone-- thanks for coming to our chat on Platforms Global Escalation Services. The chat will officially get started at 1pm Eastern time. Only questions related to this topic will be addressed during this chat. Thanks!

    Daniel (Moderator):
    Hello everyone-- thanks for coming to our chat on Platforms Global Escalation Services. We'll get started in about 10 minutes.  You can start posting your questions now if you'd like and when the chat starts our Experts will begin answering them. Be sure to check the "Ask the Experts" box before you send your questions and please keep all questions on topic-- Thanks!

    Daniel (Moderator):

    Let's get started with our chat. Before we begin, though, I'd like to have our Experts introduce themselves and then they'll get started answering  your questions.

    Smoke [Windows Core] (Expert):
    Hi everyone, I'm an Escalation Engineer with the Window’s Core team.  I fix bugs for a living.

    Matthew [MSFT EE] (Expert):
    Hello, I am an Escalation Engineer with the Platforms Global Escalation Services (Windows Core) team.

    East - MSFT EE (Expert):
    I am East, an Escalation Engineer with the Microsoft Platforms Global Escalation Services. (Windows Core)



    Todd Webb - Msft (Expert):
    I am an Escalation Engineer with the Microsoft Platforms Global Escalation Services OEM hardware team...

    David (Expert):
    Hi, I'm an Escalation Engineer with Windows Core - reading code & debugging is my day-to-day.

    stheller (Expert):
    Hi, I'm a new Escalation Engineer with Platforms GES.

    Mr Ninja [MSFT EE] (Expert):
    Hi, I am an Escalation Engineer with Microsoft PGES.  I debug Windows for a living.

    Tate [MSFT EE] (Expert):
    Hi, I’m one of the EE’s on the Windows team.



    Jeff Dailey MSFT EE (Expert):
    Hi, my name is Jeff Dailey, I’m a Senior Escalation Engineer on the Microsoft Platforms Global Escalation Services team.



    Smoke [Windows Core] (Expert):
    Q: How can I track memory allocations through MmAllocateContiguousMemory?
    A: You could try poolhittag on MMCM or a breakpoint on MmAllocateContiguousMemory.  If you go with the break point, you can use a conditional breakpoint and dump the stack and anything else, then 'go' the system.  There will be a perf hit each time you break in.

    Tate [MSFT EE] (Expert):
    Q: For MmAllocatecontiguousMemory, will !poolused show the total amount used?
    A: !poolused 2 will show MmCm

    Matthew [MSFT EE] (Expert):
    Q: What's the best way to go about troubleshooting pool corruption dumps.
    A: Special Pool can be used to track down pool corruption problems.  http://msdn.microsoft.com/en-us/library/cc265889.aspx

    a-hstein (Expert):
    Greetings and sorry for the late message.  I am an intern in the GES group.

    Mr Ninja [MSFT EE] (Expert):
    Q: Could you explain the reasons why a memory dump analysis show an "illegal instruction" exception raised from a valid instruction?
    A: There are many reasons this could happen.  The instruction that was executed may not be what you see due to hardware problems such as a bit flip in the instruction when it was executed.  It is also possible for a hardware problem caused an exception to be raised on a valid instruction.  Sometimes software, or hardware, may trigger a jump to the middle of an instruction so that the instruction being executed is not what you think it is.I described a problem where we executed from the middle of an instruction in the blog http://blogs.msdn.com/ntdebugging/archive/2008/04/28/ntdebugging-puzzler-0x00000004-this-didn-t-puzzle-the-debug-ninja-how-about-you.aspx.

    Smoke [Windows Core] (Expert):
    Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
    A: This sounds like a bad idea.  I would expect different ways that this could break (just like you have observed).

    David (Expert):
    Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
    A: Part of the problem is that if ExitThread is called, any pending APCs on that thread's queue are lost.

    Matthew [MSFT EE] (Expert):
    Q: This question is in reference to special pool mentioned already. Is this article essentially the same as the MSDN reference?  http://support.microsoft.com/kb/188831/en-us
    A: The KB article documents enabling special pool via the registry, rather than verifier.  These are two different ways to accomplish the same thing.  Enabling it via the registry is sometimes preferred, since verifier enables additional checks beyond special pool.

    East - MSFT EE (Expert):
    Q: We use APC's to perform certain operations one of them is to have thread cleanup and exit.Is calling thread exit from an APC recommended ? This used to work fine, but with newer service packs we have threads exiting holding the heaplock!
    A: Would this help KB254956

     - If not we would need to follow-up with you for more information

    East - MSFT EE (Expert):
    Are there anything additional you want on the blog that we have not done?



    Jeff Dailey MSFT EE (Expert):
    Q: The final version of the Windows Internals Exam should be available before December 2008.  I’d like to thank all the community members that participated in the Beta.  Your feedback was very valuable.



    East - MSFT EE (Expert):
    Q: Are there anything additional you want on the blog that we have not done?



    Jeff Dailey MSFT EE (Expert):
    Q: When is the next Windows Internals exam scheduled? I would like to plan ahead.
    A: The final version of the Windows Internals Exam should be available before December 2008.  I’d like to thank all the community members that participated in the Beta.  Your feedback was very valuable.



    Matthew [MSFT EE] (Expert):
    Q: Will we get more puzzler on the blog?
    A: We’d like to do more puzzlers, but unfortunately they tend to take a lot of time, so I cannot say for sure when/if we’ll have more.



    Matthew [MSFT EE] (Expert):
    Q: How many of you in the audience are interested in more puzzlers on the ntdebugging blog?

    Smoke [Windows Core] (Expert):
    Q: Are you planning to write a book?
    A: Windows Internals is a great reference book that we all rely upon.  Additionally, you can check out: <http://www.amazon.com/Advanced-Debugging-Addison-Wesley-Microsoft-Technology/dp/0321374460>



    Tate [MSFT EE] (Expert):
    Q: As far as the blog is concerned I'm more a fan of the case studies type posts where you go through how you troubleshooted issues that you have enountered.
    A: So are we!!!

    Smoke [Windows Core] (Expert):
    Q: I'm very interested in puzzlers...
    A: Thanks for the feedback.  We will try to create some more in the future.

    Smoke [Windows Core] (Expert):
    Q: Debugging MPI apps - sometimes a crash happens on remote and the local smpd daemon will terminate the process being debugged. Using the debugger, is there a way to guard from TerminateProcess from the child? I guess that would break some security models.
    A: I'm not sure what MPI is, but this scenario sounds just like a service.  The service control manager will kill the service if it doesn't respond in a timely fashion.  With a service, there is a registry key to extend the timeout.  If such a mechanism isn't available for you, you should consider instrumentation/logging.

    East - MSFT EE (Expert):
    Q: I just skimmed over KB254956, we found APC to work. The issue here is that there are alertable waits in library modules like LSA/NDR/I_RPC calls where our APC fires which raises an user exception, gets handled and exits-thread exits holding the heap lock.
    A: We would need to discuss this further offline, how can I contact you?

    Matthew [MSFT EE] (Expert):
    Q: An award of puzzler like next edition of Windows Internals would definitely have my full attention. :)
    A: We'll consider it... thanks for the feedback!

    Jeff Dailey MSFT EE (Expert):
    Q: Have you ever found yourselves with an "unsolvable" case? :P
    A: No cases is unsolvable, nothing is truly random.  Some cases may take a very long time to resolve through multiple debugging passes, detailed code review, reverse engineering and multiple iterations of instrumentation.  In the end we find the problem.  



    Daniel (Moderator):

    Just a heads-up --we have about 15 minutes left in today's chat. Be sure to post your questions asap and our Experts will try to answer as many as possible before the chat ends. Thanks.

    Mr Ninja [MSFT EE] (Expert):
    Q: Tri-boot machine - XP, Server 2003 and Server 2000 with 2000 being the last one installed. After awhile, I got an error: "Windows 2000 could not start because the following file is missing or corrupt: \WINDOWS\SYSTEM32\CONFIG\SYSTEMd startup options for"..
    A: That is usually a known issue in Windows 2000 caused by the size of the system hive becoming too large.  We have several KB articles that describe this issue  KB269075, KB306038, KB323148, and KB277222 contain various resolutions you can try.  I have found that most often the steps in KB277222, using scrubber in a shutdown script, resolve this problem.  Starting with Windows 2003 we changed the boot architecture to prevent this problem, KB302594 describes this improvement.

    Tate [MSFT EE] (Expert):
    Q: Do you guys use USB debugging in Vista/2008? Why is that there is still one vendor that sells the debug dongle?
    A: Serial debugging works well enough most times.  Usually only if we have hardward that doesn't have a serial connection for some reason and only has USB or Firewire we try these alternates...

    East - MSFT EE (Expert):
    Q: I just skimmed over KB254956, we found APC to work. The issue here is that there are alertable waits in library modules like LSA/NDR/I_RPC calls where our APC fires which raises an user exception, gets handled and exits-thread exits holding the heap lock.
    A: On a better note it would be best to open a case with Microsoft Support - > <http://support.microsoft.com/> -> Need more help? -> Select a Product to start



    Jeff Dailey MSFT EE (Expert):
    Q: What companies are in attendance today?



    Graham (Expert):
    Q: There are lots of post mortum debuggers available, Dr Watson, NTSD, windbg, userdump, WER. Which ones do you usually recrommend your customers to use if you need to be sure to capture a dump from a crash?
    A: Userdump.exe is quite reliable for obtaining post-mortem dumps, and is easy to use.  It (along with ADPlus, which uses CDB) are good because they attach to the process and monitor exceptions, and can create dumps for times when a JIT debugger would not be able to create a thread in the process to obtain the dump.  Normally, I will set up drwtsn32 first, and if it cannot generate the dump, then I will go to userdump. 

    Smoke [Windows Core] (Expert):
    Q: How can I debug cases in which just I have the Minidump for CPU Hog? I tried !runaway and does not works
    A: The minidump alone may not be enough information.  You could try to look at the stacks and guess at what is using the CPU, but that require familiarity with the application.  You should capture a circular perfmon log with thread data.  Then get 3-5 dumps of the app.  From the perfmon log, you'll see what threads are active (and their activity profile).  From the dumps, you'll have a few snapshot of the process in motion.  Alternatively you could try a profiler like xperf.

    David (Expert):
    Q: Are there any free code coverage tools on Windows?
    A: This article describes how to obtain code coverage data:

    David (Expert):
    A: http://msdn.microsoft.com/en-us/library/ms182496.aspx

    stheller (Expert):
    http://www.microsoft.com/whdc/devtools/tools/prefast.mspx discusses the PREfast static source code analysis tool

    East - MSFT EE (Expert):
    Q: Are there any free code coverage tools on Windows?
    A: Please keep watching our blog site for the next chat - <http://blogs.msdn.com/ntdebugging>  or you can submit the question to the our blog site

    Daniel (Moderator):
    Well we're out of time for today's chat. Thank you very much to all of our guests who joined us today as well as to our Experts for answering so  many great questions. Have a great day!

     

  • Ntdebugging Blog

    New Facebook group: “Escalation Engineers”

    • 1 Comments

    Are you the final tier of escalation at a company or group that supports software?
    Are you fluent in assembly, C, C++, etc?

    Are you the voice of reason in critical situations?

    Do you spend more time debugging other people’s code than writing your own?

     

     

    If you answered the above questions with “yes”, then this new Facebook group is for you. "Escalation Engineers" http://www.facebook.com/group.php?gid=23477747996

    Jeff-

  • Ntdebugging Blog

    How to Access the User Mode Debugger from the Kernel Debugger

    • 4 Comments

    In certain cases you may want to use a user mode debugger to debug a process from within the kernel debugger.    It could be that you have an application that loads a kernel mode driver, and you want to be able to debug the user mode aspect of the application and then break into the kernel to follow the calls made to kernel.

    Here is how you do it!

    ·         Attach the kernel debugger via a serial cable (Null modem cable), USB cable or FireWire cable, and have your machine configured to be kernel debugged. The article located at  http://support.microsoft.com/kb/151981  is a good reference for pre-Vista systems.  To enable the debug options on Vista or Windows 2008 you must use bcdedit.exe because those OSes no longer use a boot.ini file. Here’s an example:

     

    bcdedit /debug {<guid>} <ON | OFF>
    bcdedit /dbgsettings SERIAL DEBUGPORT:1 BAUDRATE:115200

     

    ·         Add a new debugger key to the “Image File Execution Options” for your process.  In this case we will use notepad.exe as the target process. The new key will look like this:

     

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\notepad.exe

     

    ·         Add a string value under this key called “debugger” that contains the value “ntsd –d”. Here’s a screen shot of the registry changes for reference.

     

    ·         The –d option redirects the output of NTSD to the kernel debugger allowing remote control via the kernel debugger.

     

    ·         With the existence of this new key, the user mode debugger will automatically start and attach to your process when Notepad.exe starts.  Note: It’s important to remove the registry entry when you’re finished debugging.

     

    ·         You can now issue any standard NTSD Command via the kernel debugger.

     

    ·         When you are ready to break into the kernel and run under the kernel debugger simply type .breakin

     

     

    Jeff- 

  • Ntdebugging Blog

    Windows NT Debugging Blog Live Chat

    • 1 Comments

    Microsoft Platform Global Escalation Services is hosting our first live group debug chat session for the debugging community on August 13, 2008 at 10 AM PT.  We will be focusing on debugging techniques and any questions you may have about anything we’ve previously blogged about.  Also, we will try to cover some topics that were requested here.

     

    Details about the “PGES-Windows NT Debugging Blog Live Chat” can be found here: http://www.microsoft.com/communities/chats/default.mspx

  • Ntdebugging Blog

    How can I find out why the Cluster Resource Monitor dumped – Access Violation

    • 4 Comments

    Hello, my name is John Marlin, and I am a Support Escalation Engineer on the Microsoft Platform Cluster Services Support team.  I wanted to talk about the Windows 2003 Cluster Resource Monitor along with what happens when it crashes as well as how to debug it to find out why it crashed.

     

    We need to first understand what the Cluster Resource Monitor is and does.  Below is taken from the Microsoft MSDN site describing the Cluster Resource Monitor.

     

    A Resource Monitor provides a communication, monitoring, and processing layer between the Cluster service and one or more resources. Resource Monitors have the following characteristics:

    ·         A Resource Monitor always runs in a process separate from the Cluster service. If a resource fails, the Resource Monitor isolates the Cluster service from the effects. If the Cluster service fails, the Resource Monitor allows its resources to shut down gracefully.

    ·         To work with a resource, a Resource Monitor loads the resource DLL responsible for that resource type into its process.

    ·         When the Cluster service requests an operation on a resource, the Resource Monitor routes the request to the appropriate entry point function of the resource DLL responsible for the resource. The Resource Monitor performs default processing for some resource operations.

    ·         A Resource Monitor stores synchronized state data, allowing the Cluster service and resource DLLs to operate asynchronously, checking and updating resource status as needed.

    ·         A Resource Monitor periodically checks the operational status of all of its resources. For more information on this process, see Resource Failure.

     

    By default, the Cluster service creates one Resource Monitor per node.

     

    As the article states, everything currently running on the node is in the one Resource Monitor.  If the Resource Monitor crashes, the system will dump the Resource Monitor Process to a file called RESRCMON.DMP, and create a new instance of the process.  Because it must create a new one, all resources in the monitor are gone and needs to be restarted.  When this occurs, you would see the following entry in the Windows System Event Log. 

     

    Event ID:  1146

    Source:  ClusSvc

    Description:  The cluster resource monitor died unexpectedly, an attempt will be made to restart it

     

    After this, you could also see other resource failures (Event ID: 1069) as well as disk related events such as Lost Delayed Writes, etc.  You would see the disk related events because the disk(s) would be considered down and since there is data in the cache of the HBA, it has nowhere to write it.  Hence, lost delayed writes exist until the disk is brought back online.  For our examples here, we will ignore these disk related events as we will focus on why the Resource Monitor crashed.

     

    There are a couple reasons why a Resource Monitor would crash such as an Access Violation (0xC0000005) or a Deadlock (0xC0000194).  For this blog, we will talk about the Access Violation (0xC0000005).  An Access Violation will occur because a resource tried to do something it wasn’t supposed to or it is having an issue starting up.

     

    Along with the above System Event (Event ID: 1146) where the Resource Monitor died, you will see this in the Cluster Log file. 

     

    NOTE:

    The Cluster Log will convert times to Greenwich Mean Time (GMT), so you must ensure you do the proper GMT conversion of time to get to the location in the Cluster Log.

     

    00001d6c.00001b60::2008/03/04-05:28:46.114 ERR  [RM] Exception. Code = 0xc0000005, Address = 0x781449D1

    00001d6c.00001b60::2008/03/04-05:28:46.114 ERR  [RM] Exception parameters: 0, 0, 1003f, 0

    00001d6c.00001b60::2008/03/04-05:28:46.114 INFO [RM] GenerateMemoryDump: Start memory dump to file C:\WINDOWS\Cluster\resrcmon.dmp

     

    Now that we see this entry in the log, we should take a look at the Resource Monitor dump to see what caused the failure.  The first thing to examine is the register states, specifically the ESP (stack pointer) value.

     

    0:023> r

    eax=01bf0000 ebx=000f7b88 ecx=00000007 edx=7c8285ec esi=000f7b60 edi=000f7bb8

    eip=7c8285ec esp=01aed598 ebp=01aed5a8 iopl=0         nv up ei pl zr na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246

    ntdll!KiFastSystemCallRet:

    7c8285ec c3              ret

     

    Starting at the stack pointer address 01aed598, we use the dds command to dump the raw stack.  We are looking for the value on the stack just below the routine resrcmon!GenerateMemoryDump.  It will take several iterations of the dds command to finally get to the value because the call was made much earlier in the stack.

     

    0:023> dds 01aed598

    01aed598  00740061 xpsp2res.dll

     xpsp2res+0x100061

    01aed59c  00720075 xpsp2res+0xe0075

    01aed5a0  00730065 xpsp2res+0xf0065

    01aed5a4  00610000

    *** pages removed ***

    01aedddc  0026afd8

    01aedde0  0026af28

    01aedde4  01aee034                                        <<-- pointer to Exception address stack

    01aedde8  0100e638 resrcmon!GenerateMemoryDump+0x180

    01aeddec  ffffffff

    01aeddf0  00001d6c

    01aeddf4  00000018

     

    Now that we have our value, we will use the kv command with the value 01aee034 to dump out the stack contents.

     

    0:023> kv=01aee034

    ChildEBP RetAddr  Args to Child             

    01aed628 7c826d2b 77e63eb3 000004e0 00080178 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    01aed62c 77e63eb3 000004e0 00080178 6d5b5af6 ntdll!ZwClose+0xc (FPO: [1,0,0])

    01aee034 0100e989 01aee300 01003024 00000000 kernel32!CloseHandle+0x59 (FPO: [Non-Fpo])

    01aee04c 01008b2c 01aee300 01003024 01aee300 resrcmon!GenerateExceptionReport+0x7e (FPO: [Non-Fpo])

    01aee060 76348d17 01aee300 01aee300 01aee080 resrcmon!RmpExceptionFilter+0x14 (FPO: [Non-Fpo])

    01aee070 7786d6d2 01aee300 77ecb7c0 01aee2d8 netshell!__CxxUnhandledExceptionFilter+0x4a (FPO: [Non-Fpo])

    01aee080 77e761b7 01aee300 00000000 00000000 netman!__CxxUnhandledExceptionFilter+0x4a (FPO: [Non-Fpo])

    01aee2d8 77e792a3 01aee300 77e61ac1 01aee308 kernel32!UnhandledExceptionFilter+0x12a (FPO: [Non-Fpo])

    01aee2e0 77e61ac1 01aee308 00000000 01aee308 kernel32!BaseThreadStart+0x4a (FPO: [SEH])

    01aee308 7c828752 01aee3ec 01aeffdc 01aee408 kernel32!_except_handler3+0x61 (FPO: [Uses EBP] [3,0,7])

    01aee32c 7c828723 01aee3ec 01aeffdc 01aee408 ntdll!ExecuteHandler2+0x26

    01aee3d4 7c82855e 01ace000 01aee408 01aee3ec ntdll!ExecuteHandler+0x24

    01aee3d4 781449d1 01ace000 01aee408 01aee3ec ntdll!KiUserExceptionDispatcher+0xe (FPO: [2,0,0]) (CONTEXT @ 01aee408)

    01aee6d0 10006d11 00000000 00f16914 01aeff58 msvcr80!wcslen+0x4 (FPO: [Non-Fpo])

    WARNING: Stack unwind information not available. Following frames may be wrong.

    01aee6f8 10001364 100096e4 00000000 744eecf8 JohnApp!Startup+0x5851

    01aeff78 781329bb 0009a6a8 75b03b60 00000000 JohnApp+0x1364

    01aeffb0 78132a47 00000000 77e64829 00f15d30 msvcr80!_endthreadex+0x3b (FPO: [Non-Fpo])

    01aeffb8 77e64829 00f15d30 00000000 00000000 msvcr80!_endthreadex+0xc7 (FPO: [Non-Fpo])

    01aeffec 00000000 781329e1 00f15d30 00000000 kernel32!BaseThreadStart+0x34 (FPO: [Non-Fpo])

     

    Based on the stack above, we have an exception at address 0x01aee300 which we will use to set the failing context.

     

    0:023> dc 0x01aee300

    01aee300  01aee3ec 01aee408 01aee32c 7c828752  ........,...R..|     <<-- Exception and Context Records

    01aee310  01aee3ec 01aeffdc 01aee408 01aee3c8  ................

    01aee320  01aeff6c 7c828766 01aeffdc 01aee3d4  l...f..|........

    01aee330  7c828723 01aee3ec 01aeffdc 01aee408  #..|............

    01aee340  01aee3c8 77e61a60 00000000 01aee3ec  ....`..w........

    01aee350  01aeffdc 7c8315c2 01aee3ec 01aeffdc  .......|........

    01aee360  01aee408 01aee3c8 77e61a60 00000000  ........`..w....

    01aee370  01aee3ec 00000000 00000000 00000000  ................

     

    The first DWORD is the Exception Record (0x01aee3ec) and the second DWORD is the Context Record (0x01aee408) that holds our true stack where the problem occurred.  From the Exception Record, we can see it is an Access Violation.

     

    0:023> .exr 0x01aee3ec

    ExceptionAddress: 781449d1 (msvcr80!wcslen+0x00000004)

       ExceptionCode: c0000005 (Access violation)

      ExceptionFlags: 00000000

    NumberParameters: 2

       Parameter[0]: 00000000

       Parameter[1]: 00000000

     

    So we need to jump into the saved context to get the thread that caused the Resource Monitor to crash.

     

    0:023> .cxr 0x01aee408

    eax=00000000 ebx=00f15d30 ecx=00000000 edx=00000000 esi=00000000 edi=00000000

    eip=781449d1 esp=01aee6d4 ebp=01aee6f8 iopl=0         nv up ei pl nz na po nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202

    msvcr80!wcslen+0x4:

    781449d1 668b08          mov     cx,word ptr [eax]        ds:0023:00000000=????

     

    0:023> kv

      *** Stack trace for last set context - .thread/.cxr resets it

    ChildEBP RetAddr  Args to Child             

    01aee6d0 10006d11 00000000 00f16914 01aeff58 msvcr80!wcslen+0x4 (FPO: [Non-Fpo])

    WARNING: Stack unwind information not available. Following frames may be wrong.

    01aee6f8 10001364 100096e4 00000000 744eecf8 JohnApp!Startup+0x5851

    01aeff78 781329bb 0009a6a8 75b03b60 00000000 JohnApp+0x1364

    01aeffb0 78132a47 00000000 77e64829 00f15d30 msvcr80!_endthreadex+0x3b (FPO: [Non-Fpo])

    01aeffb8 77e64829 00f15d30 00000000 00000000 msvcr80!_endthreadex+0xc7 (FPO: [Non-Fpo])

    01aeffec 00000000 781329e1 00f15d30 00000000 kernel32!BaseThreadStart+0x34 (FPO: [Non-Fpo])

     

    This stack reveals that JohnApp’s .DLL was in the process of a calling one of its threads to do something and caused this problem.  Now we can find out what the specific resource (in case there are multiple) that caused the problem.  In the case of an Access Violation dump, it is going to be a resource that failed or is in the process of coming online.  You can do a search through the threads using ~*kb to find a current resource trying to startup during a resrcmon termination.

     

    0:023> ~*kb

     

      10  Id: 1d6c.1394 Suspend: 0 Teb: 7fff4000 Unfrozen

    ChildEBP RetAddr  Args to Child             

    0179f78c 7c827d0b 77e61d1e 00000464 00000000 ntdll!KiFastSystemCallRet

    0179f790 77e61d1e 00000464 00000000 00000000 ntdll!NtWaitForSingleObject+0xc

    0179f800 77e61c8d 00000464 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac

    0179f814 10002728 00000464 ffffffff 77e61d48 kernel32!WaitForSingleObject+0x12

    WARNING: Stack unwind information not available. Following frames may be wrong.

    0179f838 0100a352 0009a6a8 0179f888 0100864d JohnApp!Startup+0x1268

    0179f844 0100864d 000a88a0 0179f8a0 0179fa94 resrcmon!Resmon_Terminate+0x14          <<-- Resource we want

    0179f888 01009c3b 000a88a0 00000000 0179f8a4 resrcmon!RmpOfflineResource+0x2f1       <<-- Resource we want

    0179f89c 77c80193 00000003 02460246 00000001 resrcmon!s_RmTerminateResource+0x13

    0179f8b4 77ce33e1 01009c28 0179fa98 00000001 rpcrt4!Invoke+0x30

    0179fcb4 77ce35c4 00000000 00000000 000ef734 rpcrt4!NdrStubCall2+0x299

    0179fcd0 77c7ff7a 000ef734 0008f980 000ef734 rpcrt4!NdrServerCall2+0x19

    0179fd04 77c8042d 0100c24c 000ef734 0179fdec rpcrt4!DispatchToStubInCNoAvrf+0x38

    0179fd58 77c80353 00000005 00000000 01011458 rpcrt4!RPC_INTERFACE::DispatchToStubWorker+0x11f

    0179fd7c 77c7e0d4 000ef734 00000000 01011458 rpcrt4!RPC_INTERFACE::DispatchToStub+0xa3

    0179fdbc 77c7e080 000ef734 000ef6ec 00000000 rpcrt4!RPC_INTERFACE::DispatchToStubWithObject+0xc0

    0179fdfc 77c812f0 0008fb38 00083498 000efcf8 rpcrt4!LRPC_SCALL::DealWithRequestMessage+0x41e

    0179fe20 77c88678 000834d0 0179fe38 0008fb38 rpcrt4!LRPC_ADDRESS::DealWithLRPCRequest+0x127

    0179ff84 77c88792 0179ffac 77c8872d 00083498 rpcrt4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430

    0179ff8c 77c8872d 00083498 00000000 00000000 rpcrt4!RecvLotsaCallsWrapper+0xd

    0179ffac 77c7b110 00086890 0179ffec 77e64829 rpcrt4!BaseCachedThreadRoutine+0x9d

     

    With the above information, and focusing on the resrcmon!RmpOfflineResource frame, the parameter 0x000a88a0 is our resource.  Using the DC command, you can confirm that this is the resource.

     

    0:023> dc 000a88a0

    000a88a0  63727352 00000001 000a7d90 000a9290  Rsrc.....}......

    000a88b0  000b37f0 000ad130 000a8318 000ad0f0  .7..0........... 

    000a88c0  00001388 0000ea60 10000000 0009a6a8  ....`...........

    000a88d0  00000000 00000000 00000000 00000000  ................

    000a88e0  00000000 00000001 10001540 10001910  ........@.......

    000a88f0  10001a00 10002280 100026b0 10002810  ....."...&...(..

    000a8900  100027c0 00000000 00000000 100028a0  .'...........(..

    000a8910  100029e0 00000003 00000000 0000000c  .)..............

     

    The DWORDS at offset 10, 14, 18, and 1C will reveal the information to confirm the resource such as its GUID in the registry, the .DLL being used, etc.

     

    0:005> du 0x000ad0f0                            <<-- Resource Displayed in Cluster Administrator

    000ad0f0  "Johns Resource"

     

    0:005> du 0x000a8318                            <<-- GUID in registry (HKLM\Cluster\Resources)

    000a8318  "35a73cba-6096-485e-a227-d4a8d06f"

    000a8358  "680a"

     

    0:005> du 0x000ad130                            <<-- Resource Type (HKLM\Cluster\ResourceTypes)

    000ad130  "Johns Customer Resource Type"

     

    0:005> du 0x000b37f0                            <<-- Specific DLL being Used

    000b37f0  "johnapp.dll"

     

    Now that you know the resource generating the access violation, you should consult with the vendor of this resource to find out what had happened.  It’s possible they can look at the threads with their symbols or get other information from the event logs, any dump it may create, etc.  It could be the resource has a known problem where an update is needed.

     

    The above steps will take you to where you want and I wanted to show those steps first.  In most cases, but not all, when the dump is first opened it could show the exception and the stack that we found above by simply entering the .ecxr command...

     

    0:023> .ecxr

    eax=00000000 ebx=00f15d30 ecx=00000000 edx=00000000 esi=00000000 edi=00000000

    eip=781449d1 esp=01aee6d4 ebp=01aee6f8 iopl=0         nv up ei pl nz na po nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010202

    msvcr80!wcslen+0x4:

    781449d1 668b08          mov     cx,word ptr [eax]        ds:0023:00000000=????

     

    0:023> kb

      *** Stack trace for last set context - .thread/.cxr resets it

    ChildEBP RetAddr  Args to Child             

    01aee6d0 10006d11 00000000 00f16914 01aeff58 msvcr80!wcslen+0x4

    WARNING: Stack unwind information not available. Following frames may be wrong.

    01aee6f8 10001364 100096e4 00000000 744eecf8 JohnApp!Startup+0x5851

    01aeff78 781329bb 0009a6a8 75b03b60 00000000 JohnApp+0x1364

    01aeffb0 78132a47 00000000 77e64829 00f15d30 msvcr80!_callthreadstartex+0x1b

    01aeffb8 77e64829 00f15d30 00000000 00000000 msvcr80!_threadstartex+0x66

    01aeffec 00000000 781329e1 00f15d30 00000000 kernel32!BaseThreadStart+0x34

     

    You could also get to the same information using the above original steps (dds 01aed598) but stopping at the resrcmon!RmpExceptionFilter (Resource Monitor handles the exception) which has the exception as the first parameter.

     

    0:023> dds 01aed598

    01aed598  00740061 xpsp2res.dll

     xpsp2res+0x100061

    01aed59c  00720075 xpsp2res+0xe0075

    01aed5a0  00730065 xpsp2res+0xf0065

    01aed5a4  00610000

    *** pages removed ***

    01aedddc  0026afd8

    01aedde0  0026af28

    01aedde4  01aee034                                        <<-- pointer to Exception address stack

    01aedde8  0100e638 resrcmon!GenerateMemoryDump+0x180

    01aeddec  ffffffff

    01aeddf0  00001d6c

    01aeddf4  00000018

    *** pages removed ***

    01aee000  01005528 resrcmon!`string'+0xc

    01aee004  ffffffff

    01aee008  0100d27b resrcmon!ClRtlLogPrint+0x499

    01aee00c  0100e96c resrcmon!GenerateExceptionReport+0x61

    01aee010  00000001

    01aee014  01005bf4 resrcmon!`string'

    *** pages removed ***

    01aee048  7786d687 netman!__CxxUnhandledExceptionFilter

    01aee04c  01aee060                                        <<-- Frame 3 in kv=01aee034 above

    01aee050  01008b2c resrcmon!RmpExceptionFilter+0x14       <<-- Frame 4 in kv=01aee034 above

    01aee054  01aee300

    01aee058  01003024 resrcmon!`string'

    01aee05c  01aee300

     

  • Ntdebugging Blog

    What Are the Odds?

    • 2 Comments

     

    Hi NTDebuggers, something rarely talked about are the odds of a problem being in one piece of code vs. another.   From time to time we see some very strange debugs or symptoms reported by customers.  The problems can be associated with anything from an internally written application, a Microsoft product running on Windows, or an application written by a 3rd party vendor.  In fact we are often engaged to assist one of our customers or vendors with troubleshooting or debugging their applications. 

     

    One of the first things we do is assess the situation.  We ask questions like:

    ·         Where is the program crashing?

    ·         What binaries comprise the program?

    ·         How often are those various binaries used worldwide?

     

    Let’s use the following pseudo call stack and binaries as an example.

     

    NTDLL!VeryCommonFunction  << Crash happens in this function.

    ADVAPI32!RatherCommonFunction

    MYCustomApp!RarelyUsedCode

    MyCustomApp!Main

     

    If I see a crash in NTDLL!VeryCommonFunction I’m going to make some assumptions as I assess the domain of the problem.  This holds true for any operating system, product, or software in general.  The code that runs more than any other code is, by its nature, effectively tested more because it runs more.  Therefore it is less likely to be the root cause of the fault, and in some cases it is simply the victim.   This applies to all operating systems: UNIX, Mac OS, Windows... core code tends to be less buggy.

     

    Let’s look at a real world example of some very common code in Windows.  NTDLL!RtlAllocateHeap and NTDLL!RtlFreeHeap.  For those of you not familiar with NTDLL, it’s loaded in just about every process on every machine running a modern copy of Windows, worldwide.  The average machine has ~40-200+ process (applications, and miscellaneous services running), and there are hundreds of millions of PCs worldwide running Windows, so that gives us ~billions of processes running NTDLL,  give or take a few billion.  Collectively, those processes are going to call RtlFreeHeap or RtlAllocateHeap millions of times in the next second.

     

    So what are the odds?  Is it likely that this core API used by billions of processes is crashing because of a bug in the core API?  Or is it more likely that a smaller vertical market or custom application running on ~500 machines worldwide did something to destabilize one of the process heaps?   

     

    Typically when an application is crashing in a heap function inside of NTDLL, support engineers become suspicious of activity in the process space, and in this case it’s more likely to be a problem with heap corruption.  It is likely that code running in the host process that has NTDLL loaded has corrupted one of the heaps by overwriting a buffer, doing a double free, or some other problem.  Then when a call is made into the Microsoft heap API, NTDLL has to traverse the heap structures that are corrupted by the host application, so the process crashes.  And yes, the crash is in NTDLL.   In this case, I typically ask the customer to enable full page heap via gflags (this puts an additional page marked with the PAGE_NOACCESS attribute at the end of each allocation).  We then wait for the next crash and analyze it.  Enabling full page heap helps you catch the corruptor with “their hand in the cookie jar”.

     

    The same scenario holds true for other core functionality such as kernel pool allocations, invalid handles, leaks etc.  Again, core code tends to be rock solid because of sheer volume of use and exposure to a variety of environments. This being the case, it also tends to change less over time.  Of course there is code in the OS or other components that is not used as much, which is more likely to have problems.   We always take that into consideration when scoping an issue. 

     

    The good news is we are always happy to dig in and help our customers isolate these types of problems.

     

    Please feel free to chime in and share your stories.


    Good Luck and happy debugging.

     

    Jeff-

  • Ntdebugging Blog

    Data Execution Protection in Action

    • 5 Comments

    Hello, my name is Graham, and I’m an escalation engineer on the Platforms Global Escalation Team.  I recently worked a case where a group of Windows XP machines were hitting a bugcheck on boot, error 0xC000021A.   This error occurs when a critical usermode process such as winlogon or csrss crashes.  I had access to a failing machine, so I attached the kernel debugger to find out why winlogon was crashing.  I found the cause, and a little bit more about Data Execution Prevention (DEP) in the process.

     

    The initial debugger spew gave me this information:

     

    *** An Access Violation occurred in winlogon.exe:

     

    The instruction at 10030F90 tried to write to an invalid address, 10030F90

     

     *** enter .exr 0006F4AC for the exception record

     *** enter .cxr 0006F4C8 for the context

     *** then kb to get the faulting stack

     

     

    So I followed its cue and got the exception record and context record:

     

    1: kd> .exr 0006F4AC

    ExceptionAddress: 10030f90

       ExceptionCode: c0000005 (Access violation)

      ExceptionFlags: 00000000

    NumberParameters: 2

       Parameter[0]: 00000008

       Parameter[1]: 10030f90

    Attempt to execute non-executable address 10030f90

     

    Ahh, OK, so we know this is a DEP crash now.

     

    1: kd> .cxr 0006F4C8

    eax=00000400 ebx=00000000 ecx=00000000 edx=00010000 esi=00000000 edi=00084370

    eip=10030f90 esp=0006f794 ebp=0006f81c iopl=0         nv up ei pl nz na pe nc

    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206

    001b:10030f90 33c0            xor     eax,eax

     

     

    Let's check out the crashing stack to see what's going on:

     

    1: kd> kb

      *** Stack trace for last set context - .thread/.cxr resets it

    ChildEBP RetAddr  Args to Child             

    0006f81c 010297c1 00084370 01010ab4 00000000 3rdparty!nosymbols

    0006fcfc 010312a6 00072364 7c80b6a1 00000000 winlogon!ExecSystemProcesses+0x14d

    0006ff50 0103d4d0 01000000 00000000 00072364 winlogon!WinMain+0x2b6

    0006fff4 00000000 7ffd7000 000000c8 000001ec winlogon!WinMainCRTStartup+0x174

     

     

    The first thing I decided to look for was how we got to this address.  To begin, I unassembled the code right before the return address to winlogon!ExecSystemProcesses.

     

    kd> ub 010297c1

    winlogon!ExecSystemProcesses+0x12e

    010297a2 6a02            push    2

    010297a4 ffb594fbffff    push    dword ptr [ebp-46Ch]

    010297aa 6880000000      push    80h

    010297af 56              push    esi

    010297b0 56              push    esi

    010297b1 68b40a0101      push    offset winlogon!`string' (01010ab4)

    010297b6 ffb5a0fbffff    push    dword ptr [ebp-460h]

    010297bc e891fcffff      call    winlogon!StartSystemProcess (01029452)

     

     

    According to the stack, winlogon!ExecSystemProcesses didn't call the function currently running.  So, I suspected some hooking was going on.  Using !chkimg, I verified this was the case.  Note that chkimg requires a valid copy of the binary in the symbol path.

     

    1: kd> !chkimg -db kernel32

    10 errors : kernel32 (7c802332-7c80236b)

    7c802330  90  90 *e9 *59 *ec *82 *93  6a  00  ff  75  2c  ff  75  28  ff ...Y...j..u,.u(.

    ...

    7c802360  28  00  90  90  90  90  90 *e9 *d4 *eb *82 *93  6a  00  ff  75 (...........j..u

    1: kd> u 7c802330 

    kernel32!WriteProcessMemory+0x10d:

    7c802330 90              nop

    7c802331 90              nop

    kernel32!CreateProcessW

    7c802332 e959ec8293      jmp     3rdparty!nosymbols (10030f90)

     

     

    Aha! Something has hooked CreateProcessW to jump to our current instruction.  Now that we know how we got there, let's understand why we crashed.  Since DEP fired, that means this address is non-executable.  I verified this by dumping out the PTE for the address.

     

    1: kd> !pte 10030F90

                   VA 10030f90

    PDE at 00000000C0600400    PTE at 00000000C0080180

    contains 000000004E102867  contains 800000004E021867

    pfn 4e102 ---DA--UWEV    pfn 4e021 ---DA--UW-V

     

    Notice that in the protection flags for the PTE, the 'E' bit isn't set, saying this page isn't executable.   So, where is this address we were trying to execute?  Many times with DEP crashes this will be in stack or heap memory.  But not this time.  In this case, the address is actually in a module's memory mapped address space, as shown by the 'lm' command

     

    1: kd> lm m 3rdparty

    10000000 1003c000   3rdparty C (export symbols)       3rdparty.dll

     

    Hmm...  So the address falls in this module. Why isn't it executable?   Usually when I think of image files, I think of running code.  But, remembering back to how the PE images are laid out, a module is broken into subsections, with different types of data in each one, and different protection levels.  There's a place in the image for code, and for data, such as global variables and static data.  So, let's dump the image header and find which section offset 0x30F90 is in.

     

    1: kd>!dh 3rdparty

     

    <snip>

    SECTION HEADER #3

       .data name

       1EE3C virtual size

       1A000 virtual address   //  (1A000+1EE3C=0x38e3c so mem range for section is 1A000 to 0x38e3c)

        3000 size of raw data

       1A000 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    C0000040 flags

             Initialized Data

             (no align specified)

             Read Write  // no Execute !

     

     

    This is our section, since the virtual address starts at 0x1A000 and is 0x1EE3C in size, putting the end of the section at 0x38e3c.  Our address of 0x30F90 falls between them.

    Sure enough, this section is labeled as "Initialized Data", and the protection flags show Read and Write, but no Execute!  So, this address is not in a code section of the module, and DEP will not allow it to run. 

     

    Knowing this, I was able to find an update on the 3rd party manufacturer's site that modified their DLL to prevent this from occurring.  Mystery solved!

Page 17 of 24 (235 items) «1516171819»