• Ntdebugging Blog

    Windows Performance Toolkit - Xperf

    • 16 Comments

    I recently came across a very interesting profiling tool that is available in Vista SP1 and Server 08 called the Windows Performance Analyzer. You can use this tool to profile and diagnose different kinds of symptoms that the machine is experiencing. This tool is built on top off the Event Tracing for Windows (ETW) infrastructure. It uses the ETW providers to record kernel events and then display them in a graphical format.

    Performance Analyzer provides many different graphical views of trace data including:

    1. CPU Sampling
    2. CPU and Disk utilization by process and thread
    3. Interrupt service routine and deferred procedure call
    4. Hard faults
    5. Disk I/O Detail

    What you will need

    Download the latest version of the Windows Performance Tools Kit, and install it on your machine. (http://www.microsoft.com/whdc/system/sysperf/perftools.mspx : Windows Performance Tools Kit, v.4.1.1 (QFE)) You will need to find the toolkit that corresponds to your processor architecture. Currently there are 3 versions available i.e. X86, IA64, X64.

    After installation you should be able to see 2 new tools. The first one is Xperf, which is a command line tool that is used to capture the trace. The second is called XperfView, which graphically interprets the trace that has been collected by Xperf.

    You will need to run the Xperf and XperfView from an elevated command prompt for all functionality.

    Starting a Trace

    For many tasks all you need for effective analysis is a kernel trace.  For this example, we'll use the –on DiagEasy parameter to enable several kernel events including: image loading; disk I/O; process and thread events; hard faults; deferred procedure calls; interrupts; context switches; and, and performance counters.  From an elevated command prompt launch xperf –on DiagEasy.

    This starts the kernel logger in sequential mode to the default file "\kernel.etl"; uses a default buffer size of 64K, with a minimum of 64 and a maximum of 320 buffers. 

    Stopping a Trace

    To stop a trace, type xperf –d <filename>.etl at the command line. This will stop the trace and output the file.

    Viewing a Trace

    There are 2 ways to view the trace. From an Elevated Command prompt, launch xperf <filename>.etl, or
    launch the XperfView tool and open the file manually. When you open the trace file, you should see something similar like this.

    NOTE - While you need to run xperf from an elevated command prompt in order to record a trace you do not need an elevated command prompt in order to *analyze* a trace.  

    Using the Chart Selector tab, you can select all the graphs that you want to look at. To drill down in each chart, you can select the Summary table. For instance, in the CPU Sampling chart, the summary table gets you the summary of the processes that were running, with information like the amount of CPU time, CPU %, stacks (if the stacks were collected in the trace, see below). When looking at the Summary table for the Disk I/O chart, you can see which processes were writing files (the filename too!) to disk, as well as how much time it took.

    You also have the ability to zoom in on a selected area. Another really cool feature is the ability to overlay multiple graphs on one frame. This way you can correlate different pieces of data together very easily.

    Also, you select which counter instances you want to see in each specific chart. On the top right corner of each chart is a drop down box from where you can select the counter instances. For instance on the Disk I/O chart, you can select Disk 0, Disk 1, or a combination as well.

    You can also view detailed information about the system that the trace was taken on. Click on the Trace menu item, and select System Configuration.

    Xperf Kernel Flags and Groups

    In the first sample Xperf command we ran, xperf –on DiagEasy. I am sure many of you were wondering what DiagEasy means. DiagEasy is a group of kernel events that are predefined by the Windows Performance Toolkit. This group includes Process, Threads, Kernel and User Image Load/Unload, Disk I/O, DPCs and Context Switch events.

    When we used the xperf –on DiagEasy command, we did not specify an individual provider, so we enabled the kernel events for all the ETW providers on the system. If you want to enable events for a specific provider, you can the following format xperf -on: (GUID|KnownProviderName)[:Flags[:Level]]. For more information about ETW providers, Kernel Flags and Groups, you can run the xperf –help providers command.

    Stack Tracing

    One of the most powerful features in Performance Analyzer is the ability to visualize stacks. It's important to note that this requires no special instrumentation in the code – only that you have symbols for the binary components you are interested in analyzing.

    When the trace is setup to collect the stacks, Performance Analyzer will display call stack summary information for the events that had stack walking enabled. Here is an example that takes a trace (with stack tracing enabled) of the entire system while running a "find string" utility.. We can use the Stack Tracing feature of Xperf to record a stack when certain events happen, or take sample at regular intervals over time. See xperf –help stackwalk output for more info.

    Below, we will use the Stack Tracking feature of Xperf to take stack samples at regular intervals. With this output, we will be able to determine where the CPU is spending most of its time within a process.

    xperf -on latency -stackwalk Profile

    Latency is the kernel group to enable certain events, including the profile event which records the CPUs' activity every millisecond. The "-stackwalk Profile" flag tells Xperf to record stack walks on every profile event, which makes the profile information much more useful. In other words, in order to get profile information with stack walks you need to turn on the profile event, and turn on stack walking for that event.

    Note that decoding of stacks requires that symbol decoding be configured. However stacks can be recorded without symbols, and can even be viewed without symbols, although they are much less useful without symbols. I only mention this in the event you're trying to record a trace of a problematic machine with little time to mess around with _NT_SYMBOL_PATH.

    To get a trace with the stack information, do the following:

    1. Run the command, xperf -on latency -stackwalk Profile.
    2. Run the application that you wish to analyze, then
    3. You can end the trace by using the regular xperf –d <filename>.etl command.
    4. Load the trace in the viewer and then select the Load Symbols from the Trace drop down menu.
    5. Once the symbols are loaded, select an interesting region on the CPU sampling graph.
    6. Right click and select Summary Table.

    Click on the selector tab to bring up the column chooser list. Then select "Process name", "Process", "Stack", "Weight" and "%Weight". These are the most useful columns when looking at stacks from the sample profile event. You should get a view similar to this.

     

    At this point I need to mention a few of the restrictions with stack walking coupled with when and how it works.

    ·                Xperf stack walking is not available on XP

    ·                On Vista stack walking is available for x86, and is available for x64 as of Vista SP1.

    ·                On Windows 7 stack walking is available.

    ·                Stack walking on x64 is complicated. You have to set DisablePagingExecutive in the registry, as documented here:

    http://social.msdn.microsoft.com/Forums/en-US/wptk_v4/thread/282e5beb-0afd-411b-9a81-a0bbbf569dfe/

    REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD –f

    Sample Case

    I recently came across a case where the customer was complaining that DPC processing was taking up too much CPU time. We ran Xperf on the machine and drilled down into the DPC activity on the machine.

    From the Xperf graph, I was able to confirm that the customer was actually seeing high DPC usage. I selected the Summary for this chart, and got the list of drivers that were actually taking up CPU time.

    Right off the bat, I could identify the driver that had a lot of DPC activity. I also noted that the average duration for each DPC from that driver was taking 788 microseconds. This is way too high. Each DPC should be taking a maximum of 100 microseconds.

     

    - Omer

      

    Reference

    Performance.Analyzer.QuickStart.xps – This is shipped with the performance toolkit.

    From an elevated command prompt, launch xperf -help

    http://www.microsoft.com/whdc/system/sysperf/perftools.mspx

    http://blogs.gotdotnet.com/pigscanfly/archive/2008/02/16/using-xperf-to-take-a-trace.aspx

    http://www.microsoft.com/whdc/Driver/tips/DPC_ISR.mspx

  • Ntdebugging Blog

    Interpreting Event 153 Errors

    • 16 Comments

    Hello my name is Bob Golding and I would like to share with you a new event that you may see in the system event log.  Event ID 153 is an error associated with the storage subsystem. This event was new in Windows 8 and Windows Server 2012 and was added to Windows 7 and Windows Server 2008 R2 starting with hot fix KB2819485.

     

    An event 153 is similar to an event 129.  An event 129 is logged when the storport driver times out a request to the disk; I described event 129 messages in a previous article.  The difference between a 153 and a 129 is that a 129 is logged when storport times out a request, a 153 is logged when the storport miniport driver times out a request.  The miniport driver may also be referred to as an adapter driver or HBA driver, this driver is typically written the hardware vendor.

     

    Because the miniport driver has a better knowledge of the request execution environment, some miniport drivers time the request themselves instead of letting storport handle request timing.  This is because the miniport driver can abort the individual request and return an error rather than storport resetting the drive after a timeout.  Resetting the drive is disruptive to the I/O subsystem and may not be necessary if only one request has timed out.  The error returned from the miniport driver is bubbled up to the class driver who can log an event 153 and retry the request.

     

    Below is an example event 153:

     

    Event 153 Example

     

    This error means that a request failed and was retried by the class driver.  In the past no message would be logged in this situation because storport did not timeout the request.  The lack of messages resulted in confusion when troubleshooting disk errors because timeouts would occur but there would be no evidence of the error.

     

    The details section of the event the log record will present what error caused the retry and whether the request was a read or write. Below is the details output:

     

    Event 153 Details

     

    In the example above at byte offset 29 is the SCSI status, at offset 30 is the SRB status that caused the retry, and at offset 31 is the SCSI command that is being retried.  In this case the SCSI status was 00 (SCSISTAT_GOOD), the SRB status was 09 (SRB_STATUS_TIMEOUT), and the command was 28 (SCSIOP_READ). 

     

    The most common SCSI commands are:

    SCSIOP_READ - 0x28

    SCSIOP_WRITE - 0x2A

     

    The most common SRB statuses are below:

    SRB_STATUS_TIMEOUT - 0x09

    SRB_STATUS_BUS_RESET - 0x0E

    SRB_STATUS_COMMAND_TIMEOUT - 0x0B

     

    A complete list of SCSI operations and statuses can be found in scsi.h in the WDK.  A list of SRB statuses can be found in srb.h.

     

    The timeout errors (SRB_STATUS_TIMEOUT and SRB_STATUS_COMMAND_TIMEOUT) indicate a request timed out in the adapter. In other words a request was sent to the drive and there was no response within the timeout period.  The bus reset error (SRB_STATUS_BUS_RESET) indicates that the device was reset and that the request is being retried due to the reset since all outstanding requests are aborted when a drive receives a reset.

     

    A system administrator who encounters event 153 errors should investigate the health of the computer’s disk subsystem.  Although an occasional timeout may be part of the normal operation of a system, the frequent need to retry requests indicates a performance issue with the storage that should be corrected.

  • Ntdebugging Blog

    Debugging a bluescreen at home

    • 15 Comments

     

    Hi, my name is Chad. I work as an escalation engineer for Microsoft’s OEM support team.

     

    A while back, I encountered an interesting crash on one of my computers at home, and I thought I’d post about how I debugged it.

     

    This particular machine had been humming along quite happily for some time, but one day while I was scanning some photos, it bluescreened. Naturally, I hoped it was just a fluke, but after it happened a few more times while doing the same thing, I decided to debug it.

     

    Ordinarily, if a machine crashes when performing a specific activity, like scanning photos, my first inclination would be to suspect a bug in one of the drivers involved in that activity, like the scanner driver or the USB driver. But in this case, I had been using this scanner for a long time, with the same drivers, and never had this problem, so this sudden crashing was kind of mysterious.

     

    Let's see what we can tell from the dump!

     

    The first order of business when looking at a crash dump is the !analyze -v command. I've trimmed some of it here for brevity, but it goes something like this:

     

    kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    KERNEL_MODE_EXCEPTION_NOT_HANDLED (8e)

    This is a very common bugcheck.  Usually the exception address pinpoints

    the driver/function that caused the problem.  Always note this address

    as well as the link date of the driver/image that contains this address.

    Some common problems are exception code 0x80000003.  This means a hard

    coded breakpoint or assertion was hit, but this system was booted

    /NODEBUG.  This is not supposed to happen as developers should never have

    hardcoded breakpoints in retail code, but ...

    If this happens, make sure a debugger gets connected, and the

    system is booted /DEBUG.  This will let us see why this breakpoint is

    happening.

    Arguments:

    Arg1: c0000005, The exception code that was not handled

    Arg2: 8738e300, The address that the exception occurred at

    Arg3: b9b3dc7c, Trap Frame

    Arg4: 00000000

     

    Debugging Details:

    ------------------

     

     

    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx" referenced memory at "0x%08lx". The memory could not be "%s".

     

    FAULTING_IP:

    +ffffffff8738e300

    8738e300 0000             add     [eax],al

     

    TRAP_FRAME:  b9b3dc7c -- (.trap ffffffffb9b3dc7c)

    ErrCode = 00000002

    eax=00000001 ebx=bc514c68 ecx=0001065e edx=bc510000 esi=00000955 edi=b9b3dd64

    eip=8738e300 esp=b9b3dcf0 ebp=b9b3dd08 iopl=0         nv up ei pl zr na po nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246

    8738e300 0000             add     [eax],al                ds:0023:00000001=??

    Resetting default scope

     

    DEFAULT_BUCKET_ID:  DRIVER_FAULT

     

    BUGCHECK_STR:  0x8E

     

    LAST_CONTROL_TRANSFER:  from 8051d6a7 to 8053331e

     

    STACK_TEXT: 

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    b9b3dcec bf801619 b9b3dd64 0012eee4 bf81551f 0x8738e300

    b9b3dd08 bf81553c b9b3dd64 0012eee4 bf81551f win32k!ValidateHwnd+0x5c

    b9b3dd50 804de7ec 0001065e 012eb244 00000000 win32k!NtUserInvalidateRect+0x1d

    b9b3dd50 7c90eb94 0001065e 012eb244 00000000 nt!KiFastCallEntry+0xf8

    0012eecc 77d4b601 0105a943 0001065e 012eb244 ntdll!KiFastSystemCallRet

    0012eed0 0105a943 0001065e 012eb244 00000000 USER32!NtUserInvalidateRect+0xc

    77d4b601 90909090 001134b8 0300ba00 12ff7ffe NikonScan4!GetSource+0x21e93

    ...

     

    From the stack trace, we can tell that NikonScan4.dll made a call into User32, which ultimately ended up calling into win32k.sys down in kernel mode. Win32k was in a function called ValidateHwnd() and then tried to call some function at address 0x8738e300, at which point we tried to dereference an invalid pointer and crashed.

     

    What’s at 0x8738e300?

     

    kd> dc 8738e300

    8738e300  00000000 00000001 00000000 87360350  ............P.6.

    8738e310  00000000 00000001 f71af9fe f71b0030  ............0...

    8738e320  f71afb0e f71afbb4 f71b0098 f71b0214  ................

    8738e330  f71afef6 f71aff8e 07fef800 00000000  ................

    8738e340  f71afffc 00000000 0a0e000a 644c6d4d  ............MmLd

    8738e350  8732ea58 870303e0 ffffffff ffffffff  X.2.............

    8738e360  00000012 00000000 f797f000 f7989905  ................

    8738e370  0000c000 00500050 e1971458 00160016  ....P.P.X.......

     

    kd> !pool 8738e300 2

    Pool page 8738e300 region is Nonpaged pool

    *8738e2f8 size:   50 previous size:    8  (Allocated) *NV 

                  Owning component : Unknown (update pooltag.txt)

     

    Well, that’s bad. 0x8738e300 isn’t actually a valid address of a function. That location contains some user data (specifically, some nonpaged pool).

     

    So, that's why we blew up: the ValidateHwnd() function in win32k made a call to this bad address which contains data instead of code! Let's see if we can figure out why it did this. We can find the return address in win32k!ValidateHwnd on the stack, and unassemble the instructions leading up to the point where we ran off into the weeds.

     

    kd> kv L8

    ChildEBP RetAddr  Args to Child             

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    b9b3dcec bf801619 b9b3dd64 0012eee4 bf81551f 0x8738e300

    b9b3dd08 bf81553c b9b3dd64 0012eee4 bf81551f win32k!ValidateHwnd+0x5c (FPO: [Non-Fpo])

    b9b3dd50 804de7ec 0001065e 012eb244 00000000 win32k!NtUserInvalidateRect+0x1d (FPO: [Non-Fpo])

    b9b3dd50 7c90eb94 0001065e 012eb244 00000000 nt!KiFastCallEntry+0xf8 (FPO: [0,0] TrapFrame @ b9b3dd64)

    0012eecc 77d4b601 0105a943 0001065e 012eb244 ntdll!KiFastSystemCallRet (FPO: [0,0,0])

    0012eed0 0105a943 0001065e 012eb244 00000000 USER32!NtUserInvalidateRect+0xc

    77d4b601 90909090 001134b8 0300ba00 12ff7ffe NikonScan4!GetSource+0x21e93

    77d4b605 001134b8 0300ba00 12ff7ffe 900008c2 0x90909090

     

    The return address from where we made the bad call is bf801619. Let’s unassemble a few instructions backwards from that address (using the handy but often-overlooked “ub” command) and see what the code was doing:

     

    kd> ub bf801619

    win32k!ValidateHwnd+0x2f:

    bf8015ff 8d1c82           lea     ebx,[edx+eax*4]

    bf801602 8bc1             mov     eax,ecx

    bf801604 c1e810           shr     eax,0x10

    bf801607 663b430a         cmp     ax,[ebx+0xa]

    bf80160b 75ad             jnz     win32k!ValidateHwnd+0x3d (bf8015ba)

    bf80160d 807b0801         cmp     byte ptr [ebx+0x8],0x1

    bf801611 7573             jnz     win32k!ValidateHwnd+0xff (bf801686)

    bf801613 ff15e0b298bf call dword ptr [win32k!_imp__PsGetCurrentThread (bf98b2e0)]

     

    OK, so it's pretty simple. Win32k decided which function address to call by reading it from a pointer stored at a hardcoded location (in other words, in a global variable) within Win32k itself. That pointer is located at bf98b2e0. The debugger helpfully tells us that this pointer is intended to contain the address of a function called PsGetCurrentThread, but let’s double-check this and make sure it actually does. At this point, the working assumption would be that this pointer had gotten corrupted somehow.

     

    kd> dd bf98b2e0 L1

    bf98b2e0  804e4a15

     

    Interesting. bf98b2e0 contains the value 804e4a15. This is not even close to the bad address the processor actually called! Remember, from the stack trace, we jumped to 8738e300 instead. What does this pointer actually point to?

     

    kd> ln 804e4a15

    (804e4a15)   nt!PsGetCurrentThread   |  (804e4a51)   nt!KeSetEventBoostPriority

    Exact matches:

        nt!PsGetCurrentThread = <no type information>

     

    Sure enough, 804e4a15 is the address of nt!PsGetCurrentThread.

     

    So, basically, win32k was trying to call PsGetCurrentThread(), and even had the correct pointer in memory to get there, but the processor instead jumped to a bogus address located in the middle of some user data.

     

    At this point it's pretty safe to say that this is a hardware problem, and furthermore, since the relevant memory looks fine, it looks like a problem within the CPU itself. (I’m not an expert on CPU architecture, but if I were to take a wild guess I’d say that maybe the processor had some problem when reading from its L2 cache.)

     

    Upon discovering this, I decided to crack open my case and take a look. It didn’t take long to spot the problem:

    image

    Yes, that's my CPU fan. Notice that the heatsink is completely clogged up with dust!

     

    The resolution to this story: I took a can of compressed air and blew all the dust out of the heatsink. This took care of the problem, and the computer happily scanned hundreds of photos and hasn’t crashed again since. Why was it only crashing when I was scanning photos? Most likely because this was causing the CPU to run at 100% utilization for extended periods of time, and it was simply overheating!

     

  • Ntdebugging Blog

    Microsoft Windows Dynamic Cache Service

    • 15 Comments

    Excessive cached read I/O is a growing problem.  For over one year we have been working on this problem with several companies.  You can read more about it in the original blog post:

    http://blogs.msdn.com/ntdebugging/archive/2007/11/27/too-much-cache.aspx

    On 32 bit systems, the kernel could address at most 2GB of virtual memory.  This address range is shared and divided up for the many resources that the system needs; one of which is the System File Cache's working set.  On 32 bit systems the theoretical limit is almost 1GB for the cache’s working set; however, when a page is removed from the working set it will end up on the standby page list.  Therefore the system can cache more than the 1 GB limit if there is available memory. The working set; however, is just limited to what can be allocated within the Kernel's 2GB virtual address range.  Since most modern systems have more than 1 GB of physical RAM, the System File Cache's working set's size on a 32 bit system typically isn't a problem.

    With 64 bit systems, the kernel virtual address space is very large and is typically larger than physical RAM on most systems.  On these systems the System File Cache's working set can be very large and is typically about equal to the size of physical RAM.  If applications or file sharing performs a lot of sustained cached read I/O, the System File Cache's working set can grow to take over all of physical RAM.  If this happens, then process working sets are paged out and everyone starts fighting for physical pages and performance suffers.

    The only way to mitigate this problem is to use the provided APIs of GetSystemFileCacheSize() and SetSystemFileCacheSize().  The previous blog post "Too Much Cache" contains sample code and a compiled utility that can be used to manually set the System File Cache's working set size.

    The provided APIs, while offering one mitigation strategy, has a couple of limitations:

    1) There is no conflict resolution between multiple applications.  If you have two applications trying to set the System File Cache's working set size, the last one to call SetSystemFileCacheSize() will win.  There is no centralized control of the System File Cache's working set size.

    2) There is no guidance on what to set the System File Cache's working set size to.  There is no one size fits all solution.  A high cache working set size is good for file servers, but bad for large memory application and a low working set size could hurt everyone's I/O performance.  It is essentially up to 3rd party developers or IT administrators to determine what is best for their server and often times, the limits are determined by a best guesstimate backed by some testing.

    We fully understand that while we provide one way to mitigate this problem, the solution is not ideal. We spent a considerable amount of time reviewing and testing other options.  The problem is that there are so many varied scenarios on how users and applications rely on the System File Cache.  Some strategies worked well for the majority of usage scenarios, but ended up negatively impacting others.  We could not release any code change that would knowingly hurt several applications.

    We also investigated changing some memory manager architecture and algorithms to address these issues with a more elegant solution; however the necessary code changes are too extensive.  We are experimenting with these changes in Windows 7 and there is no way that we could back port them to the current operating systems.  If we did, we would be changing the underlying infrastructure that everyone has been accustomed to.  Such a change would require stress tests of all applications that run on Windows. The test matrix and the chance of regression are far too large.

    So that brings us back to the only provided solution - use the provided APIs.  While this isn't an ideal solution, it does work, but with the limitations mentioned above.  In order to help address these limitations, I've updated the SetCache utility to the Microsoft Windows Dynamic Cache Service.  While this service does not completely address the limitations above, it does provide some additional relief.

    The Microsoft Windows Dynamic Cache Service uses the provided APIs and centralizes the management of the System File Cache's working set size.  With this service, you can define a list of processes that you want to prioritize over the System File Cache by monitoring the working set sizes of your defined processes and back off the System File Cache's working set size accordingly.  It is always running in the background monitoring and dynamically adjusting the System File Cache's working set size.  The service provides you with many options such as adding additional slack space for each process' working set or to back off during a low memory event.

    Please note that this service is experimental and includes sample source code and a compiled binary.  Anyone is free to re-use this code in their own solution.  Please note that you may experience some performance side effects while using this service as it cannot possibly address all usage scenarios. There may be some edge usage scenarios that are negatively impacted.  The service only attempts to improve the situation given the current limitations.  Please report any bugs or observations here to this blog post.  While we may not be able to fix every usage problem, we will try to offer a best effort support.

    Side Effects may include:

    Cache page churn - If the System File Cache's working set is too low and there is sustained cached read I/O, the memory manager may not be able to properly age pages.  When forced to remove some pages in order to make room for new cache pages, the memory manager may inadvertently remove the wrong pages.  This could result in cached page churn and decreased disk performance for all applications.

    Version 1.0.0 - Initial Release

    http://www.microsoft.com/downloads/details.aspx?FamilyID=e24ade0a-5efe-43c8-b9c3-5d0ecb2f39af

    NOTE: The memory management algorithms in Windows 7 and Windows Server 2008 R2 operating systems were updated to address many file caching problems found in previous versions of Windows. There are only certain unique situations when you need to implement the Dynamic Cache service on computers that are running Windows 7 or Windows Server 2008 R2.  For more information on how to determine if you are experiencing this issue and how to resolve it, please see the More Information section of Microsoft Knowledge Base article 976618 - You experience performance issues in applications and services when the system file cache consumes most of the physical RAM.

  • Ntdebugging Blog

    Event ID 157 "Disk # has been surprise removed"

    • 14 Comments

    Hello my name is Bob Golding and I would like to share information on a new error you may see in the system event log. It is Event ID 157 "Disk <n> has been surprise removed" with Source: disk.  This error indicates that the CLASSPNP driver has received a “surprise removal” request from the plug and play manager (PNP) for a non-removable disk.

     

    What does this error mean?

    The PNP manager does what is called enumerations.  An enumeration is a request sent to a driver that controls a bus, such as PCI, to take an inventory of devices on the bus and report back a list of the devices.  The SCSI bus is enumerated in a similar manner, as are devices on the IDE bus.

     

    These enumerations can happen for a number of reasons.  For example, hardware can request an enumeration when it detects a change in configuration.  Also a user can initiate an enumeration by selecting “scan for new devices” in device manager.  

     

    When an enumeration request is received, the bus driver will rescan the bus for all devices.  It will issue commands to the existing devices as though it was looking for new ones.  If these commands fail on an existing unit, the driver will mark the device as “missing”.  When the device is marked “missing”, it will not be reported back to PNP in the inventory.  When PNP determines that the device is not in the inventory it will send a surprise removal request to the bus driver so the bus driver can remove the device object.

     

    Since the CLASSPNP driver sits in the device stack and receives requests that are destined for disks, it sees the surprise removal request and logs an event if the disk is supposed to be non-removable.  An example of a non-removable disk is a hard drive on a SCSI or IDE bus.  An example of a removable disk is a USB thumb drive.

     

    Previously nothing was logged when a non-removable disk was removed, as a result disks would disappear from the system with no indication.  The event id 157 error was implemented in Windows 8.1 and Windows Server 2012 R2 to log a record of a disk disappearing.

     

    Why does this error happen?

    These errors are most often caused when something disrupts the system’s communication with a disk, such as a SAN fabric error or a SCSI bus problem.  The errors can also be caused by a disk that fails, or when a user unplugs a disk while the system is running.  An administrator that sees these errors needs to verify the heath of the disk subsystem.

     

    Event ID 157 Example:

     

  • Ntdebugging Blog

    Windows Hotfixes and Updates - How do they work?

    • 13 Comments

    Today I would like to talk about some of the work the Windows Serviceability (WinSE) team does regarding servicing Windows and releasing updates.

    The operating system is divided into multiple components. Each component can consist of one or more files, registry keys, configuration settings, etc.  WinSE releases updates based on components rather than the entire operating system. This reduces a lot of overhead with having to install updates to components that have not changed. Depending on the severity and applicability of the problem, there are different kinds of release mechanisms. Keep in mind, though, the actual fix still remains the same.

    1.       Updates and Security Updates

    These Updates are typically available on Windows Update. They frequently contain security fixes, and from time to time also contain reliability rollup packages. These updates are thoroughly tested and Microsoft highly recommends that you update your computer with these releases. In fact, most are automatically downloaded to your machine if you have Windows Update turned on. In most cases, Update releases are also available as standalone downloads from the download center.

     

    2.       Hotfixes

    When an individual customer reports a bug to Microsoft for a specific scenario, the WinSE team releases Hotfixes to address these problems. Hotfixes are not meant to be widely distributed and go through a limited amount of testing due to the customer's need for an urgent fix.  Hotfixes are developed in a separate environment than the regular Updates.  This allows Microsoft to release Updates that do not include the Hotfix files, thereby minimizing risk for the customer.

    Once the Hotfix is ready and packaged by WinSE, a KB article is written describing the problem, with instructions on how to obtain the Hotfix.  Microsoft recommends that only customers experiencing the particular problem install the Hotfix for that problem.

    Note: Hotfixes are also sometimes referred to as LDRs, or QFE's (Quick Fix Engineering). The term QFE is an old term that is mostly no longer used in reference to current versions of Windows.

     

    3.       SP  - Service Pack

    The service pack is a major update in the life of an OS. It contains a wide variety of fixes as well as all the GDR and LDR fixes that were released since the previous service pack was shipped. This is a thoroughly tested release and highly recommended by Microsoft for installation. This is usually available as a standalone release, and is then released through Windows Update as well.

     

     

    GDR vs. LDR branches

    Now that we have described the different kinds of updates, let's take a deeper look into how these fixes are built. When a new OS or service pack is released, 2 branches are created from the release code tree -a GDR (general distribution release) branch and a LDR (limited distribution release) branch. Hotfixes are built solely from the LDR branch, while Updates for broad release are built from the GDR branch.

    Service Packs are built from a third branch that contains all Updates , Hotfixes and additional fixes.  This way the new service pack is shipped with all the fixes from both branches.

    Note – Once the new service pack is shipped, the code branches from the previous release are still active and serviced as necessary.

    Installing a Hotfix

    By default, all components on Windows systems start on the GDR branch following each major release. When you install updates from Windows Update for a GDR component, it gets upgraded with the GDR version.

    When you install a specific Hotfix, the files and components in the Hotfix package are migrated to the LDR branch. At this point, that particular component is marked as a LDR component. If you install a newer Update over this component, the Windows servicing technology will automatically install the appropriate latest version from the LDR branch for you. This is possible because each Update package ships with both the GDR and LDR versions of the component.

    Once a component is marked as a LDR component, the only way to move back to the GDR branch is to uninstall all Hotfixes for that component, or move to the next available service pack.

     

    What would happen if a user installed a Hotfix, and then sometime later installed the next service pack? Well, in that case it depends on the Hotfix and when it was built.

    1.       If the Hotfix was built before the service pack, then the component will be moved to the GDR version contained in the service pack.

    2.       If the Hotfix was built after the service pack, the component will be migrated to the post-service pack version of the component, and will stay on the same branch that it was originally on.

     

    In order to make this work, these packages contain both the RTM GDR version, the RTM Hotfix branch, and the SP1 Hotfix and GDR version of each binary.

     

    All fixes built for Windows are cumulative in nature by branch, i.e. a new update will contain the new fix, as well as all the previous fixes for that branch. Referencing the chart above, installing fix #4 can get you fixes #2 and #4 on the GDR branch. If the component is on the LDR branch, then the user would get fixes #1-4.

     

    Finally, the servicing technology has to handle the case where you need the functionality of an older Hotfix (e.g. “Fix #1” in the diagram above) but you may already have installed “Fix #4” which might be a critical security update.  What happens is that when the GDR branch of a fix is installed, it also places a copy of the Hotfix version of the same fix on the system.  When you run the installer for Hotfix #1, it detects that a newer version of the file is already installed, but it also detects that it needs to migrate it to the Hotfix version of the binary that was previously stored on the system. The result is that you end up with the Hotfix binary for Fix #4, which has both the Hotfix you need plus the cumulative set of security fixes.

     

    Stay tuned for more, in the next blog entry, I will talk about the staging mechanism that Windows uses to install Updates and Hotfixes as well as the uninstall process. Also, I will talk about how to determine the branch a file is built from.

     

    - Omer 

     

    More Information

    Description of the standard terminology that is used to describe Microsoft software updates

    Description of the contents of Windows XP Service Pack 2 and Windows Server 2003 software update packages

     

  • Ntdebugging Blog

    NTFS Misreports Free Space?

    • 13 Comments

    I have recently seen a number of issues where customers called in to report a significant difference between the “Size on disk” for the root of a volume, and the reported amount of “Used space” in the volume properties.  While considering this, I noticed that my own C: drive had a mysterious 20GB difference.

     

    Below is the story of how I found the answer.

     

    Before we begin, there are two methods used for calculating disk usage…

     

    Method 1 – Volume Bitmap Analysis

    The % used and %free indication shown below with the pie chart is based on volume bitmap analysis.  The hidden $Bitmap:$Data:”” stream is read from the volume via the FSCTL_GET_VOLUME_BITMAP IO control code, and the ones and zeroes are counted.  A binary “1” represents an allocated cluster, while a “0” indicates a free cluster.  The total number of set bits indicates the total number of allocated clusters on the volume.  This is the fastest and most reliable method of calculating allocated and free space.

     

    Method 2 – File Scanning with FindFirstFileEx  / FindNextFile

    Navigating to the “C:\” folder, we select all files/folders, then right click, and then select properties.  A dialog will appear as the shell enumerates all files that it can find in the root and subfolders.  The “Size” and “Size on disk” fields will eventually display the total size of all primary data streams for all files that are found by Explorer.  In my case, this method only finds 80.25 GB of 102 GB (leaving a 21.75 GB difference).

     

    Why such a big difference?

     

    The short answer is - “If you can’t see it with Explorer, it doesn’t get added to the total size.”

     

    How to find the hidden disk usage…

     

    Step 1:

     

    First, check to see if your files are all actually visible with the current user account and Explorer settings.  Uncheck the “Hide protected operating system files” and select “Show hidden files and folders”.  When this is completed, look at the size on disk again. 

    Does it match the pie chart within a few percent?  If not, continue to Step 2.

     

    Step 2:

    Run CHKDSK and save the output.  The summary information at the end is what we want to see.

    Based on CHKDSK output, we can calculate the total metadata usage by adding up the following…

     

    KB

    GB

    Description

    57036

    .05 GB

    Space used by 22677 indexes.

    0

    0 GB

    Space used by $Badclus file.

    1352680

    1.29 GB

    Space used by $MFT.

    65536

    .06 GB

    Space used by $Loffile.

    1475252

    1.4 GB

     Metadata Total

     

    In this example, metadata usage accounts for only 1.4 GB.  If the metadata number were a high percentage of the usage, then we need to take closer a look at how the volume is being used instead of simply looking for hidden files.  High metadata usage will be the subject for part 2 of this blog.

     

    We can see from the CHKDSK output that the total space occupied by our user files is 106,095,080 KB (101.18 GB).  This is a large percentage of the total usage, so we should be looking at the user files to see why we can’t see these files with Explorer.

     

    Step 3:

    Do you have permissions to see all files/folders the volume? 

     

    a.     If this is a system drive, look through the “\Documents and Settings”, or “\users” folder and see if you can browse all of the profile folders.  If not, you may need contact the owner(s) to check their folder size and see what they have saved in their user folder.  If they are like me, they will have 5 years worth of memory dumps, MP3’s, VHD’s, PDF’s, etc.

    b.    Try “CACLS.EXE x:\ /T”, or just browse through the folders on the drive looking for an “Access Denied” error.  Once this happens, give yourself permissions to that part of the subtree (if permitted by your administrative privileges) or have someone who has access to that folder enumerate the contents for you.  Check the number of files and the size on disk for all files in that folder and add it to the total.

    c.     Continue this process until you have a total of all files in all folders.

    d.    Does the total make sense?  If not, then continue to the next step.

     

    In the case of my mysterious 20 GB difference, I found an old user profile from a previous installation.  Sure enough, I received an access denied error when browsing to it.  To access the folder, I acquired ownership.  This user was in the habit of collecting memory dumps and virtual machines in a subfolder on his desktop.  I recovered quite a bit of free space by using the delete key.  I rescanned the volume, but to my disappointment, there was still a significant difference.

     

    Step 4.

    Are there alternate streams?  Try using STREAMS.EXE from (http://technet.microsoft.com/en-us/sysinternals/default.aspx).  Streams will recurse subdirectories and report space usage by alternate named streams.  Pipe this to a text file and analyze it with a text editor or spreadsheet program.

     

    Step 5.

    Hard links can be problematic when calculating the total usage via the file scan method.  Hard links are not extensively used, but this is worth mentioning.  A hard link is an extra index entry that points to an existing file.  These are created via the CreateHardLink function.  Although the file actually exists in one location, each hard link has its own size field.  Because of this, a single file can be added twice to the running total.  Another thing to know is that hard links are not synchronized, so it is possible that only one link will show the actual file size (see the example below).

    Unfortunately, there are few options available for detecting hard link paradoxes, but it is something to consider when the file scan shows more usage than the bitmap.  Since we have the opposite situation here, hard links are not a significant factor.

     

    Step 6.

    Is Volume Shadow Copy Service maintaining diff area files for snapshots?  Use VSSADMIN LIST SHADOWSTORAGE to find out.  Add shadow storage to the running total.

     

    VSSVC pre-allocates space for making volume snapshots.  In order to support this feature, diff area files are kept in the “\System Volume Information” folder.  This pre-allocated space is used to maintain point-in-time information for the “Previous Versions” feature and for the “System Restore” application.  If you are the type of user who prefers to minimize the impact of such features, then you can resize your shadow storage with VSSADMIN so it has less impact on disk usage.  I prefer to leave these features at their default settings (and just make a note of how much disk space it is using).

    Step 7.

     

    If your numbers still don’t make sense, then it’s time to check for actively hidden files / folders.  There are many different rootkit scanners that can help you identify the presence of actively hidden files & folders.   Try using a scanner like Rootkit Revealer.  If you suspect that your machine has been compromised by a rootkit, refer to http://www.microsoft.com/uk/business/security/security-malware.mspx.

     

    Seeing that I did not have a significant amount of mysterious usage, I was satisfied that there was no malicious logic hiding my files.  After taking into account all of the factors, there was only a 1.57GB difference remaining (which was accounted for by the contents of a user’s recycle bin).

     

    Whenever you see a difference between the “Used space”, and “Size on disk”, always ask yourself “What am I not seeing?”

     

     

    Best regards,

     

    Dennis Middleton “The NTFS Doctor”

     

  • Ntdebugging Blog

    Windows Internals Beta Exam 71-660

    • 13 Comments

    Correction, Windows Internals Beta Exam 71-660

    This was previously listed incorrectly as 70-660, the actual exam ID is now 71-660

     

    Update 7-18-2008: The link to the exam preparation information has been updated below.

     

    Update 7-21-2008: When registering, there is no need to use a credit card.  Be sure to instead use the following promotional code listed below when registering for the exam: WINT.  If you have already been charged for this exam, please contact Prometric and request a refund, and reference ticket number 3711938.

     

    Hello Windows Debugging Community,  We are super excited to announce a new certification.   With the help of industry partners, including some of our very talented community members,  we have developed a new Windows Internals Certification.   This Certification targets developers and IT professionals that require knowledge of Windows Internals as part of their profession.   Chance are if you are reading this blog post you are part of the target audience.  I encourage you to register and please take the Beta exam.  Please help us to improve the exam by providing feedback during the beta exam.

     

     

    Register for Beta Exam 71-660: TS: Windows® Internals

     

    You are invited to take beta exam 70-660: TS: Windows® Internals. If you pass the beta exam, the exam credit will be added to your transcript and you will not need to take the exam in its released form.

    By participating in beta exams, you have the opportunity to provide the Microsoft Certification program with feedback about exam content, which is integral to development of exams in their released version. We depend on the contributions of experienced IT professionals and developers as we continually improve exam content and maintain the value of Microsoft certifications.

    Please remember that participation in the beta process is completely voluntary and Microsoft makes no promises or guarantees regarding the beta exam process. You can expect to receive your score on the beta exam within 12 weeks of taking the exam, although in some instances, beta exams may take longer to score and your results may be delayed.

    70-660: TS: Windows® Internals counts as credit towards the following certification.

    ·          Microsoft Certified Technology Specialist (MCTS): Windows® Internals  


    Availability

    Registration begins: July 17, 2008

    Beta exam period runs: July 21, 2008August 5, 2008

     

    Receiving this invitation does not guarantee you a seat in the beta; we recommend that you register immediately. Beta exams have limited availability and are operated under a first-come-first-served basis. Once all beta slots are filled, no additional seats will be offered.


    Testing is held at Prometric testing centers worldwide, although this exam may not be available in all countries (see Regional Restrictions).  All testing centers will have the capability to offer this exam in its live version.

     

    Regional Restrictions: India, Pakistan, China


    Registration Information

    You must register at least 24 hours prior to taking the exam.
    Please use the following promotional code when registering for the exam: WINT
    Receiving this invitation does not guarantee you a seat in the beta; we recommend that you register immediately.


    To register in North America, please call:

    Prometric: (800) 755-EXAM (800-755-3926)

     

     

    Outside the U.S./Canada, please contact:

    Prometric: http://www.register.prometric.com/ClientInformation.asp

     

     


    Test Information and Support

    You are invited to take this beta exam at no charge.
    You will be given four hours to complete the beta exam. Please plan accordingly.
    Find exam preparation information: http://blogs.msdn.com/ntdebugging/pages/exam-preparation-information-for-exam-71-660.aspx

     


    Frequently Asked Questions

    For Microsoft Certified Professional (MCP) help and information, you may log in to the MCP Web site at http://www.microsoft.com/learning/mcp/or contact your Regional Service Center: http://www.microsoft.com/learning/support/worldsites.asp.

    What is a beta exam?

    Where can I learn more about the registration process?

    Where can I learn more about the beta exam invitation process?

    Where can I learn more about the new structure of Microsoft Certification?

    Who do I contact for help with this beta exam or other MCP questions?

     

    Thank you.


    Jeff Dailey, Microsoft Global Escalation Services.

  • Ntdebugging Blog

    NTDebugging Puzzler 0x00000004: This didn’t puzzle the Debug Ninja, how about you?

    • 12 Comments

    Hello NTDebuggers, we have been very impressed with the responses we’ve gotten to our previous puzzlers so far.  We invited the Debug Ninja to come up with a real challenge for this week.

    This server blue screened with a Bug Check 0xD1: DRIVER_IRQL_NOT_LESS_OR_EQUAL.  The challenge this week is to tell us, as specifically as possible, why it crashed.  We’ll give you a hint, the complete answer is not that ebx+5441F815h points to an invalid address.

    1: kd> .bugcheck

    Bugcheck code 000000D1

    Arguments e074281d 00000002 00000001 ba502493

     

    1: kd> kv

    ChildEBP RetAddr  Args to Child

    f78b6544 ba502493 badb0d00 00000001 00000000 nt!_KiTrap0E+0x2a7 (FPO: [0,0] TrapFrame @ f78b6544)

    f78b65e0 ba50d9d8 00000020 8c32cab8 00000022 tcpip!GetAddrType+0x19f (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b6694 ba50dc56 8c32cab8 8ca71c2c 000005c8 tcpip!IPRcvPacket+0x66c (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b66d4 ba50dd58 00000000 8ca63440 8ca71c0a tcpip!ARPRcvIndicationNew+0x149 (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b6710 bada5550 8c4b53b8 00000000 f78b678c tcpip!ARPRcvPacket+0x68 (FPO: [Non-Fpo]) (CONV: stdcall)

    f78b6764 ba9c614b 8cac2ad0 f78b6784 00000002 NDIS!ethFilterDprIndicateReceivePacket+0x1d2 (FPO: [Non-Fpo]) (CONV: stdcall)

     

    1: kd> .trap f78b6544

    ErrCode = 00000002

    eax=8c32cab8 ebx=8c323008 ecx=00000001 edx=00000001 esi=8ca71c18 edi=f78b6618

    eip=ba502493 esp=f78b65b8 ebp=f78b65e0 iopl=0         nv up ei pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00250246

    tcpip!GetAddrType+0x19f:

    ba502493 008b15f84154    add     byte ptr [ebx+5441F815h],cl ds:0023:e074281d=??

     

    The remainder of this puzzler is quite long, click here to see the complete debug notes required to solve this puzzlerPost your responses to this page (and not the complete debug page), so we can keep all the comments together.  Also, we won’t post any of your responses until Friday.


    [Update, posted 4/30/2008]

    Today is Wednesday and so far we haven’t received any complete responses; the partial answers we have received are close but no one has determined the root cause of the problem.  When looking at problems like this it is often useful to ask “why?” when you think you have the answer and think about what other information may be available.  In this situation it is possible to identify a specific problem and even determine appropriate action to fix the problem.

    Perhaps we were a bit overzealous in posting 14 pages of assembly as a puzzler.  Below is an abbreviated debug with less assembly, the link to the complete debug is still available for those of you who want to try the full challenge.  Also, we are going to break our promise not to post responses until Friday and we have posted some responses that we have gotten so far, neither of these responses completely identifies the problem but they are getting close.

     

    1: kd> dd @ebx+5441F815

    e074281d  ???????? ???????? ???????? ????????

    e074282d  ???????? ???????? ???????? ????????

    e074283d  ???????? ???????? ???????? ????????

    e074284d  ???????? ???????? ???????? ????????

    e074285d  ???????? ???????? ???????? ????????

    e074286d  ???????? ???????? ???????? ????????

    e074287d  ???????? ???????? ???????? ????????

    e074288d  ???????? ???????? ???????? ????????

    1: kd> !pte e074281d

                   VA e074281d

    PDE at 00000000C0603818    PTE at 00000000C0703A10

    contains 000000021B980963  contains E154FC1000000400

    pfn 21b980 -G-DA--KWEV                           not valid

                           Proto: 00000000E154FC10

     

    1: kd> ub ba50d9d8

    tcpip!IPRcvPacket+0x658:

    ba50d9c4 51              push    ecx

    ba50d9c5 50              push    eax

    ba50d9c6 53              push    ebx

    ba50d9c7 ff75f4          push    dword ptr [ebp-0Ch]

    ba50d9ca ff75f8          push    dword ptr [ebp-8]

    ba50d9cd ff75ec          push    dword ptr [ebp-14h]

    ba50d9d0 ff7508          push    dword ptr [ebp+8]

    ba50d9d3 e84a040000      call    tcpip!DeliverToUser (ba50de22)

     

    1: kd> uf tcpip!DeliverToUser

    tcpip!DeliverToUser:

    tcpip!DeliverToUser+0xc1:

    ba50ded9 8b45f0          mov     eax,dword ptr [ebp-10h]

    ba50dedc 29451c          sub     dword ptr [ebp+1Ch],eax

    ba50dedf 8b471c          mov     eax,dword ptr [edi+1Ch]

    ba50dee2 ff7508          push    dword ptr [ebp+8]

    ba50dee5 2b4510          sub     eax,dword ptr [ebp+10h]

    ba50dee8 57              push    edi

    ba50dee9 8945fc          mov     dword ptr [ebp-4],eax

    ba50deec e817ffffff      call    tcpip!UpdateIPSecRcvBuf (ba50de08)

     

    tcpip!DeliverToUser+0xd9:

    ba50def1 807e0906        cmp     byte ptr [esi+9],6

    ba50def5 753e            jne     tcpip!DeliverToUser+0x12f (ba50df35)

     

    tcpip!DeliverToUser+0xdf:

    ba50def7 8b430c          mov     eax,dword ptr [ebx+0Ch]

    ba50defa 83b88c00000007  cmp     dword ptr [eax+8Ch],7

    ba50df01 0f857a510000    jne     tcpip!DeliverToUser+0x124 (ba513081)

     

    tcpip!DeliverToUser+0xeb:

    ba50df07 837f1000        cmp     dword ptr [edi+10h],0

    ba50df0b 741a            je      tcpip!DeliverToUser+0x114 (ba50df27)

     

    tcpip!DeliverToUser+0xf1:

    ba50df0d 640fb61551000000 movzx   edx,byte ptr fs:[51h]

    ba50df15 8b8880010000    mov     ecx,dword ptr [eax+180h]

    ba50df1b 3bca            cmp     ecx,edx

    ba50df1d 0f857145ffff    jne     tcpip!DeliverToUser+0x103 (ba502494)

     

    tcpip!DeliverToUser+0x103:

    ba502494 8b15f84154ba    mov     edx,dword ptr [tcpip!_imp__KeNumberProcessors (ba5441f8)]

    ba50249a 0fbe12          movsx   edx,byte ptr [edx]

    ba50249d 3bca            cmp     ecx,edx

    ba50249f 0f8582ba0000    jne     tcpip!DeliverToUser+0x114 (ba50df27)

     

    tcpip!DeliverToUser+0x110:

    ba50df23 c645f402        mov     byte ptr [ebp-0Ch],2

     

    tcpip!DeliverToUser+0x114:

    ba50df27 640fb60d51000000 movzx   ecx,byte ptr fs:[51h]

    ba50df2f 898880010000    mov     dword ptr [eax+180h],ecx

     

    tcpip!DeliverToUser+0x114:

    ba5024a5 e979ba0000      jmp     tcpip!DeliverToUser+0x110 (ba50df23)

     


     

    [Update: our answer, posted 5/2/2008]

    In response:  I would like to congratulate the people who figured this out and say thank you to all of the people who gave this a try and sent in a response.  I’m not sure if we made a challenge that was too easy, too difficult, or too long, but we did not get as many responses as we had expected.  If you looked at this and chose not to post a comment, or you have looked at our previous puzzlers but chose to skip this one, let us know why so we can post better puzzlers in the future.

    Here is the answer…

    Experienced assembly readers will recognize that this instruction is a valid instruction, but it takes action that no assembly writer would ever take.  An offset of 5441F815 is too big to be realistic; no structure is 5441F815 bytes long.

    1: kd> .trap f78b6544

    ErrCode = 00000002

    eax=8c32cab8 ebx=8c323008 ecx=00000001 edx=00000001 esi=8ca71c18 edi=f78b6618

    eip=ba502493 esp=f78b65b8 ebp=f78b65e0 iopl=0         nv up ei pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00250246

    tcpip!GetAddrType+0x19f:

    ba502493 008b15f84154    add     byte ptr [ebx+5441F815h],cl ds:0023:e074281d=??

     

    Why is this instruction telling us to go to a crazy offset?  One possible cause is that the instruction is corrupt; a good debug method for this problem would be to unassemble the code around this instruction to see if the adjacent instructions look valid, and then to compare these instructions to ones in a known good version of tcpip.sys.  Because this instruction was not corrupt neither of these steps helped, so I started looking at how we got to this instruction.

    Looking at the prior function on the stack, IPRcvPacket, I see that we actually called DeliverToUser and not GetAddrType.  Why is our current instruction in GetAddrType instead of DeliverToUser?  It is possible that the call went to the wrong place, that DeliverToUser called something else which returned to the wrong place, or that somehow the flow of execution did not go as intended.

    1: kd> ub ba50d9d8

    tcpip!IPRcvPacket+0x658:

    ba50d9c4 51              push    ecx

    ba50d9c5 50              push    eax

    ba50d9c6 53              push    ebx

    ba50d9c7 ff75f4          push    dword ptr [ebp-0Ch]

    ba50d9ca ff75f8          push    dword ptr [ebp-8]

    ba50d9cd ff75ec          push    dword ptr [ebp-14h]

    ba50d9d0 ff7508          push    dword ptr [ebp+8]

    ba50d9d3 e84a040000      call    tcpip!DeliverToUser (ba50de22)

     

    The next step I took was to analyze the DeliverToUser assembly to see how we might have ended up in GetAddrType instead of DeliverToUser.  What I found, and what several of our readers found, was that DeliverToUser contained a jump that went to 1 byte away from where we encountered the bogus instruction.

    ba50df1d 0f857145ffff    jne     tcpip!DeliverToUser+0x103 (ba502494)

     

    Why would eip point to ba50d9d3 instead of ba50d9d4?  That is the challenging question in this exercise, because the answer is not obvious.  Some engineers will conclude that the hardware must not have done what we asked it to do, and say that the hardware must be at fault.  However, we don’t really have any proof at this time that the hardware has actually done something wrong.  Hardware problems usually manifest themselves as a bit flip, where one bit is changed from the correct value.  We have gone from a 4 (0100) to a 3 (0011); a 3 bit change is hard to blame on hardware.

    To answer the question “Why would eip point to ba50d9d3 instead of ba50d9d4?” we need to carefully analyze the instruction at ba50df1d.  I broke out my trusty instruction set reference and looked up the inner workings of the jne instruction.  On page 3-542 of the Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2A: Instruction Set Reference, A-M I found the layout for a jne to a 32 bit offset.  The first two bytes are 0F 85, and the next 8 bytes are the offset we are going to jump too.  This is our opcode:

    0f 85 ff ff 45 71

    Remember that by the time we make this jump the instruction pointer will have been moved to the next instruction, that will be at ba50df23 (ba50df1d plus the 6 bytes that are in this instruction).  If we take this offset (ff ff 45 71) and add it to the next instruction we see that we’ll end up at ba502494 as expected.

    1: kd> ?ba50df23+ffff4571

    Evaluate expression: -1169152876 = ba502494

     

    If we change just one bit in this opcode, if we change 71 (0111 0001) to be 70 (0111 0000), we end up at ba502493 instead.  Now we can conclude that the hardware must have made a mistake, and now that we know what bit was flipped we can confidently say that we have identified the mistake that was made.

    1: kd> ?ba50df23+ffff4570

    Evaluate expression: -1169152877 = ba502493

     

    We still have some unanswered questions.  The most common question I get when I’ve found a bit flip is what piece of hardware is broken?  Unfortunately I can’t answer this question with the data I have.  In this case the jne instruction was good when we read it to put it in the dump but obviously it was bad when it was executed.  Thinking about the parts needed to get the instruction from RAM and execute it I could speculate that a processor, riser board, or motherboard could be at fault but I don’t have any data that proves where the problem happened.  It is unlikely that hardware diagnostics will find this type of failure because it is intermittent.  Unless the problem manifests itself while the diagnostics are running, they will likely report that there is no problem.  In the case of this server I wasn’t able to find out specifically what failed, uptime was more important than troubleshooting so a whole new server was put in place of this one and the problem has not been seen since.

    Bit flips can be caused by any number of problems.  Problems with a processor, a processor riser board, a motherboard, or RAM are common causes.  As we have documented before in this blog improper voltage or overheating can also cause these types of problems.  However the potential list of bit-flip causes is nearly endless; power supplies, power cables, incoming power fluctuations, radiation from nearby electronics, even sun spots, can all cause bit flips.

    I should also explain why the stack showed that we were in GetAddrType instead of DeliverToUser.  This is because tcpip.sys has been optimized with Basic Block Tools (BBT) optimization, blocks of code from the same function are located in different pages to reduce the cache hits in common code paths.  This means that blocks of the GetAddrType code are located next to the DeliverToUser code.  As demonstrated below there is a block of GetAddrType code that ends with an unconditional jmp at ba50248f and there is a block of DeliverToUser code that starts at ba502494.

    1: kd> ub ba502494

    tcpip!GetAddrType+0x9e:

    ba502474 f6401401        test    byte ptr [eax+14h],1

    ba502478 0f84bd760000    je      tcpip!GetAddrType+0xb4 (ba509b3b)

    ba50247e 85db            test    ebx,ebx

    ba502480 0f84bd6b0200    je      tcpip!GetAddrType+0xa8 (ba529043)

    ba502486 32c0            xor     al,al

    ba502488 e967770000      jmp     tcpip!GetAddrType+0x15a (ba509bf4)

    ba50248d 33c9            xor     ecx,ecx

    ba50248f e993770000      jmp     tcpip!GetAddrType+0x19b (ba509c27)

    1: kd> u

    tcpip!DeliverToUser+0x103:

    ba502494 8b15f84154ba    mov     edx,dword ptr [tcpip!_imp__KeNumberProcessors (ba5441f8)]

    ba50249a 0fbe12          movsx   edx,byte ptr [edx]

    ba50249d 3bca            cmp     ecx,edx

    ba50249f 0f8582ba0000    jne     tcpip!DeliverToUser+0x114 (ba50df27)

    ba5024a5 e979ba0000      jmp     tcpip!DeliverToUser+0x110 (ba50df23)

    ba5024aa 90              nop

    ba5024ab 90              nop

    ba5024ac 90              nop

     

    Congratulations to those who have read this far.  Hopefully you now understand the cause of a problem that has stumped many engineers, and you have some new tools in your toolbox when you’re investigating a tricky problem.  Remember to keep asking “why?” and not to assume anything; those are the approaches I used to find the source of this problem.





  • Ntdebugging Blog

    NTDebuggers Debug Puzzler 0x00000001 "Where did my process go?"

    • 12 Comments

    Hello NTDebuggers, in the spirit of Click and Clack (The Tappet brothers), a favorite troubleshooting show of mine, we thought it would be fun to offer up some Debug puzzlers for our readers.

     

    That said, this week’s Debug Puzzler is in regard to Dr. Watson.  I’m sure most of you have seen Dr. Watson errors.  This typically means your application has crashed due to an unhandled exception.  Sometimes however the process just seems to disappear.  The Just-in-Time (JIT) debugging options configured via the AEDebug key does not catch the crash…  Does anyone know why this may happen?  

     

    We will post reader’s comments as they respond during the week, and next Monday will post our answer and recognize some of the best answers we received from our readers.

     

    Good luck and happy debugging!

     

    -  Jeff Dailey


    [Update: our answer, posted 4/11/2008]

    Hello NTDebuggers.  Let me start of by saying that we were very impressed by our reader’s answers.    Our two favorite answers were submitted by Skywing and molotov.

    When a thread starts, the ntdll Run Time Library (RTL) for the process inserts an exception hander before it calls the BaseThreadInit code to hand control over to the executable or DLL running in the process (notepad in the example below).  If anything goes wrong with the chain of exception handlers, the process can’t make it back to the RTL exception handler and the process will simply terminate.  See http://www.microsoft.com/msj/0197/Exception/Exception.aspx for details.

    ChildEBP RetAddr 

    000ef7ac 75fbf837 ntdll!KiFastSystemCallRet

    000ef7b0 75fbf86a USER32!NtUserGetMessage+0xc

    000ef7cc 00b21418 USER32!GetMessageW+0x33

    000ef80c 00b2195d notepad!WinMain+0xec

    000ef89c 76e24911 notepad!_initterm_e+0x1a1

    000ef8a8 7704e4b6 kernel32!BaseThreadInitThunk+0xe

    000ef8e8 7704e489 ntdll!__RtlUserThreadStart+0x23  << Exception Handler is inserted here.

    000ef900 00000000 ntdll!_RtlUserThreadStart+0x1b

     

    Secondly, the process that crashes is actually responsible for starting the debugger via the RTL exception handler.   The debugger is registered under the AeDebug registry key.    Even if you are able to unwind to the RTL exception handler you may still run into trouble.  If the computer is low on system resources such as desktop heap, you may not be able to create a new process and thus will not be able to launch the debugger.   As SkyWing stated, it’s a relatively heavyweight operation.   Applications may also call TerminateProcess from within their own code based on an error condition.    If we have a customer that sees this symptom on a regular basis we typically recommend having them attach a debugger to monitor the process.   Simply run via ADPLUS -crash  -p (PROCESSID).

    Good work folks!  We’ll have another puzzler ready next Monday.

    Good Luck and happy debugging!

    - Jeff

Page 2 of 24 (240 items) 12345»