Debugging a Bugcheck 0x109

Debugging a Bugcheck 0x109

  • Comments 4

My name is Nader Khonsari. I am an escalation engineer in Platforms Global Escalation Services. I want to share with you a recent experience where 64-bit Windows Server 2008 servers at a customer location were encountering bugcheck 0x109 blue screen crashes.

 

In 64-bit versions of the Windows kernel PatchGuard is present. If any driver or application attempts to modify the kernel the PatchGuard will generate the bugcheck (CRITICAL_STRUCTURE_CORRUPTION) mentioned below. PatchGuard protects the kernel from modification by malicious or badly written drivers or software.

 

To further investigate this bugcheck you need to compare the impacted kernel function with a known reliable one. For instance, if the machine encountering this was running Windows Server 2008 service pack 2 with a post SP2 hotfix kernel you need to compare the impacted kernel function with that of service pack 2 kernel function. Usually you do not need to download and extract the post SP2 hotfix, because the vast majority of the kernel code has not been modified since the service pack.

 

If you already have service pack 2 for Windows Server 2008 handy, expand the package using instructions included in KB928636:

 

Windows6.0-KB948465-X64.exe /x

expand.exe -f:* C:\WS08\SP2\windows6.0-kb948465-X64.cab C:\WS08\SP2\Expanded

 

Locate the kernel binary from the expanded binaries and then open it up with your debugger just like you open a crash memory dump.

 

windbg -z C:\WS08\SP2\Expanded\amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.0.6002.18005_none_ca3a763069a24eea\ntoskrnl.exe

 

This is the bugcheck data from the dump:

 

CRITICAL_STRUCTURE_CORRUPTION (109)

This bugcheck is generated when the kernel detects that critical kernel code or

data have been corrupted. There are generally three causes for a corruption:

1) A driver has inadvertently or deliberately modified critical kernel code

 or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx

2) A developer attempted to set a normal kernel breakpoint using a kernel

 debugger that was not attached when the system was booted. Normal breakpoints,

 "bp", can only be set if the debugger is attached at boot time. Hardware

 breakpoints, "ba", can be set at any time.

3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.

Arguments:

Arg1: a3a039d89b456543, Reserved

Arg2: b3b7465eedc23277, Reserved

Arg3: fffff80001778470, Failure type dependent information

Arg4: 0000000000000001, Type of corrupted region, can be

        0 : A generic data region

        1 : Modification of a function or .pdata

        2 : A processor IDT

        3 : A processor GDT

        4 : Type 1 process list corruption

        5 : Type 2 process list corruption

        6 : Debug routine modification

        7 : Critical MSR modification

 

Next, check the address at Arg3. This will give you the function that was modified, but not the offset of the modified instruction.

 

3: kd> ln fffff80001778470

(fffff800`01778470)   nt!KeSetSystemTime   |  (fffff800`01778790)   nt!BiLoadSystemStore

Exact matches:

    nt!KeSetSystemTime = <no type information>

 

Unassemble the same function in the SP2 kernel binary you expanded from the SP2 package. Do the same with the function of the crashed kernel and compare the two. You will find the modified opcode compared to that of the unmodified kernel.

 

Below is the comparison of the nt!KeSetSystemTime code of the crashed kernel and that of the service pack 2 kernel respectively. They match fine except for the highlighted byte in the prefetch instruction which has been overwritten with a 0x1f.  This changed the instruction to a nop, which is done to prevent the prefetch operation from occurring on processors that don't support prefetch.

 

nt!KeSetSystemTime+0x156:

fffff800`017785c6 0f1f0f          nop     dword ptr [rdi]

fffff800`017785c9 488b07          mov     rax,qword ptr [rdi]

fffff800`017785cc 493bc7          cmp     rax,r15

fffff800`017785cf 7516            jne     nt!KeSetSystemTime+0x177

 

ntoskrnl!KeSetSystemTime+0x156:

00000001`4012e5b6 0f0d0f          prefetchw [rdi]

00000001`4012e5b9 488b07          mov     rax,qword ptr [rdi]

00000001`4012e5bc 493bc7          cmp     rax,r15

00000001`4012e5bf 7516            jne    

 

After further investigation this turned out to be a known issue in the VMware environment when the VM is moved from a non-prefetch to a prefetch architecture and even then, only in a live-migration case.  The issue is documented on VMWare's site at http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1008749&sliceId=1&docTypeID=DT_KB_1_1&dialogID=74787167&stateId=0 .

Leave a Comment
  • Please add 3 and 2 and type the answer here:
  • Post
  • What's the benefit of this method? Why don't you use command !chkimg nt?

    [The !chkimg command requires access to an identical binary as the one on the system.  The method shown in this article will usually allow you to identify the changed code without having to find, download, and extract the hotfix binary.]

  • Interesting technique used here, will add it to my arsenal, thanks! ;)

  • I am facing similar BugCheck Code on server2012. It seems currently MS has not put symbols for server2012 on there server. Anyway, in my case bugcheck code is same i.e. 0x109 but arguments are different ARG3 is showing "Failure type dependent information" and ARG4 is having value 2. Which means Type of corrupted region, can be A processor IDT.

    Issue comes when I run my application which has driver components. I tried to run driver verifier but did not get enough information.

    Any help will be highly appreciably.

    [A fourth parameter of 2 indicates that some driver has attempted to modify the interrupt dispatch table (IDT).]

  • Would it be possible to shed any light on this error? I am receiving a critical_structure_corruption about every 30 mins and it goes to the blue screen and forces a reboot. I am running Windows 8.1 on an HP laptop.

    Log Name:      System

    Source:        Microsoft-Windows-WER-SystemErrorReporting

    Date:          2/13/2014 8:28:29 PM

    Event ID:      1001

    Task Category: None

    Level:         Error

    Keywords:      Classic

    User:          N/A

    Computer:      CRL

    Description:

    The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000109 (0xa3a01f58a0b665ab, 0xb3b72bdef3359b5e, 0xffffe0000110d0d0, 0x000000000000001c). A dump was saved in: C:\WINDOWS\MEMORY.DMP. Report Id: 021314-12921-01.

    Event Xml:

    <Event xmlns="" rel="nofollow" target="_new">schemas.microsoft.com/.../event">

     <System>

       <Provider Name="Microsoft-Windows-WER-SystemErrorReporting" Guid="{ABCE23E7-DE45-4366-8631-84FA6C525952}" EventSourceName="BugCheck" />

       <EventID Qualifiers="16384">1001</EventID>

       <Version>0</Version>

       <Level>2</Level>

       <Task>0</Task>

       <Opcode>0</Opcode>

       <Keywords>0x80000000000000</Keywords>

       <TimeCreated SystemTime="2014-02-14T00:28:29.000000000Z" />

       <EventRecordID>24048</EventRecordID>

       <Correlation />

       <Execution ProcessID="0" ThreadID="0" />

       <Channel>System</Channel>

       <Computer>CRL</Computer>

       <Security />

     </System>

     <EventData>

       <Data Name="param1">0x00000109 (0xa3a01f58a0b665ab, 0xb3b72bdef3359b5e, 0xffffe0000110d0d0, 0x000000000000001c)</Data>

       <Data Name="param2">C:\WINDOWS\MEMORY.DMP</Data>

       <Data Name="param3">021314-12921-01</Data>

     </EventData>

    </Event>

    [Hi Bob.  Unfortunately this message is not sufficient to identify the cause of the crash.  We are not able to provide detailed one on one troubleshooting through this blog.  If you need one on one troubleshooting please open an incident with Microsoft Support at http://support.microsoft.com/gp/microsoft-support-options .]

Page 1 of 1 (4 items)