• Ntdebugging Blog

    Interpreting a WHEA error for a MCA fault


    Howdy fellow debuggers! This is Graham McIntyre, I am an Escalation Engineer in Platforms Global Escalation Services.  We get questions from time to time from customers who experience a WHEA bugcheck 0x124, or system event, for help in interpreting the error record. The information applies to Windows Server 2008 / Vista SP1 and Windows 2008 R2 / Windows 7.


    I thought I would go through an example error record, point out some commonly asked questions, and show you how to find specific information on the error.  In many cases, the information is specific to a particular processor / hardware vendor, the customer will need to follow up with them. But, we can help to some extent to parse the data.


    For an initial primer on WHEA and hardware error reporting, I suggest reading this whitepaper:



    I’ll provide some further links to some specific WHEA information along the way.


    Getting Started:

    A WHEA bug check 0x124, WHEA_UNCORRECTABLE_ERROR, indicates that a fatal hardware error has occurred.  The bug check parameters give you further information on the WHEA error record generated.


    In this example case, the first parameter was 0 so this indicates that this is a Machine Check Exception (MCE).  An MCE is generated by certain classes of processors, such as Intel and AMD 64-bit processors.


    Checking the help included with the Debugging Tools For Windows for Bug Ch 0x124 shows this meaning for the parameters:

    Parameter 1 Parameter 2 Parameter 3 Parameter 4 Cause of Error


    Address of WHEA_ERROR_RECORD structure

    High 32 bits of MCi_STATUS MSR for the MCA bank that has the error.

    Low 32 bits of MCi_STATUS MSR for the MCA bank that has the error.

    A machine check exception occurred.

    These parameter descriptions apply if the processor is based on the x64 architecture, or the x86 architecture that has the MCA feature available (for example, Intel Pentium Pro, Pentium IV, or Xeon).


    There are 2 useful debugger commands for debugging a WHEA error:

    !whea – displays top level WHEA information

    !errrec – dumps a specific WHEA error record


    Since we already have an address of the error record in Parameter 2, we can dump it out directly with !errrec. 

    31: kd> !errrec fffffa8064341028
    Common Platform Error Record @ fffffa8064341028
    Record Id     : 01cb65718c829130
    Severity      : Fatal (1)
    Length        : 928
    Creator       : Microsoft
    Notify Type   : Machine Check Exception
    Timestamp     : 10/11/2010 7:11:22
    Flags         : 0x00000000

    Section 0     : Processor Generic
    Descriptor    @ fffffa80643410a8
    Section       @ fffffa8064341180
    Offset        : 344
    Length        : 192
    Flags         : 0x00000001 Primary
    Severity      : Fatal

    Proc. Type    : x86/x64
    Instr. Set    : x64
    Error Type    : Micro-Architectural Error
    Flags         : 0x00
    CPU Version   : 0x00000000000206e6
    Processor ID  : 0x0000000000000037

    Section 1     : x86/x64 Processor Specific
    Descriptor    @ fffffa80643410f0
    Section       @ fffffa8064341240
    Offset        : 536
    Length        : 128
    Flags         : 0x00000000
    Severity      : Fatal

    Local APIC Id : 0x0000000000000037

    CPU Id
           : e6 06 02 00 00 08 20 37 - bd e3 bc 00 ff fb eb bf
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00
                00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00

    Proc. Info 0  @ fffffa8064341240

    Section 2     : x86/x64 MCA
    Descriptor    @ fffffa8064341138
    Section       @ fffffa80643412c0
    Offset        : 664
    Length        : 264
    Flags         : 0x00000000
    Severity      : Fatal

    Error         : Internal unclassified (Proc 31 Bank 5)

    Status      : 0xfa00000000400405


    As you can see from the output, a WHEA error record is made of several sections.  Each section is actually a sub-section of the one above it. The sections go from most generic, to most specific, based on the exact type of error which occurred.

    CPER / WHEA record – this is defined in Appendix N of the UEFI spec version 2.2 (these can be obtained from www.uefi.org)

    The format of most of the sections is defined in the UEFI Spec version 2.2 as part of the Common Platform Error Record (CPER) definition.  The last section decribes a Machine Check Archtecture (MCA) which is defined by the processor manufacturer.  In this case, it is an Intel processor

    MCA information - The format of the last part of the record (MCA) is defined in the Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
    Section 15 describes the MCA format and structure. Appendix E in Volume 3B has additional details on interpreting Machine-Check error codes

    Let’s take a look at what each of the sections represents:

    An error record is described by a WHEA_ERROR_RECORD structure, the error record header is described by a WHEA_ERROR_RECORD_HEADER structure, and the error record section descriptors are each described by a WHEA_ERROR_RECORD_SECTION_DESCRIPTOR structure.

    The CPER record header is a WHEA_ERROR_PACKET_V2, and describes the severity and type of error.  In this case it is a fatal Machine Check Exception (MCE)

    Section 0 is a Generic Processor error. This error record section contains processor error data that is not specific to a particular processor architecture. The data that is contained in this section is described by the WHEA_PROCESSOR_GENERIC_ERROR_SECTION structure.

    Section 1 is an x86/x64 Processor Error. This error record section contains processor error data that is specific to the x86 or x64 processor architecture. The data that is contained in this section is described by the WHEA_XPF_PROCESSOR_ERROR_SECTION structure.

    Section 2 is of type WHEA_XPF_MCA_SECTION and contains the machine check and other machine-specific register information. The actual structure which holds the MCA data is a Microsoft specific extension of the CPER specification.  We build this record by reading the Machine Specific Registers (MSRs) which are processor specific, and filling in the fields.  These (and many of the above) are defined in the header file cper.h in the SDK.

    Some of the questions which I was asked about this record, and their answers:

    1.  Why is the processor number (31) listed in the MCA record (Section 2) different than the processor id / APIC ID (37) in sections 0 and 1?

    The answer to this one is that the numbers have different meanings, and different sources.  The one in sections 0 and 1 is the initial APIC ID of the CPU which reported the machine check.  The APIC ID for a logical CPU is set by the hardware on boot.  The processor number in Section 2 is the logical processor number (the value returned from KeGetCurrentProcessorNumberEx) of the processor which is creating the WHEA error record. This may or may not be the same processor which reported the machine check error, depending on the IRQL which the processor generating the error was running.  If the IRQL was < DISPATCH_LEVEL, then it is scheduled to run on the reporting processor.  Otherwise, it may be logged on a different processor.

    How do you map APIC IDs to logical IDs?
    One way is using the !smt debugger extension.  This shows the APIC IDs and logical CPU number for all CPUs.

    No PRCB             SMT Set                                                                             APIC Id
    0 fffff8000da3ee80 **-------------------------------------------------------------- (0000000000000003) 0x00000080
    1 fffff8800260e180 **-------------------------------------------------------------- (0000000000000003) 0x00000081

    2.  How do you interpret the MCA error  Internal unclassified (Proc 31 Bank 5)”?

    In order to make sense of these, you need to determine a few pieces of information, then refer to the specific processor guide.

    As I mentioned previously, for this particular system, it is an Intel system so these are the resources you need to use:

    Section 15 in the  Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3A
    Appendix E in Volume 3B
    has additional details on interpreting Machine-Check error codes


    a. CPU ID – What Family, Model, and Stepping is the CPU?

    !cpuid can show you this.  Or, you can parse it from the CPU ID in section 1.  In this case it is:
    CPU Id        : e6 06 02 00 00 08 20 37 - bd e3 bc 00 ff fb eb bf  // Family 6, Model 2e, stepping 6

    Table B-1 in Appendix B of the Intel guide says that this Family and Model is a “Intel Xeon Processor 7500 Series”


    b. What is the MCA Error code?

    In order to find this out, we need to parse the MCi_STATUS structure.  The ‘i’ is used in the Intel guides as a placeholder for the bank number.  An error bank is a processor specific set of MSRs.  Some banks are publically documented in what the type of error represents, and some are not.  If the bank is not documented, then you will need to check with the processor manufacturer.


    Now that we know the processor family and model, we can look up the meaning of specific bank of registers.  These are listed in this form: MSR_MCi_STATUS.  So since we know the bank number is 5, we can find the meaning of MSR_MC5_STATUS.  Here’s what the Intel guide shows:




    Table B-5

    Register (hex) Register (dec) Register Name Scope Bit Description
    414H 1044 MSR_MC5_STATUS Core See Section, “IA32_MCi_CTL MSRs.”
    415H 1045 MSR_MC5_STATUS Core See Section, “IA32_MCi_STATUS MSRS.”
    416H 1046 MSR_MC5_ADDR Core See Section, “IA32_MCi_ADDR MSRs.”
    417H 1047 MSR_MC5_MISC Core See Section, “IA32_MCi_MISC MSRs.”


    Now,referring to section, we can decode the value:

    +0x000 McaErrorCode     : 0x405  // binary:  0000 0100 0000 0101
    +0x002 ModelErrorCode   : 0x40  // binary: 0000 0000 0100 0000 // bit 22
    +0x004 OtherInformation : 0y00000000000000000000000 (0)
    +0x004 ActionRequired   : 0y0
    +0x004 Signalling       : 0y0
    +0x004 ContextCorrupt   : 0y1
    +0x004 AddressValid     : 0y0
    +0x004 MiscValid        : 0y1
    +0x004 ErrorEnabled     : 0y1
    +0x004 UncorrectedError : 0y1
    +0x004 StatusOverFlow   : 0y1
    +0x004 Valid            : 0y1
    +0x000 QuadPart         : 0xfa000000`00400405


    Section 15.9 discusses how to interpret these error codes.  From Table 8, “IA32_MCi_Status [15:0] Simple Error Code Encoding”, the meaning is given as:

    Internal Unclassified 0000 01xx xxxx xxxx Internal unclassified errors.


    This is why the error shows as “Internal Unclassified”.  Since this is not a publicly documented code, the next step would be to contact Intel for further information.  But, at least now you have verified the information and will have good data to send to the hardware manufacturer.  In other cases, the bank and MCA code may be more clearly documented and further action could be taken.


    Further Reading:

    There is more information regarding WHEA on MSDN and in several WinHEC conference presentations on the Microsoft site:

    WHEA Platform Implementation

    WHEA System Design and Implementation


    I hope this information was useful to understand how to interpret WHEA and MCA error codes. Until next time!

  • Ntdebugging Blog

    Hunting for Bugs, but Found a Worm


    Hi All, my name is Ron Riddle and I’m an Escalation Engineer on the core Windows team.  I worked an issue recently wherein a svchost.exe was crashing due to heap corruption; so, after enabling Page Heap and breaking out the services as needed, I received a user-mode dump that would show me the culprit.  I was expecting to find a legitimate bug either in our code or a third-party module; but, much to my surprise, I found that malware had caused a buffer overrun and the subsequent crash.  With that, I would like to share the simple approach I took in identifying the malware within the dump file.


    1. I start by dumping out the offending call stack.  Notice that the debugger wasn’t able to map the code addresses to a loaded or unloaded module.

    0:003> kbn

     # ChildEBP RetAddr  Args to Child             

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    00 02bcfdcc 7c81a35f 02b7ae40 7c81a3ab 00000004 0x2b685b0

    01 02bcfde4 02b68bfe 02b7ae40 00000000 77e424ee ntdll!LdrpCallInitRoutine+0x21

    02 02bcfde8 02b7ae40 00000000 77e424ee 02b7ae10 0x2b68bfe

    03 02bcfdec 00000000 77e424ee 02b7ae10 00000000 0x2b7ae40


    2. Next, I try to learn more about the mystery address, such as what larger allocation it was a part of.

    0:003> !address 0x2b685b0

    Usage:                  <unclassified>

    Allocation Base:        02b60000

    Base Address:           02b61000

    End Address:            02b81000

    Region Size:            00020000

    Type:                   00020000    MEM_PRIVATE

    State:                  00001000    MEM_COMMIT

    Protect:                00000040    PAGE_EXECUTE_READWRITE


    3. By now, I am suspicious of a rogue module, so I proceed in searching the aforementioned address range for a DOS Signature(i.e. 0x5A4D or “MZ”) that I know any Portable Executable file must contain.  I start with the Base Address from the above output and use the Region Size to specify my range.

    0:003> s -a 02b61000 l20000/4 "MZ"

    02b615d8  4d 5a 90 00 03 00 00 00-04 00 00 00 ff ff 00 00  MZ..............

    02b61bd0  4d 5a 75 f4 5f 83 c4 08-c2 04 00 55 8d 44 24 0c  MZu._......U.D$.

    02b67cd0  4d 5a 0f 85 69 01 00 00-8b 4d 7c 8b 46 3c 81 c1  MZ..i....M|.F<..

    02b681bf  4d 5a 74 07 33 c0 e9 c9-01 00 00 8b 45 0c 56 8b  MZt.3.......E.V.


    4. Now that I have some hits, I’ll start with the first one and verify whether it’s a valid module.  Bingo!

    0:003> !dh -a 02b615d8


    File Type: DLL


         14C machine (i386)

           5 number of sections

    37304740 time date stamp Wed May 05 08:27:28 1999


           0 file pointer to symbol table

           0 number of symbols

          E0 size of optional header

        2102 characteristics


                32 bit word machine




         10B magic #

        7.00 linker version

         600 size of code

         600 size of initialized data

           0 size of uninitialized data

        10B0 address of entry point

        1000 base of code

             ----- new -----

    10000000 image base

        1000 section alignment

         200 file alignment

           1 subsystem (Native)

        4.00 operating system version

        0.00 image version

        4.00 subsystem version

        6000 size of image

         400 size of headers

        41AE checksum

    00100000 size of stack reserve

    00001000 size of stack commit

    00100000 size of heap reserve

    00001000 size of heap commit

           0  DLL characteristics

           0 [       0] address [size] of Export Directory

        4000 [      28] address [size] of Import Directory

           0 [       0] address [size] of Resource Directory

           0 [       0] address [size] of Exception Directory

           0 [       0] address [size] of Security Directory

        5000 [      4C] address [size] of Base Relocation Directory

           0 [       0] address [size] of Debug Directory

           0 [       0] address [size] of Description Directory

           0 [       0] address [size] of Special Directory

           0 [       0] address [size] of Thread Storage Directory

           0 [       0] address [size] of Load Configuration Directory

           0 [       0] address [size] of Bound Import Directory

        2000 [      44] address [size] of Import Address Table Directory

           0 [       0] address [size] of Delay Import Directory

           0 [       0] address [size] of COR20 Header Directory

           0 [       0] address [size] of Reserved Directory




       .text name

         3CC virtual size

        1000 virtual address

         400 size of raw data

         400 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    68000020 flags


             Not Paged

             (no align specified)

             Execute Read



      .rdata name

          68 virtual size

        2000 virtual address

         200 size of raw data

         800 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    48000040 flags

             Initialized Data

             Not Paged

             (no align specified)

             Read Only



       .data name

          56 virtual size

        3000 virtual address

         200 size of raw data

         A00 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    C8000040 flags

             Initialized Data

             Not Paged

             (no align specified)

             Read Write



        INIT name

         1D4 virtual size

        4000 virtual address

         200 size of raw data

         C00 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    E2000020 flags



             (no align specified)

             Execute Read Write



      .reloc name

          82 virtual size

        5000 virtual address

         200 size of raw data

         E00 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    42000040 flags

             Initialized Data


             (no align specified)

             Read Only


    5. Because I’m not sure which sections might contain identifying characteristics, I decide to go spelunking through all the sections (except for the relocation section) looking for said characteristics that might help me to identify the rogue module.  I start with the relative virtual address of the .text section @ 0x1000 and continue through the INIT section @ 0x4000.

    0:003> dc 02b615d8+0x1000 l4000/4

    02b63c58  00000065 646c6977 73737265 72756365  e...wilderssecur

    02b63c68  00797469 65726874 78657461 74726570  ity.threatexpert

    02b63c78  00000000 74736163 6f63656c 00007370  ....castlecops..

    02b63c88  6d617073 73756168 00000000 65737063  spamhaus....cpse

    02b63c98  65727563 00000000 61637261 00746962  cure....arcabit.

    02b63ca8  69736d65 74666f73 00000000 626e7573  emsisoft....sunb

    02b63cb8  00746c65 75636573 6f636572 7475706d  elt.securecomput

    02b63cc8  00676e69 69736972 0000676e 76657270  ing.rising..prev

    02b63cd8  00000078 6f746370 00736c6f 6d726f6e  x...pctools.norm

    02b63ce8  00006e61 6f63376b 7475706d 00676e69  an..k7computing.

    02b63cf8  72616b69 00007375 72756168 00000069  ikarus..hauri...

    02b63d08  6b636168 74666f73 00000000 74616467  hacksoft....gdat

    02b63d18  00000061 74726f66 74656e69 00000000  a...fortinet....

    02b63d28  64697765 0000006f 6d616c63 00007661  ewido...clamav..

    02b63d38  6f6d6f63 00006f64 63697571 6165686b  comodo..quickhea

    02b63d48  0000006c 72697661 00000061 73617661  l...avira...avas

    02b63d58  00000074 66617365 00000065 6c6e6861  t...esafe...ahnl

    02b63d68  00006261 746e6563 636c6172 616d6d6f  ab..centralcomma

    02b63d78  0000646e 65777264 00000062 73697267  nd..drweb...gris

    02b63d88  0074666f 74657365 00000000 33646f6e  oft.eset....nod3

    02b63d98  00000032 72702d66 0000746f 74746f6a  2...f-prot..jott

    02b63da8  00000069 7073616b 6b737265 00000079  i...kaspersky...

    02b63db8  65732d66 65727563 00000000 706d6f63  f-secure....comp

    02b63dc8  72657475 6f737361 74616963 00007365  uterassociates..

    02b63dd8  7774656e 616b726f 636f7373 65746169  networkassociate

    02b63de8  00000073 75727465 00007473 646e6170  s...etrust..pand

    02b63df8  00000061 68706f73 0000736f 6e657274  a...sophos..tren

    02b63e08  63696d64 00006f72 6661636d 00006565  dmicro..mcafee..

    02b63e18  74726f6e 00006e6f 616d7973 6365746e  norton..symantec

    02b63e28  00000000 7263696d 666f736f 00000074  ....microsoft...

    02b63e38  65666564 7265646e 00000000 746f6f72  defender....root

    02b63e48  0074696b 776c616d 00657261 77797073  kit.malware.spyw

    02b63e58  00657261 75726976 00000073 304ce942  are.virus...B.L0

    02b64348  54464f53 45524157 63694d5c 6f736f72  SOFTWARE\Microso

    02b64358  575c7466 6f646e69 435c7377 65727275  ft\Windows\Curre

    02b64368  6556746e 6f697372 78655c6e 726f6c70  ntVersion\explor

    02b64378  415c7265 6e617664 5c646563 646c6f46  er\Advanced\Fold

    02b64388  485c7265 65646469 48535c6e 4c41574f  er\Hidden\SHOWAL

    02b64398  0000004c 63656843 5664656b 65756c61  L...CheckedValue

    02b63ee8  ffffffff 02b6a44f 02b6a453 70747468  ....O...S...http

    02b63ef8  772f2f3a 672e7777 796d7465 6f2e7069  ://www.getmyip.o

    02b63f08  00006772 70747468 772f2f3a 772e7777  rg..http://www.w

    02b63f18  73746168 7069796d 72646461 2e737365  hatsmyipaddress.

    02b63f28  006d6f63 70747468 772f2f3a 772e7777  com.http://www.w

    02b63f38  69746168 69796d73 726f2e70 00000067  hatismyip.org...

    02b63f48  70747468 632f2f3a 6b636568 642e7069  http://checkip.d

    02b63f58  6e646e79 726f2e73 00000067 61207069  yndns.org...ip a

    02b63f68  65726464 00007373 ffffffff 02b6a55e  ddress......^...

    02b64858  00000020 74666f53 65726177 63694d5c   ...Software\Mic

    02b64868  6f736f72 575c7466 6f646e69 435c7377  rosoft\Windows\C

    02b64878  65727275 6556746e 6f697372 75525c6e  urrentVersion\Ru

    02b64888  0000006e 646e7572 32336c6c 6578652e  n...rundll32.exe

    02b64898  73252220 73252c22 00000000 0065006e   "%s",%s....n.e.

    02b648a8  00730074 00630076 00000073 00000020  t.s.v.c.s... ...


    6. The list of anti-malware software vendors was a dead give-away that I was dealing with malware.  Finally, I conducted a Bing search using various artifacts from the preceding spew.  In the end, I was able to confirm that the rogue module was, in fact, the Conficker worm by simply running a full scan of the system using a signature-based scanner.



    I hope this walk-through provided you with techniques that you can leverage to identify rogue modules within your dump files, should that become necessary.  Until next time, happy bug-hunting and watch out for the worms!

Page 1 of 1 (2 items)