My name is Nader Khonsari. I am an escalation engineer in Platforms Global Escalation Services. I want to share with you a recent experience where 64-bit Windows Server 2008 servers at a customer location were encountering bugcheck 0x109 blue screen crashes.
In 64-bit versions of the Windows kernel PatchGuard is present. If any driver or application attempts to modify the kernel the PatchGuard will generate the bugcheck (CRITICAL_STRUCTURE_CORRUPTION) mentioned below. PatchGuard protects the kernel from modification by malicious or badly written drivers or software.
To further investigate this bugcheck you need to compare the impacted kernel function with a known reliable one. For instance, if the machine encountering this was running Windows Server 2008 service pack 2 with a post SP2 hotfix kernel you need to compare the impacted kernel function with that of service pack 2 kernel function. Usually you do not need to download and extract the post SP2 hotfix, because the vast majority of the kernel code has not been modified since the service pack.
If you already have service pack 2 for Windows Server 2008 handy, expand the package using instructions included in KB928636:
expand.exe -f:* C:\WS08\SP2\windows6.0-kb948465-X64.cab C:\WS08\SP2\Expanded
Locate the kernel binary from the expanded binaries and then open it up with your debugger just like you open a crash memory dump.
windbg -z C:\WS08\SP2\Expanded\amd64_microsoft-windows-os-kernel_31bf3856ad364e35_6.0.6002.18005_none_ca3a763069a24eea\ntoskrnl.exe
This is the bugcheck data from the dump:
This bugcheck is generated when the kernel detects that critical kernel code or
data have been corrupted. There are generally three causes for a corruption:
1) A driver has inadvertently or deliberately modified critical kernel code
or data. See http://www.microsoft.com/whdc/driver/kernel/64bitPatching.mspx
2) A developer attempted to set a normal kernel breakpoint using a kernel
debugger that was not attached when the system was booted. Normal breakpoints,
"bp", can only be set if the debugger is attached at boot time. Hardware
breakpoints, "ba", can be set at any time.
3) A hardware corruption occurred, e.g. failing RAM holding kernel code or data.
Arg1: a3a039d89b456543, Reserved
Arg2: b3b7465eedc23277, Reserved
Arg3: fffff80001778470, Failure type dependent information
Arg4: 0000000000000001, Type of corrupted region, can be
0 : A generic data region
1 : Modification of a function or .pdata
2 : A processor IDT
3 : A processor GDT
4 : Type 1 process list corruption
5 : Type 2 process list corruption
6 : Debug routine modification
7 : Critical MSR modification
Next, check the address at Arg3. This will give you the function that was modified, but not the offset of the modified instruction.
3: kd> ln fffff80001778470
(fffff800`01778470) nt!KeSetSystemTime | (fffff800`01778790) nt!BiLoadSystemStore
nt!KeSetSystemTime = <no type information>
Unassemble the same function in the SP2 kernel binary you expanded from the SP2 package. Do the same with the function of the crashed kernel and compare the two. You will find the modified opcode compared to that of the unmodified kernel.
Below is the comparison of the nt!KeSetSystemTime code of the crashed kernel and that of the service pack 2 kernel respectively. They match fine except for the highlighted byte in the prefetch instruction which has been overwritten with a 0x1f. This changed the instruction to a nop, which is done to prevent the prefetch operation from occurring on processors that don't support prefetch.
fffff800`017785c6 0f1f0f nop dword ptr [rdi]
fffff800`017785c9 488b07 mov rax,qword ptr [rdi]
fffff800`017785cc 493bc7 cmp rax,r15
fffff800`017785cf 7516 jne nt!KeSetSystemTime+0x177
00000001`4012e5b6 0f0d0f prefetchw [rdi]
00000001`4012e5b9 488b07 mov rax,qword ptr [rdi]
00000001`4012e5bc 493bc7 cmp rax,r15
00000001`4012e5bf 7516 jne
After further investigation this turned out to be a known issue in the VMware environment when the VM is moved from a non-prefetch to a prefetch architecture and even then, only in a live-migration case. The issue is documented on VMWare's site at http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1008749&sliceId=1&docTypeID=DT_KB_1_1&dialogID=74787167&stateId=0 .
A walk-through of creating a Netmon parser in the context of a real case
As is obvious to frequent readers of our blog, our team logs a lot of time in our debugger of choice (for some windbg, for others kd). However, a debugger is not always the best tool for the job, and sometimes even the best tool has limitations. I found this to especially true when working a few Internet Printing Protocol (IPP) cases recently.
Probably the biggest challenge of many IPP cases is the mixed environments you usually find IPP running in. The benefit customers see in IPP over other print providers is it works natively or with minimal configuration on Windows, Mac, and Linux clients. This makes it popular in places like college computer labs, where there isn’t one standard client system. Unfortunately, this also means that we can’t really debug both sides of the communication when something goes wrong.
In a recent case, a customer was having problems printing from their Windows Vista clients to a Linux Common Unix Printing System (CUPS) IPP server. If the CUPS server was set to allow anonymous connections, everything worked perfectly. When the administrator enabled authentication, he found that most print jobs failed to print. After a bit more testing, he found that small jobs (perhaps a page of printed text) worked fine, but larger, multi-page documents failed.
For situations like this, I prefer to use a network trace to get a feeling for where the problem is occurring. The problem was – IPP wasn’t one of the protocols built in to Netmon (and I find Wireshark’s IPP parser to not always work well – especially with Windows clients/servers). I decided that the amount of time it would take to decode the traffic by hand could be better spent creating a Netmon IPP parser that I could use every time I ran into one of these issues.
One of the great things about Netmon is you can view the source of every included parser. This was a big help, as I hadn’t written a parser before. [Note: all steps noted are written using Netmon 3.4.There might be slight differences in Netmon 3.3.] To do this, open Netmon and click the Parsers tab. Under Object View, expand parser files and double click any of the .npl files. The source will appear on the right.
The language for Netmon parsers is similar to C++, with a limited set of statements. These are all documented in the Netmon help file, but the ones I found useful are described below. To begin, I started by defining a number of tables. The basic idea of a table is to provide a way to convert a value to a string. For example, one field in an IPP packet is the status of a printer, which is represented by an integer. In order to allow Netmon to show printer states in a readable form, I created a table to convert the values as seen in Figure 1 below.
Table IPPPrinterState //2911-4.4.11
case 3 : "idle";
case 4 : "processing";
case 5 : "stopped";
default : "Unknown Code";
Figure 1: Netmon Table
Each table is defined with the Table keyword, followed by a name for the table. It may optionally be followed by a list of parameters, which I’ll use later. In this case, I added a comment that specified which RFC and section this information comes from. A table consists of a switch statement with a case for each value, and a default for all other cases, much like other programming languages. I created tables like IPPPrinterState for each field that could be represented in an IPP packet from information I found in each of IPP’s RFCs.
Once the tables were complete, I moved on to creating the Protocol portion of the parser. This section of the code provides the logic that iterates through a packet and calls the tables for the appropriate data. This section starts with either the RegisterBefore or RegisterAfter keyword. These are used to determine how your parser is called. Essentially, Netmon takes all of the parsers it has, and compiles them into one continuous binary (.npb) and registration tells Netmon where your parser fits. For my case, I used the following registration code.
[ RegisterAfter (HTTPPayloadData.OCSP, Ipp, Property.HTTPContentType.Contains("application/ipp")) ]
This tells Netmon that, when compiling the parser, it should insert my code right after the code for the OCSP protocol in its HTTPPayloadData parser, my protocol should be called IPP, and it should enter my code path if the HTTP Payload is of content type “application/ipp”. This allows my parser to work a bit differently than the Wireshark IPP parser – Wireshark uses a port number (631) to identify IPP traffic, whereas my code looks at HTTP content types. The advantage of this, for me, is that Windows servers use port 80 for IPP by default, not 631, so in cases with a Windows IPP server, this parser should correctly identify the packets. You may be wondering how or why I chose to register after OCSP. Basically, I knew I needed my code to be registered in the section of code where HTTP does its payload content type processing. So I opened up HTTP’s parser, and searched for the content type analysis. OCSP was the first protocol I found in HTTP’s content type logic, so I used that as the place to insert my protocol.
After the registration comes the Protocol statement. I chose the following.
Protocol IPP = FormatString("Status/OpCode = %s", IPPOperationsSupported(OpID))
This names my protocol IPP and specifies that I want the description of the protocol to display the IPP status code. This way, a user doesn’t need to drill down to find out if this is a print job or a printer status request. You’ll notice FormatString is a function in Netmon that is similar to printf. In this case, I am passing a variable (OpID, which is defined lower in my code) to my IPPOperationsSupported table to determine what this OpCode means. Before I had a parser, I would need to look up the operations supported values in the IPP RFC for each packet I examined.
Next is the body of the protocol. Basically, this consists of a series of fields (like variables) that define how a packet is laid out. Creating a field is similar to declaring a variable in C++. You start by choosing a data type that matches the size of the data in the packet and provide a name for that field. For example, Figure 2 shown below contains the first seven lines of my Protocol.
struct Version = FormatString("%i.%i", Major, Minor)
INT16 OpId = FormatString("%s (0x%X)", IPPOperationsSupported(this), this);
Figure 2: Code in the protocol block
The IPP specification states that all packets begin with two 8-bit values, the first value specifies the major protocol version in use, and the second value specifies the minor. In this case, I wrapped both in a struct so Netmon will display them as “Version: 1.0”, instead of separately as “Major: 1” “Minor: 0” on two lines. After the version is a 16-bit field that specifies the operation requested (for example, print-job or get-printer-state). I choose to display this value by looking it up in the IPPOperationsSupported table, then printing it as the string, followed by the hex value (e.g. “Get-Printer-Attributes (0xB)”). The ‘this’ keyword simply uses the value of the current field, which in this case is the OpId. Even though Netmon parses through the packet sequentially, this kind of use of a Field before its value is retrieved is allowed. Finally, I set the RequestId field, which is a 32-bit int value. Since this field is just a transaction ID for this conversation, I don’t need to do any formatting to it.
After that, things got a little more complicated. IPP’s structure allows for a variable number of attribute groups, each of which can contain a variable number of attributes. For example, in response to the request “Get-Printer-Attributes” from the client, the server responds with the Printer Attributes group, which contains a number of attributes like printer-state, queued-job-count, and so on. First, I needed to deal with the attribute groups in a loop until I’d read each one. IPP specifies that the end of the attribute groups is specified with the value of 0x03, so I wrote a while loop to create attribute groups until FrameData[FrameOffset] is equal to 3 (See Figure 3 below). FrameData and FrameOffset are special values provided by Netmon. FrameData is an array of the entire contents of the frame, and FrameOffset is Netmon’s current location in the packet. I use this instead of declaring a field here because referencing FrameData[FrameOffset] does not advance the parser frame location. This is important because I want to consume that value further down.
Inside that loop, I declared another struct that contains an attribute group. Much like the Protocol IPP line above, we reference a field here that will be declared lower down. This line does not advance the FrameOffset, since we don’t declare a field here. The first line of this struct is the field declaration line that finally consumes the attribute group tag. Below that is another While loop to process all attributes in the attribute group. IPP differentiates between attributes and attribute groups by making all attribute group identifiers smaller than 0x10, and all attribute identifiers 0x10 or higher. I use this as the condition for my loop. Finally, I declare an Attribute struct inside this loop. This struct is displayed after looking up how to properly print based on the Attribute Name and Value in the AttribDisplayTable.
IPP declares attributes as an 8-bit type identifier (int, bool, string, etc.), a 16-bit int specifying the attribute name’s length, the name (a string), a 16-bit in value length, and a value. Since I want to look up the value in various tables, depending on the Attribute Name, I store the Attribute Name as “AttName” in a property. This way, I can continue to reference it while processing continues. Properties are declared in brackets just above the field they will store. In my case, I prepend the ‘Post.’ evaluation keyword to the property name. This instructs Netmon to use the end result of the next line as its value, but before advancing the FrameOffset. I do this again for the actual value, which I call Val. If I did not use the Post evaluation keyword, Val would contain the unsigned int32 value of printer state, instead of the formatted string result I get by looking up printer state in its table.
While [FrameData[FrameOffset] != 0x03]
struct AttributeGroup = FormatString("%s", IPPTags(TagGroup))
INT8 TagGroup = FormatString("%s (0x%X)", IPPTags(this), this);
While [FrameData[FrameOffset] >= 0x10]
struct Attribute = AttribDisplayTable(AttName, Val)
INT8 Type = FormatString("%s (0x%X)", IPPTags(this), this);
case "printer-state" :
UINT32 PrinterState = FormatString("%s (0x%X)", IPPPrinterState(this),this);
Figure 3: Loops in protocol block
My case statements continue like printer-state for all possible attributes of IPP. At the very end of the protocol block, after I’ve closed my switch, structs, and whiles, I have one more line, which consumes any data remaining in the packet. This would contain document data if the packet was a print job, and is required so all the packet data is consumed before Netmon moves on to the next packet. That line is:
BLOB (FrameLength - FrameOffset) Data;
As you can see, it is a binary blob data type, set to the size of the frame, less our current location.
Finally, after my Protocol block, I needed to define my own data type. IPP defines its own data format to specify printer uptime, so I created a data type for it as shown below in Figure 4.
//Uptime format spec
Size = 4;
DisplayFormat = (this != 0) ? FormatString("%i days %02i:%02i:%02i (%i seconds)",
this) : "0"
Figure 4: Custom data type
The first line of Figure 4 specifies this will be a data type composed of numeric data named UPTIME. Size specifies how many bytes the type uses. DisplayFormat is what Netmon displays for this type. In this case, I use the x ? y : z syntax. Netmon doesn’t have if/then/else keywords, but instead uses this ternary operator. I use a special case for 0 since it seems to be a common return value in the traces I’ve looked at, and having ‘Uptime: 0 days 00:00:00 (0 seconds)’ seemed excessive.
Figures 5 and 6 below show what the result looks like in Netmon.
Figure 5: Frame Summary
Figure 6: Frame Detail
So what did the trace show? Windows attempts to send IPP requests with no authentication first, then if it receives an access denied, retries with authentication. This is by design, as the IPP server replies with the authentication types it supports in the access denied message. In the case of print jobs that are too large to fit in a single packet, IPP’s spec allows servers to either issue the access denied message as soon as it receives the first packet, or after it has received the entire job. It turns out that the IPP Print Provider on Windows was designed to send the entire job before listening for a response, so it missed the access denied message that CUPS sent after it received the first packet. See http://support.microsoft.com/kb/976988/ for related information. Want a copy of the IPP parser? It will be included in a future release of the Netmon Parser Update.
I hope this post have given you a better idea of how Netmon works, how IPP works, and helps if you ever need to write a parser for your protocol.