• Ntdebugging Blog

    Storport ETW Logging to Measure Requests Made to a Disk Unit

    • 10 Comments

    Greetings NTDEBUGGING community, Bob here again, and today I would like to let everyone know about a new feature implemented in the STORPORT.SYS binaries released in the following KB articles:

    ·         KB 979764 (Windows Server 2008)

    ·         KB 978000 (Windows Server 2008 R2)

     

    Beginning with these new versions, it is now possible to measure timing statistics for requests made to a system’s disk unit.  These measurements are taken at the lowest possible level of OS interaction with the storage adapter hardware, making it much easier to diagnose storage performance issues. The measurements are taken from the port driver, STORPORT.SYS, via ETW (Event Tracing for Windows).  Windows drivers are implemented in a layered architecture, so in our case the STORPORT driver interfaces directly with the adapter or miniport driver.  The STORPORT driver is responsible for managing requests and queues, and providing other upper level functions.  Each miniport driver is specific to a particular storage adapter card, and knows how to deliver requests to the disk unit over the transport medium, like fibre channel for instance.

    Configuring and Starting STORPORT ETW Logging

    To measure storage performance, the user enters a threshold value via the GUI interface. (This can be scripted as well; see KB 978000 for details.)  When an I/O time is equal to or greater than the entered threshold, an event 13 is written to the ETW log.  

    Here is a walkthrough to enable this feature via the GUI:

    Click Start, type “perfmon”, and press ENTER to start Performance Monitor.

    image

     

    Expand “Data Collector Sets” and select “Event Trace Sessions”.

    image

     

    Right-Click on “Event Trace Sessions” and select “New -> Data Collector Set”. 

    image

     

    The following dialog will appear:

    image

     

    Give the new data collector set a name in the dialog box.  In this example I called it “Storport Performance Logging”. Choose the “Create manually (Advanced)” option then click Next to see the following dialog.

    image

     

    After clicking “Add...” on the dialog above the following list of providers will appear. Select “Microsoft-Windows-StorPort” and click OK.

    image

     

    When the dialog below opens, select “Keywords (Any)” then click Edit.

    image

     

    Check the box for IOPerfNotification, and then click OK. Note:  For Windows Server 2008 SP1 and SP2, some properties may not be present (e. g. Reset, PortNotification and PauseResume). 

    image

     

     

    Next, select “Filter” and click Edit…

    image

     

    This is where we enter our time threshold.  Remember that timing values greater than or equal to this filter value will be logged to the trace file.

     

    Select “Binary” and check the “Filter enabled” box

    image

    In the “Filter data” field, enter the threshold value in the number of milliseconds. This must be entered in little endian format. Please refer to this table for the most common entries.

    Decimal

    Hexadecimal

    Binary(little endian)

    1ms

    5ms

    10ms

    15ms

    1

    5

    A

    F

    01 00 00 00 00 00 00 00

    05 00 00 00 00 00 00 00

    0A 00 00 00 00 00 00 00

    0F 00 00 00 00 00 00 00

     

    Note: The “Filter type” value will always remain 0 as in the example above showing an entry for 1 ms. 

     

    WARNING - The whole data line needs to be filled in when entering the threshold. For demonstration purposes here is the WRONG WAY to enter 256 ms. This will not work because of the missing 00 entries on the line.

    image 

    Below is the correct way to enter the value (256ms) Note: The Filter value may have to be re-entered for next run because it could get reset.

    image

    After clicking OK, you will see a new Event Trace session (screenshot below).  Right-Click the new Event Trace session and select “Properties”

    image

     

    Next we need to set a location for the trace file.  Ensure you have sufficient disk space in the target folder because logs can be many Megabytes in size depending on the number of hits. 

    image

     

    The target folder may be changed in the “Directory” tab

    image

     

    I recommend choosing a directory that is more easily located, like C:\PerfLogs, in which case you can click “Browse”

    image

     

    Note:  It is not recommended to store the trace file on the same storage you are troubleshooting.

    image

    The name for the file is selected on the “File” tab.

    image

    When you are done setting the file and directory information click OK.

    Now right-click the log and select “Start” to begin tracing.

    image

    The “Status” should indicate that it is running.

    image

    How long to run the log depends on you issue.  You should let the log run until the issue or slow down occur then check the log for delays.

    Viewing the Log File

    First right-click and select “Stop” to flush any cached data and stop the logging.

    image

    Then use Event Viewer to examine the log.

    image

     

    Next choose "Open the saved log…"

    image

     

    image

    image

     

    After clicking OK a popup dialog will be displayed to ask if you want to convert the log. Select “No”. Here is some sample event data that you will likely see in the log.

    image

     

    Example: Event ID 13

    image

     

    Let's examine the “Details” tab which shows information about the I/O that exceeded or equaled the threshold we had set.  If you do not set a threshold, then all I/O events are logged, but for performance reasons we recommend setting a "reasonable" threshold like 15ms.

    image

    Breaking down the Event Data we have …

     

    Miniport Extension: (Kernel address of the miniport extension, not of use to performance logging, only debugging)

    PortNumber:  This is the adapter port number (i.e. RaidPort1, etc.)

    PathID:  Path number if multi-pathing is in place

    TargetID:  Target LUN exposed to the Operating System

    LUN:  The Logical Unit Number of the physical storage where the particular I/O is directed

    *RequestDuration:  How long the request took in milliseconds (the number of note here for storage latency diagnostics)

    Command:  the SCSI command sent to the unit. (Usually read or write)  This is listed in decimal.  Convert this value to hexadecimal with Calc.exe for referencing SCSI commands http://en.wikipedia.org/wiki/SCSI_command

    ScsiStatus:  the status that came back from the drive

    SRBStatus:  the status as returned from adapter  (see srb.h and scsi.h in the Microsoft WDK or http://en.wikipedia.org/wiki/SCSI_Status_Code)

     

     Example: Command 42 equals 2A which is a Write of 10 bytes.

    image

    image

    Here’s the respective SCSI command:

    image

    When viewing the log, the commands will be mostly reads and writes.  If I/O is determined to be slow at this level, then the issue is most likely disk or SAN related.

    Hope you enjoyed the walkthrough!

    -Bob

     

     

     

     

     

     

     

  • Ntdebugging Blog

    How to use the DedicatedDumpFile registry value to overcome space limitations on the system drive when capturing a system memory dump

    • 4 Comments

    Prior to Windows Vista and Windows Server 2008 we had to keep a large page file on the system drive (typically drive C:) in order to properly capture system memory dumps for troubleshooting.  This presented problems as systems with very large amounts of RAM became more common, resulting in requirements for very large amounts of free space on the C: drive, or requiring that system visible memory be artificially limited for troubleshooting purposes.  This is no longer a requirement thanks to the Dedicated Dump File feature, which is available for use in Windows Vista and later operating systems.

     

    What is it?

    The dedicated dump file is basically a page file that is reserved for use only by the system crash dump routines.  It is not used for paging virtual memory.  Like a page file, the system process keeps an open handle to the dedicated dump file, which prevents it from being deleted.  When you manually initiate a memory dump, or the system crashes on its own, the data is written into the dedicated dump file instead of the page file on the system drive.  This feature is enabled by setting the following registry value:

    Location:   HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl

    Name: DedicatedDumpFile

    Type:  REG_SZ

    Value: A dedicated dump file together with a full path, such as D:\dedicateddumpfile.sys

    Where can it be stored?

    The dedicated dump file can be stored on any local volume that can support a page file.  This volume must be considered a fixed disk device and must not be hosting Windows clustered storage.  For example, this means you can store this file on a second partition of your server’s primary hard disk, or on a SAN volume that is mounted early in the boot process. 

    Fun fact: Unlike a page file, you can actually direct the dedicated dump file to a sub-folder, even one that is actually a volume mount point.  However, the sub-folder must pre-exist during boot, and there is generally little reason to actually do this.

     

    How much space will it use?

    You can automatically allow the system to determine the size of the dedicated dump file, or you can manually set the size by using the DumpFileSize registry value. 

     

    System determined size:

    If you choose to let the system determine the size, then regardless of the dump file type set via the CrashDumpEnabled value, the dedicated dump file will be roughly the size of system visible RAM plus 256 MB.  The extra 256 MB allows room for secondary dump data that may be written by driver bug check callback routines.  In order to write this data, the drivers would have to first call KeRegisterBugCheckReasonCallback() to register the routines that would execute when the system bug checks.   This is currently not very common due to the limitations imposed by the high IRQL at which bug check callbacks run.  Allowing the system to determine the size is a good idea, especially when disk space is not of huge concern.  Also, the dedicated dump file size will be adjusted automatically at boot if additional RAM is installed or removed later.  This includes artificial memory reduction implemented via the BCDEdit option, truncatememory.  This eliminates the need to manually resize your page file to support proper collection of a memory dump.

     

    Manual size specified via the registry:

    The following registry value will allow you to manually set the dedicated dump file size in megabytes:

    Location:   HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl

    Name: DumpFileSize

    Type:  REG_DWORD

    Value: The dump file size in megabytes

     

     

    One good reason one might have to set this value manually is in the case of a very large memory system from which only a kernel memory dump is desired.  In this case, a dedicated dump file the size of RAM+256 MB is not required.  Generally, even on 64-bit systems, kernel memory dumps are no larger than a few gigabytes, although they have the potential to be much larger.  In nearly all cases reserving one third the size of RAM will be more than sufficient to capture kernel memory.  If you have plenty of room on the drive and want to play it safe, you can simply allow the system to determine the size for you by setting the DumpFileSize value to zero, or by not creating the value at all.

    Although you may set the DumpFileSize to a value larger than needed, or the system chooses a size larger than the type of dump selected, the resultant file size after a bug check will only be as large as the data that is stored in it.  This is most commonly the size of physical RAM + room for the dump header information.  If you choose to set the DumpFileSize value manually, then you should also set the IgnorePagefileSize value under the CrashControl key to 1.  This will help ensure a dump file is created.

     

    What about the final dump file?

    After a bug check, the data will be written to the dedicated dump file on disk.  However, the final memory dump location will be based on the following registry value:

    Location:   HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl

    Name: DumpFile

    Type:  REG_EXPAND_SZ

    Value: The target dump file name together with the full path, such as D:\memory.dmp

     

     

    The system will extract the dump data from the dedicated dump file and save the resultant file to the location and file name specified for the DumpFile value.  The default for this value is %SYSTEMROOT%\MEMORY.DMP.  Since we are using the dedicated dump file to overcome space limitations of the system drive, we will likely want to also set the DumpFile value to point to an alternate location than the default. 

    If you set this value to point to the same volume as the dedicated dump file, it will save WERFAULT.EXE the step of performing an additional copy of the temporary dump file after reboot.  Instead the file will simply be renamed on the target volume.   If there is not enough room for both the dedicated dump file and the memory dump after the bug check, then the dedicated dump file will not be recreated, and only the memory dump will remain.  If this happens and the system bug checks again, only the original dump will remain because there will be no dedicated dump to write the second bug check to until space is freed up on the drive and the system is rebooted.

     

    DedicatedDumpFile Usage Example:

    The following example will set the dedicated dump file and resultant dump to be stored on the root if drive D:

     

    Windows Registry Editor Version 5.00

     

    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl]

    "DumpFile"=hex(2):44,00,3a,00,5c,00,4d,00,45,00,4d,00,4f,00,52,00,59,00,2e,00,\

      44,00,4d,00,50,00,00,00

    "DedicatedDumpFile"="D:\\DedicatedDump.sys"

     

     

    (Note: The lines above can be saved as a .REG file and imported into the registry)

     

    The screenshot below demonstrates a system configured to capture a complete 4 GB memory dump to a manually sized dedicated dump file, and will store the final MEMORY.DMP on the D: drive.

     image

    Important information:

    949052 - Kernel memory dump files may not be generated on Windows Server 2008-based and Windows Vista SP1 or later based computers when physical memory is larger than the size of the page file

     

    957517 - A dedicated complete memory dump file may not be successfully generated if the volume that stores the dedicated dump file has insufficient free space

    950858 - Dedicated dump files are unexpectedly truncated to 4 GB on a computer that is running Windows Server 2008 or Windows Vista and that has more than 4 GB of physical memory

     

  • Ntdebugging Blog

    Upcoming Blog ARTICLES

    • 3 Comments

    Working on the Global Escalation Services Team at Microsoft is really a cool gig. We’re privileged to work on several different Windows components at a very deep level so life is never boring. Here’s a list of the articles coming from the group in the next few weeks. Hope you enjoy!

    Ron Stock

    STORPORT Logging - Bob Golding discusses Storport logging. Beginning with the new versions, it is now possible to measure timing statistics for requests made to a system’s disk unit. These measurements are taken at the lowest possible level of OS interaction with the storage adapter hardware, making it much easier to diagnose storage performance issues.

    Walking through a crash dump analysis – Chad Beeder walks through a debug.

    Understanding !PTE Part 2 and Part 3 – Continuation of Ryan’s 3 part series

    How DPM implements VSS – Dennis Middleton gives a high level view of how DPM implements the VSS API’s to manage replicas. 

    How Hyper-V does backups of running VM’s - Another article from Dennis

    Pushlocks – Mark Lloyd discusses why it's important to use only published APIs.

    Debugger Extension Blog – Ryan’s work with debugger extensions.

  • Ntdebugging Blog

    A Case of Mysterious Memory Corruption

    • 2 Comments

    Hi, this is Chad from the OEM team. You may remember me from such posts as “Debugging a bluescreen at home.”

    Some time ago I debugged a bluescreen on a friend’s home computer, and I thought the results were interesting enough to share. My friend had an older Windows XP system that had been experiencing random crashes for a while. I had looked at a memory dump or two, and while there wasn’t enough information to pinpoint a specific cause, I noticed what appeared to be pool corruption, so I recommended he enable Driver Verifier against all third-party drivers on the system in an effort to track down the offending driver.

    (You can learn more about Driver Verifier in the following Knowledge Base article: “Using Driver Verifier to identify issues with Windows drivers for advanced users”).

    With Verifier enabled the machine crashed with a bugcheck, and I debugged the memory dump. As always, I start with the “!analyze -v” command:

    1: kd> !analyze -v

    *******************************************************************************

    *                                                                             *

    *                        Bugcheck Analysis                                    *

    *                                                                             *

    *******************************************************************************

     

    DRIVER_PAGE_FAULT_IN_FREED_SPECIAL_POOL (d5)

    Memory was referenced after it was freed.

    This cannot be protected by try-except.

    When possible, the guilty driver's name (Unicode string) is printed on

    the bugcheck screen and saved in KiBugCheckDriver.

    Arguments:

    Arg1: 88328eac, memory referenced

    Arg2: 00000000, value 0 = read operation, 1 = write operation

    Arg3: 86c6929b, if non-zero, the address which referenced memory.

    Arg4: 00000000, (reserved)

     

    Debugging Details:

    ------------------

     

    READ_ADDRESS:  88328eac Special pool

     

    FAULTING_IP:

    +5283c2a1

    86c6929b 8b423c          mov     eax,dword ptr [edx+3Ch]

     

    MM_INTERNAL_CODE:  0

     

    DEFAULT_BUCKET_ID:  DRIVER_FAULT_DELL

     

    BUGCHECK_STR:  0xD5

     

    PROCESS_NAME:  services.exe

     

    TRAP_FRAME:  f7516bf0 -- (.trap 0xfffffffff7516bf0)

    ErrCode = 00000000

    eax=00000000 ebx=88328e70 ecx=00000003 edx=88328e70 esi=806e6410 edi=86c31af8

    eip=86c6929b esp=f7516c64 ebp=f7516ca4 iopl=0         nv up ei pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246

    86c6929b 8b423c          mov     eax,dword ptr [edx+3Ch] ds:0023:88328eac=????????

    Resetting default scope

     

    LAST_CONTROL_TRANSFER:  from 8052037a to 804f9f43

     

    STACK_TEXT: 

    f7516b70 8052037a 00000050 88328eac 00000000 nt!KeBugCheckEx+0x1b

    f7516bd8 80544588 00000000 88328eac 00000000 nt!MmAccessFault+0x9a8

    f7516bd8 86c6929b 00000000 88328eac 00000000 nt!KiTrap0E+0xd0

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    f7516ca4 86c695ec 86cc71bd 86d3c318 88328e70 0x86c6929b

    f7516ccc ba631459 f7516cf8 86c69605 86d3cbc8 0x86c695ec

    f7516cd4 86c69605 86d3cbc8 88328e70 88328e70 sr!SrPassThrough+0x31

    f7516cf8 8057f982 f7516d64 0007f964 80579e64 0x86c69605

    f7516d0c 80579ec1 86d3cbc8 88328e70 86c31af8 nt!IopSynchronousServiceTail+0x70

    f7516d30 8054162c 00000578 00000000 00000000 nt!NtQueryDirectoryFile+0x5d

    f7516d30 7c90e514 00000578 00000000 00000000 nt!KiFastCallEntry+0xfc

    0007f9ac 00000000 00000000 00000000 00000000 0x7c90e514

     

     

    STACK_COMMAND:  kb

     

    FOLLOWUP_IP:

    nt!IopSynchronousServiceTail+70

    8057f982 807d1c00        cmp     byte ptr [ebp+1Ch],0

    A good way to get more information about a particular bugcheck code is to search for it in the Windows Debugger help file. Under the entry “Bug Check 0xD5: DRIVER_PAGE_FAULT_IN_FREED_SPECIAL_POOL” we learn that this particular bugcheck occurs when “The Driver Verifier Special Pool option has caught the driver accessing memory which was earlier freed.”

    So, as a result of having previously enabled Driver Verifier, we have some memory allocations coming out of Special Pool. (Incidentally, you can use the “!verifier” command in the debugger to get a list of which drivers are being verified and various information about them.) Accesses to Special Pool memory undergo additional verification checking, and in this case, the verifier has thrown a bugcheck because the memory in question is free.

    Using !pte against the address in question (88338eac) shows that, in fact, it’s not a valid virtual address at all:

    1: kd> !pte 88328eac

                        VA 88328eac

    PDE at C0602208            PTE at C0441940

    contains 000000000676C963  contains FFFFFFFF00000000

    pfn 676c      -G-DA--KWEV   not valid

                                PageFile:  0

                                Offset: ffffffff

                                Protect: 0

    So, some code tried to read from a memory location that was completely invalid. This isn’t altogether uncommon, but there is something more unusual about this crash: If we look at the call stack leading up to the crash, the debugger isn’t even displaying a module name for the function that did the bad memory access! Let’s use the .trap command (helpfully supplied in the !analyze output above) to look at the instruction that actually failed, and dump the stack again.

    1: kd> .trap 0xfffffffff7516bf0

    ErrCode = 00000000

    eax=00000000 ebx=88328e70 ecx=00000003 edx=88328e70 esi=806e6410 edi=86c31af8

    eip=86c6929b esp=f7516c64 ebp=f7516ca4 iopl=0         nv up ei pl zr na pe nc

    cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010246

    86c6929b 8b423c          mov     eax,dword ptr [edx+3Ch] ds:0023:88328eac=????????

     

    1: kd> kb

      *** Stack trace for last set context - .thread/.cxr resets it

    ChildEBP RetAddr  Args to Child              

    WARNING: Frame IP not in any known module. Following frames may be wrong.

    f7516ca4 86c695ec 86cc71bd 86d3c318 88328e70 0x86c6929b

    f7516ccc ba631459 f7516cf8 86c69605 86d3cbc8 0x86c695ec

    f7516cd4 86c69605 86d3cbc8 88328e70 88328e70 sr!SrPassThrough+0x31

    f7516cf8 8057f982 f7516d64 0007f964 80579e64 0x86c69605

    f7516d0c 80579ec1 86d3cbc8 88328e70 86c31af8 nt!IopSynchronousServiceTail+0x70

    f7516d30 8054162c 00000578 00000000 00000000 nt!NtQueryDirectoryFile+0x5d

    f7516d30 7c90e514 00000578 00000000 00000000 nt!KiFastCallEntry+0xfc

    0007f9ac 00000000 00000000 00000000 00000000 0x7c90e514

    So, we were doing some file I/O (NtQueryDirectoryFile) and somehow ended up running some code which is loaded in memory around 0x86c6929b. But the debugger isn’t even able to match this code up with a module name. Why not? Well, because there’s nothing in the kernel’s loaded module list that matches up with this address. (You can dump the loaded module list with the “lm” command.)

    This most likely means that this module was not loaded into memory using the standard Win32 APIs, since these APIs would always add the module to the loaded list. Alternately, the loaded module list may have been corrupted or tampered with in some way.

    Let’s do a “!address” on the location of this function to see if we can tell anything more:

    1: kd> !address 0x86c6929b

      86c68000 - 00037000                          

              Usage       KernelSpaceUsageNonPagedPool

    Now things are looking really strange indeed – this code resides in nonpaged pool memory. We do not ordinarily execute code out of pool memory. Worse, it’s not even allocated pool:

    1: kd> !pool 0x86c6929b

    Pool page 86c6929b region is Nonpaged pool

    86c69000 is not a valid small pool allocation, checking large pool...

    unable to get pool big page table - either wrong symbols or pool tagging is disabled

    86c69000 is freed (or corrupt) pool

    Bad previous allocation size @86c69000, last size was 0

    This all looked pretty strange, so I started dumping memory to get a look at the code in question. Turns out there is a module header at the start of the previous page, at 0x86c68000. You can always identify these by the “MZ” string at the beginning of the header. (Fun trivia fact: These are the initials of Mark Zbikowski, the Microsoft developer who originally designed the .exe file format, way back in the early days of MS-DOS.)

    1: kd> dc 0x86c68000

    86c68000  00905a4d 00000003 00000004 0000ffff  MZ..............

    86c68010  000000b8 00000000 00000040 00000000  ........@.......

    86c68020  00000000 00000000 00000000 00000000  ................

    86c68030  00000000 00000000 00000000 00000248  ............H...

    86c68040  0eba1f0e cd09b400 4c01b821 685421cd  ........!..L.!Th

    86c68050  70207369 72676f72 63206d61 6f6e6e61  is program canno

    86c68060  65622074 6e757220 206e6920 20534f44  t be run in DOS

    86c68070  65646f6d 0a0d0d2e 00000024 00000000  mode....$.......

    If you have an executable file in memory, you can dump the header using the “!dh” command. The resulting output is pretty long, so I’ve trimmed some of the output for purposes of this post. But there was one thing that really stood out, highlighted in red  below:

    1: kd> !dh 0x86c68000

     

    File Type: EXECUTABLE IMAGE

    FILE HEADER VALUES

         14C machine (i386)

           5 number of sections

    4A53A574 time date stamp Tue Jul 07 12:43:48 2009

     

           0 file pointer to symbol table

           0 number of symbols

          E0 size of optional header

         102 characteristics

                Executable

                32 bit word machine

     

     

    SECTION HEADER #2

      .rdata name

         106 virtual size

        1980 virtual address

         180 size of raw data

        1980 file pointer to raw data

           0 file pointer to relocation table

           0 file pointer to line numbers

           0 number of relocations

           0 number of line numbers

    48000040 flags

             Initialized Data

             Not Paged

             (no align specified)

             Read Only

     

     

    Debug Directories(1)

           Type       Size     Address  Pointer

           cv           5a        1a2c     1a2c     Format: RSDS, guid, 1, c:\programs\revolution6\innerdrv\objfre_w2k_x86\i386\InnerDrv.pdb

     

     

    Aha! When you build a module, the compiler puts information in the header to help debuggers find the appropriate symbol file. In this case, we can see that this program seems to be called “InnerDrv.” Now we have something to go on!

    A quick Bing search for “InnerDrv.pdb” shows that this particular code is part of a rootkit known as “Pushdo”. My friend’s system had been infected by this malware. In the end, my friend opted to play it safe and simply reformat and reinstall this machine.

     

  • Ntdebugging Blog

    Understanding VSS in DPM

    • 1 Comments

    Data Protection Manager 2007 utilizes Windows’ Volume Shadow Copy Service to protect Data Sources on Production Servers, and to manage Recovery Points on the DPM server. This presentation provides step-by-step illustrations and explanations of how DPM uses VSS to backup a Protected Server’s file share and System State data sources, and then manage the resulting recovery points in DPM.

     

    Get Microsoft Silverlight

  • Ntdebugging Blog

    Understanding !PTE, Part2: Flags and Large Pages

    • 0 Comments

    Hello, it's Ryan Mangipano with part two of my PTE series. Today I'll discuss PDE/PTE flags, the TLB, and show you a manual conversion of x86 PAE Large Page Virtual Addresses to Physical. If you haven’t read the first part of this series please find it here. It's a good primer before proceeding.

    PDE and PTE flags 

    I'll start with a discussion about the PDE/PTE flags. If you recall from part one not all of the bits of the Page Directory Entry (PDE) are related to the index (used to form the pointer to the base of the next level).  This is true of the table entries in all the levels. For example, on a PAE x86 systems only 9 bits of the PTE (page table entry) are used for the index. During our previous conversion,  we only used some of the bits for the index into the next table. The rest of the data, we simply dropped off and replaced with zeros as needed. So what are the other bits used for? They are used for a series of flags. You will observe the state of these flags output by !PTE in the following manner: (-G-DA—KWEV).

    These flags are documented in the Intel Manuals. Intel and AMD reserved some of the flags for use by the Operating System. All of these are also documented in chapter 9 (Memory Management) of “Windows Internals, 5th edition”. Let’s dump the PDE from the virtual address we dissected last time. This will allow you to see some of the flags that are present in the other bits
       

    Obtaining the Virtual Address of the PDE

    1: kd> !pte 0xf9a10054

    VA f9a10054

    PDE at 00000000C0603E68    PTE at 00000000C07CD080

    contains 000000000102D963  contains 0000000002010121

    pfn 102d       -G-DA--KWEV    pfn 2010       -G--A—KREV

     

    Here is the data-type of our PDE

    1: kd> dt nt!_MMPTE u.Hard

       +0x000 u      :

          +0x000 Hard   : _MMPTE_HARDWARE

     

    Dumping the PDE and flags

    1: kd> dt _MMPTE_HARDWARE 00000000C0603E68

    nt!_MMPTE_HARDWARE

       +0x000 Valid            : 0y1

       +0x000 Writable         : 0y1

       +0x000 Owner            : 0y0

       +0x000 WriteThrough     : 0y0

       +0x000 CacheDisable     : 0y0

       +0x000 Accessed         : 0y1

       +0x000 Dirty            : 0y1

       +0x000 LargePage        : 0y0

       +0x000 Global           : 0y1

       +0x000 CopyOnWrite      : 0y0

       +0x000 Prototype        : 0y0

       +0x000 Write            : 0y1

       +0x000 PageFrameNumber  : 0y00000000000001000000101101 (0x102d)

       +0x000 reserved1        : 0y00000000000000000000000000 (0)

     

    Take note of the Letters in the PDE and PTE section of the !pte output, such as -G-DA--KWEV    . These letters represent various flags. The presence or absence of the letter in the !PTE output tells you the state of the flag. These flags can also be seen in the hardware pte output above. 

    Valid                   (V)         - Indicates that the data is located in physical memory. If this flag is not set, then the software can use ALL of the rest of the bits for whatever it wants(like storing the pagefile number and offset where the page is stored.
    Write                 (W/R)     - Indicates if the data is writeable or read-only. Multiprocessor or Vista or later. Hardware bit is documented in the processor manuals. Reserved Bit 11’s use is documented in Windows Internals, Chap. 9.
    Owner               (K/U)      - Indicates if the page is owned kernel mode or usermode. Kernel if cleared. User if set.
    WriteThrough  (T)           - When set indicates Writethrough caching policy. When not set indicates write-back caching policy
    CacheDisable   (N)           - If set, the page translation table or physical page it points to cannot be cached.   
    Accessed           (A)           - Set when the page itself, or the table referencing it has been read from or written to
    Dirty                   (D)          - Indicates if any data on this page has been updated
    LargePage         (L)            - This field is only used on PDEs, not PTEs. It indicates whether or not the PDE is the last table level (meaning that this entry references an actual page in memory) or if it is instead referencing a Page Table. If this bit is set in the PDE, this PDE points directly to a 2-MB page when PAE is in use. If PAE is not being used, the large page size that we are referencing is 4-MB. So basically, this is the page size bit. If this bit is cleared, the final destination page is 4k and can be found in the page table that this PDE points to. If this bit is set, then the final destination page is equal to the size of a large page on your system (2MB when PAE is in use) and can be located using the index value of this particular PDE since it becomes the last level. Keep in mind that a larger offset will be needed to reference all the positions in this large page since it is larger. To use this feature, the PSE bit (bit 4 which is the 5th bit over) must be set in CR4. The setting in CR4 is a global setting, enable the use of large pages on the system. The flag in the PDE only applies to the individual PDE.
    Global                 (G)        - If not set Translation Caching flushes affect this bit. If set, other processes use this translation also, so don’t flush it from the Translation Lookaside Buffer cache upon process context switches.
    CopyOnWrite    (C)       - Intel states this is a software field. Windows uses this for processes to share the same copy of a page. The system will give the process a private copy of this page if there is any attempt to write to the page by the process (by copying it). Any attempt to execute code in this page occurs on a No execute system will cause an access violation.
    Prototype           (P)        - Intel states this is a software field. Windows uses this to indicate that this is a prototype PTE.
    Reserved0                       - These Bits are Reserved
    E                            (E)       - Executable page. E is always displayed on platformst that Do not support hardware No-Execute.

    Inspecting the state of the flags is important when attempting to manually convert addresses from Virtual Addresses to Physical. For example, since the valid bit is not set in the following invalid PTE, all of the fields are available for Windows to use. This means the information in the processor manuals doesn’t apply. Instead it is an nt!_MMPTE_SOFTWARE which references data located in the page file.

    3: kd> !pte b8ae900c

                   VA b8ae900c

    PDE at 00000000C0602E28    PTE at 00000000C05C5748

    contains 000000000B880863  contains 000B8AF500000000

    pfn b880       ---DA--KWEV    not valid

                           PageFile:  0

                           Offset: b8af5

                           Protect: 0

     

    For more information on the different types of invalid PTEs, refer to page 775 of “Windows Internals, 5th edition”.

    Manually Converting x86 PAE Large Page Virtual Address to Physical


    In part one of this blog, we manually translated a PAE  4-KByte Page Virtual Address (VA). Now we are going to manually translate a VA that represents a Large Page from our PAE system. As discussed in the previous section on PTE flags, a large page allocation means that the page size is larger and the PDE points directly to the page itself. The PDE will not point to the base of a page table.  This means that there will be one less level of tables used in the translation.  This also means that more bits will be needed to represent the offsets in the large page.  I found the following address on my system that references a Large Page, 8054099e. Once again, all the required information was obtained from the processor manuals, debugger help file, and Windows Internals Book.

    1: kd> !pte 8054099e

                   VA 8054099e

    PDE at 00000000C0602010    PTE at 00000000C0402A00

    contains 00000000004009E3  contains 0000000000000000

    pfn 400        -GLDA--KWEV    LARGE PAGE pfn 540  

     

    Below is the Virtual Address in binary.

    1: kd> .formats 8054099e

      Binary:  10000000 01010100 00001001 10011110

     

    I have split this VA into it's three parts.

    10                     Page Directory Pointer Table Offset

    000000 010      Page Directory Table Offset

    10100 00001001 10011110     This is the Offset into the large page

     

    Let’s get the base of the Page Directory Pointer Table and indentify which of the four entries we will need to follow.

    1: kd> !dq (@cr3 & 0xffffffe0) + ( 0y10 * 8) L1

    # 23406f0 00000000`06c46801

     

    Now take our address from above, add our zeros and we have the base of Page Directory Table. Then add the offset from our Virtual Address and we'll dump out the PDE.

    1: kd> !dq (6c46801 & 0xFFFFFF000) + ( 0y000000010 * 8) L1

    # 6c46010 00000000`004009e3

     

    Let’s convert the PDE to binary format to analyze the lower 12 bits. This will allow us to analyze the flags. The last Twelve bits (0-11) are used for the PFN. They are used for the flags that we discussed earlier.

    1: kd> .formats (00000000`004009e3 & 0x0000000FFF)

      Binary:  00000000 00000000 00001001 11100011

     

    Let’s analyze the flags from this VA using the information we learned earlier....

    ·         Bit Zero is set indicating that the page is Valid, located in physical memory, and all other bits

    ·         Bit One is set indicating that this page is Writeable (Hardware Field)

    ·         Bit Two is cleared indicating that this is a Kernel Mode Page

    ·         Bit Three is cleared indicating a Write-Back Caching policy (caching of writes to the page is enabled)

    ·         Bit Four is cleared indicating that caching is not disabled for the page.

    ·         Bit Five is set indicating this page has been Accessed

    ·         Bit Six is set indicating that this page is Dirty

    ·         Bit Seven is set indicating that this is a Large Page. This PDE points directly to a page, not a Page Table.

    ·         Bit Eight is set indicating other process share this Global PDE. No Delete upon TLB Cache Flush for process context switches.

    ·         Bit Nine is cleared indicating this page is not Copy-On-Write

    ·         Bit Ten is cleared indicating this is NOT a Prototype PTE

    ·         Bit Eleven is set also indicating this page is Writeable (Reserved Field, See Windows Internals, Chap. 9.)

    ...and compare our findings to the Flags output from !PTE, -GLDA—KWEV. My system doesn’t support No-Execute, so the E is also displayed.  For more information, .hh !PTE in windbg.

    We know this is a Large Page and is Valid, so we can obtain the directory of our 2-MB Large Page (on this PAE system) from this PDE. The Intel Manual states that in our PDE the last 21 bits aren’t part of the address base.

    1: kd> .formats (004009e3 & 0y11111111111000000000000000000000)

      Binary:  00000000 01000000 00000000 00000000

     

     

    So let’s combine the data from the PDE (Highlighted)  with the offset from the VA (Virtual Address).

    00000000 010  10100 00001001 10011110  

    Now I'll remove the spaces, precede this binary value with 0y, and send it to .formats.

    1: kd> .formats 0y00000000010101000000100110011110

      Hex:     0054099e

    We could have obtained the same data in this manner

    1: kd> ? (004009e3 & 0y11111111111000000000000000000000) + (8054099e & 0y00000000000111111111111111111111)

    Evaluate expression: 5507486 = 0054099e

    Now let’s dump the data in memory at this physical address

    1: kd> !db 0054099e

    #  54099e 33 db 8b 75 18 8b 7d 1c-0f 23 fb 0f 23 c6 8b 5d 3..u..}..#..#..]

    #  5409ae 20 0f 23 cf 0f 23 d3 8b-75 24 8b 7d 28 8b 5d 2c  .#..#..u$.}(.],

    #  5409be 0f 23 de 0f 23 f7 0f 23-fb e9 43 ff ff ff 8b 44 .#..#..#..C....D

    Now let’s dump the same data using the virtual address

    1: kd> db 8054099e

    8054099e  33 db 8b 75 18 8b 7d 1c-0f 23 fb 0f 23 c6 8b 5d  3..u..}..#..#..]

    805409ae  20 0f 23 cf 0f 23 d3 8b-75 24 8b 7d 28 8b 5d 2c   .#..#..u$.}(.],

    805409be  0f 23 de 0f 23 f7 0f 23-fb e9 43 ff ff ff 8b 44  .#..#..#..C....D

    So now you can see how I used the debugger to translate virtual addresses to physical adrresess. This concludes part two of this blog and in part three we will cover translation of x86 Non-PAE Virtual Address Translation, x64 Address Translation, and the TLB.

  • Ntdebugging Blog

    Archive of the Debug Ninja’s Twitter debug tips

    • 0 Comments

    Every Wednesday I post a debug tip to our twitter page at www.twitter.com/ntdebugging. This blog is an archive of these tips to allow our readers to find this information easily. We will update this blog every few weeks with the new tips; follow us on twitter if you want to see the tips as I post them.

    The goal of these tips is to share debug commands, and forms of commands (parameters, flags, etc) that my colleagues and I find useful. I hope you can add these commands to your toolkit and they will help you debug more efficiently.

    Tips

    !thread/!process [address] e - on x64 will not show you the meaningless Args to Child information.

    .frame /c [FrameNumber] - sets context to specified stack frame. Provides more reliable information than .trap on x64.

    kn - Dumps call stack with frame numbers, easier than counting stacks for .frame.

    .frame /r [FrameNumber] - same as .frame /c, but shows registers without changing context.

    Note: With .frame /c or /r you can only trust the nonvolatile registers. See http://msdn.microsoft.com/en-us/library/9z1stfyw(VS.80).aspx for vol/nonvol regs.

    k=rbp rip FrameCount - Dumps call stack starting at rbp/rip on x64. Useful when the stack is corrupt. #debug ^DN

    .process/.thread /p /r [address] - sets new process context, sets .cache forcedecodeuser, and reloads user symbols. #debug ^DebugNinja

    !process [address] 17 - Sets the context for this command, avoids the need for .process to see user stacks. Try !process 0 17 #debug ^DN

    ~~[ThreadID]s - Changes threads in user mode. Use Thread ID number from output such as !locks. Ex: ~~[1bd4]s #debug ^DN

    runas /netonly /u:<account> windbg.exe - Launch windbg with domain account. Use when dbg computer isn't in domain and symbol server is. ^DN

    !heap -p -a <address> - Shows information about the heap block containing <address>, even if you aren't using pageheap. #debug ^DN

    ub - Unassembles starting at a location prior to your address. Accepts l<number> to specify how many instructions to go back. ub . l20 ^DN

    !stacks 2 [FilterString] - Finds kernel mode call stacks that contain the FilterString in a symbol. #debug ^DN

    !thread [address] 17 (or 1e on x64) - Sets context for this command, avoids the need for .thread/.process for user stacks. #debug ^DN

    .hh [Text] - Opens the debugger help. [Text] is the topic to lookup in the index. Example: .hh !pte   #debug ^DN

    ?? can dump structs using C++ style expressions. Ex: ??((nt!_KTHREAD*)(0xfffffa800ea43bb0))->ApcState #debug ^DN

    bp /t EThread - Sets a kernel mode breakpoint that only triggers when hit in the context of this thread. #debug ^DN

    bp /p EProcess - Sets a kernel mode breakpoint that only triggers when hit in the context of this process. #debug ^DN

    gc - If you run 'p' and hit a breakpoint, gc takes you where p would have gone if you had not hit the bp.  #debug ^DN

    gu - Go until the current function returns.  Effectively this unwinds one stack frame.  #debug #windbg ^DN

    pc - Steps through until the next 'call' instruction. Combine with other commands to find who returned your error> pc;p;r eax #debug ^DN

    pt - Steps through until the next 'ret' instruction. Similar to gu, but pt stops on the ret and gu stops after the ret. #debug ^DN

    .ignore_missing_pages 1 - supresses the error: "Page 2a49 not present in the dump file. Type ".hh dbgerr004" for details" #debug ^DN

    .exr -1 shows the most recent exception.  Useful in user dumps of crashes, especially for no execute crashes (NX/DEP). #debug ^DN

    wt - Trace calls until they return to the current address. More useful with -or to get return values. Use -l for depth. ^DN #debug

    .thread /w - Changes to the WOW64 32-bit context from 64-bit kernel mode. Wow64exts doesn't work in kernel mode. #debug ^DN

    ??sizeof(structure) - Gets the size of a structure, it's easier than counting. #debug ^DN

    sxe ld:module.dll - Enables an exception which will break into the debugger when module.dll is loaded. #debug ^DN

    vertarget - Shows OS version of the debug target. Also shows machine name, uptime, and session time (when the dump was taken). #debug ^DN

    !vm 1 - In a kernel debugger, shows basic information about memory usage. Available, committed, pagefile, pool, sysptes, etc. #debug ^DN

    .time - Shows session time (when dump was taken) and system uptime. In user mode shows process uptime, kernel/user time. #debug ^DN

    ba w size [address] - Break on write access only. Replace size with the num bytes you want to watch. Ex: ba w 4 005d5f10 #debug ^DN

    .bugcheck - Displays the bugcheck code of a blue screen crash. The format is more concise than !analyze.  #debug ^DN

Page 1 of 1 (7 items)