Rubato and Chord

Reiley's technical blog

  • Rubato and Chord

    Using Function Evaluation in WinDBG

    • 1 Comments

    People who develop debuggers would know in theory you cannot have a perfect disassembler (especially for x86) and stepper (especially for Step Over). People who develop commercial debuggers would know Function Evaluation (a.k.a. funceval) is a big challenge while implementing an Expression Evaluator. And people who develop the Visual Studio Debugger would face other difficulties - Interop Debugging, Edit & Continue.

    In this article, I'm not going to explain the bloody details of funceval, I just demonstrate how to use funceval in WinDBG and how powerful it is.

    Previously we mentioned the .call command in Microsoft Binary Technologies, at that time we were not able to invoke the function since we don't have private symbols - funceval requires private symbol since debugger needs to understand the calling convention, which is stripped out in public symbol.

    While I cannot use private symbols writting articles for this blog (private symbol is Microsoft privacy, also debugging without private symbol is much more fun), the way I'd take is to create a proxy DLL:


    #include <Windows.h>
    
    VOID WINAPI SetLastError(DWORD dwErrCode){}
    

    Now compile the code into a DLL, with PDB file generated:

    cl.exe funceval.cpp /D UNICODE /Fd /GS- /LD /Od /Zi /link /NOENTRY /NODEFAULTLIB /RELEASE /SUBSYSTEM:CONSOLE

    In order to use the proxy DLL, we will use the following approach:

    1. Launch a debug session.
    2. Allocate memory from debugee process.
    3. Inject the proxy DLL into the allocated memory (note that we skipped PE relocation).
    4. Load private symbol of the proxy DLL.
    5. Use WinDBG .call command to kick off funceval from proxy DLL.
    6. Change the IP register to the real address we want to execute.
    7. Start evaluating.

    Here is the automation script, enjoy!

    $$ cdb.exe -xe cpr -c "$$>a< .\funceval.txt" notepad.exe
    
    .echo [Launch Script]
    
    $$ change the following value to the size of funceval.dll
    r $t1 = 0n1536
    
    bp @$exentry; g
    
    .echo [Allocate Memory]
    .foreach ( token { .dvalloc @$t1 } ) {
    	aS alias token
    	.block {
    		.if ($spat("${alias}", "[0-9a-f]+")) {
    			r $t2 = 0x${alias}
    		}
    	}
    	ad /q alias
    }
    
    .printf "[Load Helper DLL(base address = %p, size = %p)]\n", @$t2, @$t1
    .readmem funceval.dll @$t2 (@$t1+@$t2-1)
    
    .block {
    	.sympath .
    }
    
    $$.symopt+ 0x40
    
    .reload /s /f funceval.dll=$t2
    
    $$.symopt- 0x40
    
    .echo [Function Evaluation]
    .call /s funceval!SetLastError kernel32!SetLastError(7777)
    
    g
    
    !gle
    
    .dvfree @$t2 0
    

    And here is the output from my machine:

    [Allocate Memory]
    [Load Helper DLL(base address = 00020000, size = 00000600)]
    Reading 600 bytes.
    [Function Evaluation]
    Thread is set up for call, 'g' will execute.
    WARNING: This can have serious side-effects,
    including deadlocks and corruption of the debuggee.
    LastErrorValue: (Win32) 0x1e61 (7777) - <Unable to get error code text>
    LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0
    Freed 0 bytes starting at 00020000
  • Rubato and Chord

    A Debugging Approach to Application Verifier

    • 0 Comments

    Application Verifier, also known as AppVerifier, is a dynamic instrumentation tool for user mode applications. It is free available from SDK/PSDK, with a set of GUI applications and DLL extensions, plus a good document.

    Let's begin by adding the most famous application - notepad.exe - from the appverif.exe GUI, and launch notepad.exe from WinDBG:

    windbg.exe notepad.exe

    ModLoad: 00620000 00650000   notepad.exe
    ModLoad: 77c00000 77d80000   ntdll.dll
    Page heap: pid 0xE10: page heap enabled with flags 0x3.
    AVRF: notepad.exe: pid 0xE10: flags 0x80643027: application verifier enabled
    ModLoad: 10350000 103b0000   C:\Windows\syswow64\verifier.dll
    Page heap: pid 0xE10: page heap enabled with flags 0x3.
    AVRF: notepad.exe: pid 0xE10: flags 0x80643027: application verifier enabled
    ModLoad: 5cca0000 5cccb000   C:\Windows\SysWOW64\vrfcore.dll
    ModLoad: 0f820000 0f878000   C:\Windows\SysWOW64\vfbasics.dll
    ModLoad: 75330000 75440000   C:\Windows\syswow64\kernel32.dll
    ModLoad: 75c40000 75c86000   C:\Windows\syswow64\KERNELBASE.dll
    ModLoad: 76ee0000 76f80000   C:\Windows\syswow64\ADVAPI32.dll
    ModLoad: 75fd0000 7607c000   C:\Windows\syswow64\msvcrt.dll

    Like we've mentioned in A Debugging Approach to IFEO, the loader code in NTDLL knows how to initialize application verifier.

    windbg.exe -xe cpr notepad.exe

    0:000> sxeld verifier
    0:000> g
    Page heap: pid 0x1DBC: page heap enabled with flags 0x3.
    AVRF: notepad.exe: pid 0x1DBC: flags 0x80643027: application verifier enabled
    ModLoad: 105f0000 10650000   C:\Windows\syswow64\verifier.dll
    eax=00000000 ebx=77d07e00 ecx=00000000 edx=00000000 esi=7efdd000 edi=00000000
    eip=77c1fc42 esp=0018f3a8 ebp=0018f790 iopl=0         nv up ei pl zr na pe nc
    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
    ntdll!ZwMapViewOfSection+0x12:
    77c1fc42 83c404          add     esp,4
    0:000> k
    ChildEBP RetAddr
    0018f3a8 77ca6fa3 ntdll!ZwMapViewOfSection+0x12
    0018f790 77ca7c29 ntdll!AvrfMiniLoadDll+0x3d1
    0018f7c4 77ca1075 ntdll!AVrfInitializeVerifier+0x252
    0018f7fc 77c80759 ntdll!LdrpInitializeApplicationVerifierPackage+0xab
    0018f878 77c45383 ntdll!LdrpInitializeExecutionOptions+0x222
    0018fa08 77c452d6 ntdll!LdrpInitializeProcess+0x261
    0018fa58 77c39e79 ntdll!_LdrpInitialize+0x78
    0018fa68 00000000 ntdll!LdrInitializeThunk+0x10

    By reading the disassembled code, it's obvious that ntdll!RtlOpenImageFileOptionsKey is used to retrieve the IFEO related information. NTDLL would read from IFEO to see if the application is registered, and whether application verifier is enabled in GlobalFlag (GFLAG). If GFLAG & 0x100 is non-zero, NTDLL would load verifier.dll from %windir%\system32 or %windir%\syswow64, depending on the target bitness.

    0:000> sxeld
    0:000> g
    Page heap: pid 0x1DBC: page heap enabled with flags 0x3.
    AVRF: notepad.exe: pid 0x1DBC: flags 0x80643027: application verifier enabled
    ModLoad: 0f6f0000 0f71b000   C:\Windows\SysWOW64\vrfcore.dll
    eax=00000000 ebx=00000000 ecx=0018f600 edx=0018f601 esi=7efdd000 edi=0018f628
    eip=77c1fc42 esp=0018f4fc ebp=0018f550 iopl=0         nv up ei pl zr na pe nc
    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
    ntdll!ZwMapViewOfSection+0x12:
    77c1fc42 83c404          add     esp,4
    0:000> k
    ChildEBP RetAddr
    0018f4fc 77c3beec ntdll!ZwMapViewOfSection+0x12
    0018f550 77c3c578 ntdll!LdrpMapViewOfSection+0xc7
    0018f644 77c3c3a9 ntdll!LdrpFindOrMapDll+0x333
    0018f7c4 77c3c4d5 ntdll!LdrpLoadDll+0x2b2
    0018f7fc 77ca746f ntdll!LdrLoadDll+0xaa
    0018f850 77ca7aaa ntdll!AVrfpLoadAndInitializeProvider+0x6f
    0018f874 77c8117c ntdll!AVrfInitializeVerifier+0xd3
    0018fa08 77c452d6 ntdll!LdrpInitializeProcess+0xfba
    0018fa58 77c39e79 ntdll!_LdrpInitialize+0x78
    0018fa68 00000000 ntdll!LdrInitializeThunk+0x10

    NTDLL would further check if IFEO has a REG_SZ value named VerifierDlls. If VerifierDlls is found, it's value will be splitted into DLL names, and these DLLs would be loaded into the target process one by one.

    0:000> dt ntdll!IMAGE_DOS_HEADER 0f6f0000
       +0x000 e_magic          : 0x5a4d
       +0x002 e_cblp           : 0x90
       +0x004 e_cp             : 3
       +0x006 e_crlc           : 0
       +0x008 e_cparhdr        : 4
       +0x00a e_minalloc       : 0
       +0x00c e_maxalloc       : 0xffff
       +0x00e e_ss             : 0
       +0x010 e_sp             : 0xb8
       +0x012 e_csum           : 0
       +0x014 e_ip             : 0
       +0x016 e_cs             : 0
       +0x018 e_lfarlc         : 0x40
       +0x01a e_ovno           : 0
       +0x01c e_res            : [4] 0
       +0x024 e_oemid          : 0
       +0x026 e_oeminfo        : 0
       +0x028 e_res2           : [10] 0
       +0x03c e_lfanew         : 0n240

    0:000> dt ntdll!_IMAGE_NT_HEADERS OptionalHeader.AddressOfEntryPoint 0f6f0000+0n240
       +0x018 OptionalHeader                     :
          +0x010 AddressOfEntryPoint                : 0x2c86

    0:000> ln 0f6f0000+0x2c86
    (0f6f2c86)   vrfcore!DllMain   |  (0f642ca7)   vrfcore!VerifierOpenLayerProperties
    Exact matches:
        vrfcore!DllMain = <no type information>

    The above steps can be automated using script:

    $$ cdb.exe -xe cpr -c "$$>a< .\appverif.txt" notepad.exe
    
    .echo [Launch Script]
    
    sxeld vrfcore; g; sxdld
    
    $$ get vrfcore.dll base address
    r $t1 = vrfcore
    
    .if @$ptrsize == 8 {
    	aS IMAGE_NT_HEADERS _IMAGE_NT_HEADERS64
    } .else {
    	aS IMAGE_NT_HEADERS _IMAGE_NT_HEADERS
    }
    
    $$ get OEP offset
    .block {
    	r $t2 = @@c++(((ntdll!${IMAGE_NT_HEADERS}*)(@$t1 + ((ntdll!_IMAGE_DOS_HEADER*)@$t1)->e_lfanew))->OptionalHeader.AddressOfEntryPoint)
    }
    
    $$ break at OEP
    bp @$t1 + @$t2; g; bc 0
    
    .echo [Hit OEP]
    k
    
    .echo [Arguments]
    dd esp L4
    

    Now we've successfully located the OEP (Original Entry Point, we mentioned that in Data Breakpoints) for vrfcore.dll, set a breakpoint.

    When we hit the breakpoint on vrfcore!DllMain, take a look at the top frame and it showed the second argument passed in is 4:

    DllMain(HINSTANCE hinstDLL = 0f6f0000, DWORD fdwReason= 4, LPVOID lpvReserved)

    It looks like fdwReason = 4 is undocumented on MSDN:

    • DLL_PROCESS_DETACH = 0
    • DLL_PROCESS_ATTACH = 1
    • DLL_THREAD_ATTACH = 2
    • DLL_THREAD_DETACH = 3

    By looking at the disassembled code, the following instructions looks suspecious:

    vsvrfcore!_DllMain:
        mov     edi, edi
        push    ebp
        mov     ebp, esp
        mov     eax, dword ptr [ebp+0Ch]
        push    ebx
        push    esi
        xor     esi, esi
        push    edi
        xor     edi, edi
        inc     esi
        sub     eax, edi
        je      vrfcore!_DllMain+0x426

    vrfcore!_DllMain+0x18:
        dec     eax
        je      vrfcore!_DllMain+0x2aa

    vrfcore!_DllMain+0x1f:
        dec     eax
        je      vrfcore!_DllMain+0x29b

    vrfcore!_DllMain+0x26:
        dec     eax
        je      vrfcore!_DllMain+0x28c

    vrfcore!_DllMain+0x2d:
        dec     eax
        jne     vrfcore!_DllMain+0x283

    vrfcore!_DllMain+0x34:
        mov     ebx, dword ptr [ebp+10h]
        cmp     ebx, edi
        jne     vrfcore!_DllMain+0x79

    vrfcore!_DllMain+0x131:
        mov     edi, offset vrfcore!VfCoreProvider
        mov     dword ptr [ebx], edi
        call    vrfcore!VfCoreProviderInitialize

    From the highlighted code we can see when fdwReason is 4, the following assignment would happen:

    *lpvReserved = (LPVOID)(&vrfcore!VfCoreProvider)

    By looking into the Win2003R2 DDK headers, we can find out the layout defintion for Verifier Provider Descript. So it's time to write a small provider now.


    #define WIN32_LEAN_AND_MEAN
    #include <Windows.h>
    
    // Borrowed from Win2003R2 DDK
    
    #define DLL_PROCESS_VERIFIER 4
    
    typedef VOID (NTAPI * RTL_VERIFIER_DLL_LOAD_CALLBACK) (PWSTR DllName, PVOID DllBase, SIZE_T DllSize, PVOID Reserved);
    typedef VOID (NTAPI * RTL_VERIFIER_DLL_UNLOAD_CALLBACK) (PWSTR DllName, PVOID DllBase, SIZE_T DllSize, PVOID Reserved);
    typedef VOID (NTAPI * RTL_VERIFIER_NTDLLHEAPFREE_CALLBACK) (PVOID AllocationBase, SIZE_T AllocationSize);
    
    typedef struct _RTL_VERIFIER_THUNK_DESCRIPTOR {
      PCHAR ThunkName;
      PVOID ThunkOldAddress;
      PVOID ThunkNewAddress;
    } RTL_VERIFIER_THUNK_DESCRIPTOR, *PRTL_VERIFIER_THUNK_DESCRIPTOR;
    
    typedef struct _RTL_VERIFIER_DLL_DESCRIPTOR {
      PWCHAR DllName;
      DWORD DllFlags;
      PVOID DllAddress;
      PRTL_VERIFIER_THUNK_DESCRIPTOR DllThunks;
    } RTL_VERIFIER_DLL_DESCRIPTOR, *PRTL_VERIFIER_DLL_DESCRIPTOR;
    
    typedef struct _RTL_VERIFIER_PROVIDER_DESCRIPTOR {
      DWORD Length;
      PRTL_VERIFIER_DLL_DESCRIPTOR ProviderDlls;
      RTL_VERIFIER_DLL_LOAD_CALLBACK ProviderDllLoadCallback;
      RTL_VERIFIER_DLL_UNLOAD_CALLBACK ProviderDllUnloadCallback;
      PWSTR VerifierImage;
      DWORD VerifierFlags;
      DWORD VerifierDebug;
      PVOID RtlpGetStackTraceAddress;
      PVOID RtlpDebugPageHeapCreate;
      PVOID RtlpDebugPageHeapDestroy;
      RTL_VERIFIER_NTDLLHEAPFREE_CALLBACK ProviderNtdllHeapFreeCallback;
    } RTL_VERIFIER_PROVIDER_DESCRIPTOR, *PRTL_VERIFIER_PROVIDER_DESCRIPTOR;
    
    // ntdll!DbgPrint
    typedef ULONG (__cdecl* PFN_DbgPrint)(PCH, ...);
    PFN_DbgPrint DbgPrint;
    
    // Here we go
    typedef BOOL (WINAPI* PFN_CloseHandle)(HANDLE);
    BOOL WINAPI ThunkCloseHandle(HANDLE hObject);
    
    static RTL_VERIFIER_THUNK_DESCRIPTOR aThunks[] = {{"CloseHandle", NULL, ThunkCloseHandle}, {}};
    static RTL_VERIFIER_DLL_DESCRIPTOR aDlls[] = {{L"kernel32.dll", 0, NULL, aThunks}, {}};
    static RTL_VERIFIER_PROVIDER_DESCRIPTOR vpd = {sizeof(RTL_VERIFIER_PROVIDER_DESCRIPTOR), aDlls};
    
    BOOL WINAPI ThunkCloseHandle(HANDLE hObject)
    {
      BOOL fRetVal = ((PFN_CloseHandle)(aThunks[0].ThunkOldAddress))(hObject);
      DbgPrint("CloseHandle(%p) = %s\n", hObject, fRetVal ? "TRUE" : "FALSE");
      return fRetVal;
    }
    
    BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, PRTL_VERIFIER_PROVIDER_DESCRIPTOR* pVPD)
    {
      switch(fdwReason)
      {
      case DLL_PROCESS_ATTACH:
        ::DisableThreadLibraryCalls(hinstDLL);
        break;
      case DLL_PROCESS_DETACH:
        break;
      case DLL_PROCESS_VERIFIER:
        DbgPrint = (PFN_DbgPrint)::GetProcAddress(::GetModuleHandle(TEXT("NTDLL")), "DbgPrint");
        DbgPrint("CommandLine: %s\n", ::GetCommandLineA());
        *pVPD = &vpd;
        break;
      default:
        ::DebugBreak(); // loader lock, be careful!!!
      }   return TRUE; }

    To compile the source code, use the following command line:

    cl.exe avhook.cpp /D UNICODE /GS- /LD /Od /link /ENTRY:DllMain /NODEFAULTLIB /RELEASE /SUBSYSTEM:CONSOLE kernel32.lib

    Copy the generated avhook.dll DLL to %windir%\system32 or %windir%\syswow64 folder, depending on the bitness, and import the IFEO entry into registry:


    Windows Registry Editor Version 5.00
    
    [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\notepad.exe]
    "VerifierDlls"="avhook.dll"
    "GlobalFlag"="0x100"
    "VerifierFlags"=dword:80000000

    Launch notepad.exe from debugger:

    windbg.exe notepad.exe

    And here is what we got:

    Executable search path is:
    ModLoad: 00160000 00190000   notepad.exe
    ModLoad: 77070000 771f0000   ntdll.dll
    Page heap: pid 0x158C: page heap enabled with flags 0x2.
    AVRF: notepad.exe: pid 0x158C: flags 0x48004: application verifier enabled
    ModLoad: 714a0000 71500000   C:\Windows\syswow64\verifier.dll Page heap: pid 0x158C: page heap enabled with flags 0x2.
    AVRF: notepad.exe: pid 0x158C: flags 0x48004: application verifier enabled
    ModLoad: 715c0000 715c5000   C:\Windows\SysWOW64\hook.dll 
    ...
    
    CloseHandle(00000320) = TRUE
    CloseHandle(0000032C) = TRUE
    CloseHandle(00000330) = TRUE
    CloseHandle(0000034C) = TRUE
    CloseHandle(00000328) = TRUE
    CloseHandle(000001C8) = TRUE

    As you could see, our provider DLL is working as expected :)

  • Rubato and Chord

    Collection of WinDBG resources

    • 0 Comments

    A list of resources related to WinDBG, debugging on Windows NT, or how to write a debugger.

    Websites

    Blogs

    Tools

  • Rubato and Chord

    A Note for Binary Hooking and Instrumentation

    • 0 Comments

    One intern in my team has been working on a utility, which makes use of binary instrumentation. So I think it's time to recap on that.

    Understand the Fundamentals

    As we mentioned in Microsoft Binary Technologies and Debugging, there are many binary technologies. Most of these technologies can be used either statically (patch and write back to the disk) or dynamically (hotpatching in memory at runtime). Which one to choose really depends on the requirement.

    In most cases, API hooking would be sufficient since it captures the skeleton of execution flow, as well as the inputs and outputs.

    API hooking normally happens on the callee side using trampoline technology like Detours. PE header (esp. delay loading, address fix up) and calling convention are must known, and being able to read a bit assembly language is a bonus.

    Sometimes you would like to hook API from caller side, normally this would be hijacking the PE import directory. This would eliminate the internal invocations since they are not routed via IAT (e.g. by adding trampoline to a function which invokes itself recursively, each recursion would go through the trampoline thunk).

    When it goes deeper such like implementing a profiler or a code coverage analysis tool, Extended Basic Block (EBB) Analysis is unavoidable. This requires solid knowledge over disassembler, compiler backend and linker (code generation, optimization, symbol file, etc.).

    If the instrumentation happens in kernel mode, special things need to be considered, such as page-in and page-out, IRQL and spinlock. For SSDT level hooking these normally wouldn't become a problem.

    Understand the Runtime and Environment

    The art of instrumentation is to live well within the target process.

    The instrumentation code would consume additional resources and might introduce side effects:

    • Time consumption
    • Memory consumed by the instrumentation code
    • Private working set increased due to modifications on copy-on-write pages
    • Stack used by trampoline
    • Stack pollution
    • Module dependency
    • Synchronization
    • Global states
    • SEH and VEH

     

  • Rubato and Chord

    x86 Linear Address Space Paging Revisited

    • 0 Comments

    Last time we revisited x86 segment addressing, which translates logical-address into linear-address. As we mentioned earlier, two stages of address translation would be used to arrive at a physical address: logical-address translation and linear address space paging.

    Paging in x86 is optional and is controlled by CR0.PG. If paging is disabled (CR0.PG = 0), the linear-address would be mapped directly into the physical address space of processor. When protection is enabled (CR0.PE = 1), paging can be turned on by having CR0.PG = 1.

    As we mentioned, paging is optional, then why do we need paging? I think there are several reasons:

    1. Paging can be used to implement virtual memory, in the good old day virtual memory is a crucial component of operating systems, because physical memory is small and expensive.
    2. Paging allows the physical address space to be noncontiguous. The system will have less physical memory fragmentation problems to deal with.
    3. Paging can be used to implement some tricky algorithms, such like a high performance ring buffer with faked contiguous space.
    4. Paging makes it possible to implement features such like copy-on-write, on demand commit, and fine-grained access control.

    When paging was first introduced to x86 family with the 80386 processor, there was only one paging mode, and the page size will always be 4KB. A lot of features were added as time moves on, such like PAE (Physical Address Extension), PSE (Page Size Extension) and 64bit support. Based on whether certain features are enabled or not, the CPU will determine which paging mode and page size to use.

    No matter which paging mode is used, the concept is the same - hierarchical paging structures will be used. Paging always starts from CR3 register, which holds the physical address of the first paging structure, and each paging structure is always 4KB in size. During each step, a portion of the linear address will be used to select an entry from a paging structure, this happens recursively until the entry maps a page instead of referencing another paging structure.

    According to the "Intel 64 and IA-32 Architectures Software Developer's Manual", there are three paging modes:

    • 32-Bit Paging
    • PAE Paging
    • IA-32e Paging

    32-Bit Paging

    Each paging entry is 4 bytes in size, there are 1024 entries in each paging structure.

    The translation process uses 10 bits at a time from a 32-bit linear address:

    1. Bits 31:22 identify the first paging structure, which is known as PDE.
    2. Bits 21:12 identify the second paging structure, which is known as PTE.
    3. Bits 11:0 are the page offset within the 4-KByte page frame.

    If PSE enabled, each page is 4-MByte in size, which would reduce one level of indirection (which in turns reduce the TLB pressure):

    PAE Paging

    Each paging entry is 8 bytes in size, there are 512 entries in each paging structure.

    The first paging structure is an exception, which is 32 bytes in size and contains 4 64-bit entries.

    The translation process uses 9 bits at a time from a 32-bit linear address, except for the first paging structure:

    1. Bits 31:30 identify the first paging structure.
    2. Bits 29:21 identify the second paging structure.
    3. Bits 20:12 identify the third paging structure, which is the page frame.
    4. Bits 11:0 are the page offset within the 4-KByte page frame.

    If PSE enabled, each page is 2-MByte in size.

    IA-32e Paging

    Each paging entry is 8 bytes in size, there are 512 entries in each paging structure.

    The translation process uses 9 bits at a time from a 48-bit linear address, except for the first paging structure:

    1. Bits 47:39 identify the first paging structure.
    2. Bits 38:30 identify the second paging structure.
    3. Bits 29:21 identify the third paging structure.
    4. Bits 20:12 identify the fourth paging structure, which is the page frame.
    5. Bits 11:0 are the page offset within the 4-KByte page frame.

    If PSE enabled in PDE, each page is 2-MByte in size.

    If PSE enabled in PDPTE, each page is 1-GByte in size.

    The main reason of having PSE and large page is to reduce the load of Translation Lookaside Buffer (TLB). However, this requires contiguous physical memory, which would be a problem when physical memory got fragmented (in Windows NT the memory manager would defrag physical memory in kernel mode when contiguous physical memory is required, which is very time consuming).

    Now we finished the introduction, and I would recommend some exercises:

    1. Let’s get physical
    2. Flags and Large Pages
    3. Non-PAE and X64

     

  • Rubato and Chord

    Process and Job Objects

    • 0 Comments

    Just like we mentioned in The Main Thread Problem, some questions do not have direct answer just because they are invalid by definition.

    Today, the invalid question would be:

    How do I kill a process tree in Windows?

    Unfortunately, the question is invalid, since Windows by design doesn't keep a tree of process creation relationship. Each process does have a parent process ID (except for the Windows Session Manager SMSS.exe), however this information is not going to change when the parent process got terminated.

    To verify this, simply run tlist.exe -t and see the rootless processes, at least the following processes don't have a parent on my Win7 machine:

    csrss.exe (536)
      conhost.exe (6036) CicMarshalWnd
      conhost.exe (8432) CicMarshalWnd
    winlogon.exe (672)
    explorer.exe (5640) Program Manager

    Now back to the question, probably the intention was to kill all processes spawned from or forked by a certain process. If this is the case, Job Object might help. But please be cautious:

    1. It is possible to use CreateProcess with CREATE_BREAKAWAY_FROM_JOB.
    2. One process can be assigned to only one job.
    3. Job objects cannot be nested.

    2 and 3 are subject to change in the near future.

    Another option I could think of is to hook process creations and maintain our own data structure.

    This could be done in either user mode or kernel mode, one possible approach could be:

    #include <ddk/ntddk.h>
    
    extern "C" DDKAPI NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath);
    static DDKAPI VOID CreateProcessNotifyRoutine(HANDLE hPPID, HANDLE hPID, BOOLEAN bCreate);
    static DDKAPI VOID DriverUnload(PDRIVER_OBJECT pDriverObject);
    
    DDKAPI NTSTATUS DriverEntry(PDRIVER_OBJECT pDriverObject, PUNICODE_STRING pRegistryPath)
    {
      NTSTATUS retval;
      pDriverObject->DriverUnload = DriverUnload;
      retval = PsSetCreateProcessNotifyRoutine(CreateProcessNotifyRoutine, FALSE);
      return retval;
    }
    
    DDKAPI VOID DriverUnload(PDRIVER_OBJECT pDriverObject)
    {
      PsSetCreateProcessNotifyRoutine(CreateProcessNotifyRoutine, TRUE);
    }
    
    DDKAPI VOID CreateProcessNotifyRoutine(HANDLE hPPID, HANDLE hPID, BOOLEAN bCreate)
    {
      DbgPrint("%s(PPID=%u, PID=%u)", bCreate ? "CreateProcess" : "TerminateProcess", hPPID, hPID);
    }
    

    I wouldn't recommend the driver way for several reasons:

    1. Modern Windows doesn't allow you to install an unsigned driver module.
    2. Antivirus software may complain since they are making use of similar approaches.
    3. It's easier to crash the whole system if you pay less attention (e.g. the sample code itself has a bug of race condition during unload, I'll leave this as a pop quiz to our readers).

     

  • Rubato and Chord

    CRT Startup

    • 0 Comments

    In my previous blog Early Debugging, we've demonstrated how early can you get using a user mode debugger.

    Normally we don't want to be such early, there are some other places we would want to start with:

    • OEP (Original Entry Point) of the EXE module. WinDBG has a predefined Pseudo-Register called $exentry which makes it a lot easier, as we already mentioned previously in Data Breakpoint.
    • The startup or initialization of runtime. I've covered the managed runtime startup in Yet Another Hello World.

    Now let's talk a bit about the native C/C++ Runtime. When you start writing applications using C/C++ on Windows, normally you would be using CRT already, unless you explicitly tell the linker not to use it, like what I did in A Debugging Approach to IFEO.

    The CRT (C Runtime Library) comes with Windows and Visual C++ Redistributable (let's not talk about the special version which serves CLR), also you can link a static version into your EXE/DLL.

    CRT provides the fundamental C++ runtime support, some obvious features are:

    • setup the C++ exception model
    • making sure the constructor of global variables get called before entering main function
    • parse command line arguments, and call the main function
    • initialize the heap
    • setup the atexit chain

    Let's get to the code:

    /* crtexport.cpp */
    
    #define WIN32_LEAN_AND_MEAN
    
    #include <Windows.h>
    
    class CFoobar
    {
    public:
      CFoobar()
      {
        OutputDebugString(TEXT("CFoobar::CFoobar()\n"));
      }
      ~CFoobar()
      {
        OutputDebugString(TEXT("CFoobar::~CFoobar()\n"));
      }
    };
    
    CFoobar g_foobar;
    
    __declspec(dllexport)
    BOOL WINAPI Foobar()
    {
      return TRUE;
    }
    
    BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpvContext)
    {
      switch(fdwReason)
      {
      case DLL_PROCESS_ATTACH:
        OutputDebugString(TEXT("DLL_PROCESS_ATTACH\n"));
        break;
      case DLL_PROCESS_DETACH:
        OutputDebugString(TEXT("DLL_PROCESS_DETACH\n"));
        break;
      case DLL_THREAD_ATTACH:
        OutputDebugString(TEXT("DLL_THREAD_ATTACH\n"));
        break;
      case DLL_THREAD_DETACH:
        OutputDebugString(TEXT("DLL_THREAD_DETACH\n"));
        break;
      default:
        DebugBreak();
      }
      return TRUE;
    }
    

    Note: don't put DebugBreak inside DLL entry point as I do, unless you understand that the loader lock would make JIT debugger unhappy.

    /* crtimport.cpp */
    
    #define WIN32_LEAN_AND_MEAN
    
    #include <Windows.h>
    
    BOOL WINAPI Foobar();
    
    int main()
    {
      Foobar();
      return 0;
    }
    
    

    cl.exe /LD /Zi crtexport.cpp

    cl.exe /Zi crtimport.cpp crtexport.lib

    Set two breakpoints, one at DllMain and one at the main function, then launch the application in Visual Studio Debugger:

    Since our DLL is statically imported, the entry point of DLL is executed before the entry point of EXE.

    As you might have noticed, the actual OEP is _DllMainCRTStartup. You can double click on the crtexport.dll!_DllMainCRTStartup frame and bring up the CRT startup code to start reading - on my machine the startup code is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\dllcrt0.c.

    Also, by taking a look at the Output window, we can see that CFoobar::CFoobar() has already been called, which means the global object was initialized before entering our DllMain. This is of course done by the CRT initialization code in __DllMainCRTStartup, which understands the contract between compiler and runtime.

    Now you understand how the constructor of global variables gets called, think about the destructor semantic:

    1. Is it possible that global variable got destructed in a different thread?
    2. What if there is an exception thrown from the global variable constructor/destructor invocation?

    The actual OEP for the EXE is __tmainCRTStartup. You can double click on the crtimport.exe!__tmainCRTStartup frame and take a look at the code - on my machine the startup code is located at C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\crt\src\crt0.c.

    As we mentioned in The Main Thread Problem, __tmainCRTStartup runs in the "main thread", and would kill all the other threads before it is going to destroy the global variables. One thing to mention is that CRT makes use of _endthreadex instead of calling ExitThread directly, since _endthreadex would destruct objects constructed on the stack and free the related TLS data, while ExitThread knows nothing about the _tiddata block.

    A few more questions:

    1. What if different versions of CRT are loaded into a single process?
      1. mixing debug and release version of CRT
      2. mixing static and dynamic version of CRT
      3. mixing different major version of CRT
    2. What would happen if there is an exception thrown across module boundary (e.g. from a DLL function to the caller which belongs to EXE)?
    3. Can I use CRT functions without initializing CRT?

     

  • Rubato and Chord

    x86 Segment Addressing Revisited

    • 0 Comments

    Memory segmentation was first introduced to x86 family with 8086, to make it possible to access 1MB physical memory under 16bit addressing mode.

    Real Mode

    Logical address points directly into physical memory location.

    Logical address consists of two parts: segment and offset.

    Physical address is calculated as Segment * 16 + Offset, and if the A20 line is not enabled, the physical address is wrapped around into 1MiB range.

    Protected Mode

    Two stages of address translation would be used to arrive at a physical address: logical-address translation and linear address space paging.

    Logical address consists of two parts: segment selector and offset.

    Segment selector is used to determine the base address as well as the access rights. Segment selelctors are maintained in a table known as GDT (Global descriptor-table) or LDT (Local descriptor-table), which is referenced by GDTR or LDTR register.

    When a thread is running in kernel mode, fs:[0] points to the PCR (Processor Control Region). Let's find it by manually translating the logical address:

    kd> r fs
    fs=00000030
    
    kd> .formats @fs
    Evaluate expression:
      Hex:     00000030
      Decimal: 48
      Octal:   00000000060
      Binary:  00000000 00000000 00000000 00110000
      Chars:   ...0
      Time:    Thu Jan 01 08:00:48 1970
      Float:   low 6.72623e-044 high 0
      Double:  2.37152e-322
    

    According to the x86 architecture specification, we have Index = 6 (0000000000110), Table Indicator = GDT (0) and RPL (Requested Privilege Level) = 0.

    kd> dd @gdtr
    8003f000  00000000 00000000 0000ffff 00cf9b00
    8003f010  0000ffff 00cf9300 0000ffff 00cffb00
    8003f020  0000ffff 00cff300 200020ab 80008b04
    8003f030  f0000001 ffc093df 00000fff 0040f300
    8003f040  0400ffff 0000f200 00000000 00000000
    8003f050  a0000068 80008954 a0680068 80008954
    8003f060  2f40ffff 00009302 80003fff 0000920b
    8003f070  700003ff ff0092ff 0000ffff 80009a40
    

    We can get the layout of segment descriptor from x86 architecture specification:

    Now we can verify the address translation by looking at the memory contents:

    kd> dg @fs
                                      P Si Gr Pr Lo
    Sel    Base     Limit     Type    l ze an es ng Flags
    ---- -------- -------- ---------- - -- -- -- -- --------
    0030 ffdff000 00001fff Data RW Ac 0 Bg Pg P  Nl 00000c93
    kd> dd fs:[0]
    0030:00000000  805495d0 80549df0 80547000 00000000
    0030:00000010  00000000 00000000 00000000 ffdff000
    0030:00000020  ffdff120 0000001c 00000000 00000000
    0030:00000030  ffff2050 80545bb8 8003f400 8003f000
    0030:00000040  80042000 00010001 00000001 00000064
    0030:00000050  00000000 00000000 00000000 00000000
    0030:00000060  00000000 00000000 00000000 00000000
    0030:00000070  00000000 00000000 00000000 00000000
    
    kd> dd ffdff000
    ffdff000  805495d0 80549df0 80547000 00000000
    ffdff010  00000000 00000000 00000000 ffdff000
    ffdff020  ffdff120 0000001c 00000000 00000000
    ffdff030  ffff2050 80545bb8 8003f400 8003f000
    ffdff040  80042000 00010001 00000001 00000064
    ffdff050  00000000 00000000 00000000 00000000
    ffdff060  00000000 00000000 00000000 00000000
    ffdff070  00000000 00000000 00000000 00000000

    It's time to dump the KPCR structure:

    kd> dt nt!_KPCR ffdff000
       +0x000 NtTib            : _NT_TIB
       +0x01c SelfPcr          : 0xffdff000 _KPCR
       +0x020 Prcb             : 0xffdff120 _KPRCB
       +0x024 Irql             : 0x1c ''
       +0x028 IRR              : 0
       +0x02c IrrActive        : 0
       +0x030 IDR              : 0xffff2050
       +0x034 KdVersionBlock   : 0x80545bb8 Void
       +0x038 IDT              : 0x8003f400 _KIDTENTRY
       +0x03c GDT              : 0x8003f000 _KGDTENTRY
       +0x040 TSS              : 0x80042000 _KTSS
       +0x044 MajorVersion     : 1
       +0x046 MinorVersion     : 1
       +0x048 SetMember        : 1
       +0x04c StallScaleFactor : 0x64
       +0x050 DebugActive      : 0 ''
       +0x051 Number           : 0 ''
       +0x052 Spare0           : 0 ''
       +0x053 SecondLevelCacheAssociativity : 0 ''
       +0x054 VdmAlert         : 0
       +0x058 KernelReserved   : [14] 0
       +0x090 SecondLevelCacheSize : 0
       +0x094 HalReserved      : [16] 0
       +0x0d4 InterruptMode    : 0
       +0x0d8 Spare1           : 0 ''
       +0x0dc KernelReserved2  : [17] 0
       +0x120 PrcbData         : _KPRCB
    
    kd> !pcr
    KPCR for Processor 0 at ffdff000:
        Major 1 Minor 1
    	NtTib.ExceptionList: 805495d0
    	    NtTib.StackBase: 80549df0
    	   NtTib.StackLimit: 80547000
    	 NtTib.SubSystemTib: 00000000
    	      NtTib.Version: 00000000
    	  NtTib.UserPointer: 00000000
    	      NtTib.SelfTib: 00000000
    
    	            SelfPcr: ffdff000
    	               Prcb: ffdff120
    	               Irql: 0000001c
    	                IRR: 00000000
    	                IDR: ffff2050
    	      InterruptMode: 00000000
    	                IDT: 8003f400
    	                GDT: 8003f000
    	                TSS: 80042000
    
    	      CurrentThread: 80552840
    	         NextThread: 00000000
    	         IdleThread: 80552840
    
    	          DpcQueue: 
    

     

  • Rubato and Chord

    The Main Thread Problem

    • 1 Comments

    Every few months I heard people asking the same question:

    Given a process ID (or handle), how can I get its main thread ID (or handle)?

    Normally that would raise another question:

    What is the definition of a main thread?

    While the Windows operating system doesn't have a concept called main thread, and threads donnot have parent-child relationship at all.

    Let's reuse the sample code from Pop Quiz - Debug Event Loop and Timeslice Quota:

    1. The WinMain function would return right after it creates two worker threads.
    2. The two worker threads run into an endless loop.

    If we compiled the code using cl.exe test.cpp, the generated test.exe would return immediately after we run it. If we take a quick debug, the call stack would look like this:

    test!ILT+10(WinMain)
    test!__tmainCRTStartup+0x154
    kernel32!BaseThreadInitThunk+0xd
    ntdll!RtlUserThreadStart+0x1d

    That's because the compiler has made the decision that we need to use CRT initialization, although actually we are not using it either explicitly or implicitly. It is the CRT exit code which called ntdll!RtlExitUserProcess and terminated our worker threads, and this is by design.

    Now let's switch to the following command:

    cl test.cpp /link /NODEFAULTLIB /ENTRY:WinMain /SUBSYSTEM:CONSOLE kernel32.lib

    As you can see, test.exe would enter an endless loop.

    Now let's try to give some possible definitions of main thread:

    1. The thread in which the CRT startup and exit code runs. (what if we are not using CRT at all? what if we start without CRT, then load a DLL that triggered the CRT initialization?)
    2. The thread which pumps window message for the main window. (what if we are not a GUI application? what in turn is the definition of a main window, and can we have two main windows?)
    3. The thread which runs through the OEP (Original Entry Point, we mentioned that in Data Breakpoints). (what if OEP has been covered several times?)
    4. The thread in which DllMain function get called with DLL_PROCESS_ATTACH and lpReserved is not NULL.
    5. The oldest thread.

    It looks like option 4 and 5 have the most clean definition. If we use option 4 then we should stay with the facts that a process may not have a main thread, and that's why we would normally end up with option 5.

    Which one do you prefer and what is your own definition? Which option do you think the Visual Studio Debugger would use?

     

  • Rubato and Chord

    What is Autos Window?

    • 0 Comments

    The developers in Microsoft have done a great job by bringing a great number of nice features, however, some of these features are poorly documented or even not documented at all.

    Autos Window in the Visual Studio Debugger is one of the best example of the gaps between implementation and documentation. I'm sure you have seen this window before, as it's shown by default while you are debugging, if not, you can always find it from the Debug menu:

    The MSDN document hasn't been updated since Visual Studio .NET 2003, which can be found at http://msdn.microsoft.com/en-us/library/aa290702.aspx:

    The Autos window displays variables used in the current statement and the previous statement. (For Visual Basic, it displays variables in the current statement and three statements on either side of the current statement.) 

    The document does leave us with some questions:

    1. What's the definition of variable? Is this the same concept as we are talking about the compiler frontends?
    2. What's the definition of used? What happens if I have a parameter that never got reference inside the function?
    3. What's the definition of statement? Is this the same concept as we are talking about parsing in compiler frontends?
    4. What's the meaning of previous statement? Is it based on the execution order (note that some statements donnot execute at all, such like plain variable definition) or just a lexical thing?
    5. How should I interpret "three statements on either side of the current statement"?

    Generally speaking, debuggers wouldn't care about source code, it knows nothing about the C++ preprocessing or the syntax (the only exception is the expression evaluator, which would be another topic). The magic of source debugging was brought by symbol files. The private PDB contains the path and checksum of source files, as well as the line number information (PDB actually supports line and column number, and this has been used in C# already), debugger just read from the symbols to get these information.

    Let's open Visual Studio 2010 and take a look at the following snippet in C:

    int x;
    int y;
    int z;
    int a;
    int b;
    int c;
    int main(int argc)
    {
      return b + c;
    }
    

    Set a breakpoint at the line of return statement, bring up the Autos Window:

    First, it looks like the ordering in the Autos Window is based on variable name, so we got another question "what would happen if there is a duplicated variable name?".

    Now let's switch to another snippet:

    int foo(int* x)
    {
      return *x;
    }
    
    int main(int argc)
    {
      return foo(&argc);
    }
    

    When we hit the breakpoint, the Autos Window would be:

    Step into the function foo and step out, now the Autos Window would look like:

    As you see, the return value is displayed as "foo returned" in the Autos Window, if you right click and select Add Watch, you will be welcomed with an error message CXX0013: Error: missing operator.

    Okay, I've given enough questions and hints, now it is time to try out by yourself, hopefully you could understand a bit more on what the Autos Window is and the way it works.

Page 2 of 4 (35 items) 1234