Rubato and Chord

Reiley's technical blog

August, 2011

  • Rubato and Chord

    Microsoft Binary Technologies and Debugging


    Midway upon the journey of our life I found myself within a forest dark, For the straightforward pathway had been lost.


    In the world of debugging, one could easily get lost without sufficient knowledge of the underlying mechanism. While well known examples being DLL (Dynamic-Link Libraries), FPO (Frame-Pointer Omission), LTCG (Link-time Code Generation), PE/COFF and SEH (Structured Exception Handling), there are many other technologies used by Microsoft:

    • BBT (Basic Block Tools) is a suite of optimization tools designed to help reduce the working-set requirements for a Win32 application by applying advanced static analysis and code layout heuristics, and integrating profile data gathered from monitoring the program execution flow. In addition, BBT rearranges static data and resources sections for additional paging reduction.
    • Detours is a library for instrumenting arbitrary Win32 functions on x86, x64, and IA64 machines. Detours intercepts Win32 functions by re-writing the in-memory code for target functions. The Detours package also contains utilities to attach arbitrary DLLs and data segments (called payloads) to any Win32 binary.
    • Vulcan is a single infrastructure for building a wide range of custom tools for program analysis, optimization, and testing. Through the Vulcan API, developers and testers can build custom tools with very few lines of code for basic block counting, memory tracing, memory allocation, coverage, failure insertion, optimization, compiler auditing etc. Vulcan scales to large commercial applications and has been used to improve the performance and reliability of products across Microsoft.


    The following disassembly is directly related to Detours, MOV EDI, EDI is a placeholder which has 2 bytes for holding a NEAR JMP instruction. The NOP instructions has 5 bytes in total for holding an FAR JMP instruction (x86). In a short words, many Windows system DLLs have Detours in mind. The Visual C++ compiler has a command line option called /hotpatch (Create Hotpatchable Image) which does all the magic.

    7541b4c1 0400            add     al,0
    7541b4c3 90              nop
    7541b4c4 90              nop
    7541b4c5 90              nop
    7541b4c6 90              nop
    7541b4c7 90              nop
    7541b4c8 8bff            mov     edi,edi
    7541b4ca 55              push    ebp

    NTDLL is not using the hot patch approach, the NOP instructions are just for padding to make sure each entry is aligned.

    77236278 b80d010000      mov     eax,10Dh
    7723627d ba0003fe7f      mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)
    77236282 ff12            call    dword ptr [edx]
    77236284 c21400          ret     14h
    77236287 90              nop
    77236288 b80e010000      mov     eax,10Eh
    7723628d ba0003fe7f      mov     edx,offset SharedUserData!SystemCallStub (7ffe0300)
    77236292 ff12            call    dword ptr [edx]
    77236294 c21800          ret     18h
    77236297 90              nop

    With the introduction of KERNELBASE, a lot of kernel32 exported functions were forwarded.

    0:000> .call kernel32!SetErrorMode(1)
                                     ^ Symbol not a function in '.call kernel32!SetErrorMode(1)'
    0:000> u kernel32!SetErrorMode L1
    75ac016d ff25b41da775    jmp     dword ptr [kernel32!_imp__SetErrorMode (75a71db4)]
    0:001> u poi(75a71db4)
    75417991 8bff            mov     edi,edi
    75417993 55              push    ebp
    75417994 8bec            mov     ebp,esp
    75417996 51              push    ecx
    75417997 56              push    esi
    75417998 e836000000      call    KERNELBASE!GetErrorMode (754179d3)
    7541799d 8bf0            mov     esi,eax
    7541799f 8b4508          mov     eax,dword ptr [ebp+8]

    Basic Block Tools

    BBT would merge duplicated blocks, rearrange binary blocks and do a lot crazy things to the symbol files (PDB). Your callstack will look weired as functions might get merged and overlapped, especially if C++ templates are used heavily. You can tell if optimization was performed on basic block level by examining the function body.

    Frame-Pointer Omission

    FPO was introduced with Windows NT 3.51 thanks to 80386 making ESP available for indexing, thus allowing EBP to be used as a general purpose register. But FPO makes stack unwinding unreliable, which in turn makes it painful to debug. You can tell if FPO was used by examining the function prologue/epilogue.

    FPO disabled:

    BOOL WINAPI Foobar()
    55              push ebp
    8B EC           mov  ebp, esp
      return TRUE;
    B8 01 00 00 00  mov  eax, 1
    5D              pop  ebp
    C3              ret

    FPO enabled:

    BOOL WINAPI Foobar()
      return TRUE;
    B8 01 00 00 00  mov  eax, 1
    C3              ret

    FPO information is available from both public and private PDB files, WinDBG has a command kv which can be used to examine this information:

    0:000> kv
    ChildEBP RetAddr  Args to Child              
    002bfdac 75d9339a 7efde000 002bfdf8 76f39ed2 notepad!WinMainCRTStartup (FPO: [0,0,0])
    002bfdb8 76f39ed2 7efde000 7b449f70 00000000 kernel32!BaseThreadInitThunk+0xe (FPO: [Non-Fpo])
    002bfdf8 76f39ea5 005b3689 7efde000 00000000 ntdll!__RtlUserThreadStart+0x70 (FPO: [Non-Fpo])
    002bfe10 00000000 005b3689 7efde000 00000000 ntdll!_RtlUserThreadStart+0x1b (FPO: [Non-Fpo])

    Link-time Code Generation

    LTCG was introduced with the first version of .NET. It can be used with or without PGO (Profile Guided Optimization). If you were debugging optimized C++ application, you should already know that local variables and inline functions can be very different. With LTCG, cross-module inlining is even possible, in addition, calling convention and parameters can be optimized. Similar as BBT, functions might get merged.

    Profile Guided Optimization

    PGO (a.k.a. POGO) does a lot of optimization such as inlining, virtual call speculation, conditional branch optimization. What's more, POGO is able to perform optimizations at extended basic block level.

    Incremental Linking

    The Microsoft Incremental Linker has an option /INCREMENTAL (don't confuse it with an incremental compiler which makes use of precompiled header) which would affect debugging. In fact, the native EnC (Edit and Continue) is built on top of incremental linking technology. Sometimes we may get symbols like module!ILT+0(_main), the ILT (Incremental Link Table) serves the incremental linker by adding a layer of indirection, thus provides the flexibility for binary patching. The bad news is that incremental linker has to generate correct symbols and patch them into PDB as well. The patching process doesn't discard unused symbols in a reliable manner. This would be challenging for debugger authors, since the integrity of symbols is not guaranteed by the MSPDB layer.

    Function Inlining

    Function inlining means there will be no actual call. The stepper and symbol binding components in debugger might get confused.

    Intrinsic Function

    Intrinsic functions are a special kind of function generated by the compiler toolchain (instead of coming from libraries or your code).

  • Rubato and Chord

    MACRO Revisited


    Macro is powerful, but few people understand how it works. In theory, syntax highlighting for C/C++ is impossible due to the presence of Preprocessing Directives FDIS N3290 16 [cpp]. Sometimes I do feel that C++ is a mixture of three languages instead of a single language, I have to keep in mind that there are several Phases of Translation FDIS N3290 2.2 [lex.phases] when I was coding.


    It turns out that most people who have been using the Win32 API and C Runtime Library for years don't know NULL is complicated than it looks. It is defined by both Windows headers and C Runtime headers, and guarded by macro. The reason behind this is to make most people happy (e.g. C++ standard requires NULL to be 0, while Standard C does not).

    /* WinDef.h */
    #ifndef NULL
    #ifdef __cplusplus
    #define NULL  0
    #define NULL  ((void *)0)


    You probably have noticed that I've used this macro extensively in my blogs, here goes the official voice:

    WIN32_LEAN_AND_MEAN excludes APIs such as Cryptography, DDE, RPC, Shell, and Windows Sockets.

    So, if you don't need these APIs, WIN32_LEAN_AND_MEAN would make the life of compiler easier, plus, Precompiled Header, Intellisense and other code analysis tools would also benifit from it.


    UNICODE is used by Windows header files to support generic Conventions for Function Prototypes and Generic Data Types.

    _UNICODE is used by the C Runtime (CRT) header files to support Generic-Text Mappings.

    The following interesting snippet is distilled from ATL headers:

    /* atldef.h */
    #ifdef  _UNICODE
    #ifndef UNICODE
    #define UNICODE         // UNICODE is used by Windows headers
    #ifdef  UNICODE
    #ifndef _UNICODE
    #define _UNICODE        // _UNICODE is used by C-runtime/MFC headers

    TEXT __TEXT and _T _TEXT __T

    The following snippet is distilled from WinNT.h, which can be found from DDK/WDK and SDK/PSDK:

    /* WinNT.h */
    #ifdef  UNICODE
    #define __TEXT(quote) L##quote
    #define __TEXT(quote) quote
    #define TEXT(quote) __TEXT(quote)

    So the following code is correct:

    _tprintf(TEXT("%s") TEXT("\n"), TEXT(__FILE__));

    But this is wrong:

    _tprintf(TEXT("%s" "\n"), __TEXT(__FILE__));

    And if UNICODE is defined, it turns out that you can (evilly) use:

    class LOST
    TEXT(OST) lost;

    The following snippet was distilled from tchar.h, which is a part of CRT:

    /* tchar.h */
    #ifdef  _UNICODE
    #define __T(x)      x
    #define __T(x)      L ## x
    #define _T(x)       __T(x)
    #define _TEXT(x)    __T(x)


    1. Use TEXT if you are using none of the ATL, CRT and MFC.
    2. Use _T if you are using the ATL, CRT and MFC.
    3. Use _TEXT instead of _T if you are not as lazy as me.
    4. Don't use __T and __TEXT unless you have a special reason.


    NDEBUG is a part of the C Language Standard, which controls the behavior of assert:

    /* assert.h */
    #ifdef NDEBUG
    #define assert(_Expression) ((void)0)

    _DEBUG is defined by the Microsoft C++ Compiler when you compile with /LDd, /MDd and /MTd. The runtime libraries such like ATL, CRT and MFC make use of this macro.

    DEBUG is defined in ATL:

    /* atldef.h */
    #ifdef _DEBUG
    #ifndef DEBUG
    #define DEBUG


    WINVER has been existing since 16bit Windows, and is still in using. Note that Windows NT 4.0 and Windows 95 both have WINVER defined as 0x0400.

    _WIN32_WINDOWS is used by Windows 95/98/Me.

    _WIN32_WINNT is used by the whole NT family.

    NTDDI_VERSION was introduced by Windows 2000, as Win9x and NT evolved into a single operating system. Plus, NTDDI_VERSION contains more information and is able to distinguish service packs. The latest sdkddkver.h has all the information you would want to know.

    _WIN32_IE was introduced because Internet Explorer shares many components with the shell (a.k.a. Windows Explorer), installing a new version of Internet Explorer would eventually replace a number of system components and even change the APIs.

    VER_PRODUCTVERSION_W can be found in ntverp.h, which is used by the NT team to maintain the product build.


    1. Use NTDDI_VERSION whenever possible.
    2. Don't use WINVER unless you have special reason.
    3. Forget about _WIN32_WINDOWS unless you are still targeting Win9x or Win32s.
    4. Don't use VER_PRODUCTVERSION_W unless you are writing low level code such like drivers and debugger extensions.

    _X86_ _AMD64_ _IA64_ and _M_AMD64 _M_IX86 _M_IA64 _M_X64

    _M_AMD64, _M_IX86, _M_IA64 and _M_X64 are defined by the Microsoft C++ Compiler according to the target processor architecture. _M_AMD64 and _M_X64 are equivalent.

    _X86_, _AMD64_ and _IA64_ are defined by Windows.h (there is no _X64_ at all, because AMD invented x86-64).

    /* Windows.h */
    #if !defined(_X86_) && !defined(_IA64_) && !defined(_AMD64_) && defined(_M_IX86)
    #define _X86_
    #if !defined(_X86_) && !defined(_IA64_) && !defined(_AMD64_) && defined(_M_AMD64)
    #define _AMD64_

    _WIN32 _WIN64 WIN32 _WINDOWS

    If bitness matters, but we don't care about architecture, we can use _WIN32 and _WIN64 provided by the Microsoft C++ Compiler. This is useful while defining data types and function prototypes. Note that _WIN32 and _WIN64 are not mutual exclusive, as _WIN32 is always defined (unless you are using DDK and writing 16bit code).

    WIN32 is defined by Windows header file WinDef.h, and is not widely used in Windows header files (TAPI being a negative example).

    _WINDOWS is a legacy thing in the 16bit era, you should hardly see it in 21st century.

    /* WinDef.h */
    // Win32 defines _WIN32 automatically,
    // but Macintosh doesn't, so if we are using
    // Win32 Functions, we must do it here
    #ifdef _MAC
    #ifndef _WIN32
    #define _WIN32
    #endif //_MAC
    #ifndef WIN32
    #define WIN32



    /* WinNT.h */
    // Macros used to eliminate compiler warning generated when formal // parameters or local variables are not declared. // // Use DBG_UNREFERENCED_PARAMETER() when a parameter is not yet // referenced but will be once the module is completely developed. // // Use DBG_UNREFERENCED_LOCAL_VARIABLE() when a local variable is not yet // referenced but will be once the module is completely developed. // // Use UNREFERENCED_PARAMETER() if a parameter will never be referenced. // // DBG_UNREFERENCED_PARAMETER and DBG_UNREFERENCED_LOCAL_VARIABLE will // eventually be made into a null macro to help determine whether there // is unfinished work. //

    (to be continued...)

  • Rubato and Chord

    Side Effects of Debugger


    A target program might behave differently if it is being debugged, sometimes this can be very annoying. Also, these behavior deviations can be leveraged by anti-debugging.

    IsDebuggerPresent and CheckRemoteDebuggerPresent are well known APIs to tell if a program is attached by a debugger.

    0:000> uf KERNELBASE!IsDebuggerPresent KERNELBASE!IsDebuggerPresent:
    7512f41b 64a118000000    mov     eax,dword ptr fs:[00000018h]
    7512f421 8b4030          mov     eax,dword ptr [eax+30h]
    7512f424 0fb64002        movzx   eax,byte ptr [eax+2]
    7512f428 c3              ret

    CloseHandle would raise an exception under a debugger, as stated by MSDN:

    If the application is running under a debugger, the function will throw an exception if it receives either a handle value that is not valid or a pseudo-handle value.

    Windows heap manager would use debug heap (note: this has nothing to do with the CRT Debug Heap) if a program was launched from debugger:

    • Low Fragmentation Heap might be disabled.
    • Heap functions might throw SEH, an article covering this can be found at
    • Debug heap can be turned off by setting the environment variable _NO_DEBUG_HEAP = 1.
    • Windows debuggers has a command line option -hd which specifies that the debug heap should not be used.

    OutputDebugString, we've have a dedicated topic on it.

    SetUnhandledExceptionFilter, a decent article can be found at A simple detouring is to intercept IsDebugPortPresent and return FALSE.

    NtSetInformationThread can be used to hide (detach) a thread from debugger.

    In addition, the target program can check its own integrity or the integrity of the system.

    • PEB and TEB, this is exactly what IsDebuggerPresent has used.
    • DebugPort, this is used by the kernel (EPROCESS). NtQueryInformationProcess from NTDLL can be used to retrieve this information.
    • INT3 and thread context, as we've already demonstrated here.
    • Environment variable, parent process, process startup information.
    • Image File Execution Options.
    • Call stack and register. If the debugger makes use of func-eval, conditional breakpoints with side effects, or caused some execution flow changes, it can be detected.

    A few things to mention:

    • You cannot attach a debugger to a program if the program is already attached by another debugger.
    • Attaching a debugger to a program can fail in many ways, such like loader lock, timeout and break-in thread creation failure. That is one reason why JIT debugging failed to work.
    • 64bit application cannot be debugged by a 32bit debugger, if you try to create a 64bit process from a 32bit process with debug creation flag, you always ended in failure. DebugActiveProcess would fail if a 32bit debugger tried to attach to a 64bit target.
    • Digital media application can take advantage of the windows kernel to protect itself from being debugged.
    • You should be cautious if you are debugging something that the debugger relies on (a GUI symbolic source level debugger relies on even more things), otherwise you would end up with deadlock or other strange behaviors.
    • Global Flags can affect the behavior of a program if running under a debugger (e.g. loader snaps).
    • CLR behaves very differently under a debugger (e.g. JIT compiler, GC).
  • Rubato and Chord

    Pop Quiz - Debug Event Loop and Timeslice Quota


    You might have heard of the Popek and Goldberg Virtualization Requirements. In theory, debugger shares a similar set of problems as virtualization, this is especially true for func-eval (Function Evaluation). Here goes a pop quiz about the side effects of the presence of debugger:

    #define WIN32_LEAN_AND_MEAN
    #include <Windows.h>
    #define LOOPCOUNT 10
    ULONG g_ulVariableA;
    ULONG g_ulVariableB;
    DWORD WINAPI ThreadProcA(LPVOID lpParameter)
        for(int i = LOOPCOUNT; i; i--)
      } // add a breakpoint here (BP1)
      return 0; } DWORD WINAPI ThreadProcB(LPVOID lpParameter) {   while(true)   {     for(int i = LOOPCOUNT; i; i--)       ++g_ulVariableB;   } // add a breakpoint here (BP2)
      return 0; } int ExeEntry(void) {   SetProcessAffinityMask(GetCurrentProcess(), 1);   CloseHandle(CreateThread(NULL, 4096, ThreadProcA, NULL, 0, NULL));   CloseHandle(CreateThread(NULL, 4096, ThreadProcB, NULL, 0, NULL));   return ERROR_SUCCESS; }

    Let's say we have two breakpoints BP1 and BP2 as illustrated:

    1. Each time I launched the application from the Visual Studio Debugger on my desktop machine (Xeon Quad core, Windows 7 64bit), BP2 would always get hit before BP1. On my laptop (Dual core, Windows 7 32bit), BP1 will get hit before BP2.
    2. If I made BP2 as a conditional breakpoint with a false condition (e.g. 0 == 1) on my desktop machine, I will have to wait a few seconds before BP1 got hit.
    3. If I made BP1 as a conditional breakpoint with a false condition (e.g. 0 == 1) on my laptop, I never got a chance to hit BP2, and my CPU usage would always stay at around 50%.

    Do you share a similar experience as I have? I have already put some hints on the title of this pop quiz, happy debugging :)

  • Rubato and Chord

    Did you know...


    Have you ever seen the following window before? It was once very popular in the good old days, but has been abandoned in recent years (another good example being the pixel fonts). People just keep getting busier in the blooming new era.

    Windows 7

    1. Shift + Right Click on a file icon would give you additional context menu entries such as "Copy as path".
    2. Shift + Right Click on a folder icon would give you "Open command window here".

    Visual Studio 2010 (C++ mode)

    1. Use the Ctrl + R, Ctrl + W keyboard shortcut to toggle the spaces and tab marks.
    2. You can use $CMDLINE and $ENV pseudovariables in the Watch Window while debugging, and if you modified the value, take a look at the Output Window to see what happened.
    3. You can use $ERR,hr in the Watch Window to view the last error, the format specifier will translate the error code into text message.
    4. You can create a new toolchain by creating a new PlatformToolset, the default installation path is C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Platforms\Win32\PlatformToolsets.


    1. To tell if a machine is physical or virtual, the most reliable way I can think is to measure the system timing (e.g. DPC latency).
    2. You can determine if a machine is physical or virtual from the hardware info, HKEY_LOCAL_MACHINE\HARDWARE\DESCRIPTION\System\BIOS.
    3. You can get the host machine for a VM by taking look at the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters.

    A few things which are difficult than they look

    1. Parse command line.
    2. Manipulate file path and name.
    3. Mimic the behavior of NT loader.
Page 1 of 1 (5 items)