March, 2012

  • The Old New Thing

    How do I perform shell file operations while avoiding shell copy hooks?


    Okay, the subject line of the article gives away the answer to the puzzle, but here's the puzzle anyway: A customer reported a problem with the SHFile­Operation function:

    Consider the following program:

    #include <windows.h>
    #include <shellapi.h>
    int main()
    SHFILEOPSTRUCT fileStruct = {};
        fileStruct.wFunc = FO_RENAME;
        fileStruct.pFrom = L"C:\\a\0";
        fileStruct.pTo   = L"C:\\b\0";
        fileStruct.fFlags= FOF_NO_UI;
        return 0;

    If "a" is a file, then everything works fine, but if it's a directory, then Application Verifier raises the following error:

    Heap violation detected
    Memory access operation in the context of a freed block: reuse-after-delete or double-delete

    Can you help explain what we're doing wrong? So far as we can tell, all our parameters are correct.

    This is one of those "It doesn't work on my machine" issues, because the provided sample program runs fine on a freshly-installed copy of Windows. We asked the customer to send us a crash dump file, and from that crash dump the source of the problem was obvious:

    eax=00000001 ebx=00000000 ecx=73d34c58 edx=00270001 esi=09fa2ff8 edi=00000000
    eip=10001131 esp=0026dea8 ebp=0026df24 iopl=0         nv up ei pl nz na po nc
    cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010200
    10001131 8b4604          mov     eax,dword ptr [esi+4] ds:002b:09fa2ffc=????????
    0:000> k
      *** Stack trace for last set context - .thread/.cxr resets it
    ChildEBP RetAddr
    WARNING: Stack unwind information not available. Following frames may be wrong.
    0026df24 75a3554a Contoso+0x1131
    0026df64 75a1b07c ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108
    0026dfb8 75a1aff1 ole32!CApartmentActivator::CreateInstance+0x112
    0026dfd8 75a1ae16 ole32!CProcessActivator::CCICallback+0x6d
    0026dff8 75a1adc7 ole32!CProcessActivator::AttemptActivation+0x2c
    0026e034 75a1b0df ole32!CProcessActivator::ActivateByContext+0x4f
    0026e05c 75a3554a ole32!CProcessActivator::CreateInstance+0x49
    0026e09c 75a352ce ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108
    0026e2fc 75a3554a ole32!CClientContextActivator::CreateInstance+0xb0
    0026e33c 75a35472 ole32!ActivationPropertiesIn::DelegateCreateInstance+0x108
    0026eb18 75a45916 ole32!ICoCreateInstanceEx+0x404
    0026eb78 75a45877 ole32!CComActivator::DoCreateInstance+0xd9
    0026eb9c 75a45830 ole32!CoCreateInstanceEx+0x38
    0026ebcc 75c61fe0 ole32!CoCreateInstance+0x37
    0026ee64 75c61354 shell32!_SHCoCreateInstance+0x1ac
    0026ee88 75c1b904 shell32!SHExtCoCreateInstance+0x1e
    0026eec8 75becbcf shell32!SHExtCoCreateInstanceString+0x43
    0026f10c 75beca76 shell32!CreateCopyHooks+0xe1
    0026f344 75bec9da shell32!CallCopyHooks+0x4b
    0026f370 75bec95b shell32!CallFileCopyHooks+0x29
    0026f5bc 75bec8a4 shell32!CFileOperation::CopyHooks+0x119
    0026fa3c 75c37955 shell32!CCopyWorkItem::_UpFrontConfirmations+0xb7
    0026fc6c 75c378b0 shell32!CCopyWorkItem::ProcessWorkItem+0x83
    0026fca0 75c37fda shell32!CRecursiveFolderOperation::Do+0x1d5
    0026fce4 75c39a19 shell32!CFileOperation::_EnumRootDo+0x14e
    0026fd4c 75c397b9 shell32!CFileOperation::PrepareAndDoOperations+0x27f
    0026fd74 75c396c7 shell32!SHFileOperationWithAdditionalFlags+0xe9

    The crash is in some third party component named Contoso, which is running because it is being Co­Create'd. The call came from Create­Copy­Hooks, and it doesn't require very much in the way of psychic powers to conclude that the shell is creating the Contoso object because it registered as a copy hook.

    This also explains why the problem occurs only on the customer's machine: The customer installed the Contoso shell extension and we didn't.

    Okay, so the problem is that the Contoso shell extension has a use-after-free memory corruption bug. (Some Web searching revealed that a lot of people had encountered problems with the Contoso shell extension.)

    The FOFX_NO­COPY­HOOKS flag comes in handy here. Setting this extended flag disables copy hooks for your file operation. Extended flags cannot be passed to the classic SHFileOperation function because the SHFILEOPSTRUCT structure uses a 16-bit WORD for the fFlags member, but the FOFX_NO­COPY­HOOKS flag has the numerical value 0x00800000 which doesn't fit in a 16-bit integer. (The "X" at the end of the prefix is another clue.) The way to set extended flags is to use the IFileOperation interface.

    // Just for fun, I'll use ATL templates instead of raw C++.
    HRESULT RenameAtoB()
     HRESULT hr;
     CCoInitialize init;
     hr = init;
     if (FAILED(hr)) return hr;
     CComPtr<IFileOperation> spfo;
     hr = spfo.CoCreateInstance(CLSID_FileOperation);
     if (FAILED(hr)) return hr;
     hr = spfo->SetOperationFlags(FOFX_NOCOPYHOOKS);
     if (FAILED(hr)) return hr;
     CComPtr<IShellItem> spsi;
     hr = SHCreateItemFromParsingName(L"C:\\a", NULL,
     if (FAILED(hr)) return hr;
     hr = spfo->RenameItem(spsi, L"b", NULL);
     if (FAILED(hr)) return hr;
     hr = spfo->PerformOperations();
     if (FAILED(hr)) return hr;
     return S_OK;
  • The Old New Thing

    Why can't I delete a file immediately after terminating the process that has the file open?


    A customer discovered a bug where terminating a process did not close the handles that the process had open, resulting in their emergency cleanup code not working:

    TerminateProcess(processHandle, EXITCODE_TERMINATED);

    Their workaround was to insert a call to Wait­For­Single­Object(process­Handle, 500) before deleting the file. The customer wanted to know whether they discovered a bug in Terminate­Process, and they were concerned that their workaround could add up to a half second to their cleanup code, during which the end user is sitting there waiting for everything to clean up.

    As MSDN notes,

    TerminateProcess initiates termination and returns immediately. This stops execution of all threads within the process and requests cancellation of all pending I/O. The terminated process cannot exit until all pending I/O has been completed or canceled.

    (Emphasis mine.)

    Termination is begun, but the function does not wait for termination to complete. Sometimes a thread gets stuck because a device driver has gotten wedged (or the driver doesn't support cancellation).

    To know when the handles are closed, wait on the process handle, because the process handle is not signaled until process termination is complete. If you are concerned that this can take too long, you can do like the customer suggested and wait with a timeout. Of course, if the timeout expires, then you have to decide what to do next. You can't delete the file, since it's still open, but maybe you can log an error diagnostic and let the user know why things are taking so long to clean up, and maybe add the file to a list of files to clean up the next time the program starts up.

  • The Old New Thing

    Converting to Unicode usually involves, you know, some sort of conversion


    A colleague was investigating a problem with a third party application and found an unusual window class name: L"整瑳整瑳". He remarked, "This looks quite odd and could be some problem with the application."

    The string is nonsense in Chinese, but I immediately recognized what was up.

    Here's a hint: Rewrite the string as

    L"\x6574" L"\x7473" L"\x6574" L"\x7473"

    Still don't see it? How about looking at the byte sequence, remembering that Windows uses UTF-16LE.

    0x74 0x65 0x73 0x74 0x74 0x65 0x73 0x74

    Okay, maybe you don't have your ASCII table memorized.

    0x74 0x65 0x73 0x74 0x74 0x65 0x73 0x74
    t e s t t e s t

    That's right, the application took the ASCII string "testtest" and just treated it as a Unicode string without actually converting it to Unicode. When the compiler complained "Cannot convert char * to wchar_t *" they just stuck a cast to make the compiler shut up.

    // Code in italics is wrong
    wc.lpszClassName = (LPWSTR)"testtest";

    They were lucky that the compiler happened to put two null bytes at the end of the "testtest" string.

    Bonus psychic powers: Actually, I have a theory as to how this happened that doesn't involve maliciousness. (This is generally a good mindset to maintain, since most of the time, when people cause a problem, it's not willful; it's accidental.) Consider a library with the following interface header file:

    // mylib.h
    #ifdef __cplusplus
    extern "C" {
    BOOL RegisterWindowClass(LPCTSTR pszClassName);
    #ifdef __cplusplus
    }; // extern "C"

    Somebody uses this header file like this:

    #include <mylib.h>
    BOOL Initialize()
        return RegisterWindowClass(TEXT("testtest"));

    So far so good.

    Meanwhile, the library implementation goes like this:

    #define UNICODE
    #define _UNICODE
    #include <mylib.h>
    BOOL RegisterWindowClass(LPCTSTR pszClassName)
        WNDCLASS wc = { 0, StandardWndProc, 0, 0, g_hInstance,
                        (HBRUSH)(COLOR_WINDOW + 1),
                        NULL, pszClassName);
        return RegisterClass(&wc);

    The two files both compile successfully, and they even link together. Unfortunately, one of them was compiled with Unicode disabled, and the other was compiled with Unicode enabled. Since the header file uses LPCTSTR, the actual declaration of RegisterWindowClass changes depending on whether the code that includes the header file is compiled as Unicode or ANSI.

    Result: If one file is compiled as ANSI and the other is compiled as Unicode, then one will pass an ANSI string, which the other will receive and treat as Unicode.

    This is why functions in Windows which are dependent on whether the caller is compiled as ANSI or Unicode are really two functions, one with the A suffix (for ANSI) and another with the W suffix (for Wnicode?), and the generic name is really a macro that forwards to one or the other. It prevents TCHARs from sneaking past the compiler and ending up being interpreted differently by the two sides.

  • The Old New Thing

    Amusing message on a whiteboard in the hallway


    It is common to see whiteboards mounted in hallways. Sometimes they have official purposes; other times they are just placed for the convenience of hallway conversations or impromptu meetings.

    One of the hallways near my office has a very large whiteboard, completely blank, save for one note clearly written in the corner.


  • The Old New Thing

    Why does a maximized window have the wrong window rectangle?


    Commenter configurator wonders why the maximum size for a form is the screen size plus (12,12). Somebody else wonders why it's the screen size plus (16,16).

    Let's start by rewinding the clock to Windows 3.0.

    When you maximize a window, the default position of the window is such that all the resizing borders hang "off the edges of the screen". The client area extends from the left edge of the screen to the right edge of the screen, and also goes all the way to the bottom. It doesn't go all the way to the top, since it needs to leave room for the caption, but the resizing border that sits above the caption area is not visible either.

    The reason for this should be obvious: Since the window is maximized, there's no point wasting screen real estate on the resizing borders. You want the client area to be as large as possible; that's why you maximized the window.

    The result of this window positioning is that the window rectangle itself is slightly larger than the screen. The parts that "hang off the edges of the screen" are not visible because, well, they're off the screen. (Of course, if your window had a maximum size smaller than the screen, then those borders stay visible.) The size of these borders might not be 12 pixels, mind you.

    This is how things stood for a long time. Even the introduction of multiple monitors in Windows 98 didn't affect the way maximized windows were positioned. Multiple monitors, however, altered one of the assumptions that lay behind the positioning of maximized windows, namely the assumption that edges beyond the screen were not visible. I mean, they weren't visible on the screen that held the maximized window, but they were visible on the adjacent monitor. As a result, when you maximized a window, its borders appeared as a sliver on the adjacent monitor.

    Why didn't Windows get rid of the sliver when multiple monitors were introduced? You probably know the reason already: Because there are applications which relied on the sliver. For example, an application might detect that it is maximized by checking whether its edges hang off the screen, rather than checking the WS_MAXIMIZED style. Why would they do it that way? Probably because they fumbled around until they found something that seemed to work, sort of like the people who detect whether the mouse buttons are swapped by calling SwapMouseButton instead of GetSystemMetrics(SM_SWAPBUTTON). (Or maybe because they wanted to treat as "logically maximized" windows which the user had manually resized to be larger than the screen.)

    The introduction of the Desktop Window Manager in Windows Vista gave the window manager team a chance to solve the problem without impairing compatibility: The Desktop Window Manager controls how windows appear on the screen, which can be different from the actual window properties. For example, the Desktop Window Manager typically animates a window into position when it becomes visible, yet if an application calls GetWindowRect, it will just see the window at its normal position with no animation.

    This decoupling of logical and physical characteristics permits all sorts of visual tricks. The visual trick relevant here is the removal of the overhang borders from a maximized window. The borders are still there: If you call GetWindowRect, you will get the same coordinates you always did. But they don't appear on the screen. The sliver is gone.

  • The Old New Thing

    Beverage Gas Division of Central Welding Supply


    The other day, I saw a van which was labeled Beverage Gas Division of Central Welding Supply.

    This odd juxtaposition was created by the acquisition of Compressed Gas Western by Central Welding Supply in 2009.

    I sure hope they don't get their tanks confused.

  • The Old New Thing

    Why is the Heap32Next function incredibly slow on Windows 7?


    A customer observed that the Heap32Function runs much slower on Windows 7 compared to previous versions of Windows. What happened?

    Set the wayback machine to 1992. The product is Windows 3.1. One of the new components available in Windows 3.1 went by the name TOOLHELP. It let you snoop around the low-level guts of the Windows 3.1 kernel, and the feature that is relevant here is walking the heaps. Since Windows 3.1 was a cooperatively multitasking system, you could ensure that the heap was stable during your calls to Heap­First and Heap­Next by the simple process of not yielding control.

    Mind you, ToolHelp was not part of the kernel itself. It was bolted onto the side. As I recall, ToolHelp arrived late on the scene, and the kernel folks didn't want to destabilize the kernel with any ToolHelp-related changes, so all the work done by ToolHelp was done "from the outside".

    Windows 95 introduced 32-bit versions of the ToolHelp functions. I'm not sure why.

    Where was I? Oh right, Heap32­Next.

    In the 32-bit version of ToolHelp, you could walk the heap of a process by calling Heap32­First and Heap32­Next. As implemented in Windows 95, the Heap32­First function allocated some memory to keep track of the state of the heap walk and stored it in the HEAPENTRY32.dwResvd field. The Heap32­Next used this state to find the next heap block, and it finally freed the memory when the end of the heap was reached (and Heap32­Next returned FALSE). This means that if you call Heap32­First and do not walk the heap to completion but rather abandon the walk partway through, you leaked some memory. Unlike functions like Find­First­File which have an explicit Find­Close function to indicate that you are done with the enumeration (and allow the operating system to free the tracking state), there was no corresponding Heap32­Close function. The only way to free the memory was to walk the heap to the end. The Windows 95 implementation also didn't handle the case that the heap changed while you were walking it.

    But since the toolhelp library was intended for diagnostic purposes anyway (I mean, it's right there in the name: tool help), these weren't considered serious problems. Your debugging plug-in might use it to walk the heap looking for memory leaks, but you wouldn't deploy it in production, right?

    The Windows NT folks didn't like that there was a memory leak built into the design. Since there was no way to ensure that the memory allocated by Heap32­First was freed in the event the application wanted to abandon the heap walk, their solution was simply to free all allocated memory before returning from Heap32­First and Heap32­Next. If an application asks to walk the heap, the Heap32­First takes a snapshot of the heap, returns the first heap block, then frees the snapshot. When the application calls Heap32­Next, it takes a snapshot of the heap, returns the second heap block, then frees the snapshot. On the second call to Heap32­Next, it takes a snapshot of the heap, returns the third heap block, then frees the snapshot. You get the idea.

    As a result, walking the heap via Heap32­First/Heap32­Next is an O(n²) operation.

    So why did this become slower in Windows 7?

    Prior to Windows 7, the snapshot was taken in a fixed-size buffer that held information for around a quarter million heap entries. As a result, there was a hard limit on the worst-case cost of walking the heap via the toolhelp functions. In Windows 7, this hard limit was lifted because there were some diagnostic tools which were bumping into this limit. The kernel folks decided that it was better to have the functions be slow but correct rather than fast and incomplete. Since the limit was lifted, so too was the cap on the worst-case cost of walking the heap with Heap32­First/Heap32­Next.

    Toolhelp was designed back in the days of co-operative multitasking and hasn't aged well. At this point, he's sort of this unwanted distant relative in the kernel. Nobody actually likes him, but when he shows up at the family reunion, you have to let him in.

    By the way, the recommended way to walk the contents of the heap is to use the Heap­Walk function. The Heap­Walk function does not suffer from this problem; enumerating the entire heap via repeated calls to Heap­Walk has total running time proportional to the number of heap blocks. Note that Heap­Walk can only enumerate heap blocks from the current process. If you're doing cross-process heap walking for diagnostic purposes, then you're stuck with Heap32­First/Heap32­Next, but since you're just doing it for diagnostic purposes, correctness should be more important to you than performance.

  • The Old New Thing

    Why does my window style change when I call SetWindowText?


    A customer observed some strange behavior with window styles:

    We ran some weird behavior: Calling the Set­Window­Text function causes a change in window styles. Specifically, calling Set­Window­Text results in the WM_STYLE­CHANGING and WM_STYLE­CHANGED messages, and sometimes the result is that the WS_TAB­STOP style is removed. Is this a bug? What would cause this?

    The Set­Window­Text message sends the WM_SET­TEXT message to the control, at which point anything that happens is the window's own responsibility. If it wants to change styles based on the text you sent, then that's what happens. The window manager doesn't do anything special to force it to happen or to prevent it.

    That's weird, because I'm not even listening for WM_SET­TEXT messages. I also verified that there is no call into my code during the call to the the Set­Window­Text function.

    I'm assuming that the window belongs to the same process as the caller. If the window belongs to another process, then the rules are different.

    I'm changing the text of a window created by the same thread.

    Okay, so let's see what we have so far. The customer is calling the Set­Window­Text function to change the text of a window created on the same thread. There is no handler for the WM_SET­TEXT message, and yet the window style is changing. At this point, you might start looking for more obscure sources for the problem, like say a global hook of some sort. While I considered the possibilities, the customer added,

    It may be worth noting that I'm using the Sys­Link.

    Okay, now things are starting to make sense, and it didn't help that the customer provided misleading information in the description of the problem. For example, when the customer wrote, "There is no handler for the WM_SET­TEXT message," the customer was not referring to the window whose window text is changing but to some other unrelated window.

    It's like responding to the statement "A confirmation letter should have been sent to the account holder" with "I never got the confirmation letter," and then the person spends another day trying to figure out why the confirmation letter was never sent before you casually mention, "Oh, I'm not the account holder."

    The WM_SET­TEXT message is sent to the window you passed to Set­Window­Text; in this case, it's the Sys­Link window. It is therefore the window procedure of the Sys­Link window that is relevant here.

    The Sys­Link control remembers whether it was originally created with the WS_TAB­STOP, and if the markup it is given has no tab stops, then it removes the style; if the markup has tab stops, then it re-adds the style.

    How do I add a tab stop to a string? I couldn't find any reference to it and all my guesses failed.

    The tab stops in question are the hyperlinks you added when you used the <A>...</A> notation. If the text has no hyperlinks, then the control removes the WS_TAB­STOP style because it is no longer something you can tab to.

  • The Old New Thing

    Isn't there a race condition in GetFileVersionInfoSize?


    In response to my explanation of what the lpdwHandle parameter in Get­File­Version­Info­Size is used for, Steve Nuchia wonders if there's a race condition between the time you get the size and the time you ask for the data.

    Yes, there is a race condition, but calling the function in a loop won't help because the Get­File­Version­Info function does not report that the buffer is too small to hold all the version data. It just fills the buffer as much as it can and truncates the rest.

    In practice, this is not a problem because you are usually getting the versions of files that you expect to be stable. For example, you might be obtaining the version resources of the files your application is using in order to show them in diagnostics. The file can't change because you're preventing them from changing by using them. In the case that the file changes out from under you, then yes, you will sometimes get partial data.

    While I'm on the subject of Get­File­Version­Info, I figured I'd mention that there's a good amount of code in Ver­Query­Value to handle the following scenario:

    • On Windows NT 3.1, a program calls Get­File­Version­Info to obtain a file version information block.
    • The program writes the information block to a file.
    • The file is preserved in amber for millions of years.
    • A curious scientists discovers the file version information block, loads it from the file back into memory, and calls Ver­Query­Value.

    The modern implementation of Ver­Query­Value still understands the file version information block created by all previous versions of Windows, and if you hand it one of those frozen-in-amber information blocks, it still knows how to extract information from it. It may not be able to do as good a job due to the lack of appropriate buffer space, but it does at least as well as the version of Windows the file version information block was originally generated from. I have no idea whether anybody actually takes advantage of this behavior, but since persisting the file version information block was never explicitly disallowed in the documentation, one could argue that doing so was legal, and the code therefore needs to be ready for it. (Heck, even if it were explicitly disallowed, there would still be a good chance that there's somebody who's doing it.)

    What Ver­Query­Value doesn't handle is people who hand it a file version information block that never came from Get­File­Version­Info in the first place.

  • The Old New Thing

    In 1993, Microsoft stole my colleague's car


    I remember walking between buildings at Microsoft back in the 1990's and seeing a moss-covered, rusted-out jalopy in one of the parking spaces. It clearly hadn't moved in ages. The person I was with said, "Oh, yeah, Microsoft owns that car. They stole it from Bob." (Bob is my generic name for a Microsoft employee.)

    The Inaugural Day Storm of 1993 left felled trees and other wind damage in its wake on the Microsoft Redmond campus. One of my colleagues was out of town when the storm hit, and he returned to stories of fallen trees, wind damage, and howling winds.

    Bob also returned to find that his car had been stolen out of the parking lot outside his building. (It was at the time a common practice to use Microsoft's parking lots as personal vehicle storage.)

    Bob filed a stolen-car report with his insurance company and received his payment. As far as Bob was concerned, that was the end of that.

    But that's not the whole story.

    Right after the storm, the Facilities department set about cleaning up all the trees and branches and leaves that were strewn across the parking lots. They waited until after work hours so the parking lot would be empty, but that didn't work in one particular lot, because Bob's car was still there. To permit the cleanup to proceed, they towed the car to the far corner of the parking lot to get it out of the way.

    Thus, when Bob returned from his trip, he found that his car was gone. It was actually not that far away, but this particular parking lot was nestled in a densely-wooded area, with the lot divided into sub-lots, each separated by a small stand of trees, so if you didn't know where to look, you could easily have missed the car.

    This left the car sitting abandoned in a Microsoft parking lot. Technically, the car was owned by Bob's insurance company, and technically Microsoft stole it.

    Pre-emptive snarky comment: "Wouldn't be the first time Microsoft stole something."

Page 1 of 3 (24 items) 123