November, 2011

  • The Old New Thing

    If you protect a write with a critical section, you may also want to protect the read


    It is common to have a critical section which protects against concurrent writes to a variable or collection of variables. But if you protect a write with a critical section, you may also want to protect the read, because the read might race against the write.

    Adam Rosenfield shared his experience with this issue in a comment from a few years back. I'll reproduce the example here in part to save you the trouble of clicking, but also to make this entry look longer and consequently make it seem like I'm actually doing some work (when in fact Adam did nearly all of the work):

    class X {
     volatile int mState;
     HANDLE mEvent;
    X::Wait() {
     while(mState != kDone) {
       WaitForSingleObject(mEvent, INFINITE);
    X::~X() {
    X::SetState(int state) {
     // do some state logic
     mState = state;
     X x;
     ... spawn off thread 2 ...
    Thread2(X* px)

    There is a race condition here:

    • Thread 1 calls X::Wait and waits.
    • Thread 2 calls X::SetState.
    • Thread 2 gets pre-empted immediately after calling Set­Event.
    • Thread 1 wakes up from the Wait­For­Single­Object call, notices that mState == kDone, and therefore returns from the X::Wait method.
    • Thread 1 then destructs the X object, which destroys the critical section.
    • Thread 2 finally runs and tries to leave a critical section that has been destroyed.

    The fix was to enclose the read of mState inside a critical section:

    X::Wait() {
     while(1) {
       int state = mState;
       if(state == kDone)
       WaitForSingleObject(mEvent, INFINITE);

    Forgetting to enclose the read inside a critical section is a common oversight. I've made it myself more than once. You say to yourself, "I don't need a critical section here. I'm just reading a value which can safely be read atomically." But you forget that the critical section isn't just for protecting the write to the variable; it's also to protect all the other actions that take place under the critical section.

    And just to make it so I actually did some work today, I leave you with this puzzle based on an actual customer problem:

    class BufferPool {
     BufferPool() { ... }
     ~BufferPool() { ... }
     Buffer *GetBuffer()
      Buffer *pBuffer = FindFreeBuffer();
      if (pBuffer) {
       pBuffer->mIsFree = false;
      return pBuffer;
     void ReturnBuffer(Buffer *pBuffer)
      pBuffer->mIsFree = true;
     Buffer *FindFreeBuffer()
      Buffer *pBuffer = NULL;
      for (int i = 0; i < 8; i++) {
       if (mBuffers[i].mIsFree) {
        pBuffer = &mBuffers[i];
      return pBuffer;
     Buffer mBuffers[8];

    The real class was significantly more complicated than this, but I've distilled the problem to its essence.

    The customer added, "I tried declaring mIs­Free as a volatile variable, but that didn't seem to help."

  • The Old New Thing

    Things I've written that have amused other people, Episode 8


    In a technical discussion, I opened a reply with

    Bob's paper which I haven't yet read points out that...

    Some people wrote to me to say that the've added this quote to their file in the hopes of being able to use it themselves someday.

    For those who are curious, I found the technical discussion in question. It had to do with whether the following code is thread-safe:

    // initial conditions
    int x = 1, y = 0;
    int *p = &x;
    // Thread 1 executes this
    y = 1;
    p = &y;
    // Thread 2 executes this
    print *p;

    Question: Can this code legitimately print 0?

    Surprisingly, the answer is yes!

  • The Old New Thing

    Why is CLIPFORMAT defined to be a WORD rather than a UINT?


    Commenter Ivo wants to know if the Register­Clipboard­Format function returns a UINT, why is the CLIP­FORMAT data type defined to be a WORD? Since a WORD is smaller than a UINT, you have to stick in a cast every time you assign the result of Register­Clipboard­Format to a CLIP­FORMAT.

    Rewind to 16-bit Windows. Back in those days, a UINT and a WORD were the same size, namely, 16 bits. As a result, people got lazy about the distinction. Six of one, a half dozen of the other. (People are lazy about this sort of distinction even today, assuming for example that UINT and DWORD are the same size, and in turn forcing UINT to remain a 32-bit integer type even on 64-bit Windows.) The Register­Clipboard­Format function came first, and when the OLE folks wanted to define a friendly name for the data type to hold a clipboard format, they said, "Well, a clipboard format is a 16-bit integer, so let me use a 16-bit integer." A WORD is a 16-bit integer, so there you go.

    This mismatch had no effect in 16-bit code, but once Win32 showed up, you had a problem since 32-bit Windows expanded the UINT type to 32 bits. Not only does keeping a CLIP­FORMAT in a WORD create the need for all this casting, it also leaves two bytes of padding in the FORMAT­ETC structure. Strike two.

    Yeah, basically, it sucks.

  • The Old New Thing

    How to insert a large number of items into a treeview efficiently


    Just a quick tip today.

    If you need to insert a large number of items into a treeview, like tens of thousands, then it's much more efficient to insert them "backwards". (I'm ignoring for now the usability question of having a treeview that large in the first place.) In other words, instead of

    for (i = 0; i < array.Count(); i++) {
     tvis.hParent = hParentNode;
     tvis.hInsertAfter = TVIF_LAST;
     tvis.item.mask = TVIF_TEXT;
     item.item.pszText = array[i].Text();
     TreeView_InsertItem(hwndTreeView, &tvis);
    do it this way:
    for (i = array.Count() - 1; i >= 0; i--) {
     tvis.hParent = hParentNode;
     tvis.hInsertAfter = TVIF_FIRST;
     tvis.item.mask = TVIF_TEXT;
     item.item.pszText = array[i].Text();
     TreeView_InsertItem(hwndTreeView, &tvis);

    Why is backwards-insertion faster?

    It has to do with the annoying hInsert­After parameter. To validate that the hInsert­After parameter is valid, the treeview needs to verify that the hInsert­After is a valid child of the hParent, and this is done by walking the parent's children looking for a match. The sooner you find the match, the faster the validation completes. (And if you pass TVI_LAST, then the treeview needs to walk to the end of the child list.)

    You'd think that you could verify the parent/child relationship by just doing a Tree­View_Get­Parent(hInsert­After), but that turns out not to be strict enough, because hInsert­After might itself not be a valid parameter. If hInsert­After is a bogus value, then you may crash when you try to read its Parent property. That's if you're lucky. If you're not lucky, then the random memory that hInsert­After points to might look just enough like a valid HTREEITEM that you end up inserting the new node after a completely bogus node, and now the treeview has become corrupted.

    Sure, you got the same problem if you passed a garbage HTREEITEM to Tree­View_Get­Parent, but in that case, it's just garbage in garbage out. Nothing gets corrupted; the application just gets a garbage result. But in the case of Tree­View_Insert­Item, the treeview is going to update its internal data structures based on the garbage you passed in, and that means that the treeview winds up corrupted.

    To ensure that the value passed in is valid, the treeview checks it against the list of valid values for hInsert­After. And therefore, you get better performance if the valid value you pass is the one that it checks first.

    (As it happens, a lot of programs pass garbage for hInsert­After, so this defensive validation step is absolutely necessary.)

    You might say that the treeview could have a one-off optimization for the special TVI_LAST value by remembering the last child so it can be located quickly. The question is whether the complexity of adding that optimization is worth the cost: Any tree rearranging function would have to update the cached value, and if you miss a spot, then the treeview ends up corrupted. That's a pretty high-risk optimization you're suggesting there.

    And think about it this way: You're worrying about a tree where a single node has tens of thousands of children, something which (I am no longer ignoring) is a horrible user interface. That's like a list box with tens of thousands of items, or a dialog box with tens of thousands of checkboxes. You'll probably want to consider a better way of presenting the information than in a tree view that goes on for hundreds of screens. The treeview isn't optimized for this case because you don't optimize for the case where somebody is mis-using your system.

  • The Old New Thing

    How can I extend the deadline for responding to the PBT_APMSUSPEND message?


    A customer observed that starting in Windows Vista, the deadline for responding to the PBT_APMSUSPEND message was reduced from twenty seconds to two seconds. Their program needs more than two seconds to prepare for a suspend and they wanted to know how they could extend the deadline. The program communicates with a device, and if they don't properly prepare the device for suspend, it has problems when the system resumes.

    No, you cannot extend the deadline for responding to the PBT_APMSUSPEND message. The two second deadline is hard-coded; it is not configurable.

    The whole point of reducing the deadline from twenty to two seconds is to ensure that when users press the Sleep button on their computers, the computer actually goes to sleep reasonably promptly. If there were a way to extend the deadline, then you're just reintroducing the bad situation in Windows XP where the user hits the Sleep button on the laptop, but the laptop just sits there taunting you. Meanwhile, the flight attendant is standing there getting angrier at you for not putting your laptop away. (Or worse: You tell the laptop to Sleep and toss it into your bag, and then when you reach your destination, you discover that your laptop is really warm, and the battery is dead.)

    Besides, imagine if ten apps did this. Your app asks for ten extra seconds, another app asks for another twenty seconds, next thing you know all the deadline extensions add up to five minutes.

    The real solution is to fix the driver for this device so it can prepare the device properly when it is told that the machine is about to go into a low power state. (The two-second limit applies to applications but not drivers. At least not yet.)

  • The Old New Thing

    It is not unreasonable to expect uninitialized garbage to change at any time, you don't need to ask for an explanation


    A customer admitted that they had a bug in their code:

    #define UNICODE
    #define _UNICODE
    #include <windows.h>
    // error checking removed for expository purposes
    // code that writes out the data
    RegSetValueEx(hkey, pszValue, 0, REG_SZ, (const BYTE *)pszData,
                  _tcslen(pszData) * sizeof(TCHAR) + 1);
    // code that reads the data
    DWORD dwType, cbData;
    RegQueryValueEx(hkey, pszValue, NULL, &dwType, NULL, &cbData);
    TCHAR *pszData = new TCHAR[cbData / sizeof(TCHAR)];
    RegQueryValueEx(hkey, pszValue, NULL, &dwType, pszData, &cbData);

    One bug in the above code is in the final parameter passed to Reg­Set­Value­Ex: It's supposed to be the count in bytes, but the calculation appends only one byte for the terminating null instead of a full TCHAR. In other words, it should be

    RegSetValueEx(hkey, pszValue, 0, REG_SZ, (const BYTE *)pszData,
                  _tcslen(pszData) * sizeof(TCHAR) + sizeof(TCHAR));

    For concreteness, let's say the original string was five TCHARs in length, not counting the terminating null. Therefore, the correct buffer size is 12 bytes, but they passed only 11 to Reg­Set­Value­Ex.

    This error is compounded in the code that reads the value back: The code happily divides cbData / sizeof(TCHAR) without checking that the division is even. In our example, the call returns a length of 11 bytes. They divide by sizeof(TCHAR) (which is 2, since the code is compiled as Unicode), leaving 5 (remainder discarded), causing them to allocate a 5-TCHAR buffer.

    That error would have been okay by itself except for another error, which is calling Reg­Query­Value­Ex a second time with an invalid buffer size: The cbData variable remains the original value of 11 even though they allocated only 10 bytes. The subsequent Reg­Query­Value­Ex call reads 11 bytes into a 10-byte buffer.

    The customer conceded that the code that writes the value is buggy, but points out that the code "worked" on Windows XP, in the sense that the string read back from the registry was correct. But Windows Vista "broke" their program, because the string read back now contained garbage at the end. Instead of returning "Hello", it returned "HelloЀ╅۞". The customer wanted to know what change to Windows Vista broke their program.

    The change to Windows Vista that broke their program is known as "luck running out." The program contained three bugs, which combined to form a heap buffer write overflow. The uninitialized garbage at the end of the heap block they allocated happened to be zero on Windows XP due to a coincidence in the way their program allocated and freed memory. Consequently, when the data was read from the registry, the "string" ended in a single null byte instead of two. The extra null byte that "happened to be there already" combined with the single null byte read from the registry to form a proper null terminator.

    When run on Windows Vista, that happy coincidence no longer took place, and the uninitialized garbage was nonzero, resulting in the subsequent attempt to use the string to read past the end of the buffer and pick up heap garbage. (Yay, bug number four: read overflow.) Why was the uninitialized garbage different?

    It's different because there was nothing forcing it to be the same. The internals of the heap manager change all the time. (Look-aside lists, low fragmentation heap, and fault-tolerant heap are just a few recent examples.) Any of these changes will result in heap memory being used and reused differently. Plus, changes in other parts of Windows may have allocated and freed memory differently between Windows XP and Windows Vista. Heck, the program itself may have allocated and freed memory differently due to the change in operating system. (For one thing, the length of the string "Windows Vista" is different from the length of the string "Windows XP".)

    Uninitialized garbage will contain unpredictable values. There's no point asking why you got a different unpredictable value this time.

  • The Old New Thing

    The Control Panel search results understand common misspellings, too


    Here's a little trick. Open your Start menu and type scrensaver into the search box. That's right, spell it with only one e.

    Hey, it still found the Control Panel options for managing your screen saver.

    If you enable Improve my search results by using online Help in Windows Help and Support, this sends your search query to a back-end server to see if there's updated online help content related to your search. And the people who develop the online help content look over those queries to see if, for example, there is a category of issues people are searching for help with and not finding anything.

    It also means that they get to see a lot of misspellings. And that information is useful, because it means that the Control Panel search provider can be given a table of the most popular misspellings and how to fix them. which in turn benefits people who unchecked the Improve my search results by using online Help (say because they don't have Internet access or because they are afraid of the black helicopters).

    So now, when you search for scrensaver, you are directed to information on screen savers, thanks to the millions of other people who spelled it wrong before you.

    Obligatory link: The so-called God mode.

  • The Old New Thing

    Why not use animated GIFs as a lightweight alternative to AVIs in the animation common control?


    Commenter Vilx- wondered why animated GIFs weren't used as the animation format for the shell animation common control. After all, "they are even more lightweight than AVIs."

    Animated GIFs are certainly more lightweight than general AVIs, since AVI is just a container format, so decoding a general AVI means decoding any encoding format invented now or in the future. On the other hand, the shell common control imposed enough limits on the type of AVIs it could handle to the point where what was left was extremely lightweight, certainly much more lightweight than an animated GIF.

    Think about it: To use an animated GIF, you need a GIF decoder. And a GIF decoder is already significantly larger (both in terms of code and memory) than the RLE-8 decoder. Also significantly more complicated, and therefore significantly more likely to have bugs. Whereas RLE-8 is so simple there isn't much that can go wrong, and the RLE-8 decoder had been around since Windows 3.0, so it was already a known quantity. All you have to do to invoke the RLE-8 decoder is call Set­DI­Bits­To­Device. One line of code is hard to beat.

    Windows 95 did not come with a GIF decoder. Remember, Internet Explorer 1.0 did not come with Windows 95; it was part of the Plus! pack. As I recall, at the time Windows 95 released to manufacturing, the Plus! pack was still under development. (And at the time the animation common control was being designed, Internet Explorer didn't exist. Heck, Mosaic didn't exist!) Plus the fact that the common controls were available in both 16-bit and 32-bit versions—in fact it was the 16-bit versions that were written first since Windows 95 didn't have good Win32 support at the start of the project. More accurately, Windows 95 didn't have any Win32 support at the start of the project.

    So I'm kind of amused by the description of GIF as a lightweight animation encoding algorithm. Compared to RLE, the GIF format weighs a ton!

  • The Old New Thing

    Why does Internet Explorer not call DLL_PROCESS_DETACH on my DLL when I call ExitProcess?


    A customer asked a question, but as is often the case, the question was much more telling than the answer.

    We have an Internet Explorer plug-in which calls Exit­Process to force Internet Explorer to exit. We found that when we do this, our plug-in does not receive a DLL_PROCESS_DETACH notification. What could be preventing our plug-in from receiving the DLL_PROCESS_DETACH notification?

    As we saw some time ago when we looked at the way processes shut down (plus an important follow-up or two), all a process has to do to thwart proper delivery of DLL_PROCESS_DETACH notifications is to do something untoward during shutdown, at which point the kernel just gives up and calls Terminate­Process.

    But like I said, the answer is much less interesting than the question. What if the user had an unsaved email message at the time you decided to exit Internet Explorer? Recall that plug-ins are a guest in the host process; don't go changing the carpet. When we asked the customer why they were exiting Internet Explorer from their plug-in, we received the explanation, "The reason I am calling Exit­Process is that I do not know another good way to exit Internet Explorer from a plug-in."

    In this case, the guest is doing far more than just changing the carpet. The guest called in a demolition company!

    "Why did you call the demolition company to destroy my house?"
    "I couldn't think of a good way to destroy your house."

    The point isn't that it's bad to use a telephone call to hire a demolition company to destroy somebody's house and that you should use some other method to contact them (like, say, a text message). The point is that it's bad to destroy somebody else's house in the first place.

    Upon further investigation, the customer was writing a test for their plug-in. They open Internet Explorer and navigate to a page that uses the plug-in. When they are satisfied that the plug-in operated correctly, they want to exit the copy of Internet Explorer in order to conclude the test.

    If you want to destroy a house, then destroy your own house. Call Co­Create­Instance(CLSID_Internet­Explorer) to build a house, navigate to your test page with IWeb­Browser2::Navigate, and when you're done, you can destroy the house with IWeb­Browser2::Quit(). There is sample code to do exactly this in the documentation for the IWeb­Browser2 interface.

    Bonus chatter: The IWeb­Browser2 interface is scriptable.

    var ie = new ActiveXObject("InternetExplorer.Application");
    ie.Visible = true;
    WScript.Sleep(5000); // five seconds, say
  • The Old New Thing

    Why can't I install this DLL via Regsvr32 /i?


    A customer asked for help installing a particular DLL. They ran the command regsvr /i SomeDll.dll but got the error "SomeDll.dll was loaded, but the DllInstall entry point was not found. This file can not be registered."

    A DLL needs to be specifically written to be used with the regsvr32 command. You can't just grab some random DLL and expect regsvr32 to work. As we saw last week, the regsvr32 program merely loads the specified DLL and calls an entry point established by convention. If the DLL was not written to be used with regsvr32 then the conventional entry point will not be found, and you get an error message.

    The /i switch to regsvr32 instructs the program to look for the entry point known as Dll­Install. By convention, the Dll­Install function performs installation and setup of a DLL, but since it's just some function exported by a DLL, it could do anything it wants (or nothing at all).

    You can't just grab a random DLL and expect regsvr32 to do anything meaningful with it. The DLL has to be designed to operate with regsvr32.

    Handing random DLLs to regsvr32 is like dialing a random telephone number, sending a tone at 1170Hz and getting upset when you don't get a 2125Hz tone in response.

    The number one hit for a search on what does regsvr32 do is an old Knowledge Base article which explains what regsvr32 does, and it even contains a sample program which emulates regsvr32 so you can use it to debug your DLL. (The sample program hasn't been updated to support the /i flag, which I leave as an exercise.)

    One day, I received a piece of email from another employee whom I had never met nor had ever heard of. It didn't even begin with an introduction; it just jumped right in as if we'd been friends for years. "I'm trying to debug a problem where regsvr32 cannot register my DLL. It gives the error 'The specified procedure could not be found.' I saw a blog entry written by you and am trying to understand what our problem is."

    This blog thing has backfired. The reasons I write these articles is to get people to stop asking me questions. (The mechanism for that being to give the answer out in public for everyone to see.) Instead, it turns into "Hi, I found an article you wrote about X, which ipso facto makes you not only the world's foremost authority on X, but also the world's leading support technician on X."

    News flash: Posting a blog entry about something on the Internet should not be taken as evidence that the author is an expert on that subject. (One might argue that it in fact is more likely to be the opposite.)

    At the time, I wasn't aware of the knowledge base article that explains what regsvr32 does and how to debug it, so I couldn't point to it. I wrote back, "All regsvr32 does is Load­Library, Get­Proc­Address, and then calls the function. You can write your own test that does the same thing. You do not require any expertise from me."

    Less than an hour later, I received a reply: "Thanks. I figured it out. There was an older version of the DLL in the path ahead of the one I was trying to register."

    And I never did figure out which blog entry I wrote that made them think I was an expert on regsvr32. Maybe the person worked in Microsoft Research and used a prototype of their machine that predicts the future, and used it to predict that I was going to write about regsvr32 two years later.

Page 1 of 3 (23 items) 123