September, 2006

  • The Old New Thing

    Eating Belgian food at Brouwer's Cafe in Fremont

    • 39 Comments

    Last year, some friends and I went for dinner at Brouwer's Café, a Belgian pub/restaurant in the Fremont neighborhood of Seattle. The menu is pub food, which means that everything comes with frites and a choice of several dipping sauces, none of which is ketchup. One of my friends spent some formative years of her life in the Netherlands, so she was familiar with frites and asked for curry ketchup. Unfortunately, they didn't have it. (But I know a great German deli that does carry curry ketchup...)

    I tried to stay somewhat healthy with a salad, but the croque monsieur pretty much cancelled out any fat-avoidance forgoing the frites may have offered. As we munched on our frites, I wondered how the Belgians managed to eat such profoundly fatty food and not blimp up like Americans. My friends revealed the secret in one word: nicotine.

  • The Old New Thing

    Quotation marks around spaces aren't necessary in the PATH environment variable

    • 28 Comments

    The purpose of quotation marks is to allow a character that would normally be interpreted as a delimiter to be included as part of a file name. Most of the time, this delimiter is the space. The CreateProcess function uses a space to separate the program name from its arguments. Most programs separate their command line arguments with a space. But the PATH environment variable doesn't use spaces to separate directories. It uses semicolons.

    This means that if you want to add a directory with spaces in its name to the path, you don't need quotation marks since spaces mean nothing to the PATH environment variable. The quotation marks don't hurt, mind you, but they don't help either.

    On the other hand, if the directory you want to add contains a semicolon in its name, then you do need the quotation marks, because it's the semicolon that you need to protect.

  • The Old New Thing

    When you crash, make sure you crash in the right place

    • 20 Comments

    Last time, I recommended that functions should just crash when given invalid pointers. There's a subtlety to this advice, however, and that's making sure you crash in the right place.

    If your function and your function's caller both reside on the same side of a security boundary, then go ahead and crash inside your function. If the caller is a bad guy who is trying to get your function to crash, then there's nothing the caller has accomplished if your function runs in the same security context as the caller. After all, if the caller wanted to make your program do something bad, it could've just done that bad thing itself! If it gave you a pointer to invalid memory and you crashed trying to access it, well the caller could have accomplished the same thing by just accessing the invalid memory directly.

    If your function resides on the other side of a security boundary, however, then having your function crash or behave erratically gives the malicious caller a power which he did not already have. For example, your function may reside in a service or local server, where the call arrives from another process. A malicious caller can pass intentionally malformed data to you via some form of IPC, causing your service or local server to crash. Or your function might reside in the same process as the caller but under a different security context. For example, it might be impersonating, or it may be operating on untrusted data.

    Another example of a security boundary is the boundary between user mode and kernel mode. Kernel mode cannot crash on parameters passed from user mode, because kernel mode runs at a higher protection level from user mode.

    In these cases, you want to make sure you crash in the correct context. In the IPC case, there typically will be a stub on the client side that does the hard work of taking the parameters and packaging them up for IPC. If the stub is given an invalid pointer, it should crash in the stub, so that the crash occurs in the same security context as the caller. A caller who passes an invalid pointer by mistake can then debug the crash in a context that is meaningful to the caller. (Of course, a malicious caller won't use your stub but will instead package the data manually and IPC it directly to the server. Your server can't crash on malicious inbound data, since that data came from a different security context.)

    If you're feeling really ambitious (and few people do), you can have the server react to malformed data by returning a special error code, which the stub detects and converts to an exception. Again, this doesn't do anything to crash the malicious caller, because the malicious caller is bypassing your stub. But it may help the caller who thought it was passing a valid pointer.

  • The Old New Thing

    Saturday is Museum Day, courtesy of Smithsonian Magazine

    • 3 Comments

    September 30, 2006 is the first Museum Day open to the general public. (Apparently, previous Museum Days were limited to subscribers of Smithsonian Magazine.)

    You will have to print out the Museum Day Admissions Coupon to get in. Some restrictions apply. Read the fine print.

    Even the Institute for Creation Research (an actual museum!) is getting into the act.

  • The Old New Thing

    IsBadXxxPtr should really be called CrashProgramRandomly

    • 81 Comments

    Often I'll see code that tries to "protect" against invalid pointer parameters. This is usually done by calling a function like IsBadWritePtr. But this is a bad idea. IsBadWritePtr should really be called CrashProgramRandomly.

    The documentation for the IsBadXxxPtr functions presents the technical reasons why, but I'm going to dig a little deeper. For one thing, if the "bad pointer" points into a guard page, then probing the memory will raise a guard page exception. The IsBadXxxPtr function will catch the exception and return "not a valid pointer". But guard page exceptions are raised only once. You just blew your one chance. When the code that is managing the guard page accesses the memory for what it thinks is the first time (but is really the second), it won't get the guard page exception but will instead get a normal access violation.

    Alternatively, it's possible that your function was called by some code that intentionally passed a pointer to a guard page (or a PAGE_NOACCESS page) and was expecting to receive that guard page exception or access violation exception so that it could dynamically generate the data that should go onto that page. (Simulation of large address spaces via pointer-swizzling is one scenario where this can happen.) Swallowing the exception in IsBadXxxPtr means that the caller's exception handler doesn't get a chance to run, which means that your code rejected a pointer that would actually have been okay, if only you had let the exception handler do its thing.

    "Yeah, but my code doesn't use guard pages or play games with PAGE_NOACCESS pages, so I don't care." Well, for one thing, just because your code doesn't use these features pages doesn't mean that no other code in your process uses them. One of the DLLs that you link to might use guard pages, and your use of IsBadXxxPtr to test a pointer into a guard page will break that other DLL.

    And second, your program does use guard pages; you just don't realize it. The dynamic growth of the stack is performed via guard pages: Just past the last valid page on the stack is a guard page. When the stack grows into the guard page, a guard page exception is raised, which the default exception handler handles by committing a new stack page and setting the next page to be a guard page.

    (I suspect this design was chosen in order to avoid having to commit the entire memory necessary for all thread stacks. Since the default thread stack size is a megabyte, this would have meant that a program with ten threads would commit ten megabytes of memory, even though each thread probably uses only 24KB of that commitment. When you have a small pagefile or are running without a pagefile entirely, you don't want to waste 97% of your commit limit on unused stack memory.)

    "But what should I do, then, if somebody passes me a bad pointer?"

    You should crash.

    No, really.

    In the Win32 programming model, exceptions are truly exceptional. As a general rule, you shouldn't try to catch them. And even if you decide you want to catch them, you need to be very careful that you catch exactly what you want and no more.

    Trying to intercept the invalid pointer and returning an error code creates nondeterministic behavior. Where do invalid pointers come from? Typically they are caused by programming errors. Using memory after freeing it, using uninitialized memory, that sort of thing. Consequently, an invalid pointer might actually point to valid memory, if for example the heap page that used to contain the memory has not been decomitted, or if the uninitialized memory contains a value that when reinterpreted as a pointer just happens to be a pointer to memory that is valid right now. On the other hand, it might point to truly invalid memory. If you use IsBadWritePtr to "validate" your pointers before writing to them, then in the case where it happens to point to memory that is valid, you end up corrupting memory (since the pointer is "valid" and you therefore decide to write to it). And in the case where it happens to point to an invalid address, you return an error code. In both cases, the program keeps on running, and then that memory corruption manifests itself as an "impossible" crash two hours later.

    In other words IsBadWritePtr is really CorruptMemoryIfPossible. It tries to corrupt memory, but if doing so raises an exception, it merely fails the operation.

    Many teams at Microsoft have rediscovered that IsBadXxxPtr causes bugs rather than fixes them. It's not fun getting a bucketful of crash dumps and finding that they are all of the "impossible" sort. You hunt through your code in search of this impossible bug. Maybe you find somebody who was using IsBadXxxPtr or equivalently an exception handler that swallows access violation exceptions and converts them to error codes. You remove the IsBadXxxPtr in order to let the exception escape unhandled and crash the program. Then you run the scenario again. And wow, look, the program crashes in that function, and when you debug it, you find the code that was, say, using a pointer after freeing it. That bug has been there for years, and it was manifesting itself as an "impossible" bug because the function was trying to be helpful by "validating" its pointers, when in fact what it was doing was taking a straightforward problem and turning it into an "impossible" bug.

    There is a subtlety to this advice that you should just crash when given invalid input, which I'll take up next time.

  • The Old New Thing

    News flash: The heart produces urine

    • 17 Comments

    In an attempt to explain why astronaut Heidemarie Stefanyshyn-Piper fainted during a welcome ceremony, ABC News reported

    The heart of an average person on Earth pumps blood throughout the body. But when an astronaut is in space, Levine explained, the blood remains predominantly in their chest cavity. Because of this, he said, the heart tries to get rid of excess blood through urination.
  • The Old New Thing

    Isn't DDE all asynchronous anyway?

    • 12 Comments

    "Isn't DDE all asynchronous anyway?" asks commenter KaiArnold.

    It's mostly asynchronous, but not entirely.

    You can read about how DDE works in MSDN, but since it seems people are reluctant to read the formal documentation, I'll repeat here the points relevant to the discussion.

    The DDE process begins with a search for a service provider. This is done by broadcasting the WM_DDE_INITIATE message and collecting the responses. Each server that wishes to respond to the request sends back a WM_DDE_ACK message. The DDE client then chooses which of the servers it wishes to continue the conversation with (possible more than one). The remainder of the DDE conversation is carried out with posted messages, the details of which are not important here.

    As you can see, everything in DDE is asynchronous with the exception of the WM_DDE_INITIATE. Why is WM_DDE_INITIATE synchronous?

    Remember that DDE was developed back in the 16-bit days, when it was safe to broadcast messages. The initiate message and its WM_DDE_ACK replies are synchronous to ensure that the client doesn't have to wait indefinitely to build a list of servers. If it were asynchronous, then the client would post the WM_DDE_INITIATE and then wait "a while" to see if anybody responded. But how does it know when it should give up waiting and just go with whatever it has? What happens if a response comes in after the client already proceeded based on the assumption that that server was unavailable? What if a response comes in five minutes later, when the client had started a second DDE discovery query? Would that response have been to the first or the second discovery broadcast?

    In particular, it is important for the client to know whether there are any servers out there at all, because the way the shell interprets DDE-based file associations is first to attempt a WM_DDE_INITIATE with the application and topic specified in the registration. If no server is found, then it launches the server manually and then tries to connect to the server via DDE a second time. (The second time should work, since the responsible server was explicitly launched!)

  • The Old New Thing

    Filming for The Battle in Seattle has begun

    • 21 Comments

    A few weeks ago, filming for the movie Battle in Seattle began.

    In Vancouver.

    Serves us right.

  • The Old New Thing

    Waiting until the dialog box is displayed before doing something

    • 30 Comments

    Last time, I left you with a few questions. Part of the answer to the first question was given in the comments, so I'll just link to that. The problem is more than just typeahead, though. The dialog box doesn't show itself until all message traffic has gone idle. If you actually ran the code presented in the original message, you'd find that it didn't actually work!

    #include <windows.h>
    
    INT_PTR CALLBACK
    DlgProc(HWND hwnd, UINT uiMsg, WPARAM wParam, LPARAM lParam)
    {
      switch (uiMsg) {
    
      case WM_INITDIALOG:
        PostMessage(hwnd, WM_APP, 0, 0);
        return TRUE;
    
      case WM_APP:
        MessageBox(hwnd,
                  IsWindowVisible(hwnd) ? TEXT("Visible")
                                        : TEXT("Not Visible"),
                   TEXT("Title"), MB_OK);
        break;
    
      case WM_CLOSE:
       EndDialog(hwnd, 0);
       break;
      }
    
      return FALSE;
    }
    
    int WINAPI WinMain(HINSTANCE hinst, HINSTANCE hinstPrev,
                       LPSTR lpCmdLine, int nShowCmd)
    {
        DialogBox(hinst, MAKEINTRESOURCE(1), NULL, DlgProc);
        return 0;
    }
    

    When you run this program, the message box says "Not Visible", and in fact when it appears, you can see that the main dialog is not yet visible. It doesn't show up until after you dismiss the message box.

    Mission: Not accomplished.

    Along the way, there was some dispute over whether the private message should be WM_USER or WM_APP. As we saw before, window messages in the WM_USER range belong to the window class, and in this case, the window class is the dialog window class, i.e., WC_DIALOG. Since you are not the implementor of the dialog window class (you didn't write the window procedure), the WM_USER messages are not yours for the taking. And in fact, if you had decided to use WM_USER you would have run into all sorts of problems, because it so happens that the dialog manager already defined that message for its own purposes:

    #define DM_GETDEFID         (WM_USER+0)
    

    When the dialog manager sends the dialog a DM_GETDEFID message to obtain the default control ID, you will think it's your WM_USER message and show your dialog box. It turns out that the dialog manager uses the default control ID rather often, and as a result, you're going to display an awful lot of dialog boxes. (Even worse, your second dialog box will probably use the dialog itself as the owner, which then leads to the problem of having a dialog box with multiple modal children, which then leads to disaster when they are dismissed by the user in the wrong order.)

    Okay, so we're agreed that we should use WM_APP as the private message.

    Some people suggested using a timer, on the theory that timer messages are lower priority than paint messages, so the timer won't fire until all painting is done. While that's true, it also doesn't help. The relative priority of timer and paint messages comes into play only if the window manager has to choose between timers and paint messages when deciding which one to deliver first. But if there are no paint messages needed in the first place, then timers are free to go ahead.

    And when the window is not visible, it doesn't need any paint messages. In a sense, the timer approach misses the point entirely: It's trying to take advantage of paint messages being higher priority precisely in the scenario where there are no paint messages!

    Let's demonstrate this by implementing the timer approach, but I'm going to add a twist to make the race condition clearer:

    ...
    
    INT_PTR CALLBACK
    DlgProc(HWND hwnd, UINT uiMsg, WPARAM wParam, LPARAM lParam)
    {
      switch (uiMsg) {
    
      case WM_INITDIALOG:
        SetTimer(hwnd, 1, 1, 0);
        Sleep(100); //simulate paging
        return TRUE;
    
      case WM_TIMER:
        if (wParam == 1) {
          KillTimer(hwnd, 1);
          MessageBox(hwnd,
                    IsWindowVisible(hwnd) ? TEXT("Visible")
                                          : TEXT("Not Visible"),
                     TEXT("Title"), MB_OK);
        }
        break;
    
      case WM_CLOSE:
       EndDialog(hwnd, 0);
       break;
      }
    
      return FALSE;
    }
    

    If you run this program, you'll see the message "Not Visible". I inserted an artificial Sleep(100) to simulate the case where the code takes a page fault and has to wait 100ms for the code to arrive from the backing store. (Maybe it's coming from the network or a CD-ROM, or maybe the local hard drive is swamped with I/O and you have to wait that long for your paging request to become satisfied after all the other I/O requests active on the drive.)

    As a result of that Sleep(), the dialog manager doesn't get a chance to empty the message queue and show the window because the timer message is already in the queue. Result: The timer fires and the dialog is still hidden.

    Some people waited for WM_ACTIVATE, but that tells you when the window becomes active, which is not the same as being shown, so it doesn't satisfy the original requirements.

    Others suggested waiting for WM_PAINT, but a window can be visible without painting. The WM_PAINT message arrives if the window's client area is uncovered, but the caption bar might still be visible even if the client area is covered. Furthermore, while this addresses the problem if you interpret "visible" as "results in pixels on the screen", as opposed to IsWindowVisible, you need to look behind the actual request to what the person was really looking for. (This is an important skill to have because people rarely ask for what they want, but rather for what they think they want.) The goal was to create a dialog box and have it look like the user automatically clicked a button on it to call up a secondary dialog. In order to get this look, the base dialog needs to be visible before the secondary dialog can be displayed.

    One approach is to show the second dialog on receipt of the WM_SHOWWINDOW, but even that is too soon:

    // In real life, this would be an instance variable
    BOOL g_fShown = FALSE;
    
    INT_PTR CALLBACK
    DlgProc(HWND hwnd, UINT uiMsg, WPARAM wParam, LPARAM lParam)
    {
      switch (uiMsg) {
    
      case WM_INITDIALOG:
        return TRUE;
    
      case WM_SHOWWINDOW:
        if (wParam && !g_fShown) {
          g_fShown = TRUE;
          MessageBox(hwnd,
                     IsWindowVisible(hwnd) ? TEXT("Visible")
                                           : TEXT("Not Visible"),
                     TEXT("Title"), MB_OK);
        }
        break;
    
      case WM_CLOSE:
       EndDialog(hwnd, 0);
       break;
      }
    
      return FALSE;
    }
    

    (Subtlety: Why do I set g_fShown = TRUE before displaying the message box?)

    If you run this program, you will still get the message "Not Visible" because WM_SHOWWINDOW is sent as part of the entire window-showing process. At the time you receive it, your window is in the process of being show but it's not quite there yet. The WM_SHOWWINDOW serves a similar purpose to WM_INITDIALOG: To let you prepare the window while it's still hidden so the user won't see ugly flashing which would otherwise occur if you had done your preparation after the window were visible.

    Is there a message that is sent after the window has been shown? There sure is: WM_WINDOWPOSCHANGED.

    INT_PTR CALLBACK
    DlgProc(HWND hwnd, UINT uiMsg, WPARAM wParam, LPARAM lParam)
    {
      switch (uiMsg) {
    
      case WM_INITDIALOG:
        return TRUE;
    
      case WM_WINDOWPOSCHANGED:
        if ((((WINDOWPOS*)lParam)->flags & SWP_SHOWWINDOW) &&
            !g_fShown) {
          g_fShown = TRUE;
          MessageBox(hwnd,
                     IsWindowVisible(hwnd) ? TEXT("Visible")
                                           : TEXT("Not Visible"),
                     TEXT("Title"), MB_OK);
        }
        break;
    
      case WM_CLOSE:
       EndDialog(hwnd, 0);
       break;
      }
    
      return FALSE;
    }
    

    This time, we get the message "Visible", because WM_WINDOWPOSCHANGED is sent after the window positioning negotiations are complete. (The "ED" at the end emphasizes that it is delivered after the operation has been done, as opposed to the "ING" which is delivered while the operation is in progress.)

    But wait, we're not out of the woods yet. Although it's true that the window position negotiations are complete, the message is nevertheless sent as part of the whole window positioning process, and there may be other things that need to be done as part of the whole window-showing bookkeeping. If you show the second dialog directly in your WM_WINDOWPOSCHANGED handler, then that clean-up won't happen until after the user exits the second dialog.

    For example, the window manager notifies Active Accessibility of the completed window positioning operation after all the window positions have settled down. This reduces the likelihood that the accessibility tool will be told "Okay, the window is shown" followed by "Oh no wait, it moved again, ha ha!" If you display the second dialog inside your WM_WINDOWPOSCHANGED handler, the screen reader will receive a bizarro sequence of events:

    • Second dialog shown.
    • (User interacts with second dialog and dismisses it.)
    • Second dialog destroyed.
    • (Your WM_WINDOWPOSCHANGED handler returns.)
    • Main dialog shown.

    Notice that the "Main dialog shown" notification arrives out of order because you did additional UI work before the previous operation was complete.

    As another example, the window may have been shown as part of a multiple-window window positioning operation such as one created by DeferWindowPos. All the affected windows will get their WM_WINDOWPOSCHANGED notifications one at a time, and if your window happened to go first, then those other windows won't know that they were repositioned until after the user finishes with the nested dialog. This may manifest itself in those other windows appearing to be "stuck" since your dialog is holding up the subsequent notifications with your nested dialog. For example, a window might be trying to do exactly what you're trying to do here, but since you're holding up the remainder of the notifications, that other window won't display its secondary dialog until the user dismisses yours. From the user's standpoint, that other window is "stuck" for no apparent reason.

    Therefore, we need one more tweak to our solution.

    INT_PTR CALLBACK
    DlgProc(HWND hwnd, UINT uiMsg, WPARAM wParam, LPARAM lParam)
    {
      switch (uiMsg) {
    
      case WM_INITDIALOG:
        return TRUE;
    
      case WM_WINDOWPOSCHANGED:
        if ((((WINDOWPOS*)lParam)->flags & SWP_SHOWWINDOW) &&
            !g_fShown) {
          g_fShown = TRUE;
          PostMessage(hwnd, WM_APP, 0, 0);
        }
        break;
    
    
      case WM_APP:
          MessageBox(hwnd,
                     IsWindowVisible(hwnd) ? TEXT("Visible")
                                           : TEXT("Not Visible"),
                     TEXT("Title"), MB_OK);
          break;
    
      case WM_CLOSE:
       EndDialog(hwnd, 0);
       break;
      }
    
      return FALSE;
    }
    

    When we learn that the dialog is being shown for the first time, we post a message to ourselves to display the secondary dialog and return from the WM_WINDOWPOSCHANGED handler. This allows the window positioning operation to complete. Everybody gets their notifications, they are all on board with the state of the windows, and only after everything has stabilized do we display our message box.

    This is a common thread to many types of window management. Many window messages are notifications which are delivered while the operation is still in progress. You do not want to display new UI while handling those notifications because that holds up the completion of the original UI operation that generated the notification in the first place. Posting a message to yourself to complete the user interaction after the original operation has stabilized is the standard solution.

  • The Old New Thing

    Hand gestures for numbers

    • 21 Comments

    When I was in Los Angeles for Thanksgiving, I began noticing the hand gestures that accompanied numbers. When people said "six", they often punctuated it by holding out their hand with the thumb and pinky extended, palm towards the speaker. That's because they were using Chinese number gestures. (It so happens that the gesture for "six" is the same in both the Chinese and Taiwanese systems.)

    What was particularly amusing was that when I asked them about it later, they had no recollection that they had done it and didn't even notice that I was doing it.

    "I noticed you said 'six'," as I make the 'six' gesture.

    "Right, it weighed six pounds."

    "But you did this," and I make the gesture again.

    "I did what?"

    "This. The hand."

    "Oh, the thing with the hand. Right, that means six."

    The use of the hand gesture was unconscious and automatic.

    As for me, I use the Korean system of Chisenbop which I taught myself in sixth grade from a book. To me, it's much more logical than the Chinese, Taiwanese, or ASL versions, all of whom I had taught myself at one point or another but now can recall only with some effort. In particular, Chisenbop lends itself much more easily to doing computations rather than merely conveying a value. For example, adding nine is "down one on left, up one on right". You can do computations on your hand like an abacus, paying no attention to the actual value but merely manipulating your fingers mechanically, and then when you're done, you look down at your hand to see what the result is.

Page 1 of 4 (37 items) 1234