September, 2006

  • The Old New Thing

    IsBadXxxPtr should really be called CrashProgramRandomly

    • 81 Comments

    Often I'll see code that tries to "protect" against invalid pointer parameters. This is usually done by calling a function like IsBadWritePtr. But this is a bad idea. IsBadWritePtr should really be called CrashProgramRandomly.

    The documentation for the IsBadXxxPtr functions presents the technical reasons why, but I'm going to dig a little deeper. For one thing, if the "bad pointer" points into a guard page, then probing the memory will raise a guard page exception. The IsBadXxxPtr function will catch the exception and return "not a valid pointer". But guard page exceptions are raised only once. You just blew your one chance. When the code that is managing the guard page accesses the memory for what it thinks is the first time (but is really the second), it won't get the guard page exception but will instead get a normal access violation.

    Alternatively, it's possible that your function was called by some code that intentionally passed a pointer to a guard page (or a PAGE_NOACCESS page) and was expecting to receive that guard page exception or access violation exception so that it could dynamically generate the data that should go onto that page. (Simulation of large address spaces via pointer-swizzling is one scenario where this can happen.) Swallowing the exception in IsBadXxxPtr means that the caller's exception handler doesn't get a chance to run, which means that your code rejected a pointer that would actually have been okay, if only you had let the exception handler do its thing.

    "Yeah, but my code doesn't use guard pages or play games with PAGE_NOACCESS pages, so I don't care." Well, for one thing, just because your code doesn't use these features pages doesn't mean that no other code in your process uses them. One of the DLLs that you link to might use guard pages, and your use of IsBadXxxPtr to test a pointer into a guard page will break that other DLL.

    And second, your program does use guard pages; you just don't realize it. The dynamic growth of the stack is performed via guard pages: Just past the last valid page on the stack is a guard page. When the stack grows into the guard page, a guard page exception is raised, which the default exception handler handles by committing a new stack page and setting the next page to be a guard page.

    (I suspect this design was chosen in order to avoid having to commit the entire memory necessary for all thread stacks. Since the default thread stack size is a megabyte, this would have meant that a program with ten threads would commit ten megabytes of memory, even though each thread probably uses only 24KB of that commitment. When you have a small pagefile or are running without a pagefile entirely, you don't want to waste 97% of your commit limit on unused stack memory.)

    "But what should I do, then, if somebody passes me a bad pointer?"

    You should crash.

    No, really.

    In the Win32 programming model, exceptions are truly exceptional. As a general rule, you shouldn't try to catch them. And even if you decide you want to catch them, you need to be very careful that you catch exactly what you want and no more.

    Trying to intercept the invalid pointer and returning an error code creates nondeterministic behavior. Where do invalid pointers come from? Typically they are caused by programming errors. Using memory after freeing it, using uninitialized memory, that sort of thing. Consequently, an invalid pointer might actually point to valid memory, if for example the heap page that used to contain the memory has not been decomitted, or if the uninitialized memory contains a value that when reinterpreted as a pointer just happens to be a pointer to memory that is valid right now. On the other hand, it might point to truly invalid memory. If you use IsBadWritePtr to "validate" your pointers before writing to them, then in the case where it happens to point to memory that is valid, you end up corrupting memory (since the pointer is "valid" and you therefore decide to write to it). And in the case where it happens to point to an invalid address, you return an error code. In both cases, the program keeps on running, and then that memory corruption manifests itself as an "impossible" crash two hours later.

    In other words IsBadWritePtr is really CorruptMemoryIfPossible. It tries to corrupt memory, but if doing so raises an exception, it merely fails the operation.

    Many teams at Microsoft have rediscovered that IsBadXxxPtr causes bugs rather than fixes them. It's not fun getting a bucketful of crash dumps and finding that they are all of the "impossible" sort. You hunt through your code in search of this impossible bug. Maybe you find somebody who was using IsBadXxxPtr or equivalently an exception handler that swallows access violation exceptions and converts them to error codes. You remove the IsBadXxxPtr in order to let the exception escape unhandled and crash the program. Then you run the scenario again. And wow, look, the program crashes in that function, and when you debug it, you find the code that was, say, using a pointer after freeing it. That bug has been there for years, and it was manifesting itself as an "impossible" bug because the function was trying to be helpful by "validating" its pointers, when in fact what it was doing was taking a straightforward problem and turning it into an "impossible" bug.

    There is a subtlety to this advice that you should just crash when given invalid input, which I'll take up next time.

  • The Old New Thing

    If you ask a Yes/No question, make sure the user also knows what happens when they say No

    • 63 Comments

    I was talking with someone last year who had a gripe about a music organizer program. Suppose you create some playlists and then decide, "Oh, nevermind, I don't like this playlist." You highlight the playlist and click "Delete". You then get a dialog box that asks, "Do you want to move the songs in this playlist to the Recycle Bin?"

    "Well, no, I don't want you to recycle those songs. I want to keep the songs. Just delete the playlist," you say to yourself and you click "No".

    Unfortunately, the program was asking you, "Do you want to move the songs to the Recycle Bin or delete the songs permanently from your computer?" The program had already decided that you wanted to delete the songs themselves when you deleted the playlist. It just wanted to know whether you wanted them gone immediately or just tossed into the Recycle Bin. Fortunately, my friend had backups of the songs that had mistakenly been purged from the computer, but it was still quite shocking to see all the music just plain disappear when there was no expectation that anything of the sort was going to happen.

    When programs put up Yes/No dialogs, they usually don't have a problem explaining what will happen when you click Yes. But they also have to make sure users understand what will happen when they click No.

    Window Vista's new Task Dialog makes it easier for programs to make it clearer to users what will happen as the result of pushing a button on a dialog box. Instead of being limited to just "Yes" and "No", you can put more meaningful text on the buttons such as "Save" and "Don't Save", or you could use command buttons and provide an explanatory sentence for each option. Now, programs could always have built custom dialogs with these more descriptive buttons, but doing so meant designing a dialog box from scratch, positioning the buttons precisely according to dialog box layout guidelines, and then writing a custom dialog procedure to handle this new custom dialog. Most people just take the easy way out and use MessageBox. In Windows Vista there is now a way to build slightly more complex dialogs without having to design a dialog template.

    Be careful, however, not to fall into the same trap with task dialogs. The original dialog might have been converted to a task dialog with the buttons "Recycle" and "Don't Recycle", which would not have solved the problem at all.

  • The Old New Thing

    Why doesn't the Shutdown dialog use Alt to get alternate behavior?

    • 59 Comments

    When you select "Shut Down" from the Start menu, a dialog appears with three options: "Stand By", "Turn Off" and "Restart". To get the secret fourth option "Hibernate" you have to press the shift key. Would the Alt key be the more obvious choice for revealing alternate options?

    You might think so, but it so happens that Alt already has meaning. In this dialog, the Alt key would be a disaster, because the underlined letters indicate keyboard accelerators, which are pressed in conjunction with the Alt key. In other words, from the Shut Down dialog, you can type Alt+S to stand by, Alt+U to turn off, or Alt+R to restart. Since the Alt key was already taken, the Shift key had to be used to reveal the bonus options. Using the Shift key to reveal bonus options is not uncommon. You can hold the Shift key while right-clicking on a file to get an extended context menu, and of course there's the Shift+No option in file copy dialogs to mean "No to all".

    In fact, you don't need to press the Alt or Shift keys at all. Recall that the rules for dialog box navigation permit omitting the Alt key if the focus control does not accept character input; since the only things on the Shut Down dialog are pushbuttons, there is no character input and you can just press "S", "U" or "R" without the Alt key. What's more, you don't need to hold the Shift key if you want to shut down; you can just type "H" and the Hibernate option will be invoked, because hotkeys for hidden controls are still active.

  • The Old New Thing

    Allocating and freeing memory across module boundaries

    • 55 Comments

    I'm sure it's been drilled into your head by now that you have to free memory with the same allocator that allocated it. LocalAlloc matches LocalFree, GlobalAlloc matches GlobalFree, new[] matches delete[]. But this rule goes deeper.

    If you have a function that allocates and returns some data, the caller must know how to free that memory. You have a variety of ways of accomplishing this. One is to state explicitly how the memory should be freed. For example, the FormatMessage documentation explicitly states that you should use the LocalFree function to free the buffer that is allocated if you pass the FORMAT_MESSAGE_ALLOCATE_BUFFER flag. All BSTRs must be freed with SysFreeString. And all memory returned across COM interface boundaries must be allocated and freed with the COM task allocator.

    Note, however, that if you decide that a block of memory should be freed with the C runtime, such as with free, or with the C++ runtime via delete or delete[], you have a new problem: Which runtime?

    If you choose to link with the static runtime library, then your module has its own private copy of the C/C++ runtime. When your module calls new or malloc, the memory can only be freed by your module calling delete or free. If another module calls delete or free, that will use the C/C++ runtime of that other module which is not the same as yours. Indeed, even if you choose to link with the DLL version of the C/C++ runtime library, you still have to agree which version of the C/C++ runtime to use. If your DLL uses MSVCRT20.DLL to allocate memory, then anybody who wants to free that memory must also use MSVCRT20.DLL.

    If you're paying close attention, you might spot a looming problem. Requiring all your clients to use a particular version of the C/C++ runtime might seem reasonable if you control all of the clients and are willing to recompile all of them each time the compiler changes. But in real life, people often don't want to take that risk. "If it ain't broke, don't fix it." Switching to a new compiler risks exposing a subtle bug, say, forgetting to declare a variable as volatile or inadvertently relying on temporaries having a particular lifetime.

    In practice, you may wish to convert only part of your program to a new compiler while leaving old modules alone. (For example, you may want to take advantage of new language features such as templates, which are available only in the new compiler.) But if you do that, then you lose the ability to free memory that was allocated by the old DLL, since that DLL expects you to use MSVCRT20.DLL, whereas the new compiler uses MSVCR71.DLL.

    The solution to this requires planning ahead. One option is to use a fixed external allocator such as LocalAlloc or CoTaskMemAlloc. These are allocators that are universally available and don't depend on which version of the compiler you're using.

    Another option is to wrap your preferred allocator inside exported functions that manage the allocation. This is the mechanism used by the NetApi family of functions. For example, the NetGroupEnum function allocates memory and returns it through the bufptr parameter. When the caller is finished with the memory, it frees it with the NetApiBufferFree function. In this manner, the memory allocation method is isolated from the caller. Internally, the NetApi functions might be using LocalAlloc or HeapAllocate or possibly even new and free. It doesn't matter; as long as NetApiBufferFree frees the memory with the same allocator that NetGroupEnum used to allocate the memory in the first place.

    Although I personally prefer using a fixed external allocator, many people find it more convenient to use the wrapper technique. That way, they can use their favorite allocator throughout their module. Either way works. The point is that when memory leaves your DLL, the code you gave the memory to must know how to free it, even if it's using a different compiler from the one that was used to build your DLL.

  • The Old New Thing

    Things you already know: How do I wait until my dialog box is displayed before doing something?

    • 53 Comments

    One customer wanted to wait until the dialog box was displayed before displaying its own dialog box. (Personally, I think immediately displaying a doubly-nested dialog box counts as starting off on the wrong foot from a usability standpoint, but let's set that issue aside for now.) The customer discovered that displaying the nested dialog box in response to the WM_INITDIALOG message was premature, because as we all know, the WM_INITDIALOG is sent before the dialog box is displayed. The question therefore is, "How do I want until my dialog box is displayed before doing something?"

    One proposed solution was the following code fragment:

    case WM_INITDIALOG:
        PostMessage(hDlg, WM_APP, 0, 0);
        return TRUE;
    
    case WM_APP:
        ... display the second dialog ...
        break;
    

    1. Why is this wrong? Hint: You definitely know the answer to this already.
    2. What is the correct solution? You probably know this already.
  • The Old New Thing

    Just change that 15 to a 1

    • 51 Comments

    It would be nice and easy to just change that 15 to a 1.

    If only it were that simple.

    In the case described in that article, it's not that a single operation was attempted fifteen times in a loop. Rather, the fifteen operations were scattered all over the program. Suppose, for example, that the network operation was "Get the attributes of this file." The program might be filling in a file listing with several columns, one for the icon, another for the file name, another for the file author, and the last one for the last-modified time.

    for each filename in directory {
     list.Add(new ListElement(filename));
    }
    

    Well, that doesn't access the same file fifteen times. Oh wait, there's more. What happens when it comes time to draw that list element?

    ListElement::DrawIcon()
    {
     if (m_whichIcon == don't know)
     {
      m_whichIcon = GetIcon(m_filename);
     }
     draw the icon for the element
    }
    
    // with this common helper function
    GetIcon(filename)
    {
     if (filename is a directory) {
      return FolderIcon;
     } else {
      return PieceOfPaper;
     }
    }
    

    Okay, getting the icon accesses the file once. You can imagine a similar exercise for getting the file's last-modified time. What else?

    ListElement::GetAuthor()
    {
     if (m_author == don't know) {
      AuthorProvider = LoadAuthorProvider();
      m_author = AuthorProvider->GetFileAuthor(m_filename);
     }
     return m_author;
    }
    
    // where the author provider is implemented in a
    // separate component
    GetFileAuthor(filename)
    {
     if (filename is offline) {
      return "";
     } else {
      ... open the file and get the author ...
     }
    }
    

    Getting the author accesses the file once to see if it is offline, then again to get the actual author (if the file is online).

    So in this simple sketch, we accessed the file a total of five times. It's not like there's a 5 in this program you can change to a 1. Rather, it's a bunch of 1's spread all over the place. (And one of the 1's is in a separate component, the hypothetical Author Provider.)

    It reminds of a story I may have read in John Gall's Systemantics: There was a server product that was having problems under load. Once there were more than thirty simultaneous users, the system slowed to a crawl, but the customer needed to support fifty users. At a meeting convened to discuss this problem, an engineer joked, "Well, we just have to search the source code for the #define that says thirty and change it to fifty."

    All the people at the meeting laughed, except one, who earnestly asked, "Yeah, so why don't we do that?"

  • The Old New Thing

    Eating Belgian food at Brouwer's Cafe in Fremont

    • 39 Comments

    Last year, some friends and I went for dinner at Brouwer's Café, a Belgian pub/restaurant in the Fremont neighborhood of Seattle. The menu is pub food, which means that everything comes with frites and a choice of several dipping sauces, none of which is ketchup. One of my friends spent some formative years of her life in the Netherlands, so she was familiar with frites and asked for curry ketchup. Unfortunately, they didn't have it. (But I know a great German deli that does carry curry ketchup...)

    I tried to stay somewhat healthy with a salad, but the croque monsieur pretty much cancelled out any fat-avoidance forgoing the frites may have offered. As we munched on our frites, I wondered how the Belgians managed to eat such profoundly fatty food and not blimp up like Americans. My friends revealed the secret in one word: nicotine.

  • The Old New Thing

    The danger of using boldface for Chinese characters

    • 34 Comments

    Take care not to hard-code boldfacing or italics into your code. Chinese and Japanese characters (and to a somewhat lesser extent, Korean characters) do not suffer many types of boldface well. (The three scripts Chinese, Japanese and Korean are collectively called "CJK" scripts.) Many characters are very intricate, and making the strokes bolder can result in a big blob of ink. Consider the Traditional Chinese character which means "celebrate". Making it boldface results in which, depending on which operating system and fonts you have installed, might come out as an unreadable mess. Similarly, CJK scripts do not often use italics on screen, since they also make the characters harder to read.

    What should you do in your program if you need to emphasize something? You should let the localizers choose how they want the emphasis to be performed by allowing them to specify the font face, size and attributes. That way, the localizers for Western languages can specify boldface, whereas those localizing for Chinese or Japanese can specify a larger font size or different font face instead.

    Different versions of Windows cope with boldface CJK fonts with different degrees of success. Windows 2000, as I recall, used a simplified simulated boldface, which didn't handle CJK fonts very well. Windows XP, I'm told, has an enhanced boldface simulation algorithm for CJK fonts. The result is not perfect, but it is much better than what was available in Windows 2000. Windows Vista finally bridges the gap and provides separate boldface fonts for CJK. These custom boldface fonts have been tuned specifically for this purpose and (I'm told) are quite readable.

  • The Old New Thing

    If you don't trust your administrators, you've already lost

    • 34 Comments

    Occasionally, a customer will ask for a way they can restrict what the administrator can do. The short answer to this is, "Um, no, that's why they're called 'Administrator'." You can try to set up roadblocks, say, ACL files to revoke access to a file you don't want the administrator to read, but the Administrator can always take ownership of the file and read the contents that way. At the end of the day, the Administrator owns the local machine.

    Often, people ask this question because they want to grant certain employees selected subsets of the full set of capabilities available to the Administrator. The way to do this is not to make the user an administrator and then try to rope off the parts you don't want them to use. Rather, you take the things that you do want them to be able to do and delegate that permission and only that permission to them (discretionary access control).

    For more information, check out this column on trustworthy administrators (based, I am told, on a TechEd presentation) by Steve Riley (and his uncredited co-presenter Jesper Johansson).

  • The Old New Thing

    Grammar review: Verb+particle versus compound noun

    • 32 Comments

    Although the inflections and compound-mania are largely absent from the English language, there are still some vestiges of its Germanic roots. One detail of English grammar that I often see neglected is the distinction between the verb+particle and the compound noun.

    Consider the verb phrase "to shut down", which is the one I see misused most often. This is a verb+particle combination and is treated as two words. When you turn it into a noun, however, it becomes "shutdown", one word. This Knowledge Base article, for example, manages to keep its head on straight for most of the article, using the verb+particle for the verb form and the compound for the noun form:

    \\computername: Use this switch to specify the remote computer to shut down.

    /a: Use this switch to quit a shutdown operation.

    But then it slips up towards the end and uses the compound as a verb:

    To schedule the local computer to shutdown and restart at 10:00 P.M. ...

    In other Germanic languages the distinction is clearer. Consider the Swedish and German verbs for "to make up" (as in, "to make up an alibi"):

    hitta på   påhittad
    legen zurecht   zurechtlegen

    In the verb+particle form, the particle comes after the verb, whereas in the single-word form, the particle comes before the verb. It's therefore more obvious when you have one word and when you have two. English does this only rarely, typically for verbs that retain poetic or archaic appeal ("cast down" → "downcast") and therefore reach back to the language's German roots for their power.

    This is one of the reasons why I'm so fascinated by the Germanic languages: The more I learn about the other languages, the more I learn about my own.

Page 1 of 4 (37 items) 1234