May, 2012

  • The Old New Thing

    The extern "C" specifier disables C++ mangling, but that doesn't mean it disables mangling


    The MSDN documentation on dllexport contains the following enigmatic paragraph, or at least did at the time I wrote this article:

    dllexport of a C++ function will expose the function with C++ name mangling. If C++ name mangling is not desired, either use a .def file (EXPORTS keyword) or declare the function as extern "C".

    I've seen this sentence misinterpreted as follows:

    dllexport of a C++ function will expose the function with C++ name mangling. To disable name mangling either use a .def file (EXPORTS keyword) or declare the function as extern "C".

    This is an understandable misinterpretation, but it is still a misinterpretation.

    The root cause of the misinterpretation is that the author of this documentation was wearing C++-colored glasses. In the author's mind, there are only two interesting cases:

    1. C++ name mangling, where all the cool people are, and
    2. everything else, for all the lamers.

    Here is a precise formulation of the paragraph:

    dllexport of a C++ function will expose the function with C++ name mangling. If C++ name mangling is not desired, either use a .def file (EXPORTS keyword), which will expose the name without mangling, or declare the function as extern "C", which will expose the name with C mangling.

    Here's a version of the paragraph that tries to take away the C++-colored glasses.

    dllexport exposes the function as it is decorated by the compiler. For example, if the function is a C++ function, it will be exposed with C++ name mangling. If the function is a C function, or has been declared as extern "C", it will be exposed with C name mangling. To expose the function under its unmangled name (or to expose it via an alternate name), use use a .def file (EXPORTS keyword).

    Behind the scenes: To forestall nitpickers, I had to go back to my copy of the C++ standard to make sure I filled in the blank in "The extern "C" _________" correctly. Officially, extern "C" is a storage class specifier.

  • The Old New Thing

    GUIDs are designed to be unique, not random


    A customer liaison asked, "My customer is looking for information on the GUID generation algorithm. They need to select N items randomly from a pool of M (jury selection), and their proposed algorithm is to assign each item a GUID, then sort the items by GUID and take the first N." (I've seen similar questions regarding using GUIDs for things like passwords or other situations where the programmer is looking for a way to generate a value that cannot be predicted.)

    The GUID generation algorithm was designed for uniqueness. It was not designed for randomness or for unpredictability. Indeed, if you look at an earlier discussion, you can see that so-called Algorithm 1 is non-random and totally predictable. If you use an Algorithm 1 GUID generator to assign GUIDs to candidates, you'll find that the GUIDs are assigned in numerically ascending order (because the timestamp increases). The customer's proposed algorithm would most likely end up choosing for jury duty the first N people entered into the system after a 32-bit timer rollover. Definitely not random.

    Similarly, the person who wanted to use a GUID for password generation would find that the passwords are totally predictable if you know what time the GUID was generated and which computer generated the GUID (which you can get by looking at the final six bytes from some other password-GUID). Totally-predictable passwords are probably not a good idea.

    Even the Version 4 GUID algorithm (which basically says "set the version to 4 and fill everything else with random or pseudo-random numbers") is not guaranteed to be unpredictable, because the algorithm does not specify the quality of the random number generator. The Wikipedia article for GUID contains primary research which suggests that future and previous GUIDs can be predicted based on knowledge of the random number generator state, since the generator is not cryptographically strong.

    If you want a random number generator, then use a random number generator.

    Bonus reading: Eric Lippert's GUID Guide, part 1, part 2, and part 3.

  • The Old New Thing

    What is the historical reason for MulDiv(1, -0x80000000, -0x80000000) returning 2?


    Commenter rs asks, "Why does Windows (historically) return 2 for MulDiv(1, -0x80000000, -0x80000000) while Wine returns zero?"

    The MulDiv function multiplies the first two parameters and divides by the third. Therefore, the mathematically correct answer for MulDiv(1, -0x80000000, -0x80000000) is 1, because a × b ÷ b = a for all nonzero b.

    So both Windows and Wine get it wrong. I don't know why Wine gets it wrong, but I dug through the archives to figure out what happened to Windows.

    First, some background. What's the point of the MulDiv function anyway?

    Back in the days of 16-bit Windows, floating point was very expensive. Most people did not have math coprocessors, so floating point was performed via software emulation. And the software emulation was slow. First, you issued a floating point operation on the assumption that you had a float point coprocessor. If you didn't, then a coprocessor not available exception was raised. This exception handler had a lot of work to do.

    It decoded the instruction that caused the exception and then emulated the operation. For example, if the bytes at the point of the exception were d9 45 08, the exception handler would have to figure out that the instruction was fld dword ptr ds:[di][8]. It then had to simulate the operation of that instruction. In this case, it would retrieve the caller's di register, add 8 to that value, load four bytes from that address (relative to the caller's ds register), expand them from 32-bit floating point to 80-bit floating point, and push them onto a pretend floating point stack. Then it advanced the instruction pointer three bytes and resumed execution.

    This took an instruction that with a coprocessor would take around 40 cycles (already slow) and ballooned its total execution time to a few hundred, probably thousand cycles. (I didn't bother counting. Those who are offended by this horrific laziness on my part can apply for a refund.)

    It was in this sort of floating point-hostile environment that Windows was originally developed. As a result, Windows has historically avoided using floating point and preferred to use integers. And one of the things you often have to do with integers is scale them by some ratio. For example, a horizontal dialog unit is ¼ of the average character width, and a vertical dialog unit is 1/8 of the average character height. If you have a value of, say, 15 horizontal dlu, the corresponding number of pixels is 15 × average character width ÷ 4. This multiply-then-divide operation is quite common, and that's the model that the MulDiv function is designed to help out with.

    In particular, MulDiv took care of three things that a simple a × b ÷ c didn't. (And remember, we're in 16-bit Windows, so a, b and c are all 16-bit signed values.)

    • The intermediate product a × b was computed as a 32-bit value, thereby avoiding overflow.
    • The result was rounded to the nearest integer instead of truncated toward zero
    • If c = 0 or if the result did not fit in a signed 16-bit integer, it returned INT_MAX or INT_MIN as appropriate.

    The MulDiv function was written in assembly language, as was most of GDI at the time. Oh right, the MulDiv function was exported by GDI in 16-bit Windows. Why? Probably because they were the people who needed the function first, so they ended up writing it.

    Anyway, after I studied the assembly language for the function, I found the bug. A shr instruction was accidentally coded as sar. The problem manifests itself only for the denominator −0x8000, because that's the only one whose absolute value has the high bit set.

    The purpose of the sar instruction was to divide the denominator by two, so it can get the appropriate rounding behavior when there is a remainder. Reverse-compiling back into C, the function goes like this:

    int16 MulDiv(int16 a, int16 b, int16 c)
     int16 sign = a ^ b ^ c; // sign of result
     // make everything positive; we will apply sign at the end
     if (a < 0) a = -a;
     if (b < 0) b = -b;
     if (c < 0) c = -c;
     //  add half the denominator to get rounding behavior
     uint32 prod = UInt16x16To32(a, b) + c / 2;
     if (HIWORD(prod) >= c) goto overflow;
     int16 result = UInt32Div16To16(prod, c);
     if (result < 0) goto overflow;
     if (sign < 0) result = -result;
     return result;
     return sign < 0 ? INT_MIN : INT_MAX;

    Given that I've already told you where the bug is, it should be pretty easy to spot in the code above.

    Anyway, when this assembly language function was ported to Win32, it was ported as, well, an assembly language function. And the port was so successful, it even preserved (probably by accident) the sign extension bug.

    Mind you, it's a bug with amazing seniority.

  • The Old New Thing

    What was the registry like in 16-bit Windows?


    Commenter Niels wonders when and how the registry was introduced to 16-bit Windows and how much of it carried over to Windows 95.

    The 16-bit registry was extremely simple. There were just keys, no values. The only hive was HKEY_CLASSES_ROOT. All it was used for was COM objects and file associations. The registry was stored in the REG.DAT file, and its maximum size was 64KB.

    It is my recollection that the registry was introduced in Windows 3.1, but Niels says it's not in a plain vanilla install, so I guess my memory is faulty.

    None of the 16-bit registry code was carried over to Windows 95. Windows 95 extended the registry into kernel mode, added support for values and non-string data types, increased the maximum registry size (though if some people are to be believed, not by enough), and added a bunch of other hives, like added the HKEY_CURRENT_USER, HKEY_LOCAL_MACHINE, and the HKEY_DYN_DATA, The old 16-bit registry code was woefully inadequate for all these new requirements (especially the kernel mode part), so it was all thrown out and a brand new registry written.

    In the early days of the Windows 95 registry, the in-memory signature value to identify the data structures which represent an open registry key were four bytes which corresponded to the ASCII values for the initials of the two programmers who wrote it.

  • The Old New Thing

    Why is the Close button in the upper right corner?

    Chris wants to know how the close button ended up to the right of the minimize and maximize/restore buttons. "In OS/2, it is on the left, which left the two other buttons in place."

    I don't know why the Close button went to the upper right instead of going to the left of the other buttons, but I'm going to guess. (That's what I do around here most of the time anyway; I just don't usually call it out.)

    Two words: Fitts's Law.

    The corners of the screen are very valuable, because users can target them with very little effort. You just slam the mouse in the direction you want, and the cursor goes into the corner. And since closing a window is a much more common operation than minimizing, maximizing, and restoring it, it seems a natural choice to give the close button the preferred location.

    Besides, maximizing and restoring a window already have very large targets, namely the entire caption. You can double-click the caption to maximize, and double-click again to restore. The restore even gets you a little bit of Fitt's Law action because the top of the screen makes the height of the caption bar effectively infinite.

  • The Old New Thing

    Charles Petzold is back with another edition of Programming Windows


    Back in the day (and perhaps still true today), Charles Petzold's Programming Windows was the definitive source for learning to program Windows. The book is so old that even I used it to learn Windows programming, back when everything was 16-bit and uphill both ways. The most recent edition is Programming Windows, 5th Edition, which was published way back in 1998. What has he been doing since then? My guess would have been "sitting on a beach in Hawaiʻi," but apparently he's been writing books on C# and Windows Forms and WPF and Silverlight. Hey, I could still be right: Maybe he writes the books while sitting on a beach in Hawaiʻi.

    It appears that Windows 8 has brought Mr. Petzold back to the topic of Windows progarmming, and despite his earlier claims that he has no plans to write a sixth edition of Programming Windows, it turns out that he's writing a sixth edition of Programming Windows specifically for Windows 8. (Perhaps he could subtitle his book The New Old Thing.)

    Here's where it gets interesting.

    Before the book officially releases (target date November 15), there will be two pre-release versions in eBook form, one based on the Consumer Preview of Windows 8 and one based on the Release Preview.

    Now it gets really interesting: If you order the Consumer Preview eBook, it comes with free upgrades to the Release Preview eBook as well as the final eBook. (If you order the Release Preview eBook, then it comes with a free upgrade to the final eBook.)

    Can it get even more interesting than that? You bet! Because the price of getting in on the action increases the longer you wait. Act now, and you can get the Consumer Preview eBook (and all the free upgrades that come with it) for just $10. Wait a few weeks, and it'll cost you $20. Wait another few months, and it'll cost you $30; after another few weeks the price goes up to $40, and if you are a lazy bum and wait until the final eBook to be released, it'll cost you $50.

    But in order to take advantage of this offer, you have to follow the instructions on this blog entry from Microsoft Press (and read the mandatory legal mumbo-jumbo, because the lawyers always get their say).

    Bonus chatter: One publisher asked me if I wanted to write a book on programming Windows 8, but I told them that I was too busy shipping Windows 8 to have any extra time to write a book about it. And it's a good thing I turned them down, because imagine if I decided to write the book and found that Charles Petzold was coming out of retirement to write his own book. My book would have done even worse than my first book, which didn't even have any competition!

    Bonus disclaimer: Charles Petzold did not pay me to write this, nor did he offer me a cut of his royalties for shilling his book. But that doesn't mean I won't accept it! (Are you listening, Charles?)

  • The Old New Thing

    Why are the Windows 7 system notification icons colorless?


    Mike wondered why the system notification icons went colorless in Windows 7 and why they went back to regular tooltips instead of the custom tooltips.

    I don't know either, so I asked Larry Osterman, who was in charge of the Volume icon.

    And he didn't know either. He was merely given new icons by the design team.

    But that doesn't stop me from guessing. (Which is what I do most of the time here, I just don't explicitly say that I'm guessing.)

    My guess is that the design team looked at the new Windows 7 taskbar and noticed that all the system-provided pieces were subdued and unobtrusive, with two exceptions: The Start button itself and the notification icons. The Start button kept its bright colors because, well, it's the Start button. But the notification icons? They are peripheral elements; why do they stand out on an otherwise neutral-looking taskbar? Isn't that just drawing the user's attention to something that doesn't deserve attention?

    So boom, make them monochromatic to fit with the other taskbar elements. The clock is monochromatic. The Show Desktop button is monochromatic. The taskbar itself is monochromatic. Hooray, aesthetic unity is restored.

    As for the return to standard tooltips, that's easy: The custom tooltip was a violation of the user interface guidelines.

    The old Windows Vista custom tooltip did not provide any useful information beyond the standard tooltip, so you paid the cost of developing and maintaining a custom tooltip for very little benefit. In the volume tooltip's case, the developers were spending effort fixing little bugs here and there (for example, there were painting glitches under certain accessibility conditions), effort that was detracting from other work that could be done, and switching to the standard tooltip made all the problems go away.

  • The Old New Thing

    Why is there sometimes a long delay between pressing a hotkey for a shortcut and opening the shortcut?


    Via a customer liaison, we received a problem report from a customer.

    The customer is facing issues with delayed desponses to opening a .lnk file by pressing its keyboard shortcut hotkey. This delay does not appear when the shortcut is double-clicked.

    For example, the customer has created a shortcut to Notepad and assigned it the shortcut Ctrl+Alt+X. Pressing the keyboard combination sometimes takes 5 or 6 seconds for Notepad to open. As noted above, double-clicking on the shortcut causes Notepad to open without delay.

    This issue is not consistently reproducible, but it appears to be independent of the shortcut file itself. Any shortcut with a hotkey exhibits this problem.

    All the shortcuts in question are on the desktop.

    The short answer is "There is a program running on your machine that occasionally stops responding to messages. If you press a shortcut hotkey during those moments, you will experience this problem. Identify the program that stops responding to messages and fix it."

    Okay, that sort of cuts to the chase, but the interesting part is the journey, not the destination.

    First, observe that if you associate a hotkey with a shortcut to say Notepad, and you press the hotkey twice, you do not get two copies of Notepad. The first time you press the hotkey, Notepad launches, but the second time you press the hotkey, focus is put on the existing copy of Notepad. This is one of those things that's so natural you may not even realize that it's happening.

    When you press the hotkey assigned to a shortcut, Explorer receives the hotkey and needs to decide what to do about it. Before it can launch the shortcut, it needs to see if the shortcut target already has a window open, in which case it should just switch to that window.

    Finding out whether a window has a hotkey is done by sending the window the WM_GETHOTKEY message. When you press a hotkey that is assigned to a shortcut, Explorer goes to all the windows already on the screen and asks each one, "Hey, what's your hotkey?" If any window says, "My hotkey is Ctrl+Alt+X," then Explorer says, "Oh, sorry to step on your toes. The user pressed your hotkey, so here, go ahead and take focus."

    If no window cops to having Ctrl+Alt+X as its hotkey, Explorer says, "Okay, well, then I guess I have to make one." It launches Notepad and tells it, "Oh, and your hotkey is Ctrl+Alt+X."

    If there is a window that is not responding to messages, then when Explorer asks it, "Hey, what's your hotkey?", the window just sits there and doesn't answer. After about three seconds, Explorer gives up. "Yeesh, I was just asking a question. Don't have to give me the silent treatment."

    And that petulant window is the source of the 3-second delay. It takes Explorer 3 seconds before it finally gives up and says, "Forget it. Even if that was somebody's hotkey, they're being a jerk, so I'm just going to pretend they didn't have a hotkey. Let me open a new window instead and just deal with the hotkey conflict."

  • The Old New Thing

    How to view the stack of threads that were terminated as part of process teardown from the kernel debugger


    As we saw some time ago, process shutdown is a multi-phase affair. After you call ExitProcess, all the threads are forcibly terminated. After that's done, each DLL is sent a DLL_PROCESS_DETACH notification. You may be debugging a problem with DLL_PROCESS_DETACH handling that suggests that some of those threads were not cleaned up properly. For example, you might assert that a reference count is zero, and you find during process shutdown that this assertion sometimes fires. Maybe you terminated a thread before it got a chance to release its reference? How can you test this theory if the thread is already gone?

    It so happens that when all the threads are terminated during the early phase of process shutdown, the kernel is a bit lazy and doesn't free their stacks. It figures, hey, the entire process is going away soon, so the stack memory is going to be cleaned up as part of process termination. (It's sort of the kernel equivalent of not bothering to sweep the floor of a building that's about to be demolished.) You can use this to your advantage by grovelling the stacks that were left behind.

    Hey, this is why you get called in to debug the hard stuff, right?

    Before continuing, I need to emphasize that this information is for debugging purposes only. The structures and offsets are all implementation details which can change from release to release.

    The first step is to identify where all the stacks are. The direct approach is difficult because the stacks can be all different sizes, so it's not easy to pick them out of a line-up. But one thing does come in a consistent size: The TEB.

    From the kernel debugger, use the !process command to dump the process you are interested in, and from the header information, extract the VadRoot.

    1: kd> !process -1
    PROCESS 8731bd40  SessionId: 1  Cid: 0748    Peb: 7ffda000  ParentCid: 0620
        DirBase: 4247b000  ObjectTable: 96f66de0  HandleCount: 104.
        Image: oopsie.exe
        VadRoot 893de570 Vads 124 Clone 0 Private 518. Modified 643. Locked 0.
        DeviceMap 995628c0

    Dump this VAD root with the !vad command, and pay attention only to the entries which say 1 Private READWRITE.

    1: kd> !vad 893de570
    VAD     level      start      end    commit
    ... ignore everything except "1 Private READWRITE" ...
    8730a5f0 ( 6)         50       50         1 Private      READWRITE
    9ab0cb40 ( 5)         60       7f         1 Private      READWRITE
    893978b0 ( 6)         80       9f         1 Private      READWRITE
    87302d30 ( 5)        110      110         1 Private      READWRITE
    889693f8 ( 6)        120      121         1 Private      READWRITE
    872f3fb8 ( 6)        170      170         1 Private      READWRITE
    87089a80 ( 6)        1a0      1a0         1 Private      READWRITE
    8cbf1cb0 ( 5)        1c0      1df         1 Private      READWRITE
    88c079d0 ( 6)        1e0      1e0         1 Private      READWRITE
    9abc33e0 ( 6)        410      48f         1 Private      READWRITE
    873173b0 ( 7)        970      970         1 Private      READWRITE
    8ca1c158 ( 7)      7ffd5    7ffd5         1 Private      READWRITE
    88c02a78 ( 6)      7ffd6    7ffd6         1 Private      READWRITE
    872f9298 ( 5)      7ffd7    7ffd7         1 Private      READWRITE
    8750d210 ( 7)      7ffd8    7ffd8         1 Private      READWRITE
    87075ce8 ( 6)      7ffda    7ffda         1 Private      READWRITE
    87215da0 ( 4)      7ffdc    7ffdc         1 Private      READWRITE
    872f2200 ( 6)      7ffdd    7ffdd         1 Private      READWRITE
    8730a670 ( 5)      7ffdf    7ffdf         1 Private      READWRITE

    (If you are debugging from user mode, then you can use !vadump but the output format is different.)

    Each of these is a candidate TEB. In practice, TEBs tend to be allocated at the high end of memory, so the ones with a low start value are probably red herrings. Therefore, you should investigate these candidates in reverse order.

    For each candidate, take the start address and append three zeroes. (Each page on x86 is 4KB, which conveniently maps to 1000 in hex.) Dump the first seven pointers of the TEB with the dp xxxxx000 L7 command.

    1: kd> dp 7ffdf000 L7
    7ffdf000  0016fbb0 00170000 0016b000 00000000
    7ffdf010  00001e00 00000000 7ffdf000 ← hit

    If the TEB is valid, then the seventh pointer points back to the start of the TEB. In a valid TEB, the second and third values are the stack limits; in this case, the candidate stack lives between 0016b000 and 00170000. (As a double-check, you can verify that the upper limit of the stack, 00170000 in this case, matches up with the end of a VAD allocation in the !vad output above.)

    Now that you know where the stack is, you can dps it and look for EBP frames. (I usually start about two to four pages below the upper limit of the stack.) Test out each candidate EBP frame with the k= command until you find one that seems to be solid. Record this candidate stack trace in a text file for further study.

    Repeat for each candidate TEB, and you will eventually reconstruct what each thread in the process was doing at the moment it was terminated. If you're really lucky, you might even see the code that incremented the reference count but was terminated before it could release it.

    The above discussion also applies to debugging 64-bit processes. However, instead of looking for 1 Private READWRITE pages, you want to look for 2 Private READWRITE pages. As an additional wrinkle, if you are debugging ia64, then converting a page frame to a linear address is sadly not as simple as appending three zeroes. Pages on ia64 are 8KB, not 4KB, so you need to shift the value left by 25 bits: Add three zeroes and then multiply by two.

    And finally, if you are debugging a 32-bit process on x64, then you want to look for 3 Private READWRITE pages, but add 2 before appending the three zeroes. That's because the TEB for a 32-bit process on x64 is really two TEBs glued together: A 64-bit TEB followed by a 32-bit TEB.

    Note: I did not come up with this debugging technique on my own. I learned it from an even greater debugging genius.

    Next time, we'll look at debugging this issue from a user-mode debugger.

    Trivia: The informal term for these terminated-but-not-yet-completely-destroyed threads is ghost threads. The term was coined by the Exchange support team, because they often have to study server failures that require them to do this type of investigation, and they needed a cute name for it.

  • The Old New Thing

    How does the MultiByteToWideChar function treat invalid characters?


    The MB_ERR_INVALID_CHARS flag controls how the Multi­Byte­To­Wide­Char function treats invalid characters. Some people claim that the following sentences in the documentation are contradictory:

    • "Starting with Windows Vista, the function does not drop illegal code points if the application does not set the flag."
    • "Windows XP: If this flag is not set, the function silently drops illegal code points."
    • "The function fails if MB_ERR_INVALID_CHARS is set and an invalid character is encountered in the source string."

    Actually, the three sentences are talking about different cases. The first two talk about what happens if you omit the flag; the third talks about what happens if you include the flag.

    Since people seem to like tables, here's a description of the MB_ERR_INVALID_CHARS flag in tabular form:

    MB_ERR_INVALID_CHARS set? Operating system Treatment of invalid character
    Yes Any Function fails
    No XP and earlier Character is dropped
    Vista and later Character is not dropped

    Here's a sample program that illustrates the possibilities:

    #include <windows.h>
    #include <ole2.h>
    #include <windowsx.h>
    #include <commctrl.h>
    #include <strsafe.h>
    #include <uxtheme.h>
    void MB2WCTest(DWORD flags)
     WCHAR szOut[256];
     int cch = MultiByteToWideChar(CP_UTF8, flags,
                                   "\xC0\x41\x42", 3, szOut, 256);
     printf("Called with flags %d\n", flags);
     printf("Return value is %d\n", cch);
     for (int i = 0; i < cch; i++) {
      printf("value[%d] = %d\n", i, szOut[i]);
    int __cdecl main(int argc, char **argv)
     return 0;

    If you run this on Windows XP, you get

    Called with flags 0
    Return value is 2
    Value[0] = 65
    Value[1] = 66
    Called with flags 8
    Return value is 0

    This demonstrates that passing the MB_ERR_INVALID_CHARS flag causes the function to fail, and omitting it causes the invalid character \xC0 to be dropped.

    If you run this on Windows Vista, you get

    Called with flags 0
    Return value is 3
    Value[0] = 65533
    Value[1] = 65
    Value[2] = 66
    Called with flags 8
    Return value is 0

    This demonstrates again that passing the MB_ERR_INVALID_CHARS flag causes the function to fail, but this time, if you omit the flag, the invalid character \xC0 is converted to U+FFFD, which is REPLACEMENT CHARACTER. (Note that it does not appear to be documented precisely what happens to invalid characters, aside from the fact that they are not dropped. Perhaps code pages other than CP_UTF8 convert them to some other default character.)

Page 1 of 3 (25 items) 123