• The Old New Thing

    Standard handles are really meant for single-threaded programs


    When I discussed the conventions for managing standard handles, Sven2 noted that I implied that you need to call Set­Std­Handle with a new handle if you close the current handle and asked "Wouldn't it make more sense to call it the other way round? I.e., first set the new handle, then close the old one? It would ensure that any other thread that runs in parallel won't get an invalid handle."

    Yes, that would make more sense, but only by a little bit. If you have one thread changing a standard handle at the same time another thread tries to use it, you still have the race condition, as Cesar noted, where the thread that gets the standard handle gets the handle closed out from under it. So you still have a race condition. All you did was narrow the window a little bit.

    This is basically a fundamental limitation of the standard handles. They are a shared process-wide resource, and if you're going to be mucking with them from multiple threads, it's your responsibility to apply whatever synchronization you need in order to avoid the problems associated with messing with process-wide resources. (This is similar to the problem with inherited handles and Create­Process.)

    The most common way of ensuring that one thread doesn't change a standard handle while another thread is using it is simple: Never change a standard handle. Consider standard handles a setting provided by the parent process. If the parent process says that standard output should go to a particular place, then send it to that place. Don't try to override the decision and send it somewhere else.

    If you really must change a standard handle, you'd be best off doing so right at the start before you start kicking off background threads. Another model you might try is to treat the initial thread as the "console UI thread" and make that the only thread that can communicate with the standard handles. Background threads can do work, but if they want to write to standard output or read from standard input, they need to ask the main thread to do it. This is probably a good plan anyway, because it avoids messy interleaved output as well as confusing input. (If two threads read from standard input at the same time, it's not clear to the user which thread their input will go to.)

    Personally, I would recommend combining both approaches: (1) Never change a standard handle, and (2) Restrict all usage of standard handles to your main thread to avoid problems with interleaving.

  • The Old New Thing

    Microspeak: All-up


    Here are some citations. Let's see if we can figure out what it means.

    I think a presentation of these results would be a fun boost for the team. Is this something we should handle in a bunch of teams' weekly meetings, or should we do something all up?

    In the first citation, all up appears to mean "with everybody all together."

    We're looking for an all-up view of the various compatibility mitigations we have related to this feature.

    In the second citation, all up could mean "overview" or "detailed summary". Not sure yet. Let's keep looking.

    From the all up performance effort, we've settled on the approach below.

    Okay, this seems to suggest that all up refers to an aggregation of individual items. Let's try again:

    We have a number of channels for disseminating information. I think an all up destination could play a key and proactive role in major announcements such as the one from last week.

    Here, all up appears to mean "consolidated, comprehensive". Let's keep going.

    Document title: XYZ All Up Glossary

    This document is a glossary. Presumably is a glossary of terms you may encounter throughout the entire XYZ project. One last citation, this from a status report:

    • This week: Created Customer All up report.
    • Next week: Update Customer all up report with more customer related information.

    Okay, this didn't actually tell me much about what an all up report is, which is kind of a bummer because I was asked to create an all up report, and I still don't know if what I created is what the person wanted.

    (I ended up creating a report that summarized the status of every team, and called out issues that were noteworthy or reasons for concern. The person who asked for the report didn't complain, so I guess that was close enough to what they wanted that they didn't bother asking for more.)

  • The Old New Thing

    Enumerating cyclical decompositions with Stirling numbers


    This whole enumeration nightmare-miniseries started off with Stirling numbers of the second kind. But what about Stirling numbers of the first kind? Those things ain't gonna enumerate themselves!

    The traditional formulation of the recursion for Stirling numbers of the first kind (unsigned version, since it's hard to enumerate negative numbers) goes like this:

    c(n + 1, k) = n · c(n, k) + c(n, k − 1).

    although it is more convenient from a programming standpoint to rewrite it as

    c(n, k) = (n − 1) · c(n − 1, k) + c(n − 1, k − 1).

    The Wikipedia article explains the combinatorial interpretation, which is what we will use to enumerate all the possibilities.

    • The first term says that we recursively generate the ways of decomposing n − 1 items into k cycles, then insert element n in one of n − 1 ways.
    • The second term says that we recursively generate the ways of decomposing n − 1 items into k − 1 cycles, then add a singleton cycle of just n.
    function Stirling1(n, k, f) {
     if (n == 0 && k == 0) { f([]); return; }
     if (n == 0 || k == 0) { return; }
     // second term
     Stirling1(n-1, k-1, function(s) { f(s.concat([[n]])); });
     // first term
     Stirling1(n-1, k, function(s) {
      for (var i = 0; i < s.length; i++) {
       for (var j = s[i].length; j > 0; j--) {
        f(s.map(function(e, index) {
         return i == index ? e.slice(0, j).concat(n, e.slice(j)) : e;
    Stirling1(5, 3, function(s) { console.log(JSON.stringify(s)); });

    The inner loop could just as easily gone

       for (var j = 0; j < s[i].length; j++) {

    but I changed the loop control for style points. (It makes the output prettier.)

  • The Old New Thing

    Why is 0x00400000 the default base address for an executable?


    The default base address for a DLL is 0x10000000, but the default base address for an EXE is 0x00400000. Why that particular value for EXEs? What's so special about 4 megabytes

    It has to do with the amount of address space mapped by a single page directory entry on an x86 and a design decision made in 1987.

    The only technical requirement for the base address of an EXE is that it be a multiple of 64KB. But some choices for base address are better than others.

    The goal in choosing a base address is to minimize the likelihood that modules will have to be relocated. This means not colliding with things already in the address space (which will force you to relocate) as well as not colliding with things that may arrive in the address space later (forcing them to relocate). For an executable, the not colliding with things that may arrive later part means avoiding the region of the address space that tends to fill with DLLs. Since the operating system itself puts DLLs at high addresses and the default base address for non-operating system DLLs is 0x10000000, this means that the base address for the executable should be somewhere below 0x10000000, and the lower you go, the more room you have before you start colliding with DLLs. But how low can you go?

    The first part means that you also want to avoid the things that are already there. Windows NT didn't have a lot of stuff at low addresses. The only thing that was already there was a PAGE_NOACCESS page mapped at zero in order to catch null pointer accesses. Therefore, on Windows NT, you could base your executable at 0x00010000, and many applications did just that.

    But on Windows 95, there was a lot of stuff already there. The Windows 95 virtual machine manager permanently maps the first 64KB of physical memory to the first 64KB of virtual memory in order to avoid a CPU erratum. (Windows 95 had to work around a lot of CPU bugs and firmware bugs.) Furthermore, the entire first megabyte of virtual address space is mapped to the logical address space of the active virtual machine. (Nitpickers: actually a little more than a megabyte.) This mapping behavior is required by the x86 processor's virtual-8086 mode.

    Windows 95, like its predecessor Windows 3.1, runs Windows in a special virtual machine (known as the System VM), and for compatibility it still routes all sorts of things through 16-bit code just to make sure the decoy quacks the right way. Therefore, even when the CPU is running a Windows application (as opposed to an MS-DOS-based application), it still keeps the virtual machine mapping active so it doesn't have to do page remapping (and the expensive TLB flush that comes with it) every time it needs to go to the MS-DOS compatibility layer.

    Okay, so the first megabyte of address space is already off the table. What about the other three megabytes?

    Now we come back to that little hint at the top of the article.

    In order to make context switching fast, the Windows 3.1 virtual machine manager "rounds up" the per-VM context to 4MB. It does this so that a memory context switch can be performed by simply updating a single 32-bit value in the page directory. (Nitpickers: You also have to mark instance data pages, but that's just flipping a dozen or so bits.) This rounding causes us to lose three megabytes of address space, but given that there was four gigabytes of address space, a loss of less than one tenth of one percent was deemed a fair trade-off for the significant performance improvement. (Especially since no applications at the time came anywhere near beginning to scratch the surface of this limit. Your entire computer had only 2MB of RAM in the first place!)

    This memory map was carried forward into Windows 95, with some tweaks to handle separate address spaces for 32-bit Windows applications. Therefore, the lowest address an executable could be loaded on Windows 95 was at 4MB, which is 0x00400000.

    Geek trivia: To prevent Win32 applications from accessing the MS-DOS compatibility area, the flat data selector was actually an expand-down selector which stopped at the 4MB boundary. (Similarly, a null pointer in a 16-bit Windows application would result in an access violation because the null selector is invalid. It would not have accessed the interrupt vector table.)

    The linker chooses a default base address for executables of 0x0400000 so that the resulting binary can load without relocation on both Windows NT and Windows 95. Nobody really cares much about targeting Windows 95 any more, so in principle, the linker folks could choose a different default base address now. But there's no real incentive for doing it aside from making diagrams look prettier, especially since ASLR makes the whole issue moot anyway. And besides, if they changed it, then people would be asking, "How come some executables have a base address of 0x04000000 and some executables have a base address of 0x00010000?"

    TL;DR: To make context switching fast.

  • The Old New Thing



    The exception code EXCEPTION_INT_DIVIDE_BY_ZERO (and its doppelgänger STATUS_INTEGER_DIVIDE_BY_ZERO) is raised, naturally enough, when the denominator of an integer division is zero.

    The x86 and x64 processors also raise this exception when you divide INT_MIN by -1, or more generally, when the result of a division does not fit in the destination. The division instructions for those processors take a 2N-bit dividend and an N-bit divisor, and they produce N-bit quotient and remainder. Values of N can be 8, 16, and 32; the 64-bit processors also support 64. And the division can be signed or unsigned. Therefore, you can get this exception if you try to divide, say, 2³² by 1, using a 64-bit dividend and 32-bit divisor. The quotient is 2³², which does not fit in a 32-bit divisor.

    The Windows 95 kernel does not attempt to distinguish between division overflow and division by zero. It just converts the processor exception to EXCEPTION_INT_DIVIDE_BY_ZERO and calls it a day.

    The Windows NT kernel realizes that the underlying processor exception is ambiguous and tries to figure out why the division operation failed. If the divisor is zero, then the exception is reported as EXCEPTION_INT_DIVIDE_BY_ZERO. If the divisor is nonzero, then the exception is reported as EXCEPTION_INT_OVERFLOW.

    Another place that EXCEPTION_INT_OVERFLOW can arise from a processor exception is if an application issues the INTO instruction and the overflow flag is set.

    Puzzle: The DIV and IDIV instructions support a divisor in memory. What happens if the memory becomes inaccessible after the processor raises the exception but before the kernel can read the value in order to check whether it is zero? What other things could go wrong?

  • The Old New Thing

    You can name your car, and you can name your kernel objects, but there is a qualitative difference between the two


    A customer reported that the Wait­For­Single­Object appeared to be unreliable.

    We have two threads, one that waits on an event and the other that signals the event. But we found that sometimes, signaling the event does not wake up the waiting thread. We have to signal it twice. What are the conditions under which Wait­For­Single­Object will ignore a signal?

    // cleanup and error checking elided for expository purposes
    void Thread1()
      // Create an auto-reset event, initially unsignaled
      HANDLE eventHandle = CreateEvent(NULL, FALSE, FALSE, TEXT("MyEvent"));
      // Kick off the background thread and give it the handle
      CreateThread(..., Thread2, eventHandle, ...);
      // Wait for the event to be signaled
      WaitForSingleObject(eventHandle, INFINITE);
    DWORD CALLBACK Thread2(void *eventHandle)
     ResetEvent(eventHandle); // start with a clean slate
     // All the calls to SetEvent succeed.
     SetEvent(eventHandle); // this does not always wake up Thread1
     SetEvent(eventHandle); // need to add this line
     return 0;

    Remember, you generally shouldn't start with the conspiracy theory. The problem is most likely close to home.

    People offered a variety of theories as to what may be wrong. One possibility is that some other code in the process is calling Reset­Event on the event handle. Another is that some other code in the process has a bug where it is calling Reset­Event on the wrong event handle.

    I asked about the name.

    I have a friend who names her car. Whenever she gets a new car, she agonizes over what to call it. She'll drive it for a few days to see what its personality is and eventually choose a name that suits the vehicle. And thereafter, whenever she refers to her car, she uses the name. (She also assigns the car a gender.)

    If you like naming your car, then that's great. But there's a difference between naming your car and naming your kernel objects. When you give your car a name, that name is just for your private use. On the other hand, if you give your kernel object a name, other people can use that name to access your object. And once they have access to your object, they can do funky things to it, like reset it.

    Imagine if you decided to name your car Clara, and any time somebody shouted, "Clara, where are you?" your car horn honked. I'm assuming your car has voice recognition software. Also that your car has the personality of a puppy. Work with me here.

    Even scarier: Any time somebody shouted, "Clara, open the trunk," your car trunk unlocked.

    That's what happens when you name your kernel objects. Anybody who knows the name (and has appropriate access) can open the object and start doing things to it. Presumably that's why you named your kernel object in the first place: You want this to happen. You gave your object a name specifically to allow other people to come in and access the same object.

    In the above example, I saw that the event had a very generic-sounding name, My­Event. That sounds like the name that some other similarly uncreative application developer might have chosen.

    And indeed, that was the reason. There was another application which was creating an event that coincidentally has the same name, so instead of creating a new object, the kernel returned a handle to the existing one. The other application called Wait­For­Single­Object on the event, and so when the customer's program called Set­Event, it woke the other application instead. So this bug has a double-whammy: Not only does it cause your program to miss a signal, it causes the other program to receive a signal when it wasn't expecting one. Two bugs for the price of one.

    Note that no matter how clever you are at choosing a name for your event, you will always have this problem, because even if you called it Super­Secret­Never­Gonna­Find­It­75, there's a program out there that knows the secret name: Namely your own program! If you run two copies of your program, they will both be manipulating the same Super­Secret­Never­Gonna­Find­It­75, and then you're back where you started. When the first copy of the program calls Set­Event, it may wake up the second copy.

    (This is the same principle behind the conclusion that a single-instance program is its own denial of service.)

    Kernel objects should not be named unless you intend them to be shared, because once you name them, you open yourself to issues like this. If you name a kernel object, it must be because you want another process to access it, not because you think giving it a name is kind of cute.

    I suspect a lot of people give their kernel objects names not because they intend them to be shared, but because they see that the Create­Event function has a lpName parameter, and they think, "Well, I guess giving it a name would be nice. Maybe I can use it for debugging purposes or something," not realizing that giving it name actually introduced a bug. Another possibility is that they see that there is a lpName parameter and think, "Gosh, I must give this event a name."

    Kernel object names are optional. Don't give them a name unless you intend them to be shared.

  • The Old New Thing

    Why does misdetected Unicode text tend to show up as Chinese characters?


    If you take an ASCII string and cast it to Unicode,¹ the results are usually nonsense Chinese. Why does ASCII→Unicode mojibake result in Chinese? Why not Hebrew or French?

    The Latin alphabet in ASCII lives in the range 0x41 through 0x7A. If this gets misinterpreted as UTF-16LE, the resulting characters are of the form U+XXYY where XX and YY are in the range 0x41 through 0x7A. Generously speaking, this means that the results are in the range U+4141 through U+7A7A. This overlaps the following Unicode character ranges:

    • CJK Unified Ideographs Extension A (U+3400 through U+4DBF)
    • Yijing Hexagram Symbols (U+4DC0 through U+4DFF)
    • CJK Unified Ideographs (U+4E00 through U+9FFF)

    But you never see the Yijing hexagram symbols because that would require YY to be in the range 0xC0 through 0xFF, which is not valid ASCII. That leaves only CJK Unified Ideographs of one sort of another.

    That's why ASCII misinterpreted as Unicode tends to result in nonsense Chinese.

    The CJK Unified Ideographs are by far the largest single block of Unicode characters in the BMP, so just by purely probabilistic arguments, a random character in BMP is most likely to be Chinese. If you look at a graphic representation of what languages occupy what parts of the BMP, you'll see that it's a sea of pink (CJK) and red (East Asian), occasionally punctuated by other scripts.

    It just so happens that the placement of the CJK ideographs in the BMP effectively guarantees it.

    Now, ASCII text is not all just Latin letters. There are space and punctuation marks, too, so you may see an occasional character from another Unicode range. But most of the time, it's a Latin letter, which means that most of the time, your mojibake results in Chinese.

    ¹ Remember, in the context of Windows, "Unicode" is generally taken to be shorthand for UTF-16LE.

  • The Old New Thing

    Simulating media controller buttons like Play and Pause


    Today's Little Program simulates pressing the Play/Pause button on your fancy keyboard. This might be useful if you want to write a program that converts some other input (say, gesture detection) into media controller events.

    One way of doing this is to take advantage of the Def­Window­Proc function, since the default behavior for the WM_APP­COMMAND message is to pass the message up the parent chain, and if it still can't find a handler, it hands the message to the shell for global processing.

    Remember, don't fumble around. If you want to send a message to a window, then send a message to a window. Don't broadcast a message to every window in the system (resulting in mass chaos).

    Take the scratch program and make this simple addition:

    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
     if (ch == ' ') {
      SendMessage(hwnd, WM_APPCOMMAND, (WPARAM)hwnd,
     HANDLE_MSG(hwnd, WM_CHAR, OnChar);

    When you press the space bar in the scratch application, it pretends that you instead pressed the Play/Pause button on your fancy keyboard with no shift modifiers.

    The scratch program doesn't do anything with the key, so it ends up falling through to Def­Window­Proc, which eventually hands the key to the shell and any other registered shell hooks. If you have a program like Windows Media Player which registers for shell events, it will see the notification and pause/resume playback.

    Of course, this assumes that the program you want to talk to listens globally for the keypress. If you want to make the current foreground program respond as if you had pressed the Play/Pause, you can just inject the keypress.

    int __cdecl main(int, char**)
     INPUT inputs[2] = {};
     inputs[0].type = INPUT_KEYBOARD;
     inputs[0].ki.wVk = VK_MEDIA_PLAY_PAUSE;
     inputs[0].ki.wScan = 0x22;
     inputs[0].ki.dwFlags = KEYEVENTF_EXTENDEDKEY;
     inputs[1].type = INPUT_KEYBOARD;
     inputs[1].ki.wVk = VK_MEDIA_PLAY_PAUSE;
     inputs[1].ki.wScan = 0x22;
     SendInput(2, inputs, sizeof(INPUT));
     return 0;

    Note, however, that since we didn't do anything about the state of modifier keys, if the user happens to have the shift key down at the time you injected the message, the application is going to be told, "Hey, do your play/pause thing, and if you change behavior when the shift key is down, here's your chance."

    But what did you expect from a Little Program?

  • The Old New Thing

    Marshaling won't get in your way if it isn't needed


    I left an exercise at the end of last week's article: "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?"

    COM subscribes to the principle that if no marshaling is needed, then an interface pointer points directly at the object with no COM code in between.

    If the current thread is running in a single-threaded apartment, and it creates a COM object with thread affinity (also known as an "apartment-model object"; yes, the name is confusing), then the thread gets a pointer directly to the object. When you call p->Query­Interface(), you are calling directly into the Query­Interface implementation provided by the object.

    This principle has its pluses and minuses.

    People concerned with high performance pretty much insist that COM stay out of the way and get involved only when necessary. They consider it a plus that if there is no marshaling involved, then all pointers are direct pointers, and calls go straight to the target object without a single instruction of COM-provided code getting in the way.

    One downside of this is that every object is responsible for its own compatibility hacks. If there are bugs in the implementation of IUnknown::Query­Interface, then each object is on its own for working around them. There is no opportunity for the system to enforce correct behavior because there is no system code running. Each object becomes responsible for its own enforcement.

    Therefore, the answer to "Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?" is "The marshaler is involved only sometimes."

    If the object being called belongs to the same apartment as the thread that is calling into it, then there is no marshaler, and the call goes directly to the object. Since there is no marshaler, the marshaler isn't around to enforce marshaling rules. It's up to the object to enforce marshaling rules, and if the object chooses not to, then you get into the cases where a method call works when the object is unmarshaled and fails when the object is marshaled.

  • The Old New Thing

    If a process crashes while holding a mutex, why is its ownership magically transferred to another process?


    A customer was observing strange mutex ownership behavior. They had two processes that used a mutex to coordinate access to some shared resource. When the first process crashed while owning the mutex, they found that the second process somehow magically gained ownership of that mutex. Specifically, when the first process crashed, the second process could take the mutex, but when it released the mutex, the mutex was still not released. They discovered that in order to release the mutex, the second process had to call Release­Mutex twice. It's as if the claim on the mutex from the crashed process was secretly transferred to the second process.

    My psychic powers told me that that's not what was happening. I guessed that their code went something like this:

    // code in italics is wrong
    bool TryToTakeTheMutex()
     return WaitForSingleObject(TheMutex, TimeOut) == WAIT_OBJECT_0;

    The code failed to understand the consequences of WAIT_ABANDONED.

    In the case where the mutex was held by the first process when it crashed, the second process will attempt to claim the mutex, and it will succeed, and the return code from Wait­For­Single­Object will be WAIT_ABANDONED. Their code treated that value as a failure code rather than a modified success code.

    The second program therefore claimed the mutex without realizing it. That is what led the customer to believe that ownership was being magically transferred to the second program. It wasn't magic. The second program misinterpreted the return code.

    The second program saw that Try­To­Take­The­Mutex "failed", and it went off and did something else for a while. Then the next time it called Try­To­Take­The­Mutex, the function succeeded: It was a successful recursive acquisition, but the program thought it was the initial acquisition.

    The customer didn't reply back, so we never found out whether that was the actual problem, but I suspect it was.

Page 7 of 439 (4,383 items) «56789»