January, 2011

  • The Old New Thing

    Don't just stand around saying somebody should do something: Be someone


    On one of the frivolous mailing lists in the Windows project, somebody spotted some behavior that seemed pretty bad and filed a bug on it. The project was winding down, with fewer and fewer bugs being accepted by the release management team each day, so it was not entirely surprising that this particular bug was also rejected. News of this smackdown threw the mailing list into an fit of apoplexy.

    Don't they realize how bad this bug is? Somebody should reactivate this bug.
    Yeah, this is really serious. I don't think they understood the scope of the problem. Somebody should mark the bug for reconsideration.
    Definitely. Someone should reactivate the bug and reassert the issue.

    After about a half dozen of messages like this, I couldn't take it any longer.

    I can't believe I'm reading this.

    I decided to Be Someone. I reactivated the bug and included a justification.

    Don't just stand around saying somebody should do something. Be that someone.

    (And even though it's not relevant to the story, the bug was ultimately accepted by the release management team the second time around because all the discussion of the bug gave the bug representative more information with which to to argue why the bug should be fixed.)

  • The Old New Thing

    Was showing the column header in all Explorer views a rogue feature?


    User :( asks whether the Explorer feature that shows the column headers in all views was a rogue feature or a planned one.

    If it was a rogue feature, it was a horribly badly hidden one.

    One of the important characteristics of the rogue feature is that you not be able to tell that the feature is there unless you go looking for it. If the feature is right there on the screen as soon as you open an Explorer window, odds are that somebody is going to notice and say something about it. (For example, the designer who is responsible for Explorer is probably going to notice that every screenshot of Explorer doesn't match the spec.)

    That's why rogue features typically take the form of a hotkey or holding a modifier key in conjunction with another operation.

    Writing up this characterization of rogue features reminds me that I myself am responsible for a rogue feature of Windows 95. If you go to an MS-DOS box, select Edit, then Mark, then select a section of the window for copying, if you hold the Ctrl key while dragging the mouse, the shape of the selection changes from a box to, um, I'm not sure the shape has a name. But it includes all the text between the start point and the end point, as if the contents of the window had come from an edit control. Something like this:


    Since this was a rogue feature, it was never tested, and I suspect that it didn't work on Hebrew or Arabic systems.

    I must've chickened out, or maybe my rogue feature was found out, because the code for streamed selections was ifdef'd out before Windows 95 shipped. So at least I can still honestly say that I never shipped a rogue feature.

  • The Old New Thing

    What's the difference between an asynchronous PIPE_WAIT pipe and a PIPE_NOWAIT pipe?


    When you operate on named pipes, you have a choice of opening them in PIPE_WAIT mode or PIPE_NOWAIT mode. When you read from a PIPE_WAIT pipe, the read blocks until data becomes available in the pipe. When you read from a PIPE_NOWAIT pipe, then the read completes immediately even if there is no data in the pipe. But how is this different from a PIPE_WAIT pipe opened in asynchronous mode by passing FILE_FLAG_OVERLAPPED?

    The difference is in when the I/O is deemed to have completed.

    When you issue an overlapped read against a PIPE_WAIT pipe, the call to Read­File returns immediately, but the completion actions do not occur until there is data available in the pipe. (Completion actions are things like setting the event, running the completion routine, or queueing a completion to an I/O completion port.) On the other hand, when you issue a read against a PIPE_NOWAIT pipe, the call to Read­File returns immediately with completion—if the pipe is empty, the read completes with a read of zero bytes and the error ERROR_NO_DATA.

    Here's a timeline, for people who prefer tables.

    Event Asynchronous PIPE_WAIT PIPE_NOWAIT
    pipe initially empty
    ReadFile Returns immediately with ERROR_IO_PENDING Returns immediately with ERROR_NO_DATA
    I/O completes with 0 bytes
    time passes
    Data available I/O completes with n > 0 bytes

    If you use the PIPE_NOWAIT flag, then the only way to know whether there is data is to poll for it. There is no way to be notified when data becomes available.

    As the documentation notes, PIPE_NOWAIT remains solely for compatibility with LAN Manager 2.0. Since the only way to use pipes created as PIPE_NOWAIT is to poll them, this is obviously not a recommended model for a multitasking operating system.

  • The Old New Thing

    The MARGINS parameter to the DwmExtendFrameIntoClientArea function controls how far the frame extends into the client area


    A customer wrote a program that calls Dwm­Extend­Frame­Into­Client­Area to extend the frame over the entire client area, but then discovered that this made programming difficult:

    I have a window which I want to have a glassy border but an opaque body. I made my entire window transparent by calling Dwm­Extend­Frame­Into­Client­Area, and I understand that this means that I am now responsible for managing the alpha channel when drawing so that the body of my window remains opaque while the glassy border is transparent. Since most GDI functions are not alpha-aware, this management is frustrating. Is there a better way? In pictures, I only want the red portion of the diagram below to be on glass; the inside yellow part should be opaque like normal. Is there an API that can do this?

    This customer's excitement about the glass frame is like somebody who buys a pallet of tangerine juice even though he only wanted two glasses. And now he has questions about how to store the rest of the tangerine juice he didn't want.

    This customer, it appears, passed −1 as the MARGINS to Dwm­Extend­Frame­Into­Client­Area which means "Bring it on, baby! Give me all tangerine all the time everywhere!" If you only want the glass to extend into part of your client area, then say so. Set the MARGINS to the thickness of the glass border (the thickness of the red portion of the above diagram).

  • The Old New Thing

    My, what strange NOPs you have!


    While cleaning up my office, I ran across some old documents which reminded me that there are a lot of weird NOP instructions in Windows 95.

    Certain early versions of the 80386 processor (manufactured prior to 1987) are known as B1 stepping chips. These early versions of the 80386 had some obscure bugs that affected Windows. For example, if the instruction following a string operation (such as movs) uses opposite-sized addresses from that in the string instruction (for example, if you performed a movs es:[edi], ds:[esi] followed by a mov ax, [bx]) or if the following instruction accessed an opposite-sized stack (for example, if you performed a movs es:[edi], ds:[esi] on a 16-bit stack, and the next instruction was a push), then the movs instruction would not operate correctly. There were quite a few of these tiny little "if all the stars line up exactly right" chip bugs.

    Most of the chip bugs only affected mixed 32-bit and 16-bit operations, so if you were running pure 16-bit code or pure 32-bit code, you were unlikely to encounter any of them. And since Windows 3.1 did very little mixed-bitness programming (user-mode code was all-16-bit and kernel-mode code was all-32-bit), these defects didn't really affect Windows 3.1.

    Windows 95, on the other hand, contained a lot of mixed-bitness code since it was the transitional operating system that brought people using Windows out of the 16-bit world into the 32-bit world. As a result, code sequences that tripped over these little chip bugs turned up not infrequently.

    An executive decision had to be made whether to continue supporting these old chips or whether to abandon them. A preliminary market analysis of potential customers showed that there were enough computers running old 80386 chips to be worth making the extra effort to support them.

    Everybody who wrote assembly language code was alerted to the various code sequences that would cause problems on a B1 stepping, so that they wouldn't generate those code sequences themselves, and so they could be on the lookout for existing code that might have problems. To supplement the manual scan, I wrote a program that studied all the Windows 95 binaries trying to find these troublesome code sequences. When it brought one to my attention, I studied the offending code, and if I agreed with the program's assessment, I notified the developer who was responsible for the component in question.

    In nearly all cases, the troublesome code sequences could be fixed by judicious insertion of NOP statements. If the problem was caused by "instruction of type X followed by instruction of type Y", then you can just insert a NOP between the two instructions to "break up the party" and sidestep the problem. Sometimes, the standard NOP would end up classified as an instruction of type Y, so you had to insert a special kind of NOP, one that was not of type Y.

    For example, here's one code sequence from a function which does color format conversion:

            push    si          ; borrow si temporarily
            ; build second 4 pixels
            movzx   si, bl
            mov     ax, redTable[si]
            movzx   si, cl
            or      ax, blueTable[si]
            movzx   si, dl
            or      ax, greenTable[si]
            shl     eax, 16     ; move pixels to high word
            ; build first 4 pixels
            movzx   si, bh
            mov     ax, redTable[si]
            movzx   si, ch
            or      ax, blueTable[si]
            movzx   si, dh
            or      ax, greenTable[si]
            pop     si
            stosd   es:[edi]    ; store 8 pixels
            db      67h, 90h    ; 32-bit NOP fixes stos (B1 stepping)
            dec     wXE

    Note that we couldn't use just any old NOP; we had to use a NOP with a 32-bit address override prefix. That's right, this isn't just a regular NOP; this is a 32-bit NOP.

    From a B1 stepping-readiness standpoint, the folks who wrote in C had a little of the good news/bad news thing going. The good news is that the compiler did the code generation and you didn't need to worry about it. The bad news is that you also were dependent on the compiler writers to have taught their code generator how to avoid these B1 stepping pitfalls, and some of them were quite subtle. (For example, there was one bug that manifested itself in incorrect instruction decoding if a conditional branch instruction had just the right sequence of taken/not-taken history, and the branch instruction was followed immediately by a selector load, and one of the first two instructions at the destination of the branch was itself a jump, call, or return. The easy workaround: Insert a NOP between the branch and the selector load.)

    On the other hand, some quirks of the B1 stepping were easy to sidestep. For example, the B1 stepping did not support virtual memory in the first 64KB of memory. Fine, don't use virtual memory there. If virtual memory was enabled, if a certain race condition was encountered inside the hardware prefetch, and if you executed a floating point coprocessor instruction that accessed memory at an address in the range 0x800000F8 through 0x800000FF, then the CPU would end up reading from addresses 0x000000F8 through 0x0000000FF instead. This one was easy to work around: Never allocate valid memory at 0x80000xxx. Another reason for the no man's land in the address space near the 2GB boundary.

    I happened to have an old computer with a B1 stepping in my office. It ran slowly, but it did run. I think the test team "re-appropriated" the computer for their labs so they could verify that Windows 95 still ran correctly on a computer with a B1 stepping CPU.

    Late in the product cycle (after Final Beta), upper management reversed their earlier decision and decide not to support the B1 chip after all. Maybe the testers were finding too many bugs where other subtle B1 stepping bugs were being triggered. Maybe the cost of having to keep an eye on all the source code (and training/retraining all the developers to be aware of B1 issues) exceeded the benefit of supporting a shrinking customer base. For whatever reason, B1 stepping support was pulled, and customers with one of these older chips got an error message when they tried to install Windows 95. And just to make it easier for the product support people to recognize this failure, the error code for the error message was Error B1.

  • The Old New Thing

    The message text limit for the Marquee screen saver is 255, even if you bypass the dialog box that prevents you from entering more than 255 characters


    If you find an old Windows XP machine and fire up the configuration dialog for the Marquee screen saver, you'll see that the text field for entering the message won't let you type more than 255 characters. That's because the Marquee screen saver uses a 255-character buffer to hold the message, and the dialog box figure there's no point in letting you type in a message longer than the screen saver can display.

    A customer decided to bypass the configuration dialog and change the text in the screen saver by editing the settings directly, and then complained that the Marquee screen saver truncated the message at 255 characters.

    Well, yeah, because the limit is 255 characters. That's what the dialog box was trying to tell you. If you bypass the dialog box and whack the setting directly, that doesn't change the Marquee screen saver. Its limit is still 255 characters.

    It's like attaching a gizmo to the gas pump fuel nozzle to disable the auto-stop feature because you want to put 20 gallons of gas into your 15-gallon tank. Disabling the auto-stop will not make the gas tank any bigger. All it does it make it easier to accidentally overflow your gas tank's buffer.

  • The Old New Thing

    Why does pasting a string containing an illegal filename character into a rename edit box delete the characters from the clipboard, too?


    Ane asks why, if you have a string with an illegal filename character on the clipboard, and you paste that string into a rename edit box, do the illegal characters get deleted not just from the edit box but also the clipboard?

    Basically, it's a bug, the result of a poor choice of default in an internal helper class.

    There is an internal helper class for "monitoring an edit control" with options to do things like remove illegal characters. This helper class was written back in 1998, presumably with the intention of being used somewhere, but it never did get hooked up. Maybe the feature it was originally written for got cancelled, I can't quite tell. At any rate, this helper class had many options, one of which was "When pasting text containing illegal characters, should I filter the illegal characters from the clipboard, too?", and for some reason it defaulted to Yes. (I can see why the default was Yes from a coding standpoint. It was actually less work to filter the characters from the clipboard that it was to preserve them, but it's a bad default from an API design standpoint.)

    Anyway, this helper class sat unused for a few years, but in 2000, Explorer decided to use this helper class so it could filter illegal characters out of file names when you used the Rename command. The code that uses this helper class chose which options it wanted, and probably due to oversight, the "preserve clipboard contents when pasting" flag was not specified.

    So yeah, it's just a bug. But then again, it's a bug that's been around for over a decade, so who knows if there's somebody out there that relies on it.

  • The Old New Thing

    When does a process ID become available for reuse?


    A customer wanted some information about process IDs:

    I'm writing some code that depends on process IDs and I'd like to understand better problem of process ID reuse.

    When can PIDs be reused? Does it happen when the process handle becomes signaled (but before the zombie object is removed from the system) or does it happen only after last handle to process is released (and the process object is removed from the system)?

    If its the former, will OpenProcess() succeed for a zombie process? (i.e. the one that has been terminated, but not yet removed from the system)?

    The process ID is a value associated with the process object, and as long as the process object is still around, so too will its process ID. The process object remains as long as the process is still running (the process implicitly retains a reference to itself) or as long as somebody still has a handle to the process object.

    If you think about it, this makes sense, because as long as there is still a handle to the process, somebody can call WaitForSingleObject to wait for the process to exit, or they can call GetExitCodeProcess to retrieve the exit code, and that exit code has to be stored somewhere for later retrieval.

    When all handles are closed, then the kernel knows that nobody is going to ask whether the process is still running or what its exit code is (because you need a handle to ask those questions). At which point the process object can be destroyed, which in turn destroys the process ID.

    What happens if somebody calls OpenProcess on a zombie process? The same thing that happens if they call it on a running process: They get a handle to the process. Why would you want to get a handle to a zombie process? Well, you might not know that it's a zombie yet; you're getting the handle so you can call WaitForSingleObject to see if it has exited yet. Or you might get the handle, knowing that it's a zombie, because you want to call GetExitCodeProcess to see what the exit code was.

  • The Old New Thing

    Processes, commit, RAM, threads, and how high can you go?


    Back in 2008, Igor Levicki made a boatload of incorrect assumptions in an attempt to calculate the highest a process ID can go on Windows NT. Let's look at them one at a time.

    So if you can't create more than 2,028 threads in one process (because of 2GB per process limit) and each process needs at least one thread, that means you are capped by the amount of physical RAM available for stack.

    One assumption is that each process needs at least one thread. Really? What about a process that has exited? (Some people call these zombie processes.) There are no threads remaining in this process, but the process object hangs around until all handles are closed.

    Next, the claim is that you are capped by the amount of physical RAM available for stack. This assumes that stacks are non-pageable, which is an awfully strange assumption. User-mode stacks are most certainly pageable. In fact, everything in user-mode is pageable unless you take special steps to make it not pageable.

    Given that the smallest stack allocation is 4KB and assuming 32-bit address space:

    4,294,967,296 / 4,096 = 1,048,576 PIDs

    This assumes that all the stacks live in the same address space, but user mode stacks from different processes most certainly do not; that's the whole point of separate address spaces! (Okay, kernel stacks live in the same address space, but the discussion about "initial stack commit" later makes it clear he's talking about user-mode stacks.)

    Since they have to be a multiple of 4:

    1,048,576 / 4 = 262,144 PIDs

    It's not clear why we are dividing by four here. Yes, process IDs are a multiple of four (implementation detail, not contractual, do not rely on it), but that doesn't mean that three quarters of the stacks are no longer any good. It just means that we can't use more than 4,294,967,296/4 of them since we'll run out of names after 1,073,741,824 of them. In other words, this is not a division but rather a min operation. And we already dropped below 1 billion when we counted kernel stacks, so this min step has no effect.

    It's like saying, "This street is 80 meters long. The minimum building line is 4 meters, which means that you can have at most 20 houses on this side of the street. But house numbers on this side of the street must be even, so the maximum number of houses is half that, or 10." No, the requirement that house numbers be even doesn't cut the number of houses in half; it just means you have to be more careful how you assign the numbers.

    Having 262,144 processes would consume 1GB of RAM just for the initial stack commit assuming that all processes are single-threaded. If they commited 1MB of stack each you would need 256 GB of memory.

    Commit does not consume RAM. Commit is merely a promise from the memory manager that the RAM will there when you need it, but the memory manager doesn't have to produce it immediately (and certainly doesn't have to keep the RAM reserved for you until you free it). Indeed, that's the whole point of virtual memory, to decouple commit from RAM! (If commit consumed RAM, then what's the page file for?)

    This calculation also assumes that process IDs are allocated "smallest available first", but it's clear that it's not as simple as that: Fire up Task Manager and look at the highest process ID. (I've got one as high as 4040.) If process IDs are allocated smallest-available-first, then a process ID of 4040 implies that at some point there were 1010 processes in the system simultaneously! Unlikely.

    Here's a much simpler demonstration that process IDs are not allocated smallest-available-first: Fire up Task Manager, tell it to Show processes from all users, go to the Processes tab, and enable the PID column if you haven't already. Now launch Calc. Look for Calc in the process list and observe that it was not assigned the lowest available PID. If your system is like mine, you have PID zero assigned to the System Idle Process (not really a process but it gets a number anyway), and PID 4 assigned to the System process (again, not really a process but it gets a number anyway), and then you have a pretty big gap before the next process ID (for me, it's 372). And yet Calc was given a process ID in the 2000's. Proof by counterexample that the system does not assign PIDs smallest-available-first.

    So if they aren't assigned smallest-available-first, what's to prevent one from having a process ID of 4000000000?

    (Advanced readers may note that kernel stacks do all share a single address space, but even in that case, a thread that doesn't exist doesn't have a stack. And it's clear that Igor was referring to user-mode stacks since he talked about 1MB stack commits, a value which applies to user mode and not kernel mode.)

    Just for fun, I tried to see how high I could get my process ID.

    #include <windows.h>
    int __cdecl _tmain(int argc, TCHAR **argv)
     DWORD dwPid = 0;
     TCHAR szSelf[MAX_PATH];
     GetModuleFileName(NULL, szSelf, MAX_PATH);
     int i;
     for (i = 0; i < 10000; i++) {
      STARTUPINFO si = { 0 };
      if (!CreateProcess(szSelf, TEXT("Bogus"),
            &si, &pi)) break;
      TerminateProcess(pi.hProcess, 0);
      // intentionally leak the process handle so the
      // process object is not destroyed
      // CloseHandle(pi.hProcess); // leak
      if (dwPid < pi.dwProcessId) dwPid = pi.dwProcessId;
     _tprintf(_TEXT("\nCreated %d processes, ")
              _TEXT("highest pid seen was %d\n"), i, dwPid);
     _fgetts(szSelf, MAX_PATH, stdin);
     return 0;

    In order to get the program to complete before I got bored, I ran it on a Windows 2000 virtual machine with 128MB of memory. It finally conked out at 5245 processes with a PID high water mark of 21776. Along the way, it managed to consume 2328KB of non-paged pool, 36KB of paged pool, and 36,092KB of commit. If you divide this by the number of processes, you'll see that a terminated process consumes about 450 bytes of non-paged pool, a negligible amount of paged pool, and 6KB of commit. (The commit is probably left over page tables and other detritus.) I suspect commit is the limiting factor in the number of processes.

    I ran the same program on a Windows 7 machine with 1GB of RAM, and it managed to create all 10,000 processes with a high process ID of 44264. I cranked the loop limit up to 65535, and it still comfortably created 65535 processes with a high process Id of 266,232, easily exceeding the limit of 262,144 that Igor calculated.

    I later learned that the Windows NT folks do try to keep the numerical values of process ID from getting too big. Earlier this century, the kernel team experimented with letting the numbers get really huge, in order to reduce the rate at which process IDs get reused, but they had to go back to small numbers, not for any technical reasons, but because people complained that the large process IDs looked ugly in Task Manager. (One customer even asked if something was wrong with his computer.)

    That's not saying that the kernel folks won't go back and try the experiment again someday. After all, they managed to get rid of the dispatcher lock. Who knows what other crazy things will change next? (And once they get process IDs to go above 65535—like they were in Windows 95, by the way—or if they decided to make process IDs no longer multiples of 4 in order to keep process IDs low, this guy's program will stop working, and it'll be Microsoft's fault.)

  • The Old New Thing

    Why does SHGetSpecialFolderPath take such a long time before returning a network error?


    A customer reported that their program was failing to start up because the call to SHGet­Special­Folder­Path(CSIDL_PERSONAL) was taking a long time and then eventually returning with ERROR_BAD_NETPATH. The account that was experiencing this problem had a redirected network profile, "but even if he's redirecting, why would we get the bad net path error? Does calling SHGet­Folder­Path actually touch the folder/network? If so, we should probably stop calling this function on the UI thread since network problems could cause our program to hang."

    The SHGet­Folder­Path function will access the network if you pass the CSIDL_FLAG_CREATE flag, which says "Check if the folder is there, and if not, create it."

    The customer had been passing the flag. "We'll remove it. As if our program is going to dictate the creation of the user profile directory."

    The CSIDL_FLAG_CREATE flag has been implicated in some other unwanted behavior. For example, if you pass the CSIDL_FLAG_CREATE flag when asking for CSIDL_MYPICTURES, this will create a My Pictures directory if there wasn't one before. Generally speaking, you shouldn't be creating these directories as side-effects of other actions. Corporate administrators may suppress creation of folders like Pictures and Videos, but that doesn't do much good if your program casually creates them as part of its startup.

    Note that SHGet­Special­Folder­Path and CSIDL values have been superseded by SHGet­Known­Folder­Path and KNOWN­FOLDER­ID. The flag corresponding to CSIDL_FLAG_CREATE is KF_FLAG_CREATE. If you want to make things even faster, consider passing the KF_FLAG_DONT_VERIFY flag (formerly known as CSIDL_FLAG_DONT_VERIFY).

Page 2 of 3 (22 items) 123