January, 2011

  • The Old New Thing

    My, what strange NOPs you have!


    While cleaning up my office, I ran across some old documents which reminded me that there are a lot of weird NOP instructions in Windows 95.

    Certain early versions of the 80386 processor (manufactured prior to 1987) are known as B1 stepping chips. These early versions of the 80386 had some obscure bugs that affected Windows. For example, if the instruction following a string operation (such as movs) uses opposite-sized addresses from that in the string instruction (for example, if you performed a movs es:[edi], ds:[esi] followed by a mov ax, [bx]) or if the following instruction accessed an opposite-sized stack (for example, if you performed a movs es:[edi], ds:[esi] on a 16-bit stack, and the next instruction was a push), then the movs instruction would not operate correctly. There were quite a few of these tiny little "if all the stars line up exactly right" chip bugs.

    Most of the chip bugs only affected mixed 32-bit and 16-bit operations, so if you were running pure 16-bit code or pure 32-bit code, you were unlikely to encounter any of them. And since Windows 3.1 did very little mixed-bitness programming (user-mode code was all-16-bit and kernel-mode code was all-32-bit), these defects didn't really affect Windows 3.1.

    Windows 95, on the other hand, contained a lot of mixed-bitness code since it was the transitional operating system that brought people using Windows out of the 16-bit world into the 32-bit world. As a result, code sequences that tripped over these little chip bugs turned up not infrequently.

    An executive decision had to be made whether to continue supporting these old chips or whether to abandon them. A preliminary market analysis of potential customers showed that there were enough computers running old 80386 chips to be worth making the extra effort to support them.

    Everybody who wrote assembly language code was alerted to the various code sequences that would cause problems on a B1 stepping, so that they wouldn't generate those code sequences themselves, and so they could be on the lookout for existing code that might have problems. To supplement the manual scan, I wrote a program that studied all the Windows 95 binaries trying to find these troublesome code sequences. When it brought one to my attention, I studied the offending code, and if I agreed with the program's assessment, I notified the developer who was responsible for the component in question.

    In nearly all cases, the troublesome code sequences could be fixed by judicious insertion of NOP statements. If the problem was caused by "instruction of type X followed by instruction of type Y", then you can just insert a NOP between the two instructions to "break up the party" and sidestep the problem. Sometimes, the standard NOP would end up classified as an instruction of type Y, so you had to insert a special kind of NOP, one that was not of type Y.

    For example, here's one code sequence from a function which does color format conversion:

            push    si          ; borrow si temporarily
            ; build second 4 pixels
            movzx   si, bl
            mov     ax, redTable[si]
            movzx   si, cl
            or      ax, blueTable[si]
            movzx   si, dl
            or      ax, greenTable[si]
            shl     eax, 16     ; move pixels to high word
            ; build first 4 pixels
            movzx   si, bh
            mov     ax, redTable[si]
            movzx   si, ch
            or      ax, blueTable[si]
            movzx   si, dh
            or      ax, greenTable[si]
            pop     si
            stosd   es:[edi]    ; store 8 pixels
            db      67h, 90h    ; 32-bit NOP fixes stos (B1 stepping)
            dec     wXE

    Note that we couldn't use just any old NOP; we had to use a NOP with a 32-bit address override prefix. That's right, this isn't just a regular NOP; this is a 32-bit NOP.

    From a B1 stepping-readiness standpoint, the folks who wrote in C had a little of the good news/bad news thing going. The good news is that the compiler did the code generation and you didn't need to worry about it. The bad news is that you also were dependent on the compiler writers to have taught their code generator how to avoid these B1 stepping pitfalls, and some of them were quite subtle. (For example, there was one bug that manifested itself in incorrect instruction decoding if a conditional branch instruction had just the right sequence of taken/not-taken history, and the branch instruction was followed immediately by a selector load, and one of the first two instructions at the destination of the branch was itself a jump, call, or return. The easy workaround: Insert a NOP between the branch and the selector load.)

    On the other hand, some quirks of the B1 stepping were easy to sidestep. For example, the B1 stepping did not support virtual memory in the first 64KB of memory. Fine, don't use virtual memory there. If virtual memory was enabled, if a certain race condition was encountered inside the hardware prefetch, and if you executed a floating point coprocessor instruction that accessed memory at an address in the range 0x800000F8 through 0x800000FF, then the CPU would end up reading from addresses 0x000000F8 through 0x0000000FF instead. This one was easy to work around: Never allocate valid memory at 0x80000xxx. Another reason for the no man's land in the address space near the 2GB boundary.

    I happened to have an old computer with a B1 stepping in my office. It ran slowly, but it did run. I think the test team "re-appropriated" the computer for their labs so they could verify that Windows 95 still ran correctly on a computer with a B1 stepping CPU.

    Late in the product cycle (after Final Beta), upper management reversed their earlier decision and decide not to support the B1 chip after all. Maybe the testers were finding too many bugs where other subtle B1 stepping bugs were being triggered. Maybe the cost of having to keep an eye on all the source code (and training/retraining all the developers to be aware of B1 issues) exceeded the benefit of supporting a shrinking customer base. For whatever reason, B1 stepping support was pulled, and customers with one of these older chips got an error message when they tried to install Windows 95. And just to make it easier for the product support people to recognize this failure, the error code for the error message was Error B1.

  • The Old New Thing

    Why does the name of my TEMP directory keep changing?


    A customer liaison contacted the shell team with the following request:

    Subject: Support case: 069314718055994

    On two of my customer's machines, he's finding that if he opens %TEMP% from the Start menu, it opens C:\Users\username\AppData\Local\Temp\1, C:\Users\username\AppData\Local\Temp\2, and so on. Each time the user logs off and back on, the number increments. The number resets after each reboot. Why are we seeing these folders being created under Temp? This does not appear to be the default behavior. What would cause the operating system to create these folders?

    The customer rebuilt one of the affected machines, and the behavior went away. However, the customer claims that both machines were working fine before, and then this problem suddenly started. Therefore, the customer is afraid that the problem will come back in the future. Any pointers in solving this mystery would be very much appreciated.

    It's not clear why this question was directed at the shell team, since Explorer doesn't set your TEMP directory. (In general, a lot of random questions come to the shell team not because they are shell questions but because people don't know where else to turn. Since the shell presents the desktop, and the desktop is on the screen when the problem occurs, maybe it's a shell issue!)

    The question was redirected to the Remote Desktop team, since it is Remote Desktop that creates these subdirectories off the TEMP directory. And from there, psychic powers predicted that the problem lay in the Administrative Templates\Windows Components\Terminal Services\Temporary folders group policy. If you don't select Do not use temporary folders per session, then these TEMP subdirectories are created. (Yet another of those crazy negative checkboxes.) There is also a knowledge base article describing the registry keys behind these group policies.

    The customer liaison responded cryptically,

    Thanks. I tested the policies and it is the one that creates the folder.

    Not sure what this means for solving the customer's problem, but that was the last we heard from the customer liaison, so I guess this policy was enough to give the customer a nudge in the right direction.

  • The Old New Thing

    Solutions that require a time machine: Making applications which require compatibility behaviors crash so the developers will fix their bug before they ship


    A while ago, I mentioned that there are many applications that rely on WM_PAINT messages being delivered even if there is nothing to paint because they put business logic inside their WM_PAINT handler. As a result, Windows sends them dummy WM_PAINT messages.

    Jerry Pisk opines,

    Thanks to the Windows team going out of their way not to break poorly written applications developers once again have no incentive to clean up their act and actually write working applications. If an application requires a dummy WM_PAINT not to crash it should be made to crash as soon as possible so the developers go in and fix it before releasing their "code".

    In other words, Jerry recommends that Microsoft use the time machine that Microsoft Research has been secretly perfecting for the past few years. (They will sometimes take it out for a spin and fail to cover their tracks.)

    In 1993, Company X writes a program that relies on WM_PAINT messages arriving in a particular order relative to other messages. (And just to make things more interesting, in 1994, Company X goes out of business, or they discontinue the program in question, or the only person who understands the code leaves the company or dies in a plane crash.)

    In 1995, changes to Windows alter the order of messages, and in particular, WM_PAINT messages are no longer sent under certain circumstances. I suspect that the reason for this is the introduction of the taskbar. Before the taskbar, minimized windows appeared as icons on your desktop and therefore received WM_PAINT messages while minimized. But now that applications minimize to the taskbar, minimized windows are sent off screen and never actually paint. The taskbar button does the job of representing the program on the screen.

    Okay, now let's put Jerry in charge of solving this compatibility problem. He recommends that instead of sending a dummy WM_PAINT message to these programs to keep them happy, these programs should instead be made to crash as soon as possible, so that the developers can go in and fix the problem before they release the program.

    In other words, he wants to take the Microsoft Research time machine back to 1993 with a beta copy of Windows 95 and give it to the programmers at Company X and tell them, "Your program crashes on this future version of Windows that doesn't exist yet in your time. Fix the problem before you release your code. (Oh, and by the way, the Blue Jays are going to repeat.)"

    Or maybe I misunderstood his recommendation.

  • The Old New Thing

    Don't just stand around saying somebody should do something: Be someone


    On one of the frivolous mailing lists in the Windows project, somebody spotted some behavior that seemed pretty bad and filed a bug on it. The project was winding down, with fewer and fewer bugs being accepted by the release management team each day, so it was not entirely surprising that this particular bug was also rejected. News of this smackdown threw the mailing list into an fit of apoplexy.

    Don't they realize how bad this bug is? Somebody should reactivate this bug.
    Yeah, this is really serious. I don't think they understood the scope of the problem. Somebody should mark the bug for reconsideration.
    Definitely. Someone should reactivate the bug and reassert the issue.

    After about a half dozen of messages like this, I couldn't take it any longer.

    I can't believe I'm reading this.

    I decided to Be Someone. I reactivated the bug and included a justification.

    Don't just stand around saying somebody should do something. Be that someone.

    (And even though it's not relevant to the story, the bug was ultimately accepted by the release management team the second time around because all the discussion of the bug gave the bug representative more information with which to to argue why the bug should be fixed.)

  • The Old New Thing

    Some remarks on VirtualAlloc and MEM_LARGE_PAGES


    If you try to run the sample program demonstrating how to create a file mapping using large pages, you'll probably run into the error ERROR_NOT_ALL_ASSIGNED (Not all privileges or groups referenced are assigned to the caller) when calling Adjust­Token­Privileges. What is going on?

    The Adjust­Token­Privileges enables privileges that you already have (but which are masked). Sort of like how a super hero can't use super powers while disguised as a normal mild-mannered citizen. In order to enable the Se­Lock­Memory­Privilege privilege, you must already have it. But where do you get it?

    You do this by using the group policy editor. The list of privileges says that the Se­Lock­Memory­Privilege corresponds to "Lock pages in memory".

    Why does allocating very large pages require permission to lock pages in memory?

    Because very large pages are not pageable. This is not an inherent limitation of large pages; the processor is happy to page them in or out, but you have to do it all or nothing. In practice, you don't want a single page-out or page-in operation to consume 4MB or 16MB of disk I/O; that's a thousand times more I/O than your average paging operation. And in practice, the programs which use these large pages are "You paid $40,000 for a monster server whose sole purpose is running my one application and nothing else" type applications, like SQL Server. Those applications don't want this memory to be pageable anyway, so adding code to allow them to be pageable is not only a bunch of work, but it's a bunch of work to add something nobody who uses the feature actually wants.

    What's more, allocating very large pages can be time-consuming. All the physical pages which are involved in a very large page must be contiguous (and must be aligned on a large page boundary). Prior to Windows XP, allocating a very large page can take 15 seconds or more if your physical memory is fragmented. (And even machines with as much as 2GB of memory will probably have highly fragmented physical memory once they're running for a little while.) Internally, allocating the physical pages for a very large page is performed by the kernel function which allocates physically contiguous memory, which is something device drivers need to do quite often for I/O transfer buffers. Some drivers behave "highly unfavorably" if their request for contiguous memory fails, so the operating system tries very hard to scrounge up the memory, even if it means shuffling megabytes of memory around and performing a lot of disk I/O to get it. (It's essentially performing a time-critical defragmentation.)

    If you followed the discussion so far, you'll see another reason why large pages aren't paged out: When they need to be paged back in, the system may not be able to find a suitable chunk of contiguous physical memory!

    In Windows Vista, the memory manager folks recognized that these long delays made very large pages less attractive for applications, so they changed the behavior so requests for very large pages from applications went through the "easy parts" of looking for contiguous physical memory, but gave up before the memory manager went into desperation mode, preferring instead just to fail. (In Windows Vista SP1, this part of the memory manager was rewritten so the really expensive stuff is never needed at all.)

    Note that the MEM_LARGE_PAGES flag triggers an exception to the general principle that MEM_RESERVE only reserves address space, MEM_COMMIT makes the memory manager guarantee that physical pages will be there when you need them, and that the physical pages aren't actually allocated until you access the memory. Since very large pages have special physical memory requirements, the physical allocation is done up front so that the memory manager knows that when it comes time to produce the memory on demand, it can actually do so.

  • The Old New Thing

    From inside the Redmond Reality Distortion Field: Why publish documents in PDF?


    A few years ago, the Windows 7 team developed a document to introduce technology writers to the features of Windows 7. The document was released in PDF format, which created quite a stir among certain people trapped inside the Redmond Reality Distortion Field, who indignantly complained,

    Why are we releasing this document in PDF format? Shouldn't it be in docx or XPS? I would expect people interested in Windows 7 to be willing to use more Microsoft technology.

    Um, hello from the real world. It's the people who are critical of Windows 7 who are least likely to use Microsoft technology!

    "Okay, so Microsoft has this document telling me about their new product, but it's in some Microsoft proprietary file format that requires me to install a custom viewer that works only in Internet Explorer? You've gotta be kidding me."

    No wonder people hate Microsoft.

    It's like handing out brochures titled "Gründe, warum du Deutsch lernen solltest."

    Bonus plug: Stephen Toulouse bookified his blogerations. (Part 2: The Hardbackening.) I've read the softcopy of his book. Good stuff. And I would've endorsed his book even if he didn't promise me a personalized copy.

  • The Old New Thing

    Why didn't they use the Space Shuttle to rescue the Apollo 13 astronauts?


    Many decisions make sense only in the context of history.

    Much like the moviegoers who were puzzled why NASA didn't just use the Space Shuttle to rescue the Apollo 13 astronauts, computer users of today, when looking back on historical decisions, often make assumptions based on technology that didn't exist.

    Consider, for example, pointing out that the absence of a console subsystem in Windows 3.1 was no excuse for not porting the ipconfig program as a character-mode application. "Sure maybe you didn't have a console subsystem, but why not just use the DOS box?"

    The MS-DOS prompt is a virtual machine running a copy of MS-DOS. Since it's a virtual machine, as far as the MS-DOS prompt is concerned, it's just running all by its happy self on a dedicated computer running MS-DOS. In reality, of course, it's running inside a simulator being controlled by Windows, but the point of the simulation is so that old applications can continue to run even though they think they're running under MS-DOS.

    "There wasn't any security in place with Win 3.1, so any program run from a DOS box should have been able to affect anything on the system."

    Since the MS-DOS prompt ran in a virtual machine, everything it did was under the supervision of the virtual machine manager. If it tried to access memory it didn't have permission to access, an exception would be raised and handled by the virtual machine manager. If it tried to execute a privileged instruction, an exception would be raised, and the virtual machine manager would step in with a "Nope, I'm not going to let you do that" and terminate the virtual machine. In a sense, programs running in the MS-DOS prompt actually ran with more protection and isolation than Windows applications running on the desktop, because Windows created a whole separate universe for each MS-DOS prompt.

    One of the consequences of virtualization is that programs running in the MS-DOS prompt are plain old MS-DOS applications, not Windows applications. There is no Windows API in MS-DOS, so there is no Windows API in the MS-DOS prompt either. (It's like running Windows inside a virtual machine on your Linux box and wondering why your Windows program can't call XCreateWindow. It can't call XCreateWindow because that's a function on the host system, not in the virtual machine.)

    Okay, but let's suppose, just for the sake of argument, that somebody poked a hole in the virtual machine and provided a way for MS-DOS programs to call WinSock APIs.

    You still wouldn't want ipconfig to be an MS-DOS program.

    Recall that Windows 3.1 ran in one of two modes, either standard mode or enhanced mode. Standard mode is the version designed for the 80286 processor. It didn't have virtual memory or support for virtual machines. When you ran an MS-DOS prompt, standard mode Windows would freeze all your Windows programs and effectively put itself into suspended animation. It then ran your MS-DOS program (full-screen since there was no Windows around to put it in a window), and when your MS-DOS program exited, Windows would rouse itself from its slumber and bring things back to the way they were before you opened that MS-DOS prompt.

    It would kind of suck if getting your computer's IP address meant stopping all your work, shutting down Windows (effectively), and switching the video adapter into character mode, just so it could print 16 characters to the screen.

    "Well, who cares about standard mode Windows any more? Let's say that it only works in enhanced mode. Enhanced mode can multi-task MS-DOS prompts and run them in a window."

    Recall that the minimum memory requirements for Windows 3.1 in enhanced mode was 1664KB of memory. Given that each MS-DOS box took up about 1MB of memory, you're saying that displaying 16 characters of information is going to consume over half of your computer's memory?

    "Okay, helpdesk wants to know my IP address so they can troubleshoot my computer. In order to do that, I have to run this program, but first I need to save all my work and exit all my programs in order to free up enough memory to run the program they want me to run."

    Better to just write a simple Windows application.

    Bonus commentary: 640k asks, "Why wasn't winipcfg called ipconfig?"

    Right. "Let's have two completely different and incompatible programs with the same name." See how far you get with that.

  • The Old New Thing

    There's a default implementation for WM_SETREDRAW, but you might be able to do better


    If your window doesn't have a handler for the WM_SET­REDRAW message, then Def­Window­Proc will give you a default implementation which suppresses WM_PAINT messages for your window when redraw is disabled, and re-enables WM_PAINT (and triggers a full repaint) when redraw is re-enabled. (This is internally accomplished by making the window pseudo-invisible, but that's an implementation detail you shouldn't be concerned with.)

    Although the default implementation works fine for simple controls, more complex controls can do better, and in fact they should do better, because that's sort of the point of WM_SET­REDRAW.

    The intended use for disabling redraw on a window is in preparation for making large numbers of changes to the window, where you don't want to waste time updating the screen after each tiny little change. For example, if you're going to add a hundred items to a list box, you probably want to disable redraw while adding the items so you don't have to suffer through 100 screen refreshes when only one is enough. You've probably seen the programs that forget to suppress redraw when filling a large list box: The application freezes up except for a list box whose scroll bar starts out with a big thumb that slowly shrinks as the number of items increases.

    I say that this is sort of the point of WM_SET­REDRAW for a complex control, because if you have a simple control (like a button), there isn't much in the way of "bulk updates" you can perform on it, so there isn't much reason for anybody to want to disable redraw on it anyway. The types of windows for which people want to disable redraw are the types of windows that would benefit most from a custom handler.

    For example, the list view control has a custom handler for WM_SET­REDRAW which sets an internal redraw has been disabled flag. Other parts of the list view control check this flag and bypass complex screen calculations if is set. For example, when you add an item to a list view while redraw is disabled, the list view control doesn't bother recalculating the new scroll bar position; it just sets an internal flag that says, "When redraw is re-enabled, don't forget to recalculate the scroll bars." If the list view is in auto-arrange, it doesn't bother rearranging the items after each insertion or deletion; it just sets an internal flag to remember to do it when redraw is re-enabled. If you have a regional list view, it doesn't bother recalculating the region; it just sets a flag. And when you finally re-enable drawing, it sees all the little Post-It note reminders that it left lying around and says, "Okay, let's deal with all this stuff that I had been putting off." That way, if you add 100 items, it doesn't perform 99 useless scroll bar calculations, 99 useless auto-arrange repositionings, and create, compute, and then destroy 99 regions. Since some of these calculations are O(n), deferring them when redraw is disabled improves the performance of inserting n items from O() to O(n).

    Moral of the story: If you have a control that manages a large number of sub-items, you should have a custom WM_SET­REDRAW handler to make bulk updates more efficient.

    Bonus chatter: Note that using Lock­Window­Update as a fake version of WM_SET­REDRAW does not trigger these internal optimizations. Abusing Lock­Window­Update gets you the benefit of not repainting, but you still have to suffer through the various O() calculations.

  • The Old New Thing

    When does a process ID become available for reuse?


    A customer wanted some information about process IDs:

    I'm writing some code that depends on process IDs and I'd like to understand better problem of process ID reuse.

    When can PIDs be reused? Does it happen when the process handle becomes signaled (but before the zombie object is removed from the system) or does it happen only after last handle to process is released (and the process object is removed from the system)?

    If its the former, will OpenProcess() succeed for a zombie process? (i.e. the one that has been terminated, but not yet removed from the system)?

    The process ID is a value associated with the process object, and as long as the process object is still around, so too will its process ID. The process object remains as long as the process is still running (the process implicitly retains a reference to itself) or as long as somebody still has a handle to the process object.

    If you think about it, this makes sense, because as long as there is still a handle to the process, somebody can call WaitForSingleObject to wait for the process to exit, or they can call GetExitCodeProcess to retrieve the exit code, and that exit code has to be stored somewhere for later retrieval.

    When all handles are closed, then the kernel knows that nobody is going to ask whether the process is still running or what its exit code is (because you need a handle to ask those questions). At which point the process object can be destroyed, which in turn destroys the process ID.

    What happens if somebody calls OpenProcess on a zombie process? The same thing that happens if they call it on a running process: They get a handle to the process. Why would you want to get a handle to a zombie process? Well, you might not know that it's a zombie yet; you're getting the handle so you can call WaitForSingleObject to see if it has exited yet. Or you might get the handle, knowing that it's a zombie, because you want to call GetExitCodeProcess to see what the exit code was.

  • The Old New Thing

    What's the difference between an asynchronous PIPE_WAIT pipe and a PIPE_NOWAIT pipe?


    When you operate on named pipes, you have a choice of opening them in PIPE_WAIT mode or PIPE_NOWAIT mode. When you read from a PIPE_WAIT pipe, the read blocks until data becomes available in the pipe. When you read from a PIPE_NOWAIT pipe, then the read completes immediately even if there is no data in the pipe. But how is this different from a PIPE_WAIT pipe opened in asynchronous mode by passing FILE_FLAG_OVERLAPPED?

    The difference is in when the I/O is deemed to have completed.

    When you issue an overlapped read against a PIPE_WAIT pipe, the call to Read­File returns immediately, but the completion actions do not occur until there is data available in the pipe. (Completion actions are things like setting the event, running the completion routine, or queueing a completion to an I/O completion port.) On the other hand, when you issue a read against a PIPE_NOWAIT pipe, the call to Read­File returns immediately with completion—if the pipe is empty, the read completes with a read of zero bytes and the error ERROR_NO_DATA.

    Here's a timeline, for people who prefer tables.

    Event Asynchronous PIPE_WAIT PIPE_NOWAIT
    pipe initially empty
    ReadFile Returns immediately with ERROR_IO_PENDING Returns immediately with ERROR_NO_DATA
    I/O completes with 0 bytes
    time passes
    Data available I/O completes with n > 0 bytes

    If you use the PIPE_NOWAIT flag, then the only way to know whether there is data is to poll for it. There is no way to be notified when data becomes available.

    As the documentation notes, PIPE_NOWAIT remains solely for compatibility with LAN Manager 2.0. Since the only way to use pipes created as PIPE_NOWAIT is to poll them, this is obviously not a recommended model for a multitasking operating system.

Page 1 of 3 (22 items) 123