January, 2011

  • The Old New Thing

    Solutions that require a time machine: Making applications which require compatibility behaviors crash so the developers will fix their bug before they ship


    A while ago, I mentioned that there are many applications that rely on WM_PAINT messages being delivered even if there is nothing to paint because they put business logic inside their WM_PAINT handler. As a result, Windows sends them dummy WM_PAINT messages.

    Jerry Pisk opines,

    Thanks to the Windows team going out of their way not to break poorly written applications developers once again have no incentive to clean up their act and actually write working applications. If an application requires a dummy WM_PAINT not to crash it should be made to crash as soon as possible so the developers go in and fix it before releasing their "code".

    In other words, Jerry recommends that Microsoft use the time machine that Microsoft Research has been secretly perfecting for the past few years. (They will sometimes take it out for a spin and fail to cover their tracks.)

    In 1993, Company X writes a program that relies on WM_PAINT messages arriving in a particular order relative to other messages. (And just to make things more interesting, in 1994, Company X goes out of business, or they discontinue the program in question, or the only person who understands the code leaves the company or dies in a plane crash.)

    In 1995, changes to Windows alter the order of messages, and in particular, WM_PAINT messages are no longer sent under certain circumstances. I suspect that the reason for this is the introduction of the taskbar. Before the taskbar, minimized windows appeared as icons on your desktop and therefore received WM_PAINT messages while minimized. But now that applications minimize to the taskbar, minimized windows are sent off screen and never actually paint. The taskbar button does the job of representing the program on the screen.

    Okay, now let's put Jerry in charge of solving this compatibility problem. He recommends that instead of sending a dummy WM_PAINT message to these programs to keep them happy, these programs should instead be made to crash as soon as possible, so that the developers can go in and fix the problem before they release the program.

    In other words, he wants to take the Microsoft Research time machine back to 1993 with a beta copy of Windows 95 and give it to the programmers at Company X and tell them, "Your program crashes on this future version of Windows that doesn't exist yet in your time. Fix the problem before you release your code. (Oh, and by the way, the Blue Jays are going to repeat.)"

    Or maybe I misunderstood his recommendation.

  • The Old New Thing

    Was showing the column header in all Explorer views a rogue feature?


    User :( asks whether the Explorer feature that shows the column headers in all views was a rogue feature or a planned one.

    If it was a rogue feature, it was a horribly badly hidden one.

    One of the important characteristics of the rogue feature is that you not be able to tell that the feature is there unless you go looking for it. If the feature is right there on the screen as soon as you open an Explorer window, odds are that somebody is going to notice and say something about it. (For example, the designer who is responsible for Explorer is probably going to notice that every screenshot of Explorer doesn't match the spec.)

    That's why rogue features typically take the form of a hotkey or holding a modifier key in conjunction with another operation.

    Writing up this characterization of rogue features reminds me that I myself am responsible for a rogue feature of Windows 95. If you go to an MS-DOS box, select Edit, then Mark, then select a section of the window for copying, if you hold the Ctrl key while dragging the mouse, the shape of the selection changes from a box to, um, I'm not sure the shape has a name. But it includes all the text between the start point and the end point, as if the contents of the window had come from an edit control. Something like this:


    Since this was a rogue feature, it was never tested, and I suspect that it didn't work on Hebrew or Arabic systems.

    I must've chickened out, or maybe my rogue feature was found out, because the code for streamed selections was ifdef'd out before Windows 95 shipped. So at least I can still honestly say that I never shipped a rogue feature.

  • The Old New Thing

    From inside the Redmond Reality Distortion Field: Why publish documents in PDF?


    A few years ago, the Windows 7 team developed a document to introduce technology writers to the features of Windows 7. The document was released in PDF format, which created quite a stir among certain people trapped inside the Redmond Reality Distortion Field, who indignantly complained,

    Why are we releasing this document in PDF format? Shouldn't it be in docx or XPS? I would expect people interested in Windows 7 to be willing to use more Microsoft technology.

    Um, hello from the real world. It's the people who are critical of Windows 7 who are least likely to use Microsoft technology!

    "Okay, so Microsoft has this document telling me about their new product, but it's in some Microsoft proprietary file format that requires me to install a custom viewer that works only in Internet Explorer? You've gotta be kidding me."

    No wonder people hate Microsoft.

    It's like handing out brochures titled "Gründe, warum du Deutsch lernen solltest."

    Bonus plug: Stephen Toulouse bookified his blogerations. (Part 2: The Hardbackening.) I've read the softcopy of his book. Good stuff. And I would've endorsed his book even if he didn't promise me a personalized copy.

  • The Old New Thing

    The message text limit for the Marquee screen saver is 255, even if you bypass the dialog box that prevents you from entering more than 255 characters


    If you find an old Windows XP machine and fire up the configuration dialog for the Marquee screen saver, you'll see that the text field for entering the message won't let you type more than 255 characters. That's because the Marquee screen saver uses a 255-character buffer to hold the message, and the dialog box figure there's no point in letting you type in a message longer than the screen saver can display.

    A customer decided to bypass the configuration dialog and change the text in the screen saver by editing the settings directly, and then complained that the Marquee screen saver truncated the message at 255 characters.

    Well, yeah, because the limit is 255 characters. That's what the dialog box was trying to tell you. If you bypass the dialog box and whack the setting directly, that doesn't change the Marquee screen saver. Its limit is still 255 characters.

    It's like attaching a gizmo to the gas pump fuel nozzle to disable the auto-stop feature because you want to put 20 gallons of gas into your 15-gallon tank. Disabling the auto-stop will not make the gas tank any bigger. All it does it make it easier to accidentally overflow your gas tank's buffer.

  • The Old New Thing

    Why didn't they use the Space Shuttle to rescue the Apollo 13 astronauts?


    Many decisions make sense only in the context of history.

    Much like the moviegoers who were puzzled why NASA didn't just use the Space Shuttle to rescue the Apollo 13 astronauts, computer users of today, when looking back on historical decisions, often make assumptions based on technology that didn't exist.

    Consider, for example, pointing out that the absence of a console subsystem in Windows 3.1 was no excuse for not porting the ipconfig program as a character-mode application. "Sure maybe you didn't have a console subsystem, but why not just use the DOS box?"

    The MS-DOS prompt is a virtual machine running a copy of MS-DOS. Since it's a virtual machine, as far as the MS-DOS prompt is concerned, it's just running all by its happy self on a dedicated computer running MS-DOS. In reality, of course, it's running inside a simulator being controlled by Windows, but the point of the simulation is so that old applications can continue to run even though they think they're running under MS-DOS.

    "There wasn't any security in place with Win 3.1, so any program run from a DOS box should have been able to affect anything on the system."

    Since the MS-DOS prompt ran in a virtual machine, everything it did was under the supervision of the virtual machine manager. If it tried to access memory it didn't have permission to access, an exception would be raised and handled by the virtual machine manager. If it tried to execute a privileged instruction, an exception would be raised, and the virtual machine manager would step in with a "Nope, I'm not going to let you do that" and terminate the virtual machine. In a sense, programs running in the MS-DOS prompt actually ran with more protection and isolation than Windows applications running on the desktop, because Windows created a whole separate universe for each MS-DOS prompt.

    One of the consequences of virtualization is that programs running in the MS-DOS prompt are plain old MS-DOS applications, not Windows applications. There is no Windows API in MS-DOS, so there is no Windows API in the MS-DOS prompt either. (It's like running Windows inside a virtual machine on your Linux box and wondering why your Windows program can't call XCreateWindow. It can't call XCreateWindow because that's a function on the host system, not in the virtual machine.)

    Okay, but let's suppose, just for the sake of argument, that somebody poked a hole in the virtual machine and provided a way for MS-DOS programs to call WinSock APIs.

    You still wouldn't want ipconfig to be an MS-DOS program.

    Recall that Windows 3.1 ran in one of two modes, either standard mode or enhanced mode. Standard mode is the version designed for the 80286 processor. It didn't have virtual memory or support for virtual machines. When you ran an MS-DOS prompt, standard mode Windows would freeze all your Windows programs and effectively put itself into suspended animation. It then ran your MS-DOS program (full-screen since there was no Windows around to put it in a window), and when your MS-DOS program exited, Windows would rouse itself from its slumber and bring things back to the way they were before you opened that MS-DOS prompt.

    It would kind of suck if getting your computer's IP address meant stopping all your work, shutting down Windows (effectively), and switching the video adapter into character mode, just so it could print 16 characters to the screen.

    "Well, who cares about standard mode Windows any more? Let's say that it only works in enhanced mode. Enhanced mode can multi-task MS-DOS prompts and run them in a window."

    Recall that the minimum memory requirements for Windows 3.1 in enhanced mode was 1664KB of memory. Given that each MS-DOS box took up about 1MB of memory, you're saying that displaying 16 characters of information is going to consume over half of your computer's memory?

    "Okay, helpdesk wants to know my IP address so they can troubleshoot my computer. In order to do that, I have to run this program, but first I need to save all my work and exit all my programs in order to free up enough memory to run the program they want me to run."

    Better to just write a simple Windows application.

    Bonus commentary: 640k asks, "Why wasn't winipcfg called ipconfig?"

    Right. "Let's have two completely different and incompatible programs with the same name." See how far you get with that.

  • The Old New Thing

    Some remarks on VirtualAlloc and MEM_LARGE_PAGES


    If you try to run the sample program demonstrating how to create a file mapping using large pages, you'll probably run into the error ERROR_NOT_ALL_ASSIGNED (Not all privileges or groups referenced are assigned to the caller) when calling Adjust­Token­Privileges. What is going on?

    The Adjust­Token­Privileges enables privileges that you already have (but which are masked). Sort of like how a super hero can't use super powers while disguised as a normal mild-mannered citizen. In order to enable the Se­Lock­Memory­Privilege privilege, you must already have it. But where do you get it?

    You do this by using the group policy editor. The list of privileges says that the Se­Lock­Memory­Privilege corresponds to "Lock pages in memory".

    Why does allocating very large pages require permission to lock pages in memory?

    Because very large pages are not pageable. This is not an inherent limitation of large pages; the processor is happy to page them in or out, but you have to do it all or nothing. In practice, you don't want a single page-out or page-in operation to consume 4MB or 16MB of disk I/O; that's a thousand times more I/O than your average paging operation. And in practice, the programs which use these large pages are "You paid $40,000 for a monster server whose sole purpose is running my one application and nothing else" type applications, like SQL Server. Those applications don't want this memory to be pageable anyway, so adding code to allow them to be pageable is not only a bunch of work, but it's a bunch of work to add something nobody who uses the feature actually wants.

    What's more, allocating very large pages can be time-consuming. All the physical pages which are involved in a very large page must be contiguous (and must be aligned on a large page boundary). Prior to Windows XP, allocating a very large page can take 15 seconds or more if your physical memory is fragmented. (And even machines with as much as 2GB of memory will probably have highly fragmented physical memory once they're running for a little while.) Internally, allocating the physical pages for a very large page is performed by the kernel function which allocates physically contiguous memory, which is something device drivers need to do quite often for I/O transfer buffers. Some drivers behave "highly unfavorably" if their request for contiguous memory fails, so the operating system tries very hard to scrounge up the memory, even if it means shuffling megabytes of memory around and performing a lot of disk I/O to get it. (It's essentially performing a time-critical defragmentation.)

    If you followed the discussion so far, you'll see another reason why large pages aren't paged out: When they need to be paged back in, the system may not be able to find a suitable chunk of contiguous physical memory!

    In Windows Vista, the memory manager folks recognized that these long delays made very large pages less attractive for applications, so they changed the behavior so requests for very large pages from applications went through the "easy parts" of looking for contiguous physical memory, but gave up before the memory manager went into desperation mode, preferring instead just to fail. (In Windows Vista SP1, this part of the memory manager was rewritten so the really expensive stuff is never needed at all.)

    Note that the MEM_LARGE_PAGES flag triggers an exception to the general principle that MEM_RESERVE only reserves address space, MEM_COMMIT makes the memory manager guarantee that physical pages will be there when you need them, and that the physical pages aren't actually allocated until you access the memory. Since very large pages have special physical memory requirements, the physical allocation is done up front so that the memory manager knows that when it comes time to produce the memory on demand, it can actually do so.

  • The Old New Thing

    How to turn off the exception handler that COM "helpfully" wraps around your server


    Historically, COM placed a giant try/except around your server's methods. If your server encountered what would normally be an unhandled exception, the giant try/except would catch it and turn it into the error RPC_E_SERVERFAULT. It then marked the exception as handled, so that the server remained running, thereby "improving robustness by keeping the server running even when it encountered a problem."

    Mind you, this was actually a disservice.

    The fact that an unhandled exception occurred means that the server was in an unexpected state. By catching the exception and saying, "Don't worry, it's all good," you end up leaving a corrupted server running. For example:

    HRESULT CServer::DoOneWork(...)
     CWork *pwork = m_listWorkPending.RemoveFirst();
     if (pwork) {
     return S_OK;

    Suppose there's a bug somewhere that causes pwork->Reverse­Polarity() to crash. Maybe the problem is that the neutrons aren't flowing, so there's no polarity to reverse. Maybe the polarizer is not property initialized. Whatever, doesn't matter what the problem is, just assume there's a bug that prevents it from working.

    With the global try/except, COM catches the exception and returns RPC_E_SERVERFAULT back to the caller. Your server remains up and running, ready for another request. Mind you, your server is also corrupted. The widget never got unfrobbed, the timestamp refers to work that never completed, and the CWork that you removed from the pending work list got leaked.

    But, hey, your server stayed up.

    A few hours later, the server starts returning E_OUTOFMEMORY errors (because of all the leaked work items), you get errors because there are too many outstanding frobs, and the client hangs because it's waiting for a completion notification on that work item that you lost track of. You debug the server to see why everything is so screwed up, but you can't find anything wrong. "I don't understand why we are leaking frobs. Every time we frob a widget, there's a call to unfrob right after it!"

    You eventually throw up your hands in resignation. "I can't figure it out. There's no way we can be leaking frobs."

    Even worse, the inconsistent object state can be a security hole. An attacker tricks you into reversing the polarity of a nonexistent neutron flow, which causes you to leave the widget frobbed by mistake. Bingo, frobbing a widget makes it temporarily exempt from unauthorized polarity changes, and now the bad guys can change the polarity at will. Now you have to chase a security vulnerability where widgets are being left frobbed, and you still can't find it.

    Catching all exceptions and letting the process continue running assumes that a server can recover from an unexpected failure. But this is absurd. You already know that the server is unrecoverably toast: It crashed!

    Much better is to let the server crash so that the crash dump can be captured at the point of the failure. Now you have a fighting chance of figuring out what's going on.

    But how do you turn off that massive try/except? You didn't put it in your code; COM created it for you.

    Enter IGlobal­Options: Set the COMGLB_EXCEPTION_HANDLING property to COMGLB_EXCEPTION_DONOT_HANDLE, which means "Please don't try to 'help' me by catching all exceptions. If a fatal exception occurs in my code, then go ahead and let the process crash." In Windows 7, you can ask for the even stronger COMGLB_EXCEPTION_DONOT_HANDLE_ANY, which means "Don't even try to catch 'nonfatal' exceptions."

    Wait, what's a 'fatal' exception?

    A 'fatal' exception, at least as COM interprets it, is an exception like STATUS_ACCESS_VIOLATION or STATUS_ILLEGAL_INSTRUCTION. (A complete list is in this sample Rpc exception filter.) On the other hand a 'nonfatal' exception is something like a C++ exception or a CLR exception. You probably want an unhandled C++ or CLR exception to crash your server, too; after all, it would have crashed your program if it weren't running as a server. Therefore, my personal recommendation is to use COMGLB_EXCEPTION_DONOT_HANDLE_ANY whenever possible.

    "That's great, but why is the default behavior the dangerous 'silently swallow exceptions' mode?"

    The COM folks have made numerous attempts to change the default from the dangerous mode to one of the safer modes, but the application compatibility consequences have always been too great. Turns out there are a lot of servers that actually rely on COM silently masking their exceptions.

    But at least now you won't be one of them.

  • The Old New Thing

    Why does pasting a string containing an illegal filename character into a rename edit box delete the characters from the clipboard, too?


    Ane asks why, if you have a string with an illegal filename character on the clipboard, and you paste that string into a rename edit box, do the illegal characters get deleted not just from the edit box but also the clipboard?

    Basically, it's a bug, the result of a poor choice of default in an internal helper class.

    There is an internal helper class for "monitoring an edit control" with options to do things like remove illegal characters. This helper class was written back in 1998, presumably with the intention of being used somewhere, but it never did get hooked up. Maybe the feature it was originally written for got cancelled, I can't quite tell. At any rate, this helper class had many options, one of which was "When pasting text containing illegal characters, should I filter the illegal characters from the clipboard, too?", and for some reason it defaulted to Yes. (I can see why the default was Yes from a coding standpoint. It was actually less work to filter the characters from the clipboard that it was to preserve them, but it's a bad default from an API design standpoint.)

    Anyway, this helper class sat unused for a few years, but in 2000, Explorer decided to use this helper class so it could filter illegal characters out of file names when you used the Rename command. The code that uses this helper class chose which options it wanted, and probably due to oversight, the "preserve clipboard contents when pasting" flag was not specified.

    So yeah, it's just a bug. But then again, it's a bug that's been around for over a decade, so who knows if there's somebody out there that relies on it.

  • The Old New Thing

    Processes, commit, RAM, threads, and how high can you go?


    Back in 2008, Igor Levicki made a boatload of incorrect assumptions in an attempt to calculate the highest a process ID can go on Windows NT. Let's look at them one at a time.

    So if you can't create more than 2,028 threads in one process (because of 2GB per process limit) and each process needs at least one thread, that means you are capped by the amount of physical RAM available for stack.

    One assumption is that each process needs at least one thread. Really? What about a process that has exited? (Some people call these zombie processes.) There are no threads remaining in this process, but the process object hangs around until all handles are closed.

    Next, the claim is that you are capped by the amount of physical RAM available for stack. This assumes that stacks are non-pageable, which is an awfully strange assumption. User-mode stacks are most certainly pageable. In fact, everything in user-mode is pageable unless you take special steps to make it not pageable.

    Given that the smallest stack allocation is 4KB and assuming 32-bit address space:

    4,294,967,296 / 4,096 = 1,048,576 PIDs

    This assumes that all the stacks live in the same address space, but user mode stacks from different processes most certainly do not; that's the whole point of separate address spaces! (Okay, kernel stacks live in the same address space, but the discussion about "initial stack commit" later makes it clear he's talking about user-mode stacks.)

    Since they have to be a multiple of 4:

    1,048,576 / 4 = 262,144 PIDs

    It's not clear why we are dividing by four here. Yes, process IDs are a multiple of four (implementation detail, not contractual, do not rely on it), but that doesn't mean that three quarters of the stacks are no longer any good. It just means that we can't use more than 4,294,967,296/4 of them since we'll run out of names after 1,073,741,824 of them. In other words, this is not a division but rather a min operation. And we already dropped below 1 billion when we counted kernel stacks, so this min step has no effect.

    It's like saying, "This street is 80 meters long. The minimum building line is 4 meters, which means that you can have at most 20 houses on this side of the street. But house numbers on this side of the street must be even, so the maximum number of houses is half that, or 10." No, the requirement that house numbers be even doesn't cut the number of houses in half; it just means you have to be more careful how you assign the numbers.

    Having 262,144 processes would consume 1GB of RAM just for the initial stack commit assuming that all processes are single-threaded. If they commited 1MB of stack each you would need 256 GB of memory.

    Commit does not consume RAM. Commit is merely a promise from the memory manager that the RAM will there when you need it, but the memory manager doesn't have to produce it immediately (and certainly doesn't have to keep the RAM reserved for you until you free it). Indeed, that's the whole point of virtual memory, to decouple commit from RAM! (If commit consumed RAM, then what's the page file for?)

    This calculation also assumes that process IDs are allocated "smallest available first", but it's clear that it's not as simple as that: Fire up Task Manager and look at the highest process ID. (I've got one as high as 4040.) If process IDs are allocated smallest-available-first, then a process ID of 4040 implies that at some point there were 1010 processes in the system simultaneously! Unlikely.

    Here's a much simpler demonstration that process IDs are not allocated smallest-available-first: Fire up Task Manager, tell it to Show processes from all users, go to the Processes tab, and enable the PID column if you haven't already. Now launch Calc. Look for Calc in the process list and observe that it was not assigned the lowest available PID. If your system is like mine, you have PID zero assigned to the System Idle Process (not really a process but it gets a number anyway), and PID 4 assigned to the System process (again, not really a process but it gets a number anyway), and then you have a pretty big gap before the next process ID (for me, it's 372). And yet Calc was given a process ID in the 2000's. Proof by counterexample that the system does not assign PIDs smallest-available-first.

    So if they aren't assigned smallest-available-first, what's to prevent one from having a process ID of 4000000000?

    (Advanced readers may note that kernel stacks do all share a single address space, but even in that case, a thread that doesn't exist doesn't have a stack. And it's clear that Igor was referring to user-mode stacks since he talked about 1MB stack commits, a value which applies to user mode and not kernel mode.)

    Just for fun, I tried to see how high I could get my process ID.

    #include <windows.h>
    int __cdecl _tmain(int argc, TCHAR **argv)
     DWORD dwPid = 0;
     TCHAR szSelf[MAX_PATH];
     GetModuleFileName(NULL, szSelf, MAX_PATH);
     int i;
     for (i = 0; i < 10000; i++) {
      STARTUPINFO si = { 0 };
      if (!CreateProcess(szSelf, TEXT("Bogus"),
            &si, &pi)) break;
      TerminateProcess(pi.hProcess, 0);
      // intentionally leak the process handle so the
      // process object is not destroyed
      // CloseHandle(pi.hProcess); // leak
      if (dwPid < pi.dwProcessId) dwPid = pi.dwProcessId;
     _tprintf(_TEXT("\nCreated %d processes, ")
              _TEXT("highest pid seen was %d\n"), i, dwPid);
     _fgetts(szSelf, MAX_PATH, stdin);
     return 0;

    In order to get the program to complete before I got bored, I ran it on a Windows 2000 virtual machine with 128MB of memory. It finally conked out at 5245 processes with a PID high water mark of 21776. Along the way, it managed to consume 2328KB of non-paged pool, 36KB of paged pool, and 36,092KB of commit. If you divide this by the number of processes, you'll see that a terminated process consumes about 450 bytes of non-paged pool, a negligible amount of paged pool, and 6KB of commit. (The commit is probably left over page tables and other detritus.) I suspect commit is the limiting factor in the number of processes.

    I ran the same program on a Windows 7 machine with 1GB of RAM, and it managed to create all 10,000 processes with a high process ID of 44264. I cranked the loop limit up to 65535, and it still comfortably created 65535 processes with a high process Id of 266,232, easily exceeding the limit of 262,144 that Igor calculated.

    I later learned that the Windows NT folks do try to keep the numerical values of process ID from getting too big. Earlier this century, the kernel team experimented with letting the numbers get really huge, in order to reduce the rate at which process IDs get reused, but they had to go back to small numbers, not for any technical reasons, but because people complained that the large process IDs looked ugly in Task Manager. (One customer even asked if something was wrong with his computer.)

    That's not saying that the kernel folks won't go back and try the experiment again someday. After all, they managed to get rid of the dispatcher lock. Who knows what other crazy things will change next? (And once they get process IDs to go above 65535—like they were in Windows 95, by the way—or if they decided to make process IDs no longer multiples of 4 in order to keep process IDs low, this guy's program will stop working, and it'll be Microsoft's fault.)

  • The Old New Thing

    My, what strange NOPs you have!


    While cleaning up my office, I ran across some old documents which reminded me that there are a lot of weird NOP instructions in Windows 95.

    Certain early versions of the 80386 processor (manufactured prior to 1987) are known as B1 stepping chips. These early versions of the 80386 had some obscure bugs that affected Windows. For example, if the instruction following a string operation (such as movs) uses opposite-sized addresses from that in the string instruction (for example, if you performed a movs es:[edi], ds:[esi] followed by a mov ax, [bx]) or if the following instruction accessed an opposite-sized stack (for example, if you performed a movs es:[edi], ds:[esi] on a 16-bit stack, and the next instruction was a push), then the movs instruction would not operate correctly. There were quite a few of these tiny little "if all the stars line up exactly right" chip bugs.

    Most of the chip bugs only affected mixed 32-bit and 16-bit operations, so if you were running pure 16-bit code or pure 32-bit code, you were unlikely to encounter any of them. And since Windows 3.1 did very little mixed-bitness programming (user-mode code was all-16-bit and kernel-mode code was all-32-bit), these defects didn't really affect Windows 3.1.

    Windows 95, on the other hand, contained a lot of mixed-bitness code since it was the transitional operating system that brought people using Windows out of the 16-bit world into the 32-bit world. As a result, code sequences that tripped over these little chip bugs turned up not infrequently.

    An executive decision had to be made whether to continue supporting these old chips or whether to abandon them. A preliminary market analysis of potential customers showed that there were enough computers running old 80386 chips to be worth making the extra effort to support them.

    Everybody who wrote assembly language code was alerted to the various code sequences that would cause problems on a B1 stepping, so that they wouldn't generate those code sequences themselves, and so they could be on the lookout for existing code that might have problems. To supplement the manual scan, I wrote a program that studied all the Windows 95 binaries trying to find these troublesome code sequences. When it brought one to my attention, I studied the offending code, and if I agreed with the program's assessment, I notified the developer who was responsible for the component in question.

    In nearly all cases, the troublesome code sequences could be fixed by judicious insertion of NOP statements. If the problem was caused by "instruction of type X followed by instruction of type Y", then you can just insert a NOP between the two instructions to "break up the party" and sidestep the problem. Sometimes, the standard NOP would end up classified as an instruction of type Y, so you had to insert a special kind of NOP, one that was not of type Y.

    For example, here's one code sequence from a function which does color format conversion:

            push    si          ; borrow si temporarily
            ; build second 4 pixels
            movzx   si, bl
            mov     ax, redTable[si]
            movzx   si, cl
            or      ax, blueTable[si]
            movzx   si, dl
            or      ax, greenTable[si]
            shl     eax, 16     ; move pixels to high word
            ; build first 4 pixels
            movzx   si, bh
            mov     ax, redTable[si]
            movzx   si, ch
            or      ax, blueTable[si]
            movzx   si, dh
            or      ax, greenTable[si]
            pop     si
            stosd   es:[edi]    ; store 8 pixels
            db      67h, 90h    ; 32-bit NOP fixes stos (B1 stepping)
            dec     wXE

    Note that we couldn't use just any old NOP; we had to use a NOP with a 32-bit address override prefix. That's right, this isn't just a regular NOP; this is a 32-bit NOP.

    From a B1 stepping-readiness standpoint, the folks who wrote in C had a little of the good news/bad news thing going. The good news is that the compiler did the code generation and you didn't need to worry about it. The bad news is that you also were dependent on the compiler writers to have taught their code generator how to avoid these B1 stepping pitfalls, and some of them were quite subtle. (For example, there was one bug that manifested itself in incorrect instruction decoding if a conditional branch instruction had just the right sequence of taken/not-taken history, and the branch instruction was followed immediately by a selector load, and one of the first two instructions at the destination of the branch was itself a jump, call, or return. The easy workaround: Insert a NOP between the branch and the selector load.)

    On the other hand, some quirks of the B1 stepping were easy to sidestep. For example, the B1 stepping did not support virtual memory in the first 64KB of memory. Fine, don't use virtual memory there. If virtual memory was enabled, if a certain race condition was encountered inside the hardware prefetch, and if you executed a floating point coprocessor instruction that accessed memory at an address in the range 0x800000F8 through 0x800000FF, then the CPU would end up reading from addresses 0x000000F8 through 0x0000000FF instead. This one was easy to work around: Never allocate valid memory at 0x80000xxx. Another reason for the no man's land in the address space near the 2GB boundary.

    I happened to have an old computer with a B1 stepping in my office. It ran slowly, but it did run. I think the test team "re-appropriated" the computer for their labs so they could verify that Windows 95 still ran correctly on a computer with a B1 stepping CPU.

    Late in the product cycle (after Final Beta), upper management reversed their earlier decision and decide not to support the B1 chip after all. Maybe the testers were finding too many bugs where other subtle B1 stepping bugs were being triggered. Maybe the cost of having to keep an eye on all the source code (and training/retraining all the developers to be aware of B1 issues) exceeded the benefit of supporting a shrinking customer base. For whatever reason, B1 stepping support was pulled, and customers with one of these older chips got an error message when they tried to install Windows 95. And just to make it easier for the product support people to recognize this failure, the error code for the error message was Error B1.

Page 1 of 3 (22 items) 123