• The Old New Thing

    I marked my parameter as [optional], so why do I get an RPC error when I pass NULL?

    • 7 Comments

    Consider the following interface declaration in an IDL file:

    // Code in italics is wrong
    
    interface IFoo : IUnknown
    {
        HRESULT Cancel([in, optional, string] LPCWSTR pszReason);
    };
    

    The idea here is that you want to be able to call the Cancel method as pFoo->Cancel(NULL) if you don't want to provide a reason.

    If you try this, you'll find that the call sometimes fails with error 0x800706F4, which decodes to HRESULT_FROM_WIN32(RPC_X_NULL_REF_POINTER). What's going on here?

    The optional attribute does not mean what you think it means. To a C or C++ programmer, an "optional" pointer parameter typically means that it is valid to pass NULL/nullptr as the parameter value. But that's not what it means to the IDL compiler.

    To the IDL compiler, optional parameters are hints to the scripting engine that the parameter should be passed as VT_ERROR/DISP_E_PARAM­NOT­FOUND. The attribute is meaningful only when applied to parameters of type VARIANT or VARIANT*.

    What you actually want is the unique attribute. This somewhat confusingly-named attribute means "The parameter is allowed to be a null pointer." Therefore, the interface should have been written as

    interface IFoo : IUnknown
    {
        HRESULT Cancel([in, unique, string] LPCWSTR pszReason);
    };
    

    At the lowest level in the marshaler, pointer parameters are marked as ref, unique, or ptr. ref parameters may not be null, whereas unique and ptr parameters are allowed to be null. Larry Osterman explained to me that the default for interface pointers (anything derived from IUnknown) is unique and the default for all other pointer types is ref. Therefore, if you want to say that NULL is a valid value for a non-interface pointer parameter, you must say so explicitly by annotating the parameter as [unique].

    It's probably too late to change the behavior of MIDL to reject the [optional] tag on non-VARIANT parameters because in the decades since the attribute was introduced, it's probably being used incorrectly approximately twenty-five bazillion times, and making it an error would break a lot of code. (Even if you just made it a warning, that wouldn't help because a lot of people treat warnings as errors.)

    Exercise: Why is the RPC_X_NULL_REF_POINTER error raised only sometimes?

  • The Old New Thing

    The stream pointer position in IDataObject::GetData and IDataObject::GetDataHere is significant

    • 5 Comments

    An oft-overlooked detail of the IData­Object::Get­Data and IData­Object::Get­Data­Here methods is the position of the stream pointer when the result is a stream. These rules are buried in the documentation, so I'm going to call them out louder.

    Let's look at IData­Object::Get­Data first.

    If IData­Object::Get­Data returns a stream, then the stream pointer must be positioned at the end of the stream before the stream is returned. In other words, the last thing you do before returning the stream is seek it to the end. The contents of the data object are assumed to extend from the start of the stream to the stream's position as returned by IData­Object::Get­Data. (And no, I don't know why this rule exists.)

    I messed this up in my demonstration of how to drag a stream. Let's fix it.

      case DATA_FILECONTENTS:
        pmed->tymed = TYMED_ISTREAM;
        pmed->pstm = SHOpenRegStream(HKEY_LOCAL_MACHINE,
           TEXT("Hardware\\Description\\System\\CentralProcessor\\0"),
           TEXT("~MHz"), STGM_READ);
        if (pmed->pstm) {
          LARGE_INTEGER liZero = { 0, 0 };
          pmed->pstm->Seek(liZero, STREAM_SEEK_END, NULL);
        }
        return pmed->pstm ? S_OK : E_FAIL;
      }
    

    But what if you don't know the stream size? For example, what if the stream is coming from a live download? What if the stream doesn't support seeking? What if the stream is infinite? In those cases, you don't really have a choice. You just leave the stream pointer at the beginning and hope for the best. (Fortunately, at least in the case of virtual file content, the shell is okay with people who leave the stream pointer at the start of the stream. Probably for reasons like this.)

    There is a similar detail with IData­Object::Get­Data­Here: If you are asked to produce the data into an existing stream, you should write the data starting at the stream's current position and leave the stream pointer at the end of the data you just wrote.

  • The Old New Thing

    The citizenship test is pass/fail; there's no blue ribbon for acing it

    • 31 Comments

    The civics portion of the United States citizenship test is an oral exam wherein you must correctly answer six out of ten questions. One of my friends studiously prepared for his examination, going so far as buying a CD with the questions and answers and listening to it every day during his commute to and from work.

    At last, the day arrived, and my friend went in to take his citizenship examination. The examiner led him to an office, and the two of them sat down for the test.

    "Who was President during World War II?"

    — Franklin D. Roosevelt.

    "Correct. How many justices are there on the Supreme Court?"

    — Nine.

    "Correct."

    And so on. Question 3, correct.

    Question 4, correct.

    Question 5, correct.

    Question 6, correct.

    And at that point, the examiner said, "Congratulations. You passed. There is a naturalization ceremony in two hours. Can you make it?"

    My friend was kind of surprised. Wasn't this a ten-question test? What about the other four questions?

    And then he realized: You only have to get six right. He got six right. How well he does on the remaining four questions is immaterial.

    My friend was hoping to get a perfect score of 10/10 on the test, or at least to find out whether he could get all ten right, just as a point of personal satisfaction, but of course the examiner doesn't care whether this guy can get all ten right. There's no blue ribbon for acing your citizenship test. It's pass/fail.

    Bonus chatter: My friend hung around for two hours and was naturalized that same day. He said that for something that could have been purely perfunctory (seeing as the people who work there have done this hundreds if not thousands of times), the ceremony was was quite well-done and was an emotionally touching experience.

    In case you hadn't noticed, today is Constitution Day, also known as Citizenship Day. One of the odd clauses in the legislation establishing the day of observance is that all schools which receive federal funding must "hold an educational program" on the United States Constitution on that day. This is why students at massage therapy schools and beauty schools have to watch a video of two Supreme Court justices.

  • The Old New Thing

    Poor man's comments: Inserting text that has no effect into a configuration file

    • 34 Comments

    Consider a program which has a configuration file, but the configuration file format does not have provisions for comments. Maybe the program has a "list of authorized users", where each line takes the form allow x or deny x, where x is a group or user. For example, suppose we have access_list that goes like this:

    allow payroll_department
    deny alice
    allow personnel_department
    allow bob
    

    This is the sort of file that can really use comments because people are going to want to know things like "Why does Bob have access?"

    One way of doing this is to embed the comments in the configuration file in a way that has no net effect. You can do this to add separator lines, too.

    deny !____________________________________________________________
    allow payroll_department
    deny !alice_is_an_intern_and_does_not_need_access_to_this_database
    deny alice
    deny !____________________________________________________________
    allow personnel_department
    deny !____________________________________________________________
    deny !temporary_access_for_auditor
    deny !see_service_request_31415
    deny !access_expires_on_2001_12_31
    allow bob
    

    Assuming that you don't have any users whose names begin with an exclamation point, the extra deny !... lines have no effect: They tell the system to deny access to a nonexistent user.

    Sometimes finding the format of a line that has no effect can take some creativity. For example, if you have a firewall configuration file, you might use URLs that correspond to no valid site.

    allow nobody http://example.com/PAYROLL_DEPARTMENT/--------------------
    allow alice http://contoso.com/payroll/
    allow nobody http://example.com/PURCHASING_DEPARTMENT/-----------------
    allow bob http://contoso.com/purchasing/
    allow nobody http://example.com/SPECIAL_REQUEST/-----------------------
    allow ceo http://www.youtube.com/
    

    Of course, these extra lines create work for the program, since it will sit there evaluating rules that will never apply. You may have to craft them in a way so that they have minimum cost. In the example above, we assigned the comments to a user called nobody which presumably will never try to access the Internet. We definitely didn't want to write the comment like

    allow * http://example.com/PAYROLL_DEPARTMENT/-------------------------
    

    because that would evaluate the dummy rule for every user.

    If you are willing to add a layer of process, you can tell everybody to stop editing the configuration files directly and instead edit an alternate file that gets preprocessed into a configuration file. For example, we might have access_list.commented that goes

    //////////////////////////////////////////////////////////////////
    allow payroll_deparment
    
    deny alice // payroll intern does not need access to this database.
    
    //////////////////////////////////////////////////////////////////
    allow personnel_department
    
    //////////////////////////////////////////////////////////////////
    allow bob // Temporary access for auditor, see SR 31415. Expires 2001/12/31.
    

    Everybody agrees to edit the access_list.commented file, and after each edit they run a script that sends the file through the C++ preprocessor and puts the result in the access_list file. By using the C++ preprocessor, you enable features like #include directives and #define macros.

  • The Old New Thing

    Microspeak: spend

    • 15 Comments

    Remember, Microspeak is not merely for jargon exclusive to Microsoft, but it's jargon you need to know.

    We don't encounter the term spend much in the engineering side of the company, but it's in common use among those who regularly deal with money and budgets.

    We are in line with company standards with regard to spend for this type of event.
    Q4 spend will be higher as a result of widget recolorization.
    Our corresponding spend will increase significantly if we adopt this proposal.

    From the above citations, it is apparent that the word spend is shorthand for expenditure.

    And then there's this citation:

    I'll let you know what we have available with respect to spend.

    So much for that theory. Here, spend means available budget.

    My new theory is that spend is shorthand for spending.

    This appears to be common use in business-speak:

    IT Spend Report Shows Tougher Times Ahead
    The spend is 15% more than the 100m T-Mobile allocated to marketing last year
  • The Old New Thing

    Enumerating all the ways of making change

    • 29 Comments

    Today's Little Program enumerates all the ways of making change for a particular amount given a set of available denominations.

    The algorithm is straightforward. To make change for a specific amount from a set of available denominations, you can take one denomination and decide how many of those you want to use. Then use the remaining denominations to make change for the remainder.

    For example, if the available coins have values [1, 5, 10, 25] and you want to make change for 60 cents, you first decide how many 25-cent pieces you want to use. If you use none, then you need to make change for 60 cents using just [1, 5, 10]. If you use one, then you need to make change for 35 cents using just [1, 5, 10]. And if you use two, then you need to make change for 10 cents using just [1, 5, 10].

    (We use the largest coin first to reduce the number of dead ends, like asking "Please make change for 83 cents using only 10-cent coins.")

    function MakeChange(coins, total, f) {
     if (total == 0) { f([]); return; }
     if (coins.length == 0) return;
    
     var coin = coins[coins.length - 1];
     var remaining = coins.slice(0, -1);
     var used = [];
     for (var target = total; target >= 0; target -= coin) {
      MakeChange(remaining, target, function(s) {
       f(used.concat(s));
      });
      used.push(coin);
     }
    }
    
    MakeChange([1, 5, 10, 25], 60, console.log.bind(console));
    

    First, we take care of some base cases. To make change for zero cents, we simply use zero coins. And it's not possible to make change for a nonzero amount with no coins.

    Otherwise, we take the highest denomination coin and try using zero of them, then one of them, and so on, until we exceed the total amount necessary.

    There are related problems, such as determining whether a particular amount of change can even be made, given a collection of denominations and calculating the number of ways change can be made rather than enumerating them. But I like enumeration problems.

  • The Old New Thing

    What did Windows 3.1 do when you hit Ctrl+Alt+Del?

    • 29 Comments

    This is the end of Ctrl+Alt+Del week, a week that sort of happened around me and I had to catch up with.

    The Windows 3.1 virtual machine manager had a clever solution for avoiding deadlocks: There was only one synchronization object in the entire kernel. It was called "the critical section", with the definite article because there was only one. The nice thing about a system where the only available synchronization object is a single critical section is that deadlocks are impossible: The thread with the critical section will always be able to make progress because the only thing that could cause it to stop would be blocking on a synchronization object. But there is only one synchronization object (the critical section), and it already owns that.

    When you hit Ctrl+Alt+Del in Windows 3.1, a bunch of crazy stuff happened. All this work was in a separate driver, known as the virtual reboot device. By convention, all drivers in Windows 3.1 were called the virtual something device because their main job was to virtualize some hardware or other functionality. That's where the funny name VxD came from. It was short for virtual x device.

    First, the virtual reboot device driver checked which virtual machine had focus. If you were using an MS-DOS program, then it told all the device drivers to clean up whatever they were doing for that virtual machine, and then it terminated the virtual machine. This was the easy case.

    Otherwise, the focus was on a Windows application. Now things got messy.

    When the 16-bit Windows kernel started up, it gave the virtual reboot device the addresses of a few magic things. One of those magic things was a special byte that was set to 1 every time the 16-bit Windows scheduler regained control. When you hit Ctrl+Alt+Del, the virtual reboot device set the byte to 0, and it also registered a callback with the virtual machine manager to say "Call me back once the critical section becomes available." The callback didn't do anything aside from remember the fact that it was called at all. And then the code waited for ¾ seconds. (Why ¾ seconds? I have no idea.)

    After ¾ seconds, the virtual reboot device looked to see what the state of the machine was.

    If the "call me back once the critical section becomes available" callback was never called, then the problem is that a device driver is stuck in the critical section. Maybe the device driver put an Abort, Retry, Ignore message on the screen that the user needs to respond to. The user saw this message:


     Procomm 


    This background non-Windows application is not responding.

    *  Press any key to activate the non-Windows application.
    *  Press CTRL+ALT+DEL again to restart your computer. You will
       lose any unsaved information.


      Press any key to continue _

    After the user presses a key, focus was placed on the virtual machine that holds the critical section so the user can address the problem. A user who is still stuck can hit Ctrl+Alt+Del again to restart the whole process, and this time, execution will go into the "If you were using an MS-DOS program" paragraph, and the code will shut down the stuck virtual machine.

    If the critical section was not the problem, then the virtual reboot device checked if the 16-bit kernel scheduler had set the byte to 1 in the meantime. If so, then it means that no applications were hung, and you got the message


     Windows 


    Although you can use CTRL+ALT+DEL to quit an application that has stopped responding to the system, there is no application in this state.

    To quit an application, use the application's quit or exit command, or choose the Close command from the Control menu.

    *  Press any key to return to Windows.
    *  Press CTRL+ALT+DEL again to restart your computer. You will
       lose any unsaved information in all applications.


      Press any key to continue _

    (Anachronism alert: The System menu was called the Control menu back then.)

    Otherwise, the special byte was still 0, which means that the 16-bit scheduler never got control, which meant that a 16-bit Windows application was not releasing control back to the kernel. The virtual reboot device then waited for the virtual machine to finish processing any pending virtual interrupts. (This allowed any pending MS-DOS emulation or 16-bit MS-DOS device drivers to finish up their work.) If things did not return to this sane state within 3¼ seconds, then you got this screen:


     Windows 


    The system is either busy or has become unstable. You can wait and see if the system becomes available again and continue working or you can restart your computer.

    *  Press any key to return to Windows and wait.
    *  Press CTRL+ALT+DEL again to restart your computer. You will
       lose any unsaved information in all applications.


      Press any key to continue _

    Otherwise, we are in the case where the system returned to a state where there are no active virtual interrupts. The kernel single-stepped the processor if necessary until the instruction pointer was no longer in the kernel, or until it had single-stepped for 5000 instructions and the instruction pointer was not in the heap manager. (The heap manager was allowed to run for more than 5000 instructions.)

    At this point, you got the screen that Steve Ballmer wrote.



    Contoso Deluxe Music Composer


      This Windows application has stopped responding to the system.

      *  Press ESC to cancel and return to Windows.
      *  Press ENTER to close this application that is not responding.
         You will lose any unsaved information in this application.
      *  Press CTRL+ALT+DEL again to restart your computer. You will
         lose any unsaved information in all applications.


    If you hit Enter, then the 16-bit kernel terminated the application by doing mov ax, 4c00h followed by int 21h, which was the system call that applications used to exit normally. This time, the kernel is making the exit call on behalf of the stuck application. Everything looks like the application simply decided to exit normally.

    The stuck application exits, the kernel regains control, and hopefully, things return to normal.

    I should point out that I didn't write any of this code. "It was like that when I got here."

    Bonus chatter: There were various configuration settings to tweak all of the above behavior. For example, you could say that Ctrl+Alt+Del always restarted the computer rather than terminating the current application. Or you could skip the check whether the 16-bit kernel scheduler had set the byte to 1 so that you could use Ctrl+Alt+Del to terminate an application even if it wasn't hung.¹ There was also a setting to restart the computer upon receipt of an NMI, the intention being that the signal would be triggered either by a dedicated add-on switch or by poking a ball-point pen in just the right spot. (This is safer than just pushing the reset button because the restart would flush disk caches and shut down devices in an orderly manner.)

    ¹ This setting was intended for developers to assist in debugging their programs because if you went for this option, the program that got terminated is whichever one happened to have control of the CPU at the time you hit Ctrl+Alt+Del. This was, in theory, random, but in practice it often guessed right. That's because the problem was usually that a program got wedged into an infinite message loop, so most of the CPU was being run in the stuck application anyway.

  • The Old New Thing

    A lie repeated often enough becomes the truth: The continuing saga of the Windows 3.1 blue screen of death (which, by the way, was never called that)

    • 36 Comments

    HN has been the only major site to report the history of the Windows 3.1 Ctrl+Alt+Del dialog correctly. But it may have lost that title due to this comment thread.

    I read here that Steve Ballmer wrote part of [the blue screen of death] too.

    The comment linked to one of may articles that erroneously reported that Steve wrote the blue screen of death.

    Somebody replied,

    Seriously?

    linking back to my article where I set the record straight.

    Undeterred, the original commenter wrote,

    LOL! so far only MSDN has been refuting the claim.

    and linked to two technology sites which reported the story incorrectly.

    Just goes to show that a lie repeated often enough becomes the truth.

    Oh, and by the way, the phrase "blue screen of death" did not really apply to the blue screen messages in Windows 3.1 or Windows 95. As we saw earlier, the Windows 3.1 fatal error message was a black screen of death, and in Windows 95, the blue screen message was more a screen of profound injury rather than death. (Windows 95 was sort of like the Black Knight, trying very hard to continue the fight despite having lost all of its limbs.)

    The phrase "blue screen of death" was generally attributed to the blue screen fatal error message of Windows NT. In Windows 95, we just called them "blue screen messages", without the "of death".

    I didn't expect this to become "blue screen week", though that's sort of what it turned into.

  • The Old New Thing

    The history of Win32 critical sections so far

    • 19 Comments

    The CRITICAL_SECTION structure has gone through a lot of changes since its introduction back oh so many decades ago. The amazing thing is that as long as you stick to the documented API, your code is completely unaffected.

    Initially, the critical section object had an owner field to keep track of which thread entered the critical section, if any. It also had a lock count to keep track of how many times the owner thread entered the critical section, so that the critical section would be released when the matching number of Leave­Critical­Section calls was made. And there was an auto-reset event used to manage contention. We'll look more at that event later. (It's actually more complicated than this, but the details aren't important.)

    If you've ever looked at the innards of a critical section (for entertainment purposes only), you may have noticed that the lock count was off by one: The lock count was the number of active calls to Enter­Critical­Section minus one. The bias was needed because the original version of the interlocked increment and decrement operations returned only the sign of the result, not the revised value. Biasing the result by 1 means that all three states could be detected: Unlocked (negative), locked exactly once (zero), reentrant lock (positive). (It's actually more complicated than this, but the details aren't important.)

    If a thread tries to enter a critical section but can't because the critical section is owned by another thread, then it sits and waits on the contention event. When the owning thread releases all its claims on the critical section, it signals the event to say, "Okay, the door is unlocked. The next guy can come in."

    The contention event is allocated only when contention occurs. (This is what older versions of MSDN meant when they said that the event is "allocated on demand.") Which leads to a nasty problem: What if contention occurs, but the attempt to create the contention event fails? Originally, the answer was "The kernel raises an out-of-memory exception."

    Now you'd think that a clever program could catch this exception and try to recover from it, say, by unwinding everything that led up to the exception. Unfortunately, the weakest link in the chain is the critical section object itself: In the original version of the code, the out-of-memory exception was raised while the critical section was in an unstable state. Even if you managed to catch the exception and unwind everything you could, the critical section was itself irretrievably corrupted.

    Another problem with the original design became apparent on multiprocessor systems: If a critical section was typically held for a very brief time, then by the time you called into kernel to wait on the contention event, the critical section was already freed!

    void SetGuid(REFGUID guid)
    {
     EnterCriticalSection(&g_csGuidUpdate);
     g_theGuid = guid;
     LeaveCriticalSection(&g_csGuidUpdate);
    }
    
    void GetGuid(GUID *pguid)
    {
     EnterCriticalSection(&g_csGuidUpdate);
     *pguid = g_theGuid;
     LeaveCriticalSection(&g_csGuidUpdate);
    }
    

    This imaginary code uses a critical section to protect accesses to a GUID. The actual protected region is just nine instructions long. Setting up to wait on a kernel object is way, way more than nine instructions. If the second thread immediately waited on the critical section contention event, it would find that by the time the kernel got around to entering the wait state, the event would say, "Dude, what took you so long? I was signaleded, like, a bazillion cycles ago!"

    Windows 2000 added the Initialize­Critical­Section­And­Spin­Count function, which lets you avoid the problem where waiting for a critical section costs more than the code the critical section was protecting. If you initialize with a spin count, then when a thread tries to enter the critical section and can't, it goes into a loop trying to enter it over and over again, in the hopes that it will be released.

    "Are we there yet? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now? How about now?"

    If the critical section is not released after the requested number of iterations, then the old slow wait code is executed.

    Note that spinning on a critical section is helpful only on multiprocessor systems, and only in the case where you know that all the protected code segments are very short in duration. If the critical section is held for a long time, then spinning is wasteful since the odds that the critical section will become free during the spin cycle are very low, and you wasted a bunch of CPU.

    Another feature added in Windows 2000 is the ability to preallocate the contention event. This avoids the dreaded "out of memory" exception in Enter­Critical­Section, but at a cost of a kernel event for every critical section, whether actual contention occurs or not.

    Windows XP solved the problem of the dreaded "out of memory" exception by using a fallback algorithm that could be used if the contention event could not be allocated. The fallback algorithm is not as efficient, but at least it avoids the "out of memory" situation. (Which is a good thing, because nobody really expects Enter­Critical­Section to fail.) This also means that requests for the contention event to be preallocated are now ignored, since the reason for preallocating (avoiding the "out of memory" exception) no longer exists.

    (And while they were there, the kernel folks also fixed Initialize­Critical­Section so that a failed initialization left the critical section object in a stable state so you could safely try again later.)

    In Windows Vista, the internals of the critical section object were rejiggered once again, this time to add convoy resistance. The internal bookkeeping completely changed; the lock count is no longer a 1-biased count of the number of Enter­Critical­Section calls which are pending. As a special concession to backward compatibility with people who violated the API contract and looked directly at the internal data structures, the new algorithm goes to some extra effort to ensure that if a program breaks the rules and looks at a specific offset inside the critical section object, the value stored there is −1 if and only if the critical section is unlocked.

    Often, people will remark that "your compatibility problems would go away if you just open-sourced the operating system." I think there is some confusion over what "go away" means. If you release the source code to the operating system, it makes it even easier for people to take undocumented dependencies on it, because they no longer have the barrier of "Well, I can't find any documentation, so maybe it's not documented." They can just read the source code and say, "Oh, I see that if the critical section is unlocked, the Lock­Count variable has the value −1." Boom, instant undocumented dependency. Compatibility is screwed. (Unless what people are saying "your compatibility problems would go away if you just open-sourced all applications, so that these problems can be identified and fixed as soon as they are discovered.")

    Exercise: Why isn't it important that the fallback algorithm be highly efficient?

  • The Old New Thing

    I wrote the original blue screen of death, sort of

    • 45 Comments

    We pick up the story with Windows 95. As I noted, the blue Ctrl+Alt+Del dialog was introduced in Windows 3.1, and in Windows 95; it was already gone. In Windows 95, hitting Ctrl+Alt+Del called up a dialog box that looked something like this:

    Close Program × 
    Explorer
    Contoso Deluxe Composer [not responding]
    Fabrikam Chart 2.0
    LitWare Chess Challenger
    Systray
    WARNING: Pressing CTRL+ALT+DEL again will restart your computer. You will lose unsaved information in all programs that are running.
    End Task
    Shut Down
    Cancel

    (We learned about Systray some time ago.)

    Whereas Windows 3.1 responded to fatal errors by crashing out to a black screen, Windows 95 switched to showing severe errors in blue. And I'm the one who wrote it. Or at least modified it last.

    I was responsible for the code that displayed blue screen messages: Asking the kernel-mode video driver to switch into text mode, filling the screen with a blue background, drawing white text, waiting for the user to press a key, restoring the screen to its original contents, and reporting the user's response back to the component that asked to display the message.¹

    When a device driver crashed, Windows 95 tried its best to limp along despite a catastrophic failure in a kernel-mode component. It wasn't a blue screen of death so much as a blue screen of lameness. For those fortunate never to have seen one, it looked like this:



     Windows 


    An exception 0D has occurred at 0028:80014812. This was called from 0028:80014C34. It may be possible to continue normally.

    * Press any key to attempt to continue.
    * Press CTRL+ALT+DEL to restart your computer. You will
      lose any unsaved information in all applications.

    Note the optimistic message "It may be possible to continue normally." Everybody forgets that after Windows 95 showed you a blue screen error, it tried its best to ignore the error and keep running anyway. I mean, sure your scanner driver crashed, so scanning doesn't work any more, but the rest of the system seems to be okay.

    (Imagine if you did that today. "Press any key to ignore this kernel panic.")

    Technically, what happened was that the virtual machine manager abandoned the event currently in progress and returned to the event dispatcher. It's the kernel-mode equivalent to swallowing exceptions in window procedures and returning to the message loop. If there was no event in progress, then the current application was terminated.

    Sometimes the problem was global, and abandoning the current event or terminating the application did nothing to solve the problem; all that happened was that the next event or application to run encountered the same problem, and you got another blue screen message a few milliseconds later. After about a half dozen of these messages, you most likely gave up hope and hit Ctrl+Alt+Del.

    Now, that's what the message looked like originally, but that message had a problem: Since the addresses at which device drivers were loaded into the kernel were not predictable, having the raw address didn't really tell you much. If you were someone who was told, "This senior executive got this crash message, can you figure out what happened?", all you had to work with was a bunch of mostly useless numbers.

    That someone might have been me.

    To help with this problem, I tweaked the message to include the name of the driver, the section number, and the offset within the section.



     Windows 


    An exception 0D has occurred at 0028:80014812 in VxD CONTOSO(03) + 00000152. This was called from 0028:80014C34 in VxD CONTOSO(03) + 00000574. It may be possible to continue normally.

    * Press any key to attempt to continue.
    * Press CTRL+ALT+DEL to restart your computer. You will
      lose any unsaved information in all applications.

    Now you had the name of the driver that crashed, which might give you a clue of where the problem is, even if you knew nothing else. And somebody with access to a MAP file for the driver could now look up the address and identify which line crashed. Not great, but better than nothing, and before I made this change, nothing is what you had.

    So you could say that I wrote the Windows 95 blue screen of death lameness to make my own life easier.

    Bonus chatter: Later, someone (I forget whether it was me, so let's say it was one of my colleagues) added some more code to inspect the crashing address, and if it was inside the kernel heap manager, the message changed slightly:



     Windows 


    A 32-bit device driver has corrupted critical system memory, resulting in an exception 0D at 0028:80001812 in VxD VMM(01) + 00001812. This was called from 0028:80014C34 in VxD CONTOSO(03) + 00000575.

    * Press any key to attempt to continue.
    * Press CTRL+ALT+DEL to restart your computer. You will
      lose any unsaved information in all applications.

    In this case, the sentence "It may be possible to continue normally" disappeared. Because we knew that, odds are, it won't be.

    Bonus chatter: Nice job, Slashdot. You considered posting a correction, but your correction was also wrong. At least you realized your mistake.

    ¹ Since this code ran in the kernel, it didn't have access to keyboard layout information. It doesn't know that if you are using the Chinese-Bopomofo keyboard layout, then the way to type "OK" is to press C, followed by L, followed by 3. Not that it would help, because there is no IME in the kernel anyway. As much as possible, the responses were mapped to language-independent keys like Enter and ESC.

Page 1 of 431 (4,310 items) 12345»