February, 2012

  • The Old New Thing

    When does an icon handler shell extension get unloaded?

    • 5 Comments

    A customer had a question about the SHGet­File­Info function. They used the function to obtain the icon for a file, and they discovered that when they asked for the icon of a particular type of file, the shell extension for the associated application was loaded.

    But unfortunately the third party shell extension is not getting unloaded, maybe because of a bug. Can we do anything in code to get this shell extension to unload?

    You already know everything you need to answer this.

    Shell extensions are COM objects, and icon handlers are in-process local servers, and in-process local servers remain loaded until the apartment is torn down or the application calls Co­Free­Unused­Libraries (or a moral equivalent).

    Therefore, their application can follow the standard COM pattern of calling Co­Free­Unused­Libraries every so often (say, after being idle for five minutes or when there is some indication that the user has stopped one task and started another). The shell extension will then be asked whether it is safe to unload, and if it responds in the affirmative, then COM will unload it.

  • The Old New Thing

    The awesome Valentine's Day gift disguised as an uncreative one

    • 37 Comments

    A few years ago, one of my colleagues wanted to surprise his wife with a new laptop for Valentine's Day. (As a bonus, he set the wallpaper to one of their wedding pictures.) Now, he could just give her a neatly wrapped laptop, but he wanted this one to be a super-surprise.

    First, he bought a large box of chocolates. He then carefully opened the box (preserving the bow and other wrapping), removed the chocolates and put the laptop inside, using a smaller box of chocolates to act as packing material. He then put the cover back on the box of chocolates and restored the box to its original unopened appearance.

    As a final step, he took the completed package to a local grocery store, explained what he was doing to the deli manager, and asked if they would be so kind as to re-wrap the box in shrink wrap to complete the deception. The manager was suitably touched by his story and was happy to help.

    On Valentine's Day morning, he put the large box of chocolates on his wife's chair. She woke up, wandered groggily into the room, saw the box, and said, "Whoa, that's a lot of chocolate." It took some encouragement to get her to open the box (seeing as she hadn't had her morning cup of coffee yet), but when she did and saw the laptop, she just stared at it in shock, saying, "What? ... No, what?"

    In case you couldn't figure it out, his wife was taken totally by surprise and was completely thrilled.

    And that's how my colleague surprised his wife with a new laptop for Valentine's Day. He makes the rest of us look bad.

    Related: iPad frozen into slab of chocolate, delivered to unsuspecting wife.

    Bonus: The story from his wife's point of view.

  • The Old New Thing

    Why did the Windows 95 Start button have a secret shortcut for closing it?

    • 23 Comments

    Windows 95 had a strange feature where, if you put focus on the Start button and then hit Alt+- (That's Alt and the hyphen key), you got a system menu for the Start button which let you close it, and then the Start button vanished. Programmerman wondered why this existed.

    This was not a feature; it was just a bug. The person who first wrote up the code for the Start button accidentally turned on the WS_SYS­MENU style. If you turn this style on for a child window, Windows assigns your child window a system menu. System menus for child windows may sound strange, but they are actually quite normal if you are an MDI application. And the standard hotkey for calling up the system menu of a child window is Alt+-.

    The Start button was not an MDI application, but since the WS_SYS­MENU style was set, Windows treated it like one, and when you pressed the hotkey, you got the system menu which let you close the window. (You could also move it, which was also kind of weird.)

    Let's add a button with an accidental system menu to our scratch program:

    BOOL
    OnCreate(HWND hwnd, LPCREATESTRUCT lpcs)
    {
        g_hwndChild = CreateWindow(
            TEXT("Button"),
            TEXT("Start"),
            WS_CHILD | WS_VISIBLE | WS_CLIPSIBLINGS | WS_SYSMENU |
            BS_PUSHBUTTON | BS_CENTER | BS_VCENTER,
            0, 0, 0, 0, hwnd, (HMENU)1, g_hinst, 0);
        return TRUE;
    }
    

    Run this program, put focus on the button, and hit Alt+-. Hey look, a child window system menu.

    To fix this bug, remove the WS_SYS­MENU style. That's how the Explorer folks fixed it.

  • The Old New Thing

    Fancy use of exception handling in FormatMessage leads to repeated "discovery" of security flaw

    • 24 Comments

    Every so often, somebody "discovers" an alleged security vulnerability in the Format­Message function. You can try it yourself:

    #include <windows.h>
    #include <stdio.h>
    
    char buf[2048];
    char extralong[128*1024];
    
    int __cdecl main(int argc, char **argv)
    {
     memset(extralong, 'x', 128 * 1024 - 1);
     DWORD_PTR args[] = { (DWORD_PTR)extralong };
     FormatMessage(FORMAT_MESSAGE_FROM_STRING |
                   FORMAT_MESSAGE_ARGUMENT_ARRAY, "%1", 0, 0,
                   buf, 2048, (va_list*)args);
     return 0;
    } 
    

    If you run this program under the debugger and you tell it to break on all exceptions, then you will find that it breaks on an access violation trying to write to an invalid address.

    eax=00060078 ebx=fffe0001 ecx=0006fa34 edx=00781000 esi=0006fa08 edi=01004330
    eip=77f5b279 esp=0006f5ac ebp=0006fa1c iopl=0         nv up ei pl nz na pe cy
    cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00010203
    ntdll!fputwc+0x14:
    77f5b279 668902           mov     [edx],ax              ds:0023:00781000=????
    

    Did you just find a buffer overflow security vulnerability?

    The FormatMessage function was part of the original Win32 interface, back in the days when you had lots of address space (two whole gigabytes) but not a lot of RAM (12 megabytes, or 16 if you were running Server). The implementation of FormatMessage reflects this historical reality by working hard to conserve RAM but not worrying too much about conserving address space. And it takes advantage of this fancy new structured exception handling feature.

    The FormatMessage uses the reserve a bunch of address space but commit pages only as they are necessary pattern, illustrated in MSDN under the topic Reserving and Committing Memory. Except that the sample code on that page contains serious errors. For example, if the sample code encounters an exception other than STATUS_ACCESS_VIOLATION, it still "handles" it by doing nothing and returning EXCEPTION_EXECUTE_HANDLER. It fails to handle random access to the buffer or access violations caused by DEP. Though in the very specific sample, it mostly works since the protected region does only one thing, so there aren't many opportunities for the other types of exceptions to occur. (Though if you're really unlucky, you might get an STATUS_IN_PAGE_ERROR.) But enough complaining about that sample.

    The FormatMessage function reserves 64KB of address space, commits the first page, and then calls an internal helper function whose job it is to generate the output, passing the start of the 64KB block of address space as the starting address and telling it to give up when it reaches 64KB. Something like this:

    struct DEMANDBUFFER
    {
      void *Base;
      SIZE_T Length;
    };
    
    int
    PageFaultExceptionFilter(DEMANDBUFFER *Buffer,
                             EXCEPTION_RECORD ExceptionRecord)
    {
      int Result;
    
      DWORD dwLastError = GetLastError();
    
      // The only exception we handle is a continuable read/write
      // access violation inside our demand-commit buffer.
      if (ExceptionRecord->ExceptionFlags & EXCEPTION_NONCONTINUABLE)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->NumberParameters < 2)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->ExceptionInformation[0] &
          ~(EXCEPTION_READ_FAULT | EXCEPTION_WRITE_FAULT))
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->ExceptionInformation[1] -
          (ULONG_PTR)Buffer->Base >= Buffer->Length)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else {
        // If the memory is already committed, then committing memory won't help!
        // (The problem is something like writing to a read-only page.)
        void *ExceptionAddress = (void*)ExceptionInformation[1];
        MEMORY_BASIC_INFORMATION Information;
        if (VirtualQuery(ExceptionAddress, &Information,
                         sizeof(Information)) != sizeof(Information))
          Result = EXCEPTION_CONTINUE_SEARCH;
    
        else if (Information.State != MEM_RESERVE)
          Result = EXCEPTION_CONTINUE_SEARCH;
    
        // Okay, handle the exception by committing the page.
        // Exercise: What happens if the faulting memory access
        // spans two pages?
        else if (!VirtualAlloc(ExceptionAddress, 1, MEM_COMMIT, PAGE_READWRITE))
          Result = EXCEPTION_CONTINUE_SEARCH;
    
        // We successfully committed the memory - retry the operation
        else Result = EXCEPTION_CONTINUE_EXECUTION;
      }
    
      RestoreLastError(dwLastError);
      return Result;
    }
    
    DWORD FormatMessage(...)
    {
      DWORD Result = 0;
      DWORD Error;
      DEMANDBUFFER Buffer;
      Error = InitializeDemandBuffer(&Buffer, FORMATMESSAGE_MAXIMUM_OUTPUT);
      if (Error == ERROR_SUCCESS) {
        __try {
         Error = FormatMessageIntoBuffer(&Result,
                                         Buffer.Base, Buffer.Length, ...);
        } __except (PageFaultExceptionFilter(&Buffer,
                       GetExceptionInformation()->ExceptionRecord)) {
         // never reached - we never handle the exception
        }
      }
      if (Error == ERROR_SUCCESS) {
       Error = CopyResultsOutOfBuffer(...);
      }
      DeleteDemandBuffer(&Buffer);
      if (Result == 0) {
        SetLastError(Error);
      }
      return Result;
    }
    

    The FormatMessageIntoBuffer function takes an output buffer and a buffer size, and it writes the result to the output buffer, stopping when the buffer is full. The DEMANDBUFFER structure and the PageFaultExceptionHandler work together to create the output buffer on demand as the FormatMessageIntoBuffer function does its work.

    To make discussion easier, let's say that the FormatMessage function merely took printf-style arguments and supported only FORMAT_MESSAGE_FROM_STRING | FORMAT_MESSAGE_ALLOCATE_BUFFER.

    DWORD FormatMessageFromStringPrintfAllocateBuffer(
        PWSTR *ResultBuffer,
        PCWSTR FormatString,
        ...)
    {
      DWORD Result = 0;
      DWORD ResultString = NULL;
      DWORD Error;
      DEMANDBUFFER Buffer;
      va_list ap;
      va_start(ap, FormatString);
      Error = InitializeDemandBuffer(&Buffer, FORMATMESSAGE_MAXIMUM_OUTPUT);
      if (Error == ERROR_SUCCESS) {
        __try {
         SIZE_T MaxChars = Buffer.Length / sizeof(WCHAR);
         int i = _vsnwprintf((WCHAR*)Buffer.Base, MaxChars,
                             FormatString, ap);
         if (i < 0 || i >= MaxChars) Error = ERROR_MORE_DATA;
         else Result = i;
        } __except (PageFaultExceptionFilter(&Buffer,
                       GetExceptionInformation()->ExceptionRecord)) {
         // never reached - we never handle the exception
        }
      }
      if (Error == ERROR_SUCCESS) {
       // Exercise: Why don't we need to worry about integer overflow?
       DWORD BytesNeeded = sizeof(WCHAR) * (Result + 1);
       ResultString = (PWSTR)LocalAlloc(LMEM_FIXED, BytesNeeded);
       if (ResultBuffer) {
        // Exercise: Why CopyMemory and not StringCchCopy?
        CopyMemory(ResultString, Buffer.Base, BytesNeeded);
       } else Error = ERROR_NOT_ENOUGH_MEMORY;
      }
      DeleteDemandBuffer(&Buffer);
      if (Result == 0) {
        SetLastError(Error);
      }
      *ResultBuffer = ResultString;
      va_end(ap);
      return Result;
    }
    

    Let's run this function in our head to see what happens if somebody triggers the alleged buffer overflow by calling

    PWSTR ResultString;
    DWORD Result = FormatMessageFromStringPrintfAllocateBuffer(
                       &ResultString, L"%s", VeryLongString);
    

    After setting up the demand buffer, we call _vsnwprintf to format the output into the demand buffer, but telling it not to go past the buffer's total length. The _vsnwprintf function parses the format string and sees that it needs to copy VeryLongString to the output buffer. Let's say that the DEMANDBUFFER was allocated at address 0x00780000 on a system with 4KB pages. At the start of the copy, the address space looks like this:

    64KB
    0078
    0000

    C
    0078
    1000

    R
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    "C" stands for a committed page, "R" stands for a reserved page, and "X" stands for a page that, if accessed, would be a buffer overflow. We start copying VeryLongString into the output buffer. After copying 2048 characters, we fill the first committed page; copying character 2049 raises a page fault exception.

    64KB
    0078
    0000

    C
    0078
    1000

    R
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    This is the point at which over-eager people observe the first-chance exception, capture the register dump above, and begin writing up their security vulnerability report, cackling with glee. (Observe that in the register dump, the address we are writing to is of the form 0x####1000.)

    As with all first-chance exceptions, it goes down the exception chain. Our custom PageFaultExceptionFilter recognizes this as an access violation in a page that it is responsible for, and the page hasn't yet been committed, so it commits the page as read/write and resumes execution.

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    Copying character 2049 now succeeds, as does the copying of characters 2050 through 4096. When we hit character 4097, the cycle repeats:

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    Again, the first-chance exception is sent down the chain, our PageFaultExceptionFilter recognizes this as a page it is responsible for, and it commits the page and resumes execution.

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    C
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    If you think about it, this is exactly what the memory manager does with memory that has been allocated but not yet accessed: The memory is not present, and the moment an application tries to access it, the not-present page fault is raised, the memory manager commits the page, and then execution resumes normally. It's memory-on-demand, which is one of the essential elements of virtual memory. What's going on with the DEMANDBUFFER is that we are simulating in user mode what the memory manager does in kernel mode. (The difference is that while the memory manager takes committed memory and makes it present on demand, the DEMANDBUFFER takes reserved address space and commits it on demand.)

    The cycle repeats 13 more times, and then we reach another interesting part of the scenario:

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    C
    0078
    3000

    C
    0078
    4000

    C
    0078
    5000

    C
    0078
    6000

    C
    0078
    7000

    C
    0078
    8000

    C
    0078
    9000

    C
    0078
    A000

    C
    0078
    B000

    C
    0078
    C000

    C
    0078
    D000

    C
    0078
    E000

    C
    0078
    F000

    C
    0079
    0000

    X
    output pointer ^

    We are about to write 32768th character into the DEMANDBUFFER. Once that's done, the buffer will be completely full. One more byte and we will overflow the buffer. (Not even a wafer-thin byte will fit.)

    Let's write that last character and cover our ears in anticipation.

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    C
    0078
    3000

    C
    0078
    4000

    C
    0078
    5000

    C
    0078
    6000

    C
    0078
    7000

    C
    0078
    8000

    C
    0078
    9000

    C
    0078
    A000

    C
    0078
    B000

    C
    0078
    C000

    C
    0078
    D000

    C
    0078
    E000

    C
    0078
    F000

    C
    0079
    0000

    X
    output pointer ^

    Oh noes! Completely full! Run for cover!

    But wait. We passed a buffer size to the _vsnwprintf function, remember? We already told it never to write more than 32768 characters. As it's about to write character 32769, it realizes, "Wait a second, this would overflow the buffer I was given. I'll return a failure code instead."

    The feared write of the 32769th character never takes place. We never write to the "X" page. Instead, the _vnswprintf call returns that the buffer was not large enough, which is converted into ERROR_MORE_DATA and returned to the caller.

    If you follow through the entire story, you see that everything worked as it was supposed to and no overflow took place. The _vnswprintf function ran up to the brink of disaster but stopped before taking that last step. This is hardly anything surprising; it happens whenever the _vnswprintf function encounters a buffer too small to hold the output. The only difference is that along the way, we saw a few first-chance exceptions, exceptions that had nothing to do with avoiding the buffer overflow in the first place. They were just part of FormatMessage's fancy buffer management.

    It so happens that in Windows Vista, the fancy buffer management technique was abandoned, and the code just allocates 64KB of memory up front and doesn't try any fancy commit-on-demand games. Computer memory has become plentiful enough that a momentary allocation of 64KB has less of an impact than it did twenty years ago, and performance measurements showed that the new "Stop trying to be so clever" technique was now about 80 times faster than the "gotta scrimp and save every last byte of memory" technique.

    The change had more than just a performance effect. It also removed the first-chance exception from FormatMessage, which means that it no longer does that thing which everybody mistakes for a security vulnerability. The good news is that nobody reports this as a vulnerability in Windows Vista any more. The bad news is that people still report it as a vulnerability in Windows XP, and each time this issue comes up, somebody (possibly me) has to sit down and reverify that the previous analysis is still correct, in the specific scenario being reported, because who knows, maybe this time they really did find a problem.

  • The Old New Thing

    What is the effect of memory-mapped file access on GetLastError()?

    • 28 Comments

    A customer was using memory-mapped files and was looking for information as to whether access to the memory-mapped data modifies the value returned by Get­Last­Error. A member of the kernel team replied, "No, memory-mapped I/O does not ever change the value returned by Get­Last­Error."

    That answer is simultaneously correct and wrong, a case of looking at the world through kernel-colored glasses.

    While it's true that the kernel does not ever change the value returned by Get­Last­Error, it's also the case that you might change it.

    If you set up an exception handler, then your exception handler might perform operations that affect the last-error code, and those changes will be visible after the exception handler returns. (This applies to all exception handlers and filters, not just ones related to memory-mapped files.)

    If you intend to return EXCEPTION_CONTINUE_EXECUTION because you handled the exception, then you probably should make sure to leave the last-error code the way you found it. Otherwise, the code that you interrupted and then resumed will have had its last-error code changed asynchronously. You just sabotaged it from above.

    // Code in italics is wrong
    
    LONG ExceptionFilter(LPEXCEPTION_POINTERS ExceptionPointers)
    {
     if (IsAnExceptionICanRepair(ExceptionPointers)) {
       RepairException(ExceptionPointers);
       // fixed up error; continuing
       return EXCEPTION_CONTINUE_EXECUTION;
     }
    
     if (IsAnExceptionICanHandle(ExceptionPointers)) {
      // We cannot repair it, but we can handle it.
      return EXCEPTION_EXECUTE_HANDLER;
     }
    
     // Not our exception. Keep looking.
     return EXCEPTION_CONTINUE_SEARCH;
    }
    

    If the Is­An­Exception­I­Can­Repair function or Repair­Exception function does anything that affects the last-error code, then when the exception filter is executed for a repairable exception, the last-error code is magically changed without the mainline code's knowledge. All the mainline code did was execute stuff normally, and somehow during a memory access or a floating point operation or some other seemingly-harmless action, the last-error code spontaneously changed!

    If you are going to continue execution at the point the exception was raised, then you need to "put things back the way you found them" (except of course for the part where you repair the exception itself).

    LONG ExceptionFilter(LPEXCEPTION_POINTERS ExceptionPointers)
    {
     PreserveLastError preserveLastError;
    
     if (IsAnExceptionICanRepair(ExceptionPointers)) {
       RepairException(ExceptionPointers);
       // fixed up error; continuing
       return EXCEPTION_CONTINUE_EXECUTION;
     }
    
     if (IsAnExceptionICanHandle(ExceptionPointers)) {
      // We cannot repair it, but we can handle it.
      return EXCEPTION_EXECUTE_HANDLER;
     }
    
     // Not our exception. Keep looking.
     return EXCEPTION_CONTINUE_SEARCH;
    }
    

    Exercise: Why isn't it important to restore the last error code if you return EXCEPTION_EXECUTE_HANDLER?

    Exercise: Is it important to restore the last error code if you return EXCEPTION_CONTINUE_SEARCH?

  • The Old New Thing

    The path-searching algorithm is not a backtracking algorithm

    • 36 Comments

    Suppose your PATH environment variable looks like this:

    C:\dir1;\\server\share;C:\dir2
    

    Suppose that you call LoadLibrary("foo.dll") intending to load the library at C:\dir2\foo.dll. If the network server is down, the LoadLibrary call will fail. Why doesn't it just skip the bad directory in the PATH and continue searching?

    Suppose the LoadLibrary function skipped the bad network directory and kept searching. Suppose that the code which called LoadLibrary("foo.dll") was really after the file \\server\share\foo.dll. By taking the server down, you have tricked the LoadLibrary function into loading c:\dir2\foo.dll instead. (And maybe that was your DLL planting attack: If you can convince the system to reject all the versions on the PATH by some means, you can then get Load­Library to look in the current directory, which is where you put your attack version of foo.dll.)

    This can manifest itself in very strange ways if the two copies of foo.dll are not identical, because the program is now running with a version of foo.dll it was not designed to use. "My program works okay during the day, but it starts returning bad data when I try to run between midnight and 3am." Reason: The server is taken down for maintenance every night, so the program ends up running with the version in c:\dir2\foo.dll, which happens to be an incompatible version of the file.

    When the LoadLibrary function is unable to contact \\server\share\foo.dll, it doesn't know whether it's in the "don't worry, I wasn't expecting the file to be there anyway" case or in the "I was hoping to get that version of the file, don't substitute any bogus ones" case. So it plays it safe and assumes it's in the "don't substitute any bogus ones" and fails the call. The program can then perform whatever recovery it deems appropriate when it cannot load its precious foo.dll file.

    Now consider the case where there is also a c:\dir1\foo.dll file, but it's corrupted. If you do a LoadLibrary("foo.dll"), the call will fail with the error ERROR_BAD_EXE_FORMAT because it found the C:\dir1\foo.dll file, determined that it was corrupted, and gave up. It doesn't continue searching the path for a better version. The path-searching algorithm is not a backtracking algorithm. Once a file is found, the algorithm commits to trying to load that file (a "cut" in logic programming parlance), and if it fails, it doesn't backtrack and return to a previous state to try something else.

    Discussion: Why does the LoadLibrary search algorithm continue if an invalid directory or drive letter is put on the PATH?

    Vaguely related chatter: No backtracking, Part One

  • The Old New Thing

    Microspeak: fit

    • 19 Comments

    In Microspeak, fit is a predicate noun which is never used on its own but always comes with a modifying adjective. For something to be a good fit is for something to be appropriate or suitable for a particular situation. The opposite of a good fit is not a bad fit, because that's pejorative. Rather, something that is not a good fit is referred to as a poor fit.

    The purpose of a previewer plug-in is to allow users to view the media without opening it. An image editing tool would not be a good fit for the previewing feature. (Alternatively, "would be a poor fit for the previewing feature.")

    To be a good fit with a particular group is to mesh well with that group's existing practices and conventions.

    The Datacenter Edition of the product is a poor fit for most small businesses.

    The group in question need not consist of people.

    The results are obtained incrementally, which makes it a good fit for IQueryable<T> and LINQ.

    Microsoft Human Resources loves to apply the concept of "fit" to people fitting into a job position.

  • The Old New Thing

    The story of the mysterious WINA20.386 file

    • 35 Comments

    matushorvath was curious about the WINA20.386 file that came with some versions of MS-DOS.

    The WINA20.386 file predates my involvement, but I was able to find some information on the Internet that explained what it was for. And it's right there in KB article Q68655: Windows 3.0 Enhanced Mode Requires WINA20.386:

    Windows 3.0 Enhanced Mode Requires WINA20.386

    Windows 3.0 enhanced mode uses a modular architecture based on what are called virtual device drivers, or VxDs. VxDs allow pieces of Windows to be replaced to add additional functionality. WINA20.386 is such a VxD. (VxDs could be called "structured" patches for Windows.)

    Windows 3.0 enhanced mode considers the state of the A20 line to be the same in all MS-DOS virtual machines (VMs). When MS-DOS is loaded in the high memory area (HMA), this can cause the machine to stop responding (hang) because of MS-DOS controlling the A20 line. If one VM is running inside the MS-DOS kernel (in the HMA) and Windows task switches to another VM in which MS-DOS turns off A20, the machine hangs when switching back to the VM that is currently attempting to execute code in the HMA.

    WINA20.386 changes the way Windows 3.0 enhanced mode handles the A20 line so that Windows treats the A20 status as local to each VM, instead of global to all VMs. This corrects the problem.

    (At the time I wrote this, a certain popular Web search engine kicks up as the top hit for the exact phrase "Windows 3.0 Enhanced Mode Requires WINA20.386" a spam site that copies KB articles in order to drive traffic. Meanwhile, the actual KB article doesn't show up in the search results. Fortunately, Bing got it right.)

    That explanation is clearly written for a technical audience with deep knowledge of MS-DOS, Windows, and the High Memory Area. matushorvath suggested that "a more detailed explanation could be interesting." I don't know if it's interesting; to me, it's actually quite boring. But here goes.

    The A20 line is a signal on the address bus that specifies the contents of bit 20 of the linear address of memory being accessed. If you aren't familiar with the significance of the A20 line, this Wikipedia article provides the necessary background.

    The High Memory Area is a 64KB-sized block of memory (really, 64KB minus 16 bytes) that becomes accessible when the CPU is in 8086 mode but the A20 line is enabled. To free up conventional memory, large portions of MS-DOS relocate themselves into the HMA. When a program calls into MS-DOS, it really calls into a stub which enables the A20 line, calls the real function in the HMA, and then disables the A20 line before returning to the program. (The value of the HMA was discovered by my colleague who also discovered the fastest way to get out of virtual-8086 mode.)

    The issue is that by default, Windows treats all MS-DOS device drivers and MS-DOS itself as global. A change in one virtual machine affects all virtual machines. This is done for compatibility reasons; after all, those old 16-bit device drivers assume that they are running on a single-tasking operating system. If you were to run a separate copy of each driver in each virtual machine, each copy would try to talk to the same physical device, and bad things would happen because each copy assumed it was the only code that communicated with that device.

    Suppose MS-DOS device drivers were treated as local to each virtual machine. Suppose you had a device driver that controlled a traffic signal, and as we all know, one of the cardinal rules of traffic signals is that you never show green in both directions. The device driver has two variables: NorthSouthColor and EastWestColor, and initially both are set to Red. The copy of the device driver running in the first virtual machine decides to let traffic flow in the north/south direction, and it executes code like this:

    if (EastWestColor != Red) {
     SetEastWestColor(Red);
    }
    SetNorthSouthColor(Green);
    

    Since both variables are initially set to Red, this code sets the north/south lights to green.

    Meanwhile, the copy of the device driver in the second virtual machine wants to let traffic flow in the east/west direction:

    if (NorthSouthColor != Red) {
     SetNorthSouthColor(Red);
    }
    SetEastWestColor(Green);
    

    Since we have a separate copy of the device driver in each virtual machine, the changes made in the first virtual machine do not affect the values in the second virtual machine. The second virtual machine sees that both variables are set to Red, so it merely sets the east/west color to green.

    On the other hand, both of these device drivers are unwittingly controlling the same physical traffic light, and it just got told to set the lights in both directions to Green.

    Oops.

    Okay, so Windows defaults drivers to global. That way, you don't run into the double-bookkeeping problem. But this causes problems for the code which manages the A20 line:

    Consider a system with two virtual machines. The first one calls into MS-DOS. The MS-DOS dispatcher enables the A20 line and calls the real function, but before the function returns, the virtual machine gets pre-empted. The second virtual machine now runs, and it too calls into MS-DOS. The MS-DOS dispatcher in the second virtual machine enables the A20 line and calls into the real function, and after the function returns, the second virtual machine disables the A20 line and returns to its caller. The second virtual machine now gets pre-empted, and the first virtual machine resumes execution. Oops: It tries to resume execution in the HMA, but the HMA is no longer there because the second virtual machine disabled the A20 line!

    The WINA20.386 driver teaches Windows that the state of the A20 should be treated as a per-virtual-machine state rather than a global state. With this new information, the above scenario does not run into a problem because the changes to the A20 line made by one virtual machine have no effect on the A20 line in another virtual machine.

    matushorvath goes on to add, "I would be very interested in how Windows 3.0 found and loaded this file. It seems to me there must have been some magic happening, e.g. DOS somehow forcing the driver to be loaded by Windows."

    Yup, that's what happened, and there's nothing secret about it. When Windows starts up, it broadcasts an interrupt. TSRs and device drivers can listen for this interrupt and respond by specifying that Windows should load a custom driver or request that certain ranges of data should be treated as per-virtual-machine state rather than global state (known to the Windows virtual machine manager as instance data). MS-DOS itself listens for this interrupt, and when Windows sends out the "Does anybody have any special requests?" broadcast, MS-DOS responds, "Yeah, please load this WINA20.386 driver."

    So there you have it, the story of WINA20.386. Interesting or boring?

  • The Old New Thing

    The compatibility constraints of error codes, episode 2

    • 22 Comments

    A customer reported an incompatibility in Windows 7: If A: is a floppy drive and they call Load­Library("A:\\foo.dll") and there is no disk in the drive, the Load­Library call fails with the error ERROR_NOT_READY. Previous versions of Windows failed with the error ERROR_MOD_NOT_FOUND.

    Both error codes are reasonable responses to the situation. "The module couldn't be found because the drive is not ready." Programs should treat a failed Load­Library as a failed library load and shouldn't be sensitive to the precise reason for the error. (They can display a more specific error to the user based on the error code, but overall program logic shouldn't depend on the error code.)

    Fortunately, the customer discovered this discrepancy during their pre-release testing and were able to accommodate this change in their program before ever releasing it. A sigh of relief from the application compatibility team.

    Episode 1.

  • The Old New Thing

    When you are looking for more information, it helps to say what you need the information for

    • 46 Comments

    It's often the case that when a question from a customer gets filtered through a customer liaison, some context gets lost. (I'm giving the customer the benefit of the doubt here and assuming that it's the customer liaison that removed the context rather than the customer who never provided it.) Consider the following request:

    We would like to know more information about the method the shell uses to resolve shortcuts.

    This is kind of a vague question. It's like asking "I'd like to know more about the anti-lock braking system in my car." There are any number of pieces of information that could be provided about the anti-lock braking system.

    • "It requires a Class C data bus."
    • "The tire position sensors are on the wheel-axis."
    • "It is connected to the brakes."
    • "It is shiny."

    When we ask the customer, "Could you be more specific what type of information you are looking for?" the response is sometimes

    We want to know everything.

    This is not a helpful clarification. Do they want to start with Maxwell's Equations and build up from there?

    As it happened, in the case of wanting more information about the method the shell uses to resolve shortcuts, they just wanted to know how to disable the search-based algorithm.

    This sort of "ask for everything and figure it out later" phenomenon is quite common. I remember another customer who wanted to know "everything" about changing network passwords, and they wouldn't be any more specific than that, so we said, "Well, you can start with these documents, perhaps paying particular attention to this one, but if they tell us what they are going to be doing with the information, we can help steer them to the specific parts that will be most useful to them."

    As it turned out, all the customer really wanted to know was "When users change their password, is the new password encrypted on the wire?"

    Third example, and then I'll stop. Another customer wanted to know everything about how Explorer takes information from the file system and displays it in an Explorer window. After asking a series of questions, we eventually figured out that they in fact didn't want or need a walkthrough of the entire code path that puts results in the Explorer window. The customer simply wanted to know why two specific folders show up in their Explorer window with names that didn't match the file system name.

    When you ask for more information, explain what you need the information for, or at least be more specific what kind of "more information" you need. That way, you save everybody lots of time. The people answering your question don't waste their time gathering information you don't need (and gathering that information can be quite time-consuming), and you don't waste your time sifting through all the information you don't want.

    You might say that these people are employing the for-if anti-pattern:

    foreach (document d in GetAllPossibleDocumentation())
    {
     if (d.Topic == "password encryption on the wire") return d;
    }
    
Page 2 of 3 (21 items) 123