February, 2012

  • The Old New Thing

    Why did the Windows 95 Start button have a secret shortcut for closing it?

    • 23 Comments

    Windows 95 had a strange feature where, if you put focus on the Start button and then hit Alt+- (That's Alt and the hyphen key), you got a system menu for the Start button which let you close it, and then the Start button vanished. Programmerman wondered why this existed.

    This was not a feature; it was just a bug. The person who first wrote up the code for the Start button accidentally turned on the WS_SYS­MENU style. If you turn this style on for a child window, Windows assigns your child window a system menu. System menus for child windows may sound strange, but they are actually quite normal if you are an MDI application. And the standard hotkey for calling up the system menu of a child window is Alt+-.

    The Start button was not an MDI application, but since the WS_SYS­MENU style was set, Windows treated it like one, and when you pressed the hotkey, you got the system menu which let you close the window. (You could also move it, which was also kind of weird.)

    Let's add a button with an accidental system menu to our scratch program:

    BOOL
    OnCreate(HWND hwnd, LPCREATESTRUCT lpcs)
    {
        g_hwndChild = CreateWindow(
            TEXT("Button"),
            TEXT("Start"),
            WS_CHILD | WS_VISIBLE | WS_CLIPSIBLINGS | WS_SYSMENU |
            BS_PUSHBUTTON | BS_CENTER | BS_VCENTER,
            0, 0, 0, 0, hwnd, (HMENU)1, g_hinst, 0);
        return TRUE;
    }
    

    Run this program, put focus on the button, and hit Alt+-. Hey look, a child window system menu.

    To fix this bug, remove the WS_SYS­MENU style. That's how the Explorer folks fixed it.

  • The Old New Thing

    How do I find out which process has a file open?

    • 24 Comments

    Classically, there was no way to find out which process has a file open. A file object has a reference count, and when the reference count drops to zero, the file is closed. But there's nobody keeping track of which processes own how many references. (And that's ignoring the case that the reference is not coming from a process in the first place; maybe it's coming from a kernel driver, or maybe it came from a process that no longer exists but whose reference is being kept alive by a kernel driver that captured the object reference.)

    This falls into the category of not keeping track of information you don't need. The file system doesn't care who has the reference to the file object. Its job is to close the file when the last reference goes away.

    You do the same thing with your COM object reference counts. All you care about is whether your reference count has reached zero (at which point it's time to destroy the object). If you later discover an object leak in your process, you don't have a magic query "Show me all the people who called AddRef on my object" because you never kept track of all the people who called AddRef on your object. Or even, "Here's an object I want to destroy. Show me all the people who called AddRef on it so I can destroy them and get them to call Release."

    At least that was the story under the classical model.

    Enter the Restart Manager.

    The official goal of the Restart Manager is to help make it possible to shut down and restart applications which are using a file you want to update. In order to do that, it needs to keep track of which processes are holding references to which files. And it's that database that is of use here. (Why is the kernel keeping track of which processes have a file open? Because it's the converse of the principle of not keeping track of information you don't need: Now it needs the information!)

    Here's a simple program which takes a file name on the command line and shows which processes have the file open.

    #include <windows.h>
    #include <RestartManager.h>
    #include <stdio.h>
    
    int __cdecl wmain(int argc, WCHAR **argv)
    {
     DWORD dwSession;
     WCHAR szSessionKey[CCH_RM_SESSION_KEY+1] = { 0 };
     DWORD dwError = RmStartSession(&dwSession, 0, szSessionKey);
     wprintf(L"RmStartSession returned %d\n", dwError);
     if (dwError == ERROR_SUCCESS) {
       PCWSTR pszFile = argv[1];
       dwError = RmRegisterResources(dwSession, 1, &pszFile,
                                     0, NULL, 0, NULL);
       wprintf(L"RmRegisterResources(%ls) returned %d\n",
               pszFile, dwError);
      if (dwError == ERROR_SUCCESS) {
       DWORD dwReason;
       UINT i;
       UINT nProcInfoNeeded;
       UINT nProcInfo = 10;
       RM_PROCESS_INFO rgpi[10];
       dwError = RmGetList(dwSession, &nProcInfoNeeded,
                           &nProcInfo, rgpi, &dwReason);
       wprintf(L"RmGetList returned %d\n", dwError);
       if (dwError == ERROR_SUCCESS) {
        wprintf(L"RmGetList returned %d infos (%d needed)\n",
                nProcInfo, nProcInfoNeeded);
        for (i = 0; i < nProcInfo; i++) {
         wprintf(L"%d.ApplicationType = %d\n", i,
                                  rgpi[i].ApplicationType);
         wprintf(L"%d.strAppName = %ls\n", i,
                                  rgpi[i].strAppName);
         wprintf(L"%d.Process.dwProcessId = %d\n", i,
                                  rgpi[i].Process.dwProcessId);
         HANDLE hProcess = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION,
                                       FALSE, rgpi[i].Process.dwProcessId);
         if (hProcess) {
          FILETIME ftCreate, ftExit, ftKernel, ftUser;
          if (GetProcessTimes(hProcess, &ftCreate, &ftExit,
                              &ftKernel, &ftUser) &&
              CompareFileTime(&rgpi[i].Process.ProcessStartTime,
                              &ftCreate) == 0) {
           WCHAR sz[MAX_PATH];
           DWORD cch = MAX_PATH;
           if (QueryFullProcessImageNameW(hProcess, 0, sz, &cch) &&
               cch <= MAX_PATH) {
            wprintf(L"  = %ls\n", sz);
           }
          }
          CloseHandle(hProcess);
         }
        }
       }
      }
      RmEndSession(dwSession);
     }
     return 0;
    }
    

    The first thing we do is call, no wait, even before we call the Rm­Start­Session function, we have the line

     WCHAR szSessionKey[CCH_RM_SESSION_KEY+1] = { 0 };
    

    That one line of code addresses two bugs!

    First is a documentation bug. The documentation for the Rm­Start­Session function doesn't specify how large a buffer you need to pass for the session key. The answer is CCH_RM_SESSION_KEY+1.

    Second is a code bug. The Rm­­StartSession function doesn't properly null-terminate the session key, even though the function is documented as returning a null-terminated string. To work around this bug, we pre-fill the buffer with null characters so that whatever ends gets written will have a null terminator (namely, one of the null characters we placed ahead of time).

    Okay, so that's out of the way. The basic algorithm is simple:

    1. Create a Restart Manager session.
    2. Add a file resource to the session.
    3. Ask for a list of all processes affected by that resource.
    4. Print some information about each process.
    5. Close the session.

    We already mentioned that you create the session by calling Rm­Start­Session. Next, we add a single file resource to the session by calling Rm­Register­Resources.

    Now the fun begins. Getting the list of affected processes is normally a two-step affair. First, you ask for the number of affected processes (by passing 0 as the nProcInfo), then allocate some memory and call a second time to get the data. But this is just a sample program, so I've hard-coded a limit of ten processes. If more than ten processes are affected, I just give up. (You can see this if you ask for all the processes that have open handles to kernel32.dll.)

    The other tricky part is mapping the RM_PROCESS_INFO to an actual process. Since process IDs can be recycled, the RM_PROCESS_INFO structure identifies a process by the combination of the process ID and the process creation time. That combination is unique because two processes cannot have the same ID at the same time. We open the handle to the process via its ID, then confirm that the start times match. (If not, then the ID refers to a process that exited during the time we obtained the list and the time we actually looked at it.) Assuming it all matches, we get the image name and print it.

    And that's all there is to enumerating all the processes that have a particular file open. Of course, a more expressive interface for managing files in use is IFileIsInUse, which I mentioned some time ago. That interface not only tells you the application that has the file open (in a friendlier format than just an executable path), you can also use it to switch to the application and even ask it to close the file. (Windows 7 first tries IFileIsInUse, and if that fails, then it goes to the Restart Manager.)

  • The Old New Thing

    Things I've written that have amused other people, Episode 9

    • 29 Comments

    A customer liaison reported that their customer wants to be able to access their machine without needing a password. They just want to be able to net use * \\machine\share and be able to access the files right away. I guess because passwords are confusing, easy to forget, and just get in the way. Anyway, the customer discovered that they could do so on Windows XP by going to the folder they want to share, going to the Sharing tab, then clicking on the If you understand the security risks but want to share files without running the wizard link,

    and then on the Enable File Sharing dialog, clicking Just enable file sharing.

    What the customer wanted to know was if there was a way they could automate this process.

    My response to the customer liaison went like this:

    Your customer has chosen to ignore not one but two security warnings. Furthermore, since they are looking for an automated way of doing this, it sounds like they intend on deploying this "feature" to all the computers in their organization. Maybe they just enjoy being part of a botnet? Your customer is basically saying "I wish my computer to have no network security." They should at least restrict access to authenticated users. But if they if they insist on having their corporate network turned into a spam farm, they can enable the Guest account and say that it can "Access this computer from the network." Congratulations, your computers will soon be filled with malware and porn.

    That last sentence made it into some people's quotes file.

  • The Old New Thing

    When you are looking for more information, it helps to say what you need the information for

    • 46 Comments

    It's often the case that when a question from a customer gets filtered through a customer liaison, some context gets lost. (I'm giving the customer the benefit of the doubt here and assuming that it's the customer liaison that removed the context rather than the customer who never provided it.) Consider the following request:

    We would like to know more information about the method the shell uses to resolve shortcuts.

    This is kind of a vague question. It's like asking "I'd like to know more about the anti-lock braking system in my car." There are any number of pieces of information that could be provided about the anti-lock braking system.

    • "It requires a Class C data bus."
    • "The tire position sensors are on the wheel-axis."
    • "It is connected to the brakes."
    • "It is shiny."

    When we ask the customer, "Could you be more specific what type of information you are looking for?" the response is sometimes

    We want to know everything.

    This is not a helpful clarification. Do they want to start with Maxwell's Equations and build up from there?

    As it happened, in the case of wanting more information about the method the shell uses to resolve shortcuts, they just wanted to know how to disable the search-based algorithm.

    This sort of "ask for everything and figure it out later" phenomenon is quite common. I remember another customer who wanted to know "everything" about changing network passwords, and they wouldn't be any more specific than that, so we said, "Well, you can start with these documents, perhaps paying particular attention to this one, but if they tell us what they are going to be doing with the information, we can help steer them to the specific parts that will be most useful to them."

    As it turned out, all the customer really wanted to know was "When users change their password, is the new password encrypted on the wire?"

    Third example, and then I'll stop. Another customer wanted to know everything about how Explorer takes information from the file system and displays it in an Explorer window. After asking a series of questions, we eventually figured out that they in fact didn't want or need a walkthrough of the entire code path that puts results in the Explorer window. The customer simply wanted to know why two specific folders show up in their Explorer window with names that didn't match the file system name.

    When you ask for more information, explain what you need the information for, or at least be more specific what kind of "more information" you need. That way, you save everybody lots of time. The people answering your question don't waste their time gathering information you don't need (and gathering that information can be quite time-consuming), and you don't waste your time sifting through all the information you don't want.

    You might say that these people are employing the for-if anti-pattern:

    foreach (document d in GetAllPossibleDocumentation())
    {
     if (d.Topic == "password encryption on the wire") return d;
    }
    
  • The Old New Thing

    The story of the mysterious WINA20.386 file

    • 35 Comments

    matushorvath was curious about the WINA20.386 file that came with some versions of MS-DOS.

    The WINA20.386 file predates my involvement, but I was able to find some information on the Internet that explained what it was for. And it's right there in KB article Q68655: Windows 3.0 Enhanced Mode Requires WINA20.386:

    Windows 3.0 Enhanced Mode Requires WINA20.386

    Windows 3.0 enhanced mode uses a modular architecture based on what are called virtual device drivers, or VxDs. VxDs allow pieces of Windows to be replaced to add additional functionality. WINA20.386 is such a VxD. (VxDs could be called "structured" patches for Windows.)

    Windows 3.0 enhanced mode considers the state of the A20 line to be the same in all MS-DOS virtual machines (VMs). When MS-DOS is loaded in the high memory area (HMA), this can cause the machine to stop responding (hang) because of MS-DOS controlling the A20 line. If one VM is running inside the MS-DOS kernel (in the HMA) and Windows task switches to another VM in which MS-DOS turns off A20, the machine hangs when switching back to the VM that is currently attempting to execute code in the HMA.

    WINA20.386 changes the way Windows 3.0 enhanced mode handles the A20 line so that Windows treats the A20 status as local to each VM, instead of global to all VMs. This corrects the problem.

    (At the time I wrote this, a certain popular Web search engine kicks up as the top hit for the exact phrase "Windows 3.0 Enhanced Mode Requires WINA20.386" a spam site that copies KB articles in order to drive traffic. Meanwhile, the actual KB article doesn't show up in the search results. Fortunately, Bing got it right.)

    That explanation is clearly written for a technical audience with deep knowledge of MS-DOS, Windows, and the High Memory Area. matushorvath suggested that "a more detailed explanation could be interesting." I don't know if it's interesting; to me, it's actually quite boring. But here goes.

    The A20 line is a signal on the address bus that specifies the contents of bit 20 of the linear address of memory being accessed. If you aren't familiar with the significance of the A20 line, this Wikipedia article provides the necessary background.

    The High Memory Area is a 64KB-sized block of memory (really, 64KB minus 16 bytes) that becomes accessible when the CPU is in 8086 mode but the A20 line is enabled. To free up conventional memory, large portions of MS-DOS relocate themselves into the HMA. When a program calls into MS-DOS, it really calls into a stub which enables the A20 line, calls the real function in the HMA, and then disables the A20 line before returning to the program. (The value of the HMA was discovered by my colleague who also discovered the fastest way to get out of virtual-8086 mode.)

    The issue is that by default, Windows treats all MS-DOS device drivers and MS-DOS itself as global. A change in one virtual machine affects all virtual machines. This is done for compatibility reasons; after all, those old 16-bit device drivers assume that they are running on a single-tasking operating system. If you were to run a separate copy of each driver in each virtual machine, each copy would try to talk to the same physical device, and bad things would happen because each copy assumed it was the only code that communicated with that device.

    Suppose MS-DOS device drivers were treated as local to each virtual machine. Suppose you had a device driver that controlled a traffic signal, and as we all know, one of the cardinal rules of traffic signals is that you never show green in both directions. The device driver has two variables: NorthSouthColor and EastWestColor, and initially both are set to Red. The copy of the device driver running in the first virtual machine decides to let traffic flow in the north/south direction, and it executes code like this:

    if (EastWestColor != Red) {
     SetEastWestColor(Red);
    }
    SetNorthSouthColor(Green);
    

    Since both variables are initially set to Red, this code sets the north/south lights to green.

    Meanwhile, the copy of the device driver in the second virtual machine wants to let traffic flow in the east/west direction:

    if (NorthSouthColor != Red) {
     SetNorthSouthColor(Red);
    }
    SetEastWestColor(Green);
    

    Since we have a separate copy of the device driver in each virtual machine, the changes made in the first virtual machine do not affect the values in the second virtual machine. The second virtual machine sees that both variables are set to Red, so it merely sets the east/west color to green.

    On the other hand, both of these device drivers are unwittingly controlling the same physical traffic light, and it just got told to set the lights in both directions to Green.

    Oops.

    Okay, so Windows defaults drivers to global. That way, you don't run into the double-bookkeeping problem. But this causes problems for the code which manages the A20 line:

    Consider a system with two virtual machines. The first one calls into MS-DOS. The MS-DOS dispatcher enables the A20 line and calls the real function, but before the function returns, the virtual machine gets pre-empted. The second virtual machine now runs, and it too calls into MS-DOS. The MS-DOS dispatcher in the second virtual machine enables the A20 line and calls into the real function, and after the function returns, the second virtual machine disables the A20 line and returns to its caller. The second virtual machine now gets pre-empted, and the first virtual machine resumes execution. Oops: It tries to resume execution in the HMA, but the HMA is no longer there because the second virtual machine disabled the A20 line!

    The WINA20.386 driver teaches Windows that the state of the A20 should be treated as a per-virtual-machine state rather than a global state. With this new information, the above scenario does not run into a problem because the changes to the A20 line made by one virtual machine have no effect on the A20 line in another virtual machine.

    matushorvath goes on to add, "I would be very interested in how Windows 3.0 found and loaded this file. It seems to me there must have been some magic happening, e.g. DOS somehow forcing the driver to be loaded by Windows."

    Yup, that's what happened, and there's nothing secret about it. When Windows starts up, it broadcasts an interrupt. TSRs and device drivers can listen for this interrupt and respond by specifying that Windows should load a custom driver or request that certain ranges of data should be treated as per-virtual-machine state rather than global state (known to the Windows virtual machine manager as instance data). MS-DOS itself listens for this interrupt, and when Windows sends out the "Does anybody have any special requests?" broadcast, MS-DOS responds, "Yeah, please load this WINA20.386 driver."

    So there you have it, the story of WINA20.386. Interesting or boring?

  • The Old New Thing

    What's the difference between Text Document, Text Document - MS-DOS Format, and Unicode Text Document?

    • 16 Comments

    Alasdair King asks why Wordpad has three formats, Text Document, Text Document - MS-DOS Format, and Unicode Text Document. "Isn't at least one redundant?"

    Recall that in Windows, three code pages have special status.

    1. Unicode (more specifically, UTF-16LE)
    2. CP_ACP, commonly known as the ANSI code page, although that is a misnomer
    3. CP_OEM, commonly known as the OEM code page, although that too is a misnomer.

    Three text file formats. Three encodings. Hm... I wonder...

    As you might have guessed by now, the three text file formats correspond to the three special code pages. Now it's just a matter of deciding which one matches with which. The easiest one is the Unicode one; it seems clear that Unicode Text Document matches with Unicode. Okay, we now have to figure out how Text Document and Text Document - MS-DOS Format map to CP_ACP and CP_OEM. But another piece of the puzzle is pretty clear, because MS-DOS used the so-called OEM code page. Therefore, by process of elimination, Text Document corresponds to CP_ACP.

    Now that we have puzzled out what the three text formats correspond to, we can address the question "Isn't at least one redundant?"

    Michael Kaplan explained that ACP and OEM are (usually) different. And neither is the same as Unicode. So in fact all three are (usually) different.

    In the United States, the so-called ANSI code page is code page 1252, the so-called OEM code page is code page 437, and Unicode is code page 1200. Here's the string résumé expressed in each of the three encodings.

    Description Encoding Code page
    (en-us)
    Bytes
    Text Document CP_ACP 1252 72 E9 73 75 6D E9
    Text Document - MS-DOS Format CP_OEM 437 72 82 73 75 6D 82
    Unicode Text Document UTF-16LE 1200 FF FE 72 00 E9 00 73 00
    75 00 6D 00 E9 00

    Three encodings, three different files. No redundancy.

  • The Old New Thing

    The awesome Valentine's Day gift disguised as an uncreative one

    • 37 Comments

    A few years ago, one of my colleagues wanted to surprise his wife with a new laptop for Valentine's Day. (As a bonus, he set the wallpaper to one of their wedding pictures.) Now, he could just give her a neatly wrapped laptop, but he wanted this one to be a super-surprise.

    First, he bought a large box of chocolates. He then carefully opened the box (preserving the bow and other wrapping), removed the chocolates and put the laptop inside, using a smaller box of chocolates to act as packing material. He then put the cover back on the box of chocolates and restored the box to its original unopened appearance.

    As a final step, he took the completed package to a local grocery store, explained what he was doing to the deli manager, and asked if they would be so kind as to re-wrap the box in shrink wrap to complete the deception. The manager was suitably touched by his story and was happy to help.

    On Valentine's Day morning, he put the large box of chocolates on his wife's chair. She woke up, wandered groggily into the room, saw the box, and said, "Whoa, that's a lot of chocolate." It took some encouragement to get her to open the box (seeing as she hadn't had her morning cup of coffee yet), but when she did and saw the laptop, she just stared at it in shock, saying, "What? ... No, what?"

    In case you couldn't figure it out, his wife was taken totally by surprise and was completely thrilled.

    And that's how my colleague surprised his wife with a new laptop for Valentine's Day. He makes the rest of us look bad.

    Related: iPad frozen into slab of chocolate, delivered to unsuspecting wife.

    Bonus: The story from his wife's point of view.

  • The Old New Thing

    The path-searching algorithm is not a backtracking algorithm

    • 36 Comments

    Suppose your PATH environment variable looks like this:

    C:\dir1;\\server\share;C:\dir2
    

    Suppose that you call LoadLibrary("foo.dll") intending to load the library at C:\dir2\foo.dll. If the network server is down, the LoadLibrary call will fail. Why doesn't it just skip the bad directory in the PATH and continue searching?

    Suppose the LoadLibrary function skipped the bad network directory and kept searching. Suppose that the code which called LoadLibrary("foo.dll") was really after the file \\server\share\foo.dll. By taking the server down, you have tricked the LoadLibrary function into loading c:\dir2\foo.dll instead. (And maybe that was your DLL planting attack: If you can convince the system to reject all the versions on the PATH by some means, you can then get Load­Library to look in the current directory, which is where you put your attack version of foo.dll.)

    This can manifest itself in very strange ways if the two copies of foo.dll are not identical, because the program is now running with a version of foo.dll it was not designed to use. "My program works okay during the day, but it starts returning bad data when I try to run between midnight and 3am." Reason: The server is taken down for maintenance every night, so the program ends up running with the version in c:\dir2\foo.dll, which happens to be an incompatible version of the file.

    When the LoadLibrary function is unable to contact \\server\share\foo.dll, it doesn't know whether it's in the "don't worry, I wasn't expecting the file to be there anyway" case or in the "I was hoping to get that version of the file, don't substitute any bogus ones" case. So it plays it safe and assumes it's in the "don't substitute any bogus ones" and fails the call. The program can then perform whatever recovery it deems appropriate when it cannot load its precious foo.dll file.

    Now consider the case where there is also a c:\dir1\foo.dll file, but it's corrupted. If you do a LoadLibrary("foo.dll"), the call will fail with the error ERROR_BAD_EXE_FORMAT because it found the C:\dir1\foo.dll file, determined that it was corrupted, and gave up. It doesn't continue searching the path for a better version. The path-searching algorithm is not a backtracking algorithm. Once a file is found, the algorithm commits to trying to load that file (a "cut" in logic programming parlance), and if it fails, it doesn't backtrack and return to a previous state to try something else.

    Discussion: Why does the LoadLibrary search algorithm continue if an invalid directory or drive letter is put on the PATH?

    Vaguely related chatter: No backtracking, Part One

  • The Old New Thing

    Fancy use of exception handling in FormatMessage leads to repeated "discovery" of security flaw

    • 24 Comments

    Every so often, somebody "discovers" an alleged security vulnerability in the Format­Message function. You can try it yourself:

    #include <windows.h>
    #include <stdio.h>
    
    char buf[2048];
    char extralong[128*1024];
    
    int __cdecl main(int argc, char **argv)
    {
     memset(extralong, 'x', 128 * 1024 - 1);
     DWORD_PTR args[] = { (DWORD_PTR)extralong };
     FormatMessage(FORMAT_MESSAGE_FROM_STRING |
                   FORMAT_MESSAGE_ARGUMENT_ARRAY, "%1", 0, 0,
                   buf, 2048, (va_list*)args);
     return 0;
    } 
    

    If you run this program under the debugger and you tell it to break on all exceptions, then you will find that it breaks on an access violation trying to write to an invalid address.

    eax=00060078 ebx=fffe0001 ecx=0006fa34 edx=00781000 esi=0006fa08 edi=01004330
    eip=77f5b279 esp=0006f5ac ebp=0006fa1c iopl=0         nv up ei pl nz na pe cy
    cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00010203
    ntdll!fputwc+0x14:
    77f5b279 668902           mov     [edx],ax              ds:0023:00781000=????
    

    Did you just find a buffer overflow security vulnerability?

    The FormatMessage function was part of the original Win32 interface, back in the days when you had lots of address space (two whole gigabytes) but not a lot of RAM (12 megabytes, or 16 if you were running Server). The implementation of FormatMessage reflects this historical reality by working hard to conserve RAM but not worrying too much about conserving address space. And it takes advantage of this fancy new structured exception handling feature.

    The FormatMessage uses the reserve a bunch of address space but commit pages only as they are necessary pattern, illustrated in MSDN under the topic Reserving and Committing Memory. Except that the sample code on that page contains serious errors. For example, if the sample code encounters an exception other than STATUS_ACCESS_VIOLATION, it still "handles" it by doing nothing and returning EXCEPTION_EXECUTE_HANDLER. It fails to handle random access to the buffer or access violations caused by DEP. Though in the very specific sample, it mostly works since the protected region does only one thing, so there aren't many opportunities for the other types of exceptions to occur. (Though if you're really unlucky, you might get an STATUS_IN_PAGE_ERROR.) But enough complaining about that sample.

    The FormatMessage function reserves 64KB of address space, commits the first page, and then calls an internal helper function whose job it is to generate the output, passing the start of the 64KB block of address space as the starting address and telling it to give up when it reaches 64KB. Something like this:

    struct DEMANDBUFFER
    {
      void *Base;
      SIZE_T Length;
    };
    
    int
    PageFaultExceptionFilter(DEMANDBUFFER *Buffer,
                             EXCEPTION_RECORD ExceptionRecord)
    {
      int Result;
    
      DWORD dwLastError = GetLastError();
    
      // The only exception we handle is a continuable read/write
      // access violation inside our demand-commit buffer.
      if (ExceptionRecord->ExceptionFlags & EXCEPTION_NONCONTINUABLE)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->NumberParameters < 2)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->ExceptionInformation[0] &
          ~(EXCEPTION_READ_FAULT | EXCEPTION_WRITE_FAULT))
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else if (ExceptionRecord->ExceptionInformation[1] -
          (ULONG_PTR)Buffer->Base >= Buffer->Length)
        Result = EXCEPTION_CONTINUE_SEARCH;
    
      else {
        // If the memory is already committed, then committing memory won't help!
        // (The problem is something like writing to a read-only page.)
        void *ExceptionAddress = (void*)ExceptionInformation[1];
        MEMORY_BASIC_INFORMATION Information;
        if (VirtualQuery(ExceptionAddress, &Information,
                         sizeof(Information)) != sizeof(Information))
          Result = EXCEPTION_CONTINUE_SEARCH;
    
        else if (Information.State != MEM_RESERVE)
          Result = EXCEPTION_CONTINUE_SEARCH;
    
        // Okay, handle the exception by committing the page.
        // Exercise: What happens if the faulting memory access
        // spans two pages?
        else if (!VirtualAlloc(ExceptionAddress, 1, MEM_COMMIT, PAGE_READWRITE))
          Result = EXCEPTION_CONTINUE_SEARCH;
    
        // We successfully committed the memory - retry the operation
        else Result = EXCEPTION_CONTINUE_EXECUTION;
      }
    
      RestoreLastError(dwLastError);
      return Result;
    }
    
    DWORD FormatMessage(...)
    {
      DWORD Result = 0;
      DWORD Error;
      DEMANDBUFFER Buffer;
      Error = InitializeDemandBuffer(&Buffer, FORMATMESSAGE_MAXIMUM_OUTPUT);
      if (Error == ERROR_SUCCESS) {
        __try {
         Error = FormatMessageIntoBuffer(&Result,
                                         Buffer.Base, Buffer.Length, ...);
        } __except (PageFaultExceptionFilter(&Buffer,
                       GetExceptionInformation()->ExceptionRecord)) {
         // never reached - we never handle the exception
        }
      }
      if (Error == ERROR_SUCCESS) {
       Error = CopyResultsOutOfBuffer(...);
      }
      DeleteDemandBuffer(&Buffer);
      if (Result == 0) {
        SetLastError(Error);
      }
      return Result;
    }
    

    The FormatMessageIntoBuffer function takes an output buffer and a buffer size, and it writes the result to the output buffer, stopping when the buffer is full. The DEMANDBUFFER structure and the PageFaultExceptionHandler work together to create the output buffer on demand as the FormatMessageIntoBuffer function does its work.

    To make discussion easier, let's say that the FormatMessage function merely took printf-style arguments and supported only FORMAT_MESSAGE_FROM_STRING | FORMAT_MESSAGE_ALLOCATE_BUFFER.

    DWORD FormatMessageFromStringPrintfAllocateBuffer(
        PWSTR *ResultBuffer,
        PCWSTR FormatString,
        ...)
    {
      DWORD Result = 0;
      DWORD ResultString = NULL;
      DWORD Error;
      DEMANDBUFFER Buffer;
      va_list ap;
      va_start(ap, FormatString);
      Error = InitializeDemandBuffer(&Buffer, FORMATMESSAGE_MAXIMUM_OUTPUT);
      if (Error == ERROR_SUCCESS) {
        __try {
         SIZE_T MaxChars = Buffer.Length / sizeof(WCHAR);
         int i = _vsnwprintf((WCHAR*)Buffer.Base, MaxChars,
                             FormatString, ap);
         if (i < 0 || i >= MaxChars) Error = ERROR_MORE_DATA;
         else Result = i;
        } __except (PageFaultExceptionFilter(&Buffer,
                       GetExceptionInformation()->ExceptionRecord)) {
         // never reached - we never handle the exception
        }
      }
      if (Error == ERROR_SUCCESS) {
       // Exercise: Why don't we need to worry about integer overflow?
       DWORD BytesNeeded = sizeof(WCHAR) * (Result + 1);
       ResultString = (PWSTR)LocalAlloc(LMEM_FIXED, BytesNeeded);
       if (ResultBuffer) {
        // Exercise: Why CopyMemory and not StringCchCopy?
        CopyMemory(ResultString, Buffer.Base, BytesNeeded);
       } else Error = ERROR_NOT_ENOUGH_MEMORY;
      }
      DeleteDemandBuffer(&Buffer);
      if (Result == 0) {
        SetLastError(Error);
      }
      *ResultBuffer = ResultString;
      va_end(ap);
      return Result;
    }
    

    Let's run this function in our head to see what happens if somebody triggers the alleged buffer overflow by calling

    PWSTR ResultString;
    DWORD Result = FormatMessageFromStringPrintfAllocateBuffer(
                       &ResultString, L"%s", VeryLongString);
    

    After setting up the demand buffer, we call _vsnwprintf to format the output into the demand buffer, but telling it not to go past the buffer's total length. The _vsnwprintf function parses the format string and sees that it needs to copy VeryLongString to the output buffer. Let's say that the DEMANDBUFFER was allocated at address 0x00780000 on a system with 4KB pages. At the start of the copy, the address space looks like this:

    64KB
    0078
    0000

    C
    0078
    1000

    R
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    "C" stands for a committed page, "R" stands for a reserved page, and "X" stands for a page that, if accessed, would be a buffer overflow. We start copying VeryLongString into the output buffer. After copying 2048 characters, we fill the first committed page; copying character 2049 raises a page fault exception.

    64KB
    0078
    0000

    C
    0078
    1000

    R
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    This is the point at which over-eager people observe the first-chance exception, capture the register dump above, and begin writing up their security vulnerability report, cackling with glee. (Observe that in the register dump, the address we are writing to is of the form 0x####1000.)

    As with all first-chance exceptions, it goes down the exception chain. Our custom PageFaultExceptionFilter recognizes this as an access violation in a page that it is responsible for, and the page hasn't yet been committed, so it commits the page as read/write and resumes execution.

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    Copying character 2049 now succeeds, as does the copying of characters 2050 through 4096. When we hit character 4097, the cycle repeats:

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    R
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    Again, the first-chance exception is sent down the chain, our PageFaultExceptionFilter recognizes this as a page it is responsible for, and it commits the page and resumes execution.

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    C
    0078
    3000

    R
    0078
    4000

    R
    0078
    5000

    R
    0078
    6000

    R
    0078
    7000

    R
    0078
    8000

    R
    0078
    9000

    R
    0078
    A000

    R
    0078
    B000

    R
    0078
    C000

    R
    0078
    D000

    R
    0078
    E000

    R
    0078
    F000

    R
    0079
    0000

    X
    ^ output pointer

    If you think about it, this is exactly what the memory manager does with memory that has been allocated but not yet accessed: The memory is not present, and the moment an application tries to access it, the not-present page fault is raised, the memory manager commits the page, and then execution resumes normally. It's memory-on-demand, which is one of the essential elements of virtual memory. What's going on with the DEMANDBUFFER is that we are simulating in user mode what the memory manager does in kernel mode. (The difference is that while the memory manager takes committed memory and makes it present on demand, the DEMANDBUFFER takes reserved address space and commits it on demand.)

    The cycle repeats 13 more times, and then we reach another interesting part of the scenario:

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    C
    0078
    3000

    C
    0078
    4000

    C
    0078
    5000

    C
    0078
    6000

    C
    0078
    7000

    C
    0078
    8000

    C
    0078
    9000

    C
    0078
    A000

    C
    0078
    B000

    C
    0078
    C000

    C
    0078
    D000

    C
    0078
    E000

    C
    0078
    F000

    C
    0079
    0000

    X
    output pointer ^

    We are about to write 32768th character into the DEMANDBUFFER. Once that's done, the buffer will be completely full. One more byte and we will overflow the buffer. (Not even a wafer-thin byte will fit.)

    Let's write that last character and cover our ears in anticipation.

    64KB
    0078
    0000

    C
    0078
    1000

    C
    0078
    2000

    C
    0078
    3000

    C
    0078
    4000

    C
    0078
    5000

    C
    0078
    6000

    C
    0078
    7000

    C
    0078
    8000

    C
    0078
    9000

    C
    0078
    A000

    C
    0078
    B000

    C
    0078
    C000

    C
    0078
    D000

    C
    0078
    E000

    C
    0078
    F000

    C
    0079
    0000

    X
    output pointer ^

    Oh noes! Completely full! Run for cover!

    But wait. We passed a buffer size to the _vsnwprintf function, remember? We already told it never to write more than 32768 characters. As it's about to write character 32769, it realizes, "Wait a second, this would overflow the buffer I was given. I'll return a failure code instead."

    The feared write of the 32769th character never takes place. We never write to the "X" page. Instead, the _vnswprintf call returns that the buffer was not large enough, which is converted into ERROR_MORE_DATA and returned to the caller.

    If you follow through the entire story, you see that everything worked as it was supposed to and no overflow took place. The _vnswprintf function ran up to the brink of disaster but stopped before taking that last step. This is hardly anything surprising; it happens whenever the _vnswprintf function encounters a buffer too small to hold the output. The only difference is that along the way, we saw a few first-chance exceptions, exceptions that had nothing to do with avoiding the buffer overflow in the first place. They were just part of FormatMessage's fancy buffer management.

    It so happens that in Windows Vista, the fancy buffer management technique was abandoned, and the code just allocates 64KB of memory up front and doesn't try any fancy commit-on-demand games. Computer memory has become plentiful enough that a momentary allocation of 64KB has less of an impact than it did twenty years ago, and performance measurements showed that the new "Stop trying to be so clever" technique was now about 80 times faster than the "gotta scrimp and save every last byte of memory" technique.

    The change had more than just a performance effect. It also removed the first-chance exception from FormatMessage, which means that it no longer does that thing which everybody mistakes for a security vulnerability. The good news is that nobody reports this as a vulnerability in Windows Vista any more. The bad news is that people still report it as a vulnerability in Windows XP, and each time this issue comes up, somebody (possibly me) has to sit down and reverify that the previous analysis is still correct, in the specific scenario being reported, because who knows, maybe this time they really did find a problem.

  • The Old New Thing

    Instead of creating something and then trying to hide it, simply don't create it in the first place

    • 16 Comments

    A customer had a question, which was sort of I bet somebody got a really nice bonus for that feature in reverse.

    A customer is asking if there is a way to programmatically control the icons in the notification area.

    Specifically, they want the setting for their notification icon to be "Only show notifications" rather than "Show icon and notifications" or "Hide icon and notifications."

    Icons Behaviors
    Power
    Show icon and notifications
    Fully charged (100%)
    Network
    Show icon and notifications
    Fabrikam Internet access
    Volume
    Show icon and notifications
    Speakers: 10%
    Contoso Resource Notification
    Only show notifications
    No new resources found.

    It's a good thing the customer explained what they wanted, because they started out asking for the impossible part. Arbitrary control of notification icons is not programmatically exposed because all the awesome programs would just force themselves on. But they clarified that what they really want is a way to reduce the visibility of their icon so it displays only when a notification is being shown.

    And there's a way to do that, and it doesn't involve having to programmatically configure anything.

    If you don't want your notification icon to appear in the notification area, then don't show your notification icon in the first place unless you have a notification.

    • When your program starts, don't call Shell_Notify­Icon(NIM_ADD). Since you don't call the function, you don't get a notification icon.
    • When you want to display a notification, call Shell_Notify­Icon(NIM_ADD).
    • When the situation that calls for the notification has passed, call Shell_Notify­Icon(NIM_REMOVE).

    In other words, use the notification icon in the manner it was intended.

    It's sad that notification icon abuse has become so popular (and application frameworks make it so easy to create an abusive notification icon) that people forget how to create a well-behaved notification icon. Instead, they start with the abusive method, and then try to figure out how to make it less abusive.

Page 1 of 3 (21 items) 123