June, 2013

  • The Old New Thing

    2013 mid-year link clearance

    • 19 Comments

    Another round of the semi-annual link clearance.

    And, as always, the obligatory plug for my column in TechNet Magazine:

  • The Old New Thing

    It's the address space, stupid

    • 78 Comments

    Nowadays, computers have so much memory that running out of RAM is rarely the cause for an "out of memory" error.

    Actually, let's try that again. For over a decade, hard drive have been so large (and cheap) that running out of swap space is rarely the cause for an "out of memory" error.

    In user-mode, the term memory refers to virtual memory, not physical RAM chips. The amount of physical RAM doesn't affect how much memory a user-mode application can allocate; it's all about commit and swap space.¹ But swap space is disk space, and that is in the hundreds of gigabytes for hard drives. (Significantly less for SSDs, but even in that case, it's far more than 4GB.)

    The limiting factor these days is address space.

    Each thread's stack takes a megabyte, and if you're creating a lot of threads, that can add up to a lot of address space consumed just for stacks. And then you have to include the address space for the DLLs you've loaded (which quickly adds up). And then there's the address space for all the memory you allocated. (Even if you don't end up using it, it still occupies address space until you free it.)

    Typically, when you get an ERROR_OUT_OF_MEMORY error, the problem isn't physical memory or virtual memory. It's address space.

    This is one of the main benefits of moving to 64-bit computing. It's not that you actually are going to use or need all that memory. But it relieves pressure on the address space: The user-mode address space in 64-bit Windows is eight terabytes.

    When the day comes that eight terabytes is not enough, we at least won't have to redesign the application model to expand the address space. The current x86-64 hardware has support for address spaces of up to 256TB, and the theoretical address space for a 64-bit processor is sixteen exabytes.

    ¹ Of course, physical RAM is a factor if the application is explicitly allocating physical memory, but that's the exception rather than the rule.

    Exercise: Help this customer clear up their confusion: They reported that processes were failing to start with STATUS_DLL_INIT_FAILED (0xC0000142), and our diagnosis was that the desktop heap was exhausted. "The system has 8GB of RAM installed, and Task Manager reports that only 2GB of it is being used, so it is unlikely that I am running out of any kind of heap/memory."

  • The Old New Thing

    Once you return from the WM_ENDSESSION message, your process can be terminated at any time

    • 21 Comments

    A customer had a program which performed some final I/O operations as it exited. Various C++ objects deleted files or flushed buffers as part of their destructors. The customer found that if their program was left running when the user shut down Windows, then the files never got deleted, and the buffers were never flushed. On the other hand, if they inserted an artificial delay into the shutdown procedure, so that it waited ten seconds after the program exited before continuing with shutdown, then the files did indeed get cleaned up and the buffers were indeed flushed. The customer confirmed that the program did receive the WM_END­SESSION message, but it appeared as if all disk I/O issued within five seconds of shutdown never gets committed to disk. This would appear to be a serious bug in Windows.

    Because, of course, when you find a problem with your program, your first reaction should be to assume that you found a bug in Windows so blatant it should be affecting every program on the planet, and yet somehow this horrific data loss bug eluded not only the entirety of the Windows QA team, but also every software developer for the past twenty years who had a program that saved data at shutdown.

    Or the problem could be in your code.

    The documentation for the WM_END­SESSION message says,

    wParam
    If the session is being ended, this parameter is TRUE; the session can end any time after all applications have returned from processing this message.

    What is much more likely to be happening is that when the application receives the WM_END­SESSION message, it posts a message to itself to initiate controlled shutdown. After the program returns from the WM_END­SESSION message, the message pump picks up the shutdown message and it is at this point that the program starts cleaning up objects, including running destructors and flushing buffers, and then finally calling Exit­Process.

    In other words, the problem is not that the final I/O never got committed to disk. The problem is that the final I/O was never issued by the program. Once your program returns from the WM_END­SESSION message, Windows has the right to terminate it without further warning. If your system shuts down quickly, that termination may occur before your destructors manage to run at all.

    You cannot rely on any code in your program running once you have responded to the WM_END­SESSION message. That message is your "final warning". If you need to do cleanup operations before termination, you need to do them before returning from the WM_END­SESSION message. Because once you return from that message, your process is living on borrowed time.

  • The Old New Thing

    The default error mode (SetErrorMode) is not zero

    • 23 Comments

    A customer put the following code at the start of their program:

    // If this assertion fires, then somebody else changed the error mode
    // and I just overwrote it with my error mode.
    ASSERT(SetErrorMode(SEM_FAILCRITICALERRORS) == 0);
    

    The customer wanted to know whether it was a valid assumption that the initial error mode for a process is zero.

    No it is not, and this is called out in the documentation for Set­Error­Mode:

    Remarks

    Each process has an associated error mode that indicates to the system how the application is going to respond to serious errors. A child process inherits the error mode of its parent process.

    The assumption that the initial error mode is zero is therefore false.

    There's another error in the above code: The call to Set­Error­Mode is placed inside an assertion. This means that in the retail build, the call disappears. The debug build has the error mode set to SEM_FAIL­CRITICAL­ERRORS, but the retail build has the default error mode. They are changing the semantics in the debug build, and are headed down the slippery slope that leads to them being forced to deploy the debug version of the program into production because that's the only build that works.

    Unfortunately, they may have already reached that point, because the customer asked, "Is it possible for the user to set the default error code to something other than zero, in which case this assertion would crash the client?" (Emphasis mine.)

    Bonus chatter: Note that you can override error mode inheritance by passing the CREATE_DEFAULT_ERROR_MODE flag to the Create­Process function.

  • The Old New Thing

    Where did the names of the fonts Marlett and Verdana come from?

    • 17 Comments

    Commenter BAA says that the -lett part of Marlett comes from the designer Virginia Howlett. BAA adds, "I forget the 'Mar' but I believe it was a co-creator."

    If so, then that co-creator was Suzan Marashi, if Vincent Connare is to be trusted. On page 17 of the PDF document From The Dark Side..., Connare identifies the authors of the font as Virginia Howlett, Rom Impas, Suzan Marashi, and Alison Grauman-Barnes. He also identifies Eliyezer Kohen as the person whose idea it was to use a special-purpose font.

    According to Virginia Howlett, the original name for the font Verdana was Ventana, which means window in Spanish. Lawyers apparently objected to the name, and the font team explored variations on verde (which means green in Spanish) and verdigris (a green pigment), thereby invoking the color associated both with Washington (The Evergreen State) and Seattle (The Emerald City). The second part of the font name comes from Howlett's granddaughter Ana, following in the tradition of font designers naming fonts after their daughters.

  • The Old New Thing

    Wait, this is not my regular bicycle commute home

    • 12 Comments

    I dreamed that I finished biking home and decided not to take the stairs. Instead I took my bicycle into the elevator to go to my dream-land 31st-floor high-rise condo. (As if.)

    For "security reasons" there were no buttons in the elevator. You had to open a secret panel and flip a circuit-breaker switch corresponding to the floor you want to go to. If you open the wrong panel, you are instead faced with a stack of hot-swappable SATA drives with labels like "Windows 95" and "Windows 95 OSR 2". And yes, I know that SATA didn't exist in 1995.

  • The Old New Thing

    Drawing content at a fixed screen position, regardless of window position

    • 10 Comments

    Today's Little Program draws content at a fixed screen position. The idea is that the window is really a viewport into some magical world. Unfortunately, our magical world just has a sign that says "Booga booga." Creating a more interesting magical world is left as an exercise.

    Start with our scratch program and make these changes:

    void OnMove(HWND hwnd, int x, int y)
    {
     InvalidateRect(hwnd, 0, TRUE);
    }
    
    void
    PaintContent(HWND hwnd, PAINTSTRUCT *pps)
    {
     POINT ptOrigin = { 0, 0 };
     ClientToScreen(hwnd, &ptOrigin);
     POINT ptOrg;
     SetWindowOrgEx(pps->hdc, ptOrigin.x, ptOrigin.y, &ptOrg);
     TextOut(pps->hdc, 200, 200, TEXT("Booga booga"), 11);
     SetWindowOrgEx(pps->hdc, ptOrg.x, ptOrg.y, nullptr);
    }
    
        HANDLE_MSG(hwnd, WM_MOVE, OnMove);
    

    Run this program and drag the window across the screen. When it reaches the "magic place", you will see the words "Booga booga". (You can resize the window to be smaller in order to make finding the magic place more of a challenge.)

  • The Old New Thing

    Of what use is the RDW_INTERNALPAINT flag?

    • 11 Comments

    For motivational purposes, let's start with a program that displays a DWM thumbnail.

    Start with the scratch program and add the following:

    #include <dwmapi.h>
    
    HWND g_hwndThumbnail;
    HTHUMBNAIL g_hthumb;
    
    void UpdateThumbnail(HWND hwndFrame, HWND hwndTarget)
    {
     if (g_hwndThumbnail != hwndTarget) {
      g_hwndThumbnail = hwndTarget;
      if (g_hthumb != nullptr) {
       DwmUnregisterThumbnail(g_hthumb);
       g_hthumb = nullptr;
      }
    
      if (hwndTarget != nullptr) {
       RECT rcClient;
       GetClientRect(hwndFrame, &rcClient);
       if (SUCCEEDED(DwmRegisterThumbnail(hwndFrame,
                             g_hwndThumbnail, &g_hthumb))) {
        DWM_THUMBNAIL_PROPERTIES props = {};
        props.dwFlags = DWM_TNP_RECTDESTINATION | DWM_TNP_VISIBLE;
        props.rcDestination = rcClient;
        props.rcDestination.top += 50;
        props.fVisible = TRUE;
        DwmUpdateThumbnailProperties(g_hthumb, &props);
       }
      }
     }
    }
    

    The Update­Thumbnail function positions a thumbnail of the target window inside the frame window. There's a small optimization in the case that the target window is the same one that the thumbnail is already viewing. Overall, no big deal.

    void
    OnDestroy(HWND hwnd)
    {
     UpdateThumbnail(hwnd, nullptr);
     PostQuitMessage(0);
    }
    

    When our window is destroyed, we need to clean up the thumbnail, which we do by updating it to a null pointer.

    For the purpose of illustration, let's say that pressing the 1 key changes the thumbnail to a randomly-selected window.

    struct RANDOMWINDOWINFO
    {
     HWND hwnd;
     int cWindows;
    };
    
    BOOL CALLBACK RandomEnumProc(HWND hwnd, LPARAM lParam)
    {
     if (hwnd != g_hwndThumbnail &&
         IsWindowVisible(hwnd) &&
         (GetWindowStyle(hwnd) & WS_CAPTION) == WS_CAPTION) {
      auto prwi = reinterpret_cast<RANDOMWINDOWINFO *>(lParam);
      prwi->cWindows++;
      if (rand() % prwi->cWindows == 0) {
       prwi->hwnd = hwnd;
      }
     }
     return TRUE;
    }
    
    void ChooseRandomWindow(HWND hwndFrame)
    {
     RANDOMWINDOWINFO rwi = {};
     EnumWindows(RandomEnumProc, reinterpret_cast<LPARAM>(&rwi));
     UpdateThumbnail(hwndFrame, rwi.hwnd);
    }
    
    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
    {
     switch (ch) {
     case TEXT('1'):
      ChooseRandomWindow(hwnd);
      break;
     }
    }
    
     HANDLE_MESSAGE(hwnd, WM_CHAR, OnChar);
    

    The random window selector selects among windows with a caption which are visible and which are not already being shown in the thumbnail. (That last bit is so that when you press 1, it will always pick a different window.)

    Run this program, and yippee, whenever you press the 1 key, you get a new thumbnail.

    Okay, but usually your program shows more than just a thumbnail. It probably incorporates the thumbnail into its other content, so let's draw some other content, too. Say, a single-character random message.

    TCHAR g_chMessage = '?';
    
    void
    PaintContent(HWND hwnd, PAINTSTRUCT *pps)
    {
     if (!IsRectEmpty(&pps->rcPaint)) {
      RECT rcClient;
      GetClientRect(hwnd, &rcClient);
      DrawText(pps->hdc, &g_chMessage, 1, &rcClient,
               DT_TOP | DT_CENTER);
     }
    }
    
    void ChooseRandomMessage(HWND hwndFrame)
    {
     g_chMessage = rand() % 26 + TEXT('A');
     InvalidateRect(hwndFrame, nullptr, TRUE);
    }
    
    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
    {
     switch (ch) {
     case TEXT('1'):
      ChooseRandomWindow(hwnd);
      break;
     case TEXT('2'):
      ChooseRandomMessage(hwnd);
      break;
     }
    }
    

    Now, if you press 2, we change the random message. There is a small optimiztion in Paint­Content that skips the rendering if the paint rectangle is empty. Again, no big deal.

    Okay, but sometimes there are times where your program wants to update the thumbnail and the message at the same time. Like this:

    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
    {
     switch (ch) {
     case TEXT('1'):
      ChooseRandomWindow(hwnd);
      break;
     case TEXT('2'):
      ChooseRandomMessage(hwnd);
      break;
     case TEXT('3'):
      ChooseRandomWindow(hwnd);
      ChooseRandomMessage(hwnd);
      break;
     }
    }
    

    Run this program and press 3 and watch the thumbnail and message change simultaneously.

    And now we have a problem.

    You see, the Choose­Random­Window function updates the thumbnail immediately, since the thumbnail is presented by DWM, whereas the Choose­Random­Message function updates the message, but the new message doesn't appear on the screen until the next paint cycle. This means that there is a window of time where the new thumbnail is on the screen, but you still have the old message. Since painting is a low-priority activity, the window manager is going to deliver other messages to your window before it finally gets around to painting, and the visual mismatch between the thumbnail and the message can last for quite some time. (You can exaggerate this in the sample program by inserting a call to Sleep.) What can we do to get rid of this visual glitch?

    One solution would be to delay updating the thumbnail until the next paint cycle. At the paint cycle, we update the thumbnail and render the new message. Now both updates occur at the same time, and you get rid of the glitch. To trigger a paint cycle, we can invalidate a dummy 1×1 pixel in the window.

    HWND g_hwndThumbnailWanted;
    
    void
    PaintContent(HWND hwnd, PAINTSTRUCT *pps)
    {
     UpdateThumbnail(hwnd, g_hwndThumbnailWanted);
    
     if (!IsRectEmpty(&pps->rcPaint)) {
      RECT rcClient;
      GetClientRect(hwnd, &rcClient);
      DrawText(pps->hdc, &g_chMessage, 1, &rcClient,
               DT_TOP | DT_CENTER);
     }
    }
    
    void ChooseRandomWindow(HWND hwndFrame)
    {
     RANDOMWINDOWINFO rwi = {};
     EnumWindows(RandomEnumProc, reinterpret_cast(&rwi));
     g_hwndThumbnailWanted = rwi.hwnd;
     RECT rcDummy = { 0, 0, 1, 1 };
     InvalidateRect(hwndFrame, &rcDummy, FALSE);
    }
    

    Now, when we want to change the thumbnail, we just remember what thumbnail we want (the "logical" current thumbnail) and invalidate a dummy pixel in our window. The invalid dummy pixel triggers a paint cycle, and in our paint cycle, we call Update­Thumbnail to synchronize the logical current thumbnail with the physical current thumbnail. And then we continue with our regular painting (in case there is also painting to be done, too).

    But it sure feels wasteful invalidating a pixel and forcing the Draw­Text to occur even though we really didn't update anything. Wouldn't it be great if we could just say, "Could you fire up a paint cycle for me, even though there's technically nothing to paint? Because I actually do have stuff to paint, it's just something outside your knowledge since it is not rendered by GDI."

    Enter the RDW_INTERNAL­PAINT flag.

    If you pass the RDW_INTERNAL­PAINT flag to Redraw­Window, that means, "Set the 'Yo, there's painting to be done!' flag. I know you think there's no actual painting to be done, but trust me on this." (It's not actually a flag, but you can think of it that way.)

    When the window manager then get around to deciding whether there is any painting to be done, before it concludes, "Nope, this window is all valid," it checks if you made a special RDW_INTERNAL­PAINT request, and if so, then it will generate a dummy WM_PAINT message for you.

    Using this new flag is simple:

     g_hwndThumbnailWanted = rwi.hwnd;
     // RECT rcDummy = { 0, 0, 1, 1 };
     // InvalidateRect(hwndFrame, &rcDummy, FALSE);
     RedrawWindow(hwndFrame, nullptr, nullptr,
                  RDW_INTERNALPAINT);
    

    Now, when the program wants to update its thumbnail, it just schedules a fake-paint message with the window manager. These fake-paint messages coalesce with real-paint messages, so if you do an internal paint and an invalidation, only one actual paint message will be generated. If the paint message is a fake-paint message, the rcPaint will be empty, and you can test for that in your paint handler and skip your GDI painting.

  • The Old New Thing

    Solving the problem rather than answering the question: Why does somebody want to write an unkillable process?

    • 49 Comments

    Via their customer liaison, a customer wanted to know how to create a process that runs with the context of the user, but which the user cannot terminate without elevating to administrator.

    The customer is engaging in the futile arms race between programs and users (which is more properly a walls and ladders scenario). And we saw that Windows has decided to keep users in control of their own programs and data and let them kill any process that belongs to them. (For one thing, allowing a process running in a user's context to protect itself from termination would mean that malware could make itself unkillable without requiring elevation.)

    We asked the customer liaison why their program is so important that they don't want the user to terminate it.

    The customer liaison explained, "The program is launched when the user logs in, and the customer doesn't want the user to be able to terminate the process from Task Manager."

    Observe that the customer (through the liaison) is not answering the question. They keep saying what they want to do (looking for an answer) without describing the problem they are trying to solve (developing a solution). We had to ask them a second time, adding that even if they managed to protect the program from being terminated via Task Manager, that doesn't mean the user can't get the program to crash, which is equivalent to terminating it. Now, the typical user won't use these techniques, but the typical user also won't go to the Processes tab of Task Manager and click End Task and then accept the scary warning dialog. We're talking about a determined user at this point.

    Finally, the customer coughed up the actual scenario. They have a service that monitors some central database. Based on the information in the central database, the service may decide that the user needs to log off. (Say, because the user's profile server is going offline, or because the service needs to reconfigure the system. The exact reason isn't important.) The application that launches at login listens for a notification from the service that a forced-logoff is about to occur and displays a balloon notification to the user warning them to save all their work because they are going to be logged off in N minutes.

    At this point, I realized that they didn't have a problem in the first place: The "horrible dire consequences" of the user terminating the application is simply that they don't get balloon notifications? Who cares? For one thing, balloon notifications do not have guaranteed timely delivery. For example, Explorer won't show a balloon if the user is in a fullscreen application or if the user is simply not at the computer. Furthermore, the user already has a way of not getting balloon notifications: By ignoring them when they appear on the screen!

    If the user kills the "Please save your work and log off because the system is shutting down in N minutes for maintenance" notification application, and then they get booted off the computer without warning, well, they get what they deserve. It's like somebody who intentionally pries the safety cover off the emergency power cutoff switch. If they bump into the switch and accidentally kill power to their computer, well, they went out of their way to disable the safety cover—they deserve what they get.

    The customer appears not to have been listening very closely, because they summarized the situation as follows: "Right. There are currently two ways to meet the requirements. First, launch the UI process with service privileges, so that the user does not have permission to kill it. Or launch the UI process with user privileges, but marked as unkillable. It looks like your recommendation is to log off immediately if the user terminates the UI process. This will likely be satisfactory, but how can a process force the user to log off if it is terminated?"

    I felt like I was stuck in one of those sitcom tropes where the comically obtuse character refuses to process any new information:

    "I brought you the boxes from the storeroom."

    Thanks. Please put them against the wall.

    "Should I put them on the shelf?"

    No. Please put them against the wall.

    "And then move them to the shelf?"

    No. Just put them against the wall.

    "Oh, sorry. I'll take them back to the storeroom."

    No. Leave them here. Just put them against the wall.

    "You want them here?"

    Yes. Against the wall, please.

    "Okay, I'll put them on the shelf."

    The customer took my explanation of why the process doesn't need to be unkillable and reinterpreted it as offering one of three options:

    • Launch a UI process with service privileges, which is a well-known security vulnerability.
    • Creating a process that cannot be killed by its owner, which we already noted was not possible. (Even if you deny Terminate permission, the user can simply edit the ACL to restore Terminate permission, because users always have WRITE_DAC permission on objects they own.)
    • Creating a process that logs the user off if it is terminated, which is possible with the help of the service, but is not something I ever mentioned at all.

    I had to issue a clarification to the customer liaison: "My recommendation was not to log the user off immediately if the user kills the UI process. My recommendation is not to worry. Let the users kill the process if they want. They are only harming themselves. By killing the UI process, the users went in and disabled their smoke detector. If they die in a fire, that's their problem."

    Unfortunately, this did not settle the issue with the customer, and we had another round of the scene with the comically obtuse character.

    "Good news. We found a way to hide the process from the Applications tab, but it still shows up in the Processes tab. We would like to know if there is a way to hide the UI process from the Processes tab as well."

    Aaaaarrrrgggghhh.

    The customer is totally fixated on putting the boxes on the shelf, no matter how many times we say, "Putting the boxes on the shelf will not solve your problem."

    Eventually we settled on some sort of compromise: Have the service monitor the UI process, and if it terminates, log the user off immediately or alternatively relaunch the UI process.

    Extra wrinkle: Actually, the original design of the system was that the UI process would show the balloon, then wait N minutes, and then call Exit­Windows­Ex to log the user off. That's why they didn't want the user to be able to kill the process: If they killed the process, then nobody was going to perform the logoff. The solution to this was to move the forced logoff into the service. Note that preventing the user from terminating the UI process wouldn't solve their problem anyway, because the user could simply patch out the Exit­Windows­Ex call or suspend the process so it never got a chance to call Exit­Windows­Ex in the first place. As with Web programming, you need to design on the assumption that the client has been completely compromised.

  • The Old New Thing

    AttachThreadInput is like taking two threads and pooling their money into a joint bank account, where both parties need to be present in order to withdraw any money

    • 29 Comments

    Consider this code:

    // Code in italics is wrong
       foregroundThreadId = ::GetWindowThreadProcessId(::GetForegroundWindow(), 0); 
       myThreadId = GetCurrentThreadId(); 
     
       if (foregroundThreadId != myThreadId) 
       { 
           AttachThreadInput(foregroundThreadId, myThreadId, TRUE); 
           BringWindowToTop(myWindowHandle);
    

    If you try to step over the Attach­Thread­Input call in the debugger, both the debugger and the application being debugged will freeze. Why is that?

    This should look familiar because it's basically the same code that I warned you about several years ago. The code grabs the current foreground window and attaches its input state to the current thread. Now you're in trouble.

    Remember dual-signature bank accounts? These were bank accounts that required the signatures of both account holders in order to make a withdrawal. It can work out fine if the two parties trust each other with a shared bank account and can coordinate their actions so that when one of them needs money, it can go to the other and say, "Hey, can you sign this withdrawal slip? I need some money." (Another use case for dual-signature bank accounts was a parent wanting to monitor their child's spending.)

    Attach­Thread­Input tells the window manager, "Please take these two threads and put all their money in a dual-signature bank account."

    In the case above, the code said, "See that random person being served by the bank teller? Please take all my money and all his money and put them into a dual-signature bank account."

    As you can imagine, this is a bad idea, both for you and for the other person. You cannot withdraw any money until you can somehow track down that random person and get him to sign the withdrawal form. And it's not like you have any relationship with that person—you don't even know his name!—so the only chance you have is to go down to the bank and hang out there hoping that the other guy will show up to make a withdrawal as part of his normal course of business, and then you can say, "Hey, you there! Sign this for me, will ya?"

    The other person is in a similar predicament. When he goes to the bank to make a withdrawal, the teller will say, "I'm sorry, sir, but your money is in a dual-signature account, and your withdrawal slip has only one signature on it." He's stuck doing the same thing that you do: Whenever he wants to withdraw money, he has to go to the bank and hang around hoping that you will show up eventually.

    bank account input queue
    money input
    go to the bank check the message queue

    In this case, what happened was that the code grabbed the debugger and said, "Okay, we now have a dual-signature bank account!" And now you're stuck. The debugger cannot withdraw any money because it is waiting for you to go to the bank. But you can't go to the bank because you're broken into the debugger. Result: Nobody gets any money.

    This is why you shouldn't grab random people in the bank and unilaterally create dual-signature bank accounts with them.

    Reminder: Attaching input queues is not a Get Out of Jail Free card. It's a Get Into the Same Jail card.

Page 1 of 3 (25 items) 123