January, 2006

  • The Old New Thing

    Performance consequences of polling

    • 52 Comments

    Polling kills.

    A program should not poll as a matter of course. Doing so can have serious consequences on system performance. It's like checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm.

    First of all, polling means that a small amount of CPU time gets eaten up at each poll even though there is nothing to do. Even if you tune your polling loop so its CPU usage is only, say, a measly one tenth of one percent, once this program is placed on a Terminal Server with 800 simultaneous connections, your 0.1% CPU has magnified into 80% CPU.

    Next, the fact that a small snippet of code runs at regular intervals means that it (and all the code that leads up to it) cannot be pruned from the system's working set. They remain present just to say "Nope, nothing to do." If your polling code touches any instance data (and it almost certainly will), that's a minimum of one page's worth of memory per instance. On an x86-class machine, that 4K times the number of copies of the program running. On that 800-user Terminal Server machine, you've just chewed up 3MB of memory, all of which is being kept hot just in case some rare event occurs.

    Finally, polling has deleterious effects even for people who aren't running humongous Terminal Server machines with hundreds of users. A single laptop will suffer from polling, because it prevents the CPU from going to more power-efficient sleep states, resulting in a hotter laptop and shorter battery life.

    Of course, Windows itself is hardly blame-free in this respect, but the performance team remains on the lookout for rogue polling in Windows and "politely reminds" teams they find engaging in polling that they should "strongly consider" other means of accomplishing what they're after.

  • The Old New Thing

    Why does a corrupted binary sometimes result in "Program too big to fit in memory"?

    • 44 Comments

    If you take a program and corrupt the header, or just take a large-ish file that isn't a program at all and give it a ".exe" extension, then try to run it (Warning: Save your work first!), you will typically get the error "Program too big to fit in memory". Why such a confusing error message? Why doesn't it say "Corrupted program"?

    Because the program isn't actually corrupted. Sort of.

    A Win32 executable file begins with a so-called "MZ" header, followed by a so-called "PE" header. If the "PE" header cannot be found, then the loader attempts to load the program as a Win16 executable file, which consists of an "MZ" header followed by an "NE" header.

    If neither a "PE" nor an "NE" header can be found after the "MZ" header, then the loader attempts to load the program as an MS-DOS relocatable executable. If not even an "MZ" header can be found, then the loader attempt to load the program as an MS-DOS non-relocatable executable (aka "COM format" since this is the format of CP/M .COM files).

    In pictures:

    MZPEWin32
    NEWin16
    elseMS-DOS relocatable
    else MS-DOS non-relocatable

    Observe that no matter what path you take through the chart, you will always end up at something. There is no exit path that says "Corrupted program".

    But where does "Program too big to fit in memory" come from?

    If the program header is corrupted, then various fields in the header such as those which specify the amount of memory required by the program will typically be nonsensical values. The loader sees an MS-DOS relocatable program that requires 800KB of conventional memory, and that's where "Out of memory" comes from.

    An MS-DOS non-relocatable program contains no such information about memory requirements. The rule for loading non-relocatable programs is simply to load the program into a single 64KB chunk of memory and set it on its way. Therefore, a program with no "MZ" header but which is larger than 64KB in size won't fit in the single 64KB chunk and consequently results in an "Out of memory" error.

    And since people are certain to ask:

    • "MZ" = the legendary Mark Zbikowski.
    • "NE" = "New Executable", back when Windows was "new".
    • "PE" = "Portable Executable", because one of Windows NT's claims to fame was its portability to architectures other than the x86.
    • "LE" = "Linear Executable", used by OS/2 and by Windows 95 device drivers.
  • The Old New Thing

    Why does the Recycle Bin have different file system names on FAT and NTFS?

    • 24 Comments

    On FAT drives, the directory that stores files in the Recycle Bin is called C:\RECYCLED, but on NTFS drives, its name is C:\RECYCLER. Why the name change?

    The FAT and NTFS Recycle Bins have different internal structure because NTFS has this thing called "security" and FAT doesn't. All recycled files on FAT drives are dumped into a single C:\RECYCLED directory, whereas recycled files on NTFS drives are separated based on the user's SID into directories named C:\RECYCLER\S-.... (It has nothing to do with whether you are running English or Swedish Windows.)

    Suppose the same directory name were used for both file systems, say, C:\RECYCLED. Since it is possible to upgrade a FAT drive to an NTFS drive with the CONVERT utility, this means that a FAT drive converted to NTFS would have a FAT-style Recycle Bin after the conversion. But since the names are the same, the Recycle Bin says, "Hey, look, here's a C:\RECYCLED directory. That must be my NTFS Recycle Bin!" except that it isn't. It's a FAT Recycle Bin left over from the conversion.

    Giving the NTFS Recycle Bin a different name means that the Recycle Bin shell folder won't get confused by the "wrong" type of recycle bin directory structure on an NTFS volume.

    Yes, the problem could have been solved some other way. For example, there could have been code to inspect the Recycle Bin directory to determine what format it is and ignore it if it didn't match the actual file system. (Or, if you're feeling really ambitious, somehow convert from one format to the other.) But that would be over-engineering. You have to write and test the detection (and possibly conversion) code, there's the risk of a false-positive, the code runs at every boot, and it needs to be maintained whenever either the FAT or NTFS recycle bin format changes. All for a scenario that happens at most once per drive.

    Or you could change one text string and be done with it. (I could make some really awful "Gordian knot"/"string" remark here but will refrain.)

  • The Old New Thing

    Taxes: Remote Desktop Connection and painting

    • 40 Comments

    An increasingly important developer tax is supporting Remote Desktop Connection properly. When the user is connected via a Remote Desktop Connection, video operations are transferred over the network connection to the client for display. Since networks have high latency and nowhere near the bandwidth of a local PCI or AGP bus, you need to adapt to the changing cost of drawing to the screen.

    If you draw a line on the screen, the "draw line" command is sent over the network to the client. If you draw text, a "draw text" command is sent (along with the text to draw). So far so good. But if you copy a bitmap to the screen, the entire bitmap needs to be transferred over the network.

    Let's write a sample program that illustrates this point. Start with our new scratch program and make the following changes:

    void Window::Register()
    {
        WNDCLASS wc;
        wc.style         = CS_VREDRAW | CS_HREDRAW;
        wc.lpfnWndProc   = Window::s_WndProc;
        ...
    }
    
    class RootWindow : public Window
    {
    public:
     virtual LPCTSTR ClassName() { return TEXT("Scratch"); }
     static RootWindow *Create();
    protected:
     LRESULT HandleMessage(UINT uMsg, WPARAM wParam, LPARAM lParam);
     LRESULT OnCreate();
     void PaintContent(PAINTSTRUCT *pps);
     void Draw(HDC hdc, PAINTSTRUCT *pps);
    private:
     HWND m_hwndChild;
    };
    
    void RootWindow::Draw(HDC hdc, PAINTSTRUCT *pps)
    {
     FillRect(hdc, &pps->rcPaint, (HBRUSH)(COLOR_WINDOW + 1));
     RECT rc;
     GetClientRect(m_hwnd, &rc);
     for (int i = -10; i < 10; i++) {
      TextOut(hdc, 0, i * 15 + rc.bottom / 2, TEXT("Blah blah"), 9);
     }
    }
    
    void RootWindow::PaintContent(PAINTSTRUCT *pps)
    {
     Draw(pps->hdc, pps);
    }
    
    LRESULT RootWindow::HandleMessage(
                              UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
     switch (uMsg) {
     ...
     case WM_ERASEBKGND: return 1;
     ...
    }
    

    There is an odd division of labor here; the PaintContent method doesn't actually do anything aside from handing the work off to the Draw method to do the actual drawing. (You'll see why soon.) Make sure "Show window contents while dragging" is enabled and run this program and resize it vertically. Ugh, what ugly flicker. We fix this by the traditional technique of double-buffering.

    void RootWindow::PaintContent(PAINTSTRUCT *pps)
    {
     if (!IsRectEmpty(&pps->rcPaint)) {
      HDC hdc = CreateCompatibleDC(pps->hdc);
      if (hdc) {
       int x = pps->rcPaint.left;
       int y = pps->rcPaint.top;
       int cx = pps->rcPaint.right - pps->rcPaint.left;
       int cy = pps->rcPaint.bottom - pps->rcPaint.top;
       HBITMAP hbm = CreateCompatibleBitmap(pps->hdc, cx, cy);
       if (hbm) {
        HBITMAP hbmPrev = SelectBitmap(hdc, hbm);
        SetWindowOrgEx(hdc, x, y, NULL);
    
        Draw(hdc, pps);
    
        BitBlt(pps->hdc, x, y, cx, cy, hdc, x, y, SRCCOPY);
    
        SelectObject(hdc, hbmPrev);
        DeleteObject(hbm);
       }
       DeleteDC(hdc);
      }
     }
    }
    

    Our new PaintContent method creates an offscreen bitmap and asks the Draw method to draw into it. Once that's done, the results are copied to the screen at one go, thereby avoiding flicker. If you run this program, you'll see that it resizes nice and smooth.

    Now connect to the computer via a Remote Desktop Connection and run it again. Since Remote Desktop Connection disables "Show window contents while dragging", you can't use resizing to trigger redraws, so instead maximize the program and restore it a few times. Notice the long delay before the window is resized when you maximize it. That's because we are pumping a huge bitmap across the Remote Desktop Connection as part of that BitBlt call.

    Go back to the old version of the PaintContent method, the one that just calls Draw, and run it over Remote Desktop Connection. Ah, this one is fast. That's because the simpler version doesn't transfer a huge bitmap over the Remote Desktop Connection; it just sends twenty TextOut calls on a pretty short string of text. These take up much less bandwidth than a 1024x768 bitmap.

    We have one method that is faster over a Remote Desktop Connection, and another method that is faster when run locally. Which should we use?

    We use both, choosing our drawing method based on whether the program is running over a Remote Desktop Connection.

    void RootWindow::PaintContent(PAINTSTRUCT *pps)
    {
     if (GetSystemMetrics(SM_REMOTESESSION)) {
      Draw(pps->hdc, pps);
     } else if (!IsRectEmpty(&pps->rcPaint)) {
      ... as before ...
     }
    }
    

    Now we get the best of both worlds. When run locally, we use the double-buffered drawing which draws without flickering, but when run over a Remote Desktop Connection, we use the simple Draw method that draws directly to the screen rather than to an offscreen bitmap.

    This is a rather simple example of adapting to Remote Desktop Connection. In a more complex world, you may have more complicated data structures associated with the two styles of drawing, or you may have background activities related to drawing that you may want to turn on and off based on whether the program is running over a Remote Desktop Connection. Since the user can dynamically connect and disconnect, you can't just assume that the state of the Remote Desktop Connection when your program starts will be the state for the lifetime of the program. We'll see next time how we can adapt to a changing world.

  • The Old New Thing

    When programs assume that the system will never change, episode 3

    • 37 Comments

    One of the stranger application compatibility puzzles was solved by a colleague of mine who was trying to figure out why a particular program couldn't open the Printers Control Panel. Upon closer investigation, the reason became clear. The program launched the Control Panel, used FindWindow to locate the window, then accessed that window's "File" menu and extracted the strings from that menu looking for an item that contained the word "Printer". It then posted a WM_COMMAND message to the Control Panel window with the menu identifier it found, thereby simulating the user clicking on the "Printers" menu option.

    With Windows 95's Control Panel, this method fell apart pretty badly. There is no "Printers" option on the Control Panel's File menu. It never occurred to the authors of the program that this was a possibility. (Mind you, it was a possibility even in Windows 3.1: If you were running a non-English version of Windows, the name of the Printers option will be something like "Skrivare" or "Drucker". Not that it mattered, because the "File" menu will be called something like "Arkiv" or "Datei"! The developers of this program simply assumed that everyone in the world speaks English.)

    The code never checked for errors; it plowed ahead on the assumption that everything was going according to plan. The code eventually completed its rounds and sent a garbage WM_COMMAND message to the Control Panel window, which was of course ignored since it didn't match any of the valid commands on that window's menu.

    The punch line is that the mechanism for opening the Printers Control Panel was rather clearly spelled out on the very first page of the "Control Panel" chapter of the Windows 3.1 SDK:

    The following example shows how an application can start Control Panel and the Printers application from the command line by using the WinExec function:

        WinExec("control.exe printers", SW_SHOWNORMAL);
    

    In other words, they didn't even read past the first page.

    The solution: Create a "decoy" Control Panel window with the same class name as Windows 3.1, so that this program would find it. The purpose of these "decoys" is to draw the attention of the offending program, taking the brunt of the mistreatment and doing what they can to mimic the original behavior enough to keep that program happy. In this case, it waited patiently for the garbage WM_COMMAND message to arrive and dutifully launched the Printers Control Panel.

    Nowadays, this sort of problem would probably have been solved with the use of a shim. But this was back in Windows 95, where application compatibility technology was still comparatively immature. All that was available at the time were application compatibility flags and hot-patching of binaries, wherein the values are modified as they are loaded into memory. Using hot-patching technology was reserved for only the most extreme compatibility cases, because getting permission from the vendor to patch their program was a comparatively lengthy legal process. Patching was considered a "last resort" compatibility mechanism not only for the legal machinery necessary to permit it, but also because patching a program fixes only the versions of the program the patch was developed to address. If the vendor shipped ten versions of a program, ten different patches would have to be developed. And if the vendor shipped another version after Windows 95 was delivered to duplication, that version would be broken when Windows 95 hit the shelves.

    It is important to understand the distinction between what is a documented and supported feature and what is an implementation detail. Documented and supported features are contracts between Windows and your program. Windows will uphold its end of the contract for as long as that feature exists. Implementation details, on the other hand, are ephemeral; they can change at any time, be it at the next major operating system release, at the next service pack, even with the next security hotfix. If your program relies on implementation details, you're contributing to the compatibility cruft that Windows carries around from release to release.

    Over the next few days, I'll talk about other decoys that have been used in Windows.

    [Somebody caught my misspelling of "Drucker" while I was fixing it! - 7:45am]

  • The Old New Thing

    The decoy visual style

    • 79 Comments

    During the development of Windows XP, the visual design team were very cloak-and-dagger about what the final visual look was going to be. They had done a lot of research and put a lot of work into their designs and wanted to make sure that they made a big splash at the E3 conference when Luna was unveiled. Nobody outside the visual styles team, not even me, knew what Luna was going to look like.

    On the other hand, the programmers who were setting up the infrastructure for visual styles needed to have something to test their code against. And something had to go out in the betas.

    The visual styles team came up with two styles. In secret, they worked on Luna. In public, they worked on a "decoy" visual style called "Mallard". (For non-English speakers: A mallard is a type of duck commonly used as the model for decoys.) The ruse was so successful that people were busy copying the decoy and porting it to their own systems. (So much for copyright protection.)

  • The Old New Thing

    Pumping messages while waiting for a period of time

    • 13 Comments

    We can use the MsgWaitForMultipleObjects function (or its superset MsgWaitForMultipleObjectsEx) to carry out a non-polling "sleep while processing messages".

    #define MSGF_SLEEPMSG 0x5300
    
    BOOL SleepMsg(DWORD dwTimeout)
    {
     DWORD dwStart = GetTickCount();
     DWORD dwElapsed;
     while ((dwElapsed = GetTickCount() - dwStart) < dwTimeout) {
      DWORD dwStatus = MsgWaitForMultipleObjectsEx(0, NULL,
                        dwTimeout - dwElapsed, QS_ALLINPUT,
                        MWFMO_WAITANY | MWMO_INPUTAVAILABLE);
      if (dwStatus == WAIT_OBJECT_0) {
       MSG msg;
       while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE)) {
        if (msg.message == WM_QUIT) {
         PostQuitMessage((int)msg.wParam);
         return FALSE; // abandoned due to WM_QUIT
        }
        if (!CallMsgFilter(&msg, MSGF_SLEEPMSG)) {
         TranslateMessage(&msg);
         DispatchMessage(&msg);
        }
       }
      }
     }
     return TRUE; // timed out
    }
    

    This function pumps messages for up to dwTimeout milliseconds. The kernel of the idea is merely to use the MsgWaitForMultipleObjects/Ex function as a surrogate for WaitMessageTimeout, pumping messages until the cumulative timeout has been reached. There are a lot of small details to pay heed to, however. I've linked them to earlier postings that discuss the specific issues, if you need a refresher. The CallMsgFilter you might find gratuitous, but you'll change your mind when you realize that users might press a keyboard accelerator while you're sleeping, and you presumably want it to go through somebody's TranslateAccelerator. The message filter lets you hook into the modal loop and do your accelerator translation.

    Extending this function to "wait on a set of handles up to a specified amount of time, while pumping messages" is left as an exercise. (You can do it without changing very many lines of code.)

    [Call the right function. -2pm]

  • The Old New Thing

    The decoy display control panel

    • 25 Comments

    Last time, we saw one example of a "decoy" used in the service of application compatibility with respect to the Printers Control Panel. Today we'll look at another decoy, this time for the Display Control Panel.

    When support for multiple monitors was being developed, a major obstacle was that a large number of display drivers hacked the Display Control Panel directly instead of using the documented extension mechanism. For example, instead of adding a separate page to the Display Control Panel's property sheet for, say, virtual desktops, they would just hack into the "Settings" page and add their button there. Some drivers were so adventuresome as to do what seemed like a total rewrite of the "Settings" page. They would take all the controls, move them around, resize them, hide some, show others, add new buttons of their own, and generally speaking treat the page as a lump of clay waiting to be molded into their own image. (Here's a handy rule of thumb: If your technique works only if the user speaks English, you probably should consider the possibility that what you're doing is relying on an implementation detail rather than something that will be officially supported going forward.)

    In order to support multiple monitors, the Settings page on the Display Control Panel underwent a major overhaul. But when you tried to open the Display Control Panel on a system that had one of these aggressive drivers installed, it would crash because the driver ran around rearranging things like it always did, even though the things it was manipulating weren't what the developers of the driver intended!

    The solution was to create a "decoy" Settings page that looked exactly like the classic Windows 95 Settings page. The decoy page's purpose in life was to act as bait for these aggressive display drivers and allow itself to be abused mercilessly, letting the driver have its way. Meanwhile, the real Settings page (which is the one that was shown to the user), by virtue of having been overlooked, remained safe and unharmed.

    There was no attempt to make this decoy Settings page do anything interesting at all. Its sole job was to soak up mistreatment without complaining. As a result, those drivers lost whatever nifty features their shenanigans were trying to accomplish, but at least the Display Control Panel stayed alive and allowed the user to do what they were trying to do in the first place: Adjust their display settings.

  • The Old New Thing

    ReadProcessMemory is not a preferred IPC mechanism

    • 32 Comments

    Occasionally I see someone trying to use the ReadProcessMemory function as an inter-process communication mechanism. This is ill-advised for several reasons.

    First, you cannot use ReadProcessMemory across security contexts, at least not without doing some extra work. If somebody uses "runas" to run your program under a different identity, your two processes will not be able to use ReadProcessMemory to transfer data back and forth.

    You could go to the extra work to get ReadProcessMemory by adjusting the privileges on your process to grant PROCESS_VM_READ permission to the owner of the process you are communicating with, but this opens the doors wide open. Any process running with that identity read the data you wanted to share, not just the process you are communicating with. If you are communicating with a process of lower privilege, you just exposed your data to lower-privilege processes other than the one you are interested in.

    What's more, once you grant PROCESS_VM_READ permission, you grant it to your entire process. Not only can that process read the data you're trying to share, it can read anything else that is mapped into your address space. It can read all your global variables, it can read your heap, it can read variables out of your stack. It can even corrupt your stack!

    What? Granting read access can corrupt your stack?

    If a process grows its stack into the stack guard page, the unhandled exception filter catches the guard exception and extends the stack. But when it happen inside a private "catch all exceptions" handler, such as the one that the IsBadReadPtr Function uses, it is handled privately and doesn't reach the unhandled exception filter. As a result, the stack is not grown; a new stack guard page is not created. When the stack normally grows to and then past the point of the prematurely-committed guard page, what would normally be a stack guard exception is now an access violation, resulting in the death of the thread and with it likely the process.

    You might think you could catch the stack access violation and try to shut down the thread cleanly, but that is not possible for multiple reasons. First, structured exception handling executes on the stack of the thread that encountered the exception. If that thread has a corrupted stack, it becomes impossible to dispatch that exception since the stack that the exception filters want to run on is no longer viable.

    Even if you could somehow run these exception filters on some sort of "emergency stack", you still can't fix the problem. At the point of the exception, the thread could be in the middle of anything. Maybe it was inside the heap manager with the heap lock held and with heap data structures in a state of flux. In order for the process to stay alive, the heap data structures need to be made consistent and the heap lock released. But you don't know how to do that.

    There are plenty of other inter-process communication mechanisms available to you. One of them is anonymous shared memory, which I discussed a few years ago. Anonymous shared memory still has the problem that any process running under the same token as the one you are communicating with can read the shared memory block, but at least the scope of the exposure is limited to the data you explicitly wanted to share.

    (In a sense, you can't do any better than that. The process you are communicating with can do anything it wants with the data once it gets it from you. Even if you somehow arranged so that only the destination process can access the memory, there's nothing stopping that destination process from copying it somewhere outside your shared memory block, at which point your data can be read from the destination process by anybody running with the same token anyway.)

  • The Old New Thing

    The cost of trying too hard: Splay trees

    • 22 Comments

    Often, it doesn't pay off to be too clever. Back in the 1980's, I'm told the file system group was working out what in-memory data structures to use to represent the contents of a directory so that looking up a file by name was fast. One of the experiments they tried was the splay tree. Splay trees were developed in 1985 by Sleator and Tarjan as a form of self-rebalancing tree that provides O(log n) amortized cost for locating an item in the tree, where n is the number of items in the tree. (Amortized costing means roughly that the cost of M operations is O(M log n). The cost of an individual operation is O(log n) on average, but an individual operation can be very expensive as long as it's made-up for by previous operations that came in "under budget".)

    If you're familiar with splay trees you may already see what's about to happen.

    A very common operation in a directory is enumerating and opening every file in it, say, because you're performing a content search through all the files in the directory or because you're building a preview window. Unfortunately, when you sequentially access all the elements in a splay tree in order, this leaves the tree totally unbalanced. If you enumerate all the files in the directory and open each one, the result is a linear linked list sorted in reverse order. Locating the first file in the directory becomes an O(n) operation.

    From a purely algorithmic analysis point of view, the O(n) behavior of that file open operation is not a point of concern. After all, in order to get to this point, you had to perform n operations to begin with, so that very expensive operation was already "paid for" by the large number of earlier operations. However, in practice, people don't like it when the cost of an operation varies so widely from use to use. If you arrive at a client's office five minutes early for a month and then show up 90 minutes late one day, your explanation of "Well, I was early for so much, I'm actually still ahead of schedule according to amortized costing," your client will probably not be very impressed.

    The moral of the story: Sometimes trying too hard doesn't work.

    (Postscript: Yes, there have been recent research results that soften the worst-case single-operation whammy of splay trees, but these results weren't available in the 1980's. Also, remember that consistency in access time is important.)

Page 1 of 4 (36 items) 1234