January, 2006

  • The Old New Thing

    The decoy visual style

    • 79 Comments

    During the development of Windows XP, the visual design team were very cloak-and-dagger about what the final visual look was going to be. They had done a lot of research and put a lot of work into their designs and wanted to make sure that they made a big splash at the E3 conference when Luna was unveiled. Nobody outside the visual styles team, not even me, knew what Luna was going to look like.

    On the other hand, the programmers who were setting up the infrastructure for visual styles needed to have something to test their code against. And something had to go out in the betas.

    The visual styles team came up with two styles. In secret, they worked on Luna. In public, they worked on a "decoy" visual style called "Mallard". (For non-English speakers: A mallard is a type of duck commonly used as the model for decoys.) The ruse was so successful that people were busy copying the decoy and porting it to their own systems. (So much for copyright protection.)

  • The Old New Thing

    Performance consequences of polling

    • 52 Comments

    Polling kills.

    A program should not poll as a matter of course. Doing so can have serious consequences on system performance. It's like checking your watch every minute to see if it's 3 o'clock yet instead of just setting an alarm.

    First of all, polling means that a small amount of CPU time gets eaten up at each poll even though there is nothing to do. Even if you tune your polling loop so its CPU usage is only, say, a measly one tenth of one percent, once this program is placed on a Terminal Server with 800 simultaneous connections, your 0.1% CPU has magnified into 80% CPU.

    Next, the fact that a small snippet of code runs at regular intervals means that it (and all the code that leads up to it) cannot be pruned from the system's working set. They remain present just to say "Nope, nothing to do." If your polling code touches any instance data (and it almost certainly will), that's a minimum of one page's worth of memory per instance. On an x86-class machine, that 4K times the number of copies of the program running. On that 800-user Terminal Server machine, you've just chewed up 3MB of memory, all of which is being kept hot just in case some rare event occurs.

    Finally, polling has deleterious effects even for people who aren't running humongous Terminal Server machines with hundreds of users. A single laptop will suffer from polling, because it prevents the CPU from going to more power-efficient sleep states, resulting in a hotter laptop and shorter battery life.

    Of course, Windows itself is hardly blame-free in this respect, but the performance team remains on the lookout for rogue polling in Windows and "politely reminds" teams they find engaging in polling that they should "strongly consider" other means of accomplishing what they're after.

  • The Old New Thing

    France, she is, how you say, on sale!

    • 46 Comments

    Marketplace reports on the start of the winter sale season in France. By law, retailers are permitted sales only twice a year, so the onset of sale season generates quite a bit of shopping madness. There is also a proposal to allow more sale periods, but opponents argue that doing so would harm smaller businesses. Coming from the land of sale fatigue (we just emerged from the after-Christmas sale season and are entering the Winter White Sale season, after which comes the President's Day season...), I find a certain appeal to the idea of limiting how often things can "go on sale". Who can forget the oriental rug stores that are perpetually going out of business? It's become such a joke that The New York Times flatly refuses to run "Going Out of Business" sales for oriental rug stores.

  • The Old New Thing

    When web sites rely on security holes

    • 44 Comments

    Perhaps the biggest risk when making a change in the name of security is all the things that may have been relying on the previously-lax security settings. After all, disabling an insecure feature is easy. The hard part is disabling it while retaining compatibility with people who were relying on that feature. In the security investigations I've been involved with, perhaps the largest chunk of my time is spent trying to find a way to mitigate the security hole without breaking existing customers. (And it's the Line of Business scenario that is the biggest question mark.)

    Here's a real-life example: Consider a sports web site which sells a service to subscribers wherein the site creates a pop-up window whenever a game's score has changed or some other significant event has occurred. That way, you can leave your browser minimized and go about your day, but when something happens in the game, it will pop up an alert.

    The round of security changes in Windows XP SP2 broke this site because the rules on positioning of pop-up windows were tightened so that pop-up windows could not appear outside the browser itself. This prevents pop-up windows from being used to cover important browser elements (such as the status bar, the address bar, or a security dialog) and makes it harder for pop-ups to masquerade as system dialogs. But it also broke this company's business model.

    And of course, if Microsoft does something that cause you to lose money, you sue.

    There were probably corporations that had internal web sites that relied on the ability to position pop-ups without restriction. Those corporations no doubt also complained about this change in the name of security.

    As with most security changes that have compatibility consequences, a "safety valve" was added to return to the old insecure behavior for those customers who were relying on it. In this case, you can put the affected sites in the Trusted Sites zone and enable the "Allow script-initiated windows without size or position constraints" setting. But this is just a stop-gap, re-opening the security hole to let this site continue to operate the way it does. The real fix is not to rely on the security hole.

  • The Old New Thing

    Why does a corrupted binary sometimes result in "Program too big to fit in memory"?

    • 44 Comments

    If you take a program and corrupt the header, or just take a large-ish file that isn't a program at all and give it a ".exe" extension, then try to run it (Warning: Save your work first!), you will typically get the error "Program too big to fit in memory". Why such a confusing error message? Why doesn't it say "Corrupted program"?

    Because the program isn't actually corrupted. Sort of.

    A Win32 executable file begins with a so-called "MZ" header, followed by a so-called "PE" header. If the "PE" header cannot be found, then the loader attempts to load the program as a Win16 executable file, which consists of an "MZ" header followed by an "NE" header.

    If neither a "PE" nor an "NE" header can be found after the "MZ" header, then the loader attempts to load the program as an MS-DOS relocatable executable. If not even an "MZ" header can be found, then the loader attempt to load the program as an MS-DOS non-relocatable executable (aka "COM format" since this is the format of CP/M .COM files).

    In pictures:

    MZPEWin32
    NEWin16
    elseMS-DOS relocatable
    else MS-DOS non-relocatable

    Observe that no matter what path you take through the chart, you will always end up at something. There is no exit path that says "Corrupted program".

    But where does "Program too big to fit in memory" come from?

    If the program header is corrupted, then various fields in the header such as those which specify the amount of memory required by the program will typically be nonsensical values. The loader sees an MS-DOS relocatable program that requires 800KB of conventional memory, and that's where "Out of memory" comes from.

    An MS-DOS non-relocatable program contains no such information about memory requirements. The rule for loading non-relocatable programs is simply to load the program into a single 64KB chunk of memory and set it on its way. Therefore, a program with no "MZ" header but which is larger than 64KB in size won't fit in the single 64KB chunk and consequently results in an "Out of memory" error.

    And since people are certain to ask:

    • "MZ" = the legendary Mark Zbikowski.
    • "NE" = "New Executable", back when Windows was "new".
    • "PE" = "Portable Executable", because one of Windows NT's claims to fame was its portability to architectures other than the x86.
    • "LE" = "Linear Executable", used by OS/2 and by Windows 95 device drivers.
  • The Old New Thing

    Taxes: Remote Desktop Connection and painting

    • 40 Comments

    An increasingly important developer tax is supporting Remote Desktop Connection properly. When the user is connected via a Remote Desktop Connection, video operations are transferred over the network connection to the client for display. Since networks have high latency and nowhere near the bandwidth of a local PCI or AGP bus, you need to adapt to the changing cost of drawing to the screen.

    If you draw a line on the screen, the "draw line" command is sent over the network to the client. If you draw text, a "draw text" command is sent (along with the text to draw). So far so good. But if you copy a bitmap to the screen, the entire bitmap needs to be transferred over the network.

    Let's write a sample program that illustrates this point. Start with our new scratch program and make the following changes:

    void Window::Register()
    {
        WNDCLASS wc;
        wc.style         = CS_VREDRAW | CS_HREDRAW;
        wc.lpfnWndProc   = Window::s_WndProc;
        ...
    }
    
    class RootWindow : public Window
    {
    public:
     virtual LPCTSTR ClassName() { return TEXT("Scratch"); }
     static RootWindow *Create();
    protected:
     LRESULT HandleMessage(UINT uMsg, WPARAM wParam, LPARAM lParam);
     LRESULT OnCreate();
     void PaintContent(PAINTSTRUCT *pps);
     void Draw(HDC hdc, PAINTSTRUCT *pps);
    private:
     HWND m_hwndChild;
    };
    
    void RootWindow::Draw(HDC hdc, PAINTSTRUCT *pps)
    {
     FillRect(hdc, &pps->rcPaint, (HBRUSH)(COLOR_WINDOW + 1));
     RECT rc;
     GetClientRect(m_hwnd, &rc);
     for (int i = -10; i < 10; i++) {
      TextOut(hdc, 0, i * 15 + rc.bottom / 2, TEXT("Blah blah"), 9);
     }
    }
    
    void RootWindow::PaintContent(PAINTSTRUCT *pps)
    {
     Draw(pps->hdc, pps);
    }
    
    LRESULT RootWindow::HandleMessage(
                              UINT uMsg, WPARAM wParam, LPARAM lParam)
    {
     switch (uMsg) {
     ...
     case WM_ERASEBKGND: return 1;
     ...
    }
    

    There is an odd division of labor here; the PaintContent method doesn't actually do anything aside from handing the work off to the Draw method to do the actual drawing. (You'll see why soon.) Make sure "Show window contents while dragging" is enabled and run this program and resize it vertically. Ugh, what ugly flicker. We fix this by the traditional technique of double-buffering.

    void RootWindow::PaintContent(PAINTSTRUCT *pps)
    {
     if (!IsRectEmpty(&pps->rcPaint)) {
      HDC hdc = CreateCompatibleDC(pps->hdc);
      if (hdc) {
       int x = pps->rcPaint.left;
       int y = pps->rcPaint.top;
       int cx = pps->rcPaint.right - pps->rcPaint.left;
       int cy = pps->rcPaint.bottom - pps->rcPaint.top;
       HBITMAP hbm = CreateCompatibleBitmap(pps->hdc, cx, cy);
       if (hbm) {
        HBITMAP hbmPrev = SelectBitmap(hdc, hbm);
        SetWindowOrgEx(hdc, x, y, NULL);
    
        Draw(hdc, pps);
    
        BitBlt(pps->hdc, x, y, cx, cy, hdc, x, y, SRCCOPY);
    
        SelectObject(hdc, hbmPrev);
        DeleteObject(hbm);
       }
       DeleteDC(hdc);
      }
     }
    }
    

    Our new PaintContent method creates an offscreen bitmap and asks the Draw method to draw into it. Once that's done, the results are copied to the screen at one go, thereby avoiding flicker. If you run this program, you'll see that it resizes nice and smooth.

    Now connect to the computer via a Remote Desktop Connection and run it again. Since Remote Desktop Connection disables "Show window contents while dragging", you can't use resizing to trigger redraws, so instead maximize the program and restore it a few times. Notice the long delay before the window is resized when you maximize it. That's because we are pumping a huge bitmap across the Remote Desktop Connection as part of that BitBlt call.

    Go back to the old version of the PaintContent method, the one that just calls Draw, and run it over Remote Desktop Connection. Ah, this one is fast. That's because the simpler version doesn't transfer a huge bitmap over the Remote Desktop Connection; it just sends twenty TextOut calls on a pretty short string of text. These take up much less bandwidth than a 1024x768 bitmap.

    We have one method that is faster over a Remote Desktop Connection, and another method that is faster when run locally. Which should we use?

    We use both, choosing our drawing method based on whether the program is running over a Remote Desktop Connection.

    void RootWindow::PaintContent(PAINTSTRUCT *pps)
    {
     if (GetSystemMetrics(SM_REMOTESESSION)) {
      Draw(pps->hdc, pps);
     } else if (!IsRectEmpty(&pps->rcPaint)) {
      ... as before ...
     }
    }
    

    Now we get the best of both worlds. When run locally, we use the double-buffered drawing which draws without flickering, but when run over a Remote Desktop Connection, we use the simple Draw method that draws directly to the screen rather than to an offscreen bitmap.

    This is a rather simple example of adapting to Remote Desktop Connection. In a more complex world, you may have more complicated data structures associated with the two styles of drawing, or you may have background activities related to drawing that you may want to turn on and off based on whether the program is running over a Remote Desktop Connection. Since the user can dynamically connect and disconnect, you can't just assume that the state of the Remote Desktop Connection when your program starts will be the state for the lifetime of the program. We'll see next time how we can adapt to a changing world.

  • The Old New Thing

    If your callback fails, it's your responsibility to set the error code

    • 39 Comments

    There are many cases where a callback function is allowed to halt an operation. For example, you might decide to return FALSE to the WM_NCCREATE message to prevent the window from being created, or you might decide to return FALSE to one of the many enumeration callback functions such as the EnumWindowsProc callback. When you do this, the enclosing operation will return failure back to its caller: the CreateWindow function returns NULL; the EnumWindows function returns FALSE.

    Of course, when this happens, the enclosing operation doesn't know why the callback failed; all it knows is that it failed. Consequently, it can't set a meaningful value to be retrieved by the GetLastError function.

    If you want something meaningful to be returned by the GetLastError function when your callback halts the operation, it's the callback's responsibility to set that value by calling the SetLastError function.

    This is something that is so obvious I didn't think it needed to be said; it falls into the "because computers aren't psychic (yet)" category of explanation. But apparently it wasn't obvious enough, so now I'm saying it.

  • The Old New Thing

    When programs assume that the system will never change, episode 3

    • 37 Comments

    One of the stranger application compatibility puzzles was solved by a colleague of mine who was trying to figure out why a particular program couldn't open the Printers Control Panel. Upon closer investigation, the reason became clear. The program launched the Control Panel, used FindWindow to locate the window, then accessed that window's "File" menu and extracted the strings from that menu looking for an item that contained the word "Printer". It then posted a WM_COMMAND message to the Control Panel window with the menu identifier it found, thereby simulating the user clicking on the "Printers" menu option.

    With Windows 95's Control Panel, this method fell apart pretty badly. There is no "Printers" option on the Control Panel's File menu. It never occurred to the authors of the program that this was a possibility. (Mind you, it was a possibility even in Windows 3.1: If you were running a non-English version of Windows, the name of the Printers option will be something like "Skrivare" or "Drucker". Not that it mattered, because the "File" menu will be called something like "Arkiv" or "Datei"! The developers of this program simply assumed that everyone in the world speaks English.)

    The code never checked for errors; it plowed ahead on the assumption that everything was going according to plan. The code eventually completed its rounds and sent a garbage WM_COMMAND message to the Control Panel window, which was of course ignored since it didn't match any of the valid commands on that window's menu.

    The punch line is that the mechanism for opening the Printers Control Panel was rather clearly spelled out on the very first page of the "Control Panel" chapter of the Windows 3.1 SDK:

    The following example shows how an application can start Control Panel and the Printers application from the command line by using the WinExec function:

        WinExec("control.exe printers", SW_SHOWNORMAL);
    

    In other words, they didn't even read past the first page.

    The solution: Create a "decoy" Control Panel window with the same class name as Windows 3.1, so that this program would find it. The purpose of these "decoys" is to draw the attention of the offending program, taking the brunt of the mistreatment and doing what they can to mimic the original behavior enough to keep that program happy. In this case, it waited patiently for the garbage WM_COMMAND message to arrive and dutifully launched the Printers Control Panel.

    Nowadays, this sort of problem would probably have been solved with the use of a shim. But this was back in Windows 95, where application compatibility technology was still comparatively immature. All that was available at the time were application compatibility flags and hot-patching of binaries, wherein the values are modified as they are loaded into memory. Using hot-patching technology was reserved for only the most extreme compatibility cases, because getting permission from the vendor to patch their program was a comparatively lengthy legal process. Patching was considered a "last resort" compatibility mechanism not only for the legal machinery necessary to permit it, but also because patching a program fixes only the versions of the program the patch was developed to address. If the vendor shipped ten versions of a program, ten different patches would have to be developed. And if the vendor shipped another version after Windows 95 was delivered to duplication, that version would be broken when Windows 95 hit the shelves.

    It is important to understand the distinction between what is a documented and supported feature and what is an implementation detail. Documented and supported features are contracts between Windows and your program. Windows will uphold its end of the contract for as long as that feature exists. Implementation details, on the other hand, are ephemeral; they can change at any time, be it at the next major operating system release, at the next service pack, even with the next security hotfix. If your program relies on implementation details, you're contributing to the compatibility cruft that Windows carries around from release to release.

    Over the next few days, I'll talk about other decoys that have been used in Windows.

    [Somebody caught my misspelling of "Drucker" while I was fixing it! - 7:45am]

  • The Old New Thing

    There are two types of rebates, and you need to be on the alert

    • 34 Comments

    Many commenters to my earlier entry on sales in France had questions about rebates. Slate explained the whole rebate thing back in 2003. The short version: There are two types of rebates, manufacturer rebates and retailer rebates. Manufacturer rebates exist because they want the retail price to go down, but they are afraid that if they just lowered the wholesale price, retailers would not pass the savings on to the consumer. A manufacturer's rebate ensures that all the benefit of the price drop goes to the consumer and not to any middle-men. Retailer rebates, on the other hand, are carefully crafted schemes designed to trick the consumer into buying the product and then failing to meet all the requirements for redeeming the rebate coupon. Read the Slate article for details.

  • The Old New Thing

    Welcome to the United States, unless you're a Canadian technologist who is an invited guest at a Microsoft conference, in which case, keep out

    • 32 Comments

    Vancouver technologist Darren Barefoot was invited to Redmond by the MSN Search team but was stopped by Immigration and denied entry.

    Ultimately, the customs agents concluded that because Microsoft was covering my flight and accommodation, I was being compensated for consulting activities. In order to enter the country, I'd need a work permit.

    He says Customs, but I'm pretty sure it was really Immigration that stopped him.

    Sorry, Darren. We miss you already. We even asked the rain to take a few days off and invited the sun out to play just for you.

Page 1 of 4 (36 items) 1234