December, 2003

  • The Old New Thing

    Famous people doing mundane things = news!

    • 7 Comments

    So an actor learns a foreign language and it's news: Actor Kingsley Masters Farsi Language. Meanwhile, tens of millions of people around the world learn a foreign language without any media coverage whatsoever.

    (And if you read the article: He didn't master Farsi. He mastered basic Farsi. Whatever that means.)
  • The Old New Thing

    You can read a contract from the other side

    • 8 Comments

    An interface is a contract, but remember that a contract applies to both parties. Most of the time, when you read an interface, you look at it from the point of view of the client side of the contract, but often it helps to read it from the server side.

    For example, let's look at the interface for control panel applications.

    Most of the time, when you're reading this documentation, you are wearing your "I am writing a Control Panel application" hat. So, for example, the documentation says

    When the controlling application first loads the Control Panel application, it retrieves the address of the CPlApplet function and subsequently uses the address to call the function and pass it messages.

    With your "I am writing a Control Panel application" hat, this means "Gosh, I had better have a function called CPlApplet and export it so I can receive messages."

    But if you are instead wearing your "I am hosting a Control Panel application" hat, this means, "Gosh, I had better call GetProcAddress() to get the address of the application's CPlApplet function so I can send it messages."

    Similarly, under the "Message Processing" section it lists the messages that are sent from the controlling application to the Control Panel application. If you are wearing your "I am writing a Control Panel application" hat, this means "Gosh, I had better be ready to receive these messages in this order." But if you are wearing your "I am hosting a Control Panel application" hat, this means "Gosh, I had better send these messages in the order listed."

    And finally, when it says "the controlling application release the Control Panel application by calling the FreeLibrary function," your "I am writing a Control Panel application" hat says "I had better be prepared to be unloaded," whereas your "I am hosting a Control Panel application" hat says, "This is where I unload the DLL."

    So let's try it. As always, start with our scratch program and change the WinMain:

    #include <cpl.h>
    
    int WINAPI WinMain(HINSTANCE hinst, HINSTANCE hinstPrev,
                       LPSTR lpCmdLine, int nShowCmd)
    {
      HWND hwnd;
    
      g_hinst = hinst;
    
      if (!InitApp()) return 0;
    
      if (SUCCEEDED(CoInitialize(NULL))) {/* In case we use COM */
    
          hwnd = CreateWindow(
              "Scratch",                      /* Class Name */
              "Scratch",                      /* Title */
              WS_OVERLAPPEDWINDOW,            /* Style */
              CW_USEDEFAULT, CW_USEDEFAULT,   /* Position */
              CW_USEDEFAULT, CW_USEDEFAULT,   /* Size */
              NULL,                           /* Parent */
              NULL,                           /* No menu */
              hinst,                          /* Instance */
              0);                             /* No special parameters */
    
          if (hwnd) {
            TCHAR szPath[MAX_PATH];
            LPTSTR pszLast;
            DWORD cch = SearchPath(NULL, TEXT("access.cpl"),
                         NULL, MAX_PATH, szPath, &pszLast);
            if (cch > 0 && cch < MAX_PATH) {
              RunControlPanel(hwnd, szPath);
          }
        }
        CoUninitialize();
      }
    
      return 0;
    }
    

    Instead of showing the window and entering the message loop, we start acting like a Control Panel host. Our victim today is access.cpl, the accessibility control panel. After locating the program on the path, we ask RunControlPanel to do the heavy lifting:

    void RunControlPanel(HWND hwnd, LPCTSTR pszPath)
    {
      // Maybe this control panel application has a custom manifest
      ACTCTX act = { 0 };
      act.cbSize = sizeof(act);
      act.dwFlags = 0;
      act.lpSource = pszPath;
      act.lpResourceName = MAKEINTRESOURCE(123);
      HANDLE hctx = CreateActCtx(&act);
      ULONG_PTR ulCookie;
      if (hctx == INVALID_HANDLE_VALUE ||
          ActivateActCtx(hctx, &ulCookie)) {
    
        HINSTANCE hinstCPL = LoadLibrary(pszPath);
        if (hinstCPL) {
          APPLET_PROC pfnCPlApplet = (APPLET_PROC)
            GetProcAddress(hinstCPL, "CPlApplet");
          if (pfnCPlApplet) {
            if (pfnCPlApplet(hwnd, CPL_INIT, 0, 0)) {
              int cApplets = pfnCPlApplet(hwnd, CPL_GETCOUNT, 0, 0);
              //  We're going to run application zero
              //  (In real life we might show the user a list of them
              //  and let them pick one)
              if (cApplets > 0) {
                CPLINFO cpli;
                pfnCPlApplet(hwnd, CPL_INQUIRE, 0, (LPARAM)&cpli);
                pfnCPlApplet(hwnd, CPL_DBLCLK, 0, cpli.lData);
                pfnCPlApplet(hwnd, CPL_STOP, 0, cpli.lData);
              }
            }
            pfnCPlApplet(hwnd, CPL_EXIT, 0, 0);
          }
    
          FreeLibrary(hinstCPL);
        }
    
        if (hctx != INVALID_HANDLE_VALUE) {
          DeactivateActCtx(0, ulCookie);
          ReleaseActCtx(hctx);
        }
      }
    }
    

    Ignore the red lines for now; we'll discuss them later.

    All we're doing is following the specification but reading it from the host side. So we load the library, locate its entry point, and call it with CPL_INIT, then CPL_GETCOUNT. If there are any control panel applications inside this CPL file, we inquire after the first one, double-click it (this is where all the interesting stuff happens), then stop it. After all that excitement, we clean up according to the rules set out for the host (namely, by sending a CPL_EXIT message.)

    So that's all. Well, except for the red parts. What's that about?

    The red parts are to support Control Panel applications that have a custom manifest. This is something new with Windows XP and is documented in MSDN here.

    If you go down to the "Using ComCtl32 Version 6 in Control Panel or a DLL That Is Run by RunDll32.exe" section, you'll see that the application provides its manifest to the Control Panel host by attaching it as resource number 123. So that's what the red code does: It loads and activates the manifest, then invites the Control Panel application to do its thing (with its manifest active), then cleans up. If there is no manifest, CreateActCtx will return INVALID_HANDLE_VALUE. We do not treat that as an error, since many programs don't yet provide a manifest.

    Exercise: What are the security implications of passing NULL as the first parameter to SearchPath?
  • The Old New Thing

    What order do programs in the startup group execute?

    • 22 Comments

    The programs in the startup group run in an unspecified order. Some people think they execute in the order they were created. Then you upgraded from Windows 98 to Windows 2000 and found that didn't work any more. Other people think they execute in alphabetical order. Then you installed a Windows XP multilingual user interface language pack and found that didn't work any more either.

    If you want to control the order that programs in the startup group are run, write a batch file that runs them in the order you want and put a shortcut to the batch file in your startup group.
  • The Old New Thing

    Why not just block the apps that rely on undocumented behavior?

    • 47 Comments
    Because every app that gets blocked is another reason for people not to upgrade to the next version of Windows. Look at all these programs that would have stopped working when you upgraded from Windows 3.0 to Windows 3.1.
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Compatibility
    

    Actually, this list is only partial. Many times, the compatibility fix is made inside the core component for all programs rather than targetting a specific program, as this list does.

    (The Windows 2000-to-Windows XP list is stored in your C:\WINDOWS\AppPatch directory, in a binary format to permit rapid scanning. Sorry, you won't be able to browse it easily. I think the Application Compatibility Toolkit includes a viewer, but I may be mistaken.)

    Would you have bought Windows XP if you knew that all these programs were incompatible?

    It takes only one incompatible program to sour an upgrade.

    Suppose you're the IT manager of some company. Your company uses Program X for its word processor and you find that Program X is incompatible with Windows XP for whatever reason. Would you upgrade?

    Of course not! Your business would grind to a halt.

    "Why not call Company X and ask them for an upgrade?"

    Sure, you could do that, and the answer might be, "Oh, you're using Version 1.0 of Program X. You need to upgrade to Version 2.0 for $150 per copy." Congratulations, the cost of upgrading to Windows XP just tripled.

    And that's if you're lucky and Company X is still in business.

    I recall a survey taken a few years ago by our Setup/Upgrade team of corporations using Windows. Pretty much every single one has at least one "deal-breaker" program, a program which Windows absolutely must support or they won't upgrade. In a high percentage of the cases, the program in question was developed by their in-house programming staff, and it's written in Visual Basic (sometimes even 16-bit Visual Basic), and the person who wrote it doesn't work there any more. In some cases, they don't even have the source code any more.

    And it's not just corporate customers. This affects consumers too.

    For Windows 95, my application compatibility work focused on games. Games are the most important factor behind consumer technology. The video card that comes with a typical computer has gotten better over time because games demand it. (Outlook certainly doesn't care that your card can do 20 bajillion triangles a second.) And if your game doesn't run on the newest version of Windows, you aren't going to upgrade.

    Anyway, game vendors are very much like those major corporations. I made phone call after phone call to the game vendors trying to help them get their game to run under Windows 95. To a one, they didn't care. A game has a shelf life of a few months, and then it's gone. Why would they bother to issue a patch for their program to run under Windows 95? They already got their money. They're not going to make any more off that game; its three months are over. The vendors would slipstream patches and lose track of how many versions of their program were out there and how many of them had a particular problem. Sometimes they wouldn't even have the source code any more.

    They simply didn't care that their program didn't run on Windows 95. (My favorite was the one that tried to walk me through creating a DOS boot disk.)

    Oh, and that Application Compatibility Toolkit I mentioned above. It's a great tool for developers, too. One of the components is the Verifier: If you run your program under the verifier, it will monitor hundreds of API calls and break into the debugger when you do something wrong. (Like close a handle twice or allocate memory with GlobalAlloc but free it with LocalAlloc.)

    The new application compatibility architecture in Windows XP carries with it one major benefit (from an OS development perspective): See all those DLLs in your C:\WINDOWS\AppPatch directory? That's where many of the the compatibility changes live now. The compatibility workarounds no longer sully the core OS files. (Not all classes of compatibility workarounds can be offloaded to a compatibility DLL, but it's a big help.)
  • The Old New Thing

    When programs grovel into undocumented structures...

    • 58 Comments

    Three examples off the top of my head of the consequences of grovelling into and relying on undocumented structures.

    Defragmenting things that can't be defragmented
    In Windows 2000, there are several categories of things that cannot be defragmented. Directories, exclusively-opened files, the MFT, the pagefile... That didn't stop a certain software company from doing it anyway in their defragmenting software. They went into kernel mode, reverse-engineered NTFS's data structures, and modified them on the fly. Yee-haw cowboy! And then when the NTFS folks added support for defragmenting the MFT to Windows XP, these programs went in, modified NTFS's data structures (which changed in the meanwhile), and corrupted your disk.

    Of course there was no mention of this illicit behavior in the documentation. So when the background defragmenter corrupted their disks, Microsoft got the blame.

    Parsing the Explorer view data structures
    A certain software company decided that they wanted to alter the behavior of the Explorer window from a shell extension. Since there is no way to do this (a shell extension is not supposed to mess with the view; the view belongs to the user), they decided to do it themselves anyway.

    From the shell extension, they used an undocumented window message to get a pointer to one of the internal Explorer structures. Then they walked the structure until they found something they recognized. Then they knew, "The thing immediately after the thing that I recognize is the thing that I want."

    Well, the thing that they recognize and the thing that they want happened to be base classes of a multiply-derived class. If you have a class with multiple base classes, there is no guarantee from the compiler which order the base classes will be ordered. It so happened that they appeared in the order X,Y,Z in all the versions of Windows this software company tested against.

    Except Windows 2000.

    In Windows 2000, the compiler decided that the order should be X,Z,Y. So now they grovelled in, saw the "X" and said "Aha, the next thing must be a Y" but instead they got a Z. And then they crashed your system some time later.

    So I had to create a "fake X,Y" so when the program went looking for X (so it could grab Y), it found the fake one first.

    This took the good part of a week to figure out.

    Reaching up the stack
    A certain software company decided that it was too hard to take the coordinates of the NM_DBLCLK notification and hit-test it against the treeview to see what was double-clicked. So instead, they take the address of the NMHDR structure passed to the notification, add 60 to it, and dereference a DWORD at that address. If it's zero, they do one thing, and if it's nonzero they do some other thing.

    It so happens that the NMHDR is allocated on the stack, so this program is reaching up into the stack and grabbing the value of some local variable (which happens to be two frames up the stack!) and using it to control their logic.

    For Windows 2000, we upgraded the compiler to a version which did a better job of reordering and re-using local variables, and now the program couldn't find the local variable it wanted and stopped working.

    I got tagged to investigate and fix this. I had to create a special NMHDR structure that "looked like" the stack the program wanted to see and pass that special "fake stack".

    I think this one took me two days to figure out.

    I hope you understand why I tend to go ballistic when people recommend relying on undocumented behavior. These weren't hobbyists in their garage seeing what they could do. These were major companies writing commercial software.

    When you upgrade to the next version of Windows and you experience (a) disk corruption, (b) sporadic Explore crashes, or (c) sporadic loss of functionality in your favorite program, do you blame the program or do you blame Windows?

    If you say, "I blame the program," the first problem is of course figuring out which program. In cases (a) and (b), the offending program isn't obvious.

  • The Old New Thing

    The cult of PowerPoint

    • 3 Comments

    Recent articles on how PowerPoint contributed to failure at NASA reminded me that this is hardly a new discovery. The Department of Defense long ago discovered that PowerPoint is a great way to hide the fact that you don't know what you're talking about.

    I remember reading an article (which I of course can't find now) of a "cult of PowerPoint". Apparently some organizations take the built-in PowerPoint template as gospel. Every presentation must follow the structure established by the PowerPoint template. The first slide is always a title; the second is always a statement of a problem; the third slide is always summary of previous work, etc.

    Um, people, it's just a template.

    And before the Linux folks get all so gloaty: "Ravens Coach Brian Billick faults last week's [October 2001] defensive breakdown on team's switch to Linux operating system."

  • The Old New Thing

    One in five Swedes steal their Christmas tree

    • 5 Comments
    According to Aftonbladet, "Gathering Stockholm's finest news from overheard conversations on the street corner", En av fem stjäl sin julgran. ("One in five steals their Christmas tree.") This of course comes from a highly scientific online reader poll. The question is, "How do you get your Christmas tree?" and the response categories are (from top to bottom, with percentages as of 2:31pm PST): "Buy" (35.7%), "Cut with permission" (13.2%), "Cut without permission" (14.3%), "Steal from some other source" (6.1%), "Get from some other source" (4.1%) and "Don't have a tree" (26.7%).
  • The Old New Thing

    How do I determine whether I own a critical section if I am not supposed to look at internal fields?

    • 16 Comments

    Seth asks how he can perform proper exception-handling cleanup if he doesn't know whether he needs to clean up a critical section.

    I'm using SEH, and have some __try/__except blocks in which the code enters and leaves critical sections. If an exception occurs, I don't know if I'm currently in the CS or not. Even wrapping the code in __try/__finally wouldn't solve my problems.

    Answer: You know whether you own the CRITICAL_SECTION because you entered it.

    Method 1: Deduce it from your instruction pointer.
    "If I'm at this line of code, then I must be in the critical section."
    __try {
      ...
      EnterCriticalSection(x);
      __try { // if an exception happens here
         ...  // make sure we leave the CS
      } __finally { LeaveCriticalSection(x); }
      ...
    } except (filter) {
      ...
    }
    

    Note that this technique is robust to nested calls to EnterCriticalSection. If you take the critical section again, then wrap the nested call in its own try/finally.

    Method 2: Deduce it from local state
    "I'll remember whether I entered the critical section."
    int cEntered = 0;
    __try {
      ...
      EnterCriticalSection(x);
      cEntered++;
      ...
      cEntered--;
      LeaveCriticalSection(x);
      ...
    } except (filter) {
      while (cEntered--)
        LeaveCriticalSection(x);
      ...
    }
    

    Note that this technique is also robust to nested calls to EnterCriticalSection. If you take the critical section again, increment cEntered another time.

    Method 3: Track it in an object
    Wrap the CRITICAL_SECTION in another object.

    This most closely matches what Seth is doing today.

    class CritSec : CRITICAL_SECTION
    {
    public:
     CritSec() : m_dwDepth(0), m_dwOwner(0)
       { InitializeCriticalSection(this); }
     ~CritSec() { DeleteCriticalSection(this); }
     void Enter() { EnterCriticalSection(this);
        m_dwDepth++;
        m_dwOwner = GetCurrentThreadId(); }
     void Leave() { if (!--m_dwDepth) m_dwOwner=0;
        LeaveCriticalSection(this); }
     bool Owned()
       { return GetCurrentThreadId() == m_dwOwner; }
    private:
      DWORD m_dwOwner;
      DWORD m_dwDepth;
    };
    
    __try {
      assert(!cs.Owned());
      ...
      cs.Enter();
      ...
      cs.Leave();
      ...
    } except (filter) {
      if (cs.Owned()) cs.Leave();
    }
    

    Notice that this code is not robust to nested critical sections (and correspondingly, Seth's code isn't either). If you take the critical section twice, the exception handler will only leave it once.

    Note also that we assert that the critical section is not initially owned. If it happens to be owned already, then our cleanup code may attempt to leave a critical section that it did not enter. (Imagine if an exception occurs during the first "...".)

    Method 4: Track it in a smarter object
    Wrap the CRITICAL_SECTION in a smarter object.

    Add the following method to the CritSec object above:

     DWORD Depth() { return Owned() ? m_dwDepth : 0; }
    

    Now you can be robust to nested critical sections:

    DWORD dwDepth = cs.Depth();
    __try {
      ...
      cs.Enter();
      ...
      cs.Leave();
      ...
    } except (filter) {
      while (cs.Depth() > dwDepth)
        cs.Leave();
    }
    

    Note however that I am dubious of the entire endeavor that inspired the original question.

    Cleaning up behind an exception thrown from within a critical section raises the issue of "How do you know what is safe to clean up?" You have a critical section because you are about to destabilize a data structure and you don't want others to see the data structure while it is unstable. But if you take an exception while owning the critical section - well your data structures are unstable at the point of the exception. Merely leaving the critical section will now leave your data structures in an inconsistent state, leading to harder-to-diagnose bugs later on. "How did my count get out of sync?"

    More rants on exceptions in a future entry.

    Exercise: Why don't we need to use synchronization to protect the uses of m_dwDepth and m_dwOwner?

    Update 2004/Jan/16: Seth pointed out that I got the two branches of the ternary operator backwards in the Depth() function. Fixed.

  • The Old New Thing

    German sounds more and more like "Alles Lookenpeepers" every day

    • 3 Comments
    München wird ausgebootet. Maybe someday Alles Lookenspeepers will become proper German.
  • The Old New Thing

    Sometimes, an app just wants to crash

    • 14 Comments
    I think it was Internet Explorer 5.0, when we discovered that a thirdparty browser extension had a serious bug, the details of which aren't important. The point was that this bug was so vicious, it crashed IE pretty frequently. Not good. To protect the users from this horrible fate, we marked the object as "bad" so IE wouldn't load it.

    And then we got an angry letter from the company that wrote this browser extension. They demanded that we remove the marking from their object and let IE crash in flames every time the user wanted to surf the web. Why? Because they also wanted us to hook up Windows Error Reporting to detect this crash and put up a dialog that says, "A fix for the problem you experienced is available. Click here for more information," and the "more information" was a redirect to the company's web site (where you could upgrade to version x.y of Program ABC for a special price of only $nnn!). (Actually I forget whether the upgrade was free or not, but the story is funnier if you had to pay for it.)

    In other words, they were crashing on purpose in order to drive upgrade revenue.

    (Astute readers may have noticed an additional irony: If the plug-in crashed IE, then how could the user view the company's web page so they could purchase and download the latest version?)
Page 2 of 5 (45 items) 12345