May, 2005

  • The Old New Thing

    Why does Add or Remove Programs show a large blank space?

    • 43 Comments

    Some people have noticed that certain programs cause the Add or Remove Programs control panel to create an enormous amount of blank space. What's going on?

    These are programs that have bad custom uninstall icon registrations.

    If you go to the registry key HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Uninstall, you'll find a list of programs that have registered for appearing in the Add or Remove Programs control panel. Some of them might have been so kind as to provide a "DisplayIcon" value, thereby saving the control panel the indignity of having to guess at an appropriate icon.

    Unfortunately, if they put a bad icon registration in that registry value, the result is a bunch of blank space since the control panel is trying to reserve space for a bogus icon.

    The format of the icon registration is a filename, optionally followed by a comma and a decimal number.

    C:\full\path\to\icon\file.dll
    C:\full\path\to\icon\file.dll,123
    

    Since this is not a command line, quotation marks are not necessary (although they are tolerated). Furthermore, the number can be any value except for -1. Why is -1 forbidden? Because the ExtractIcon function treats the value -1 specially.

    If the icon file does not exist in the icon file, or if the icon number is -1, then the icon specification is invalid and the Add or Remove Programs control panel will reserve an odd amount of space for an icon that doesn't exist.

    Perhaps the Add or Remove Programs control panel should be more tolerant of invalid icon registrations? Or should it stay the way it is, adhering to the "Don't bend over backwards to fix buggy programs; force the program authors to fix their own bugs" policy that so many of my readers advocate? (Noting furthermore that refusing to accomodate invalid icon registrations makes it look like Add or Remove Programs is the buggy one.)

  • The Old New Thing

    You can't simulate keyboard input with PostMessage

    • 15 Comments

    Some people attempt to simulate keyboard input to an application by posting keyboard input messages, but this is not reliable for many reasons.

    First of all, keyboard input is a more complicated matter than those who imprinted on the English keyboard realize. Languages with accent marks have dead keys, Far East languages have a variety of Input Method Editors, and I have no idea how complex script languages handle input. There's more to typing a character than just pressing a key.

    Second, even if you manage to post the input messages into the target window's queue, that doesn't update the keyboard shift states. When the code behind the window calls the GetKeyState function or the GetAsyncKeyState function, it's going to see the "real" shift state and not the fake state that your posted messages have generated.

    The SendInput function was designed for injecting input into Windows. If you use that function, then at least the shift states will be reported correctly. (I can't help you with the complex input problem, though.)

  • The Old New Thing

    How do I cover the taskbar with a fullscreen window?

    • 44 Comments

    For some reason, people think too hard. If you want to create a fullscreen window that covers the taskbar, just create a fullscreen window and the taskbar will automatically get out of the way. Don't go around hunting for the taskbar and poking it; let it do its thing.

    As always, start with the scratch program and add the following:

    HWND CreateFullscreenWindow(HWND hwnd)
    {
     HMONITOR hmon = MonitorFromWindow(hwnd,
                                       MONITOR_DEFAULTTONEAREST);
     MONITORINFO mi = { sizeof(mi) };
     if (!GetMonitorInfo(hmon, &mi)) return NULL;
     return CreateWindow(TEXT("static"),
           TEXT("something interesting might go here"),
           WS_POPUP | WS_VISIBLE,
           mi.rcMonitor.left,
           mi.rcMonitor.top,
           mi.rcMonitor.right - mi.rcMonitor.left,
           mi.rcMonitor.bottom - mi.rcMonitor.top,
           hwnd, NULL, g_hinst, 0);
    }
    
    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
    {
     if (ch == TEXT(' ')) {
      CreateFullscreenWindow(hwnd);
     }
    }
    
        HANDLE_MSG(hwnd, WM_CHAR, OnChar);
    

    Note that this sample program doesn't worry about destroying that fullscreen window or preventing the user from creating more than one. It's just a sample. The point is seeing how the CreateFullScreenWindow function is written.

    We use the MonitorFromWindow function to figure out which monitor we should go fullscreen to. Note that in a multiple monitor system, this might not be the same monitor that the taskbar is on. Fortunately, we don't have to worry about that; the taskbar figures it out.

    I've seen people hunt for the taskbar window and then do a ShowWindow(hwndTaskbar, SW_HIDE) on it. This is nuts for many reasons.

    First is a mental exercise you should always use when evaluating tricks like this: "What if two programs tried this trick?" Now you have two programs both of which think they are in charge of hiding and showing the taskbar, neither of which is coordinating with the other. The result is a mess. One program hides the taskbar, then the other does, then the first decides it's finished so it unhides the taskbar, but the second program wasn't finished yet and gets a visible taskbar when it thought it should be hidden. Things only go downhill from there.

    Second, what if your program crashes before it gets a chance to unhide the taskbar? The taskbar is now permanently hidden and the user has to log off and back on to get their taskbar back. That's not very nice.

    Third, what if there is no taskbar at all? It is common in Terminal Server scenarios to run programs by themselves without Explorer [archived]. In this configuration, there is no Explorer, no taskbar. Or maybe you're running on a future version of Windows that doesn't have a taskbar, it having been replaced by some other mechanism. What will your program do now?

    Don't do any of this messing with the taskbar. Just create your fullscreen window and let the taskbar do its thing automatically.

  • The Old New Thing

    Another dead computer: My personal laptop

    • 33 Comments

    I'm kind of surprised at how much people reacted to my previous dead computer story. I guess there's an audience for stories about dead computers.

    Today's dead computer is my Sony Vaio PCG-Z505LE laptop, with a 600MHz processor and 192MB of RAM. Certainly a big step up from that 486/50 with 12MB of RAM.

    Laptop computers have a comparatively short lifetime. (Hardware vendors must love them.) I've learned that the best value comes from buying used laptops. You have to accept being a bit behind the curve, but the way I use my laptop, it needs to do only a small number of things:

    • surf the web,
    • read e-mail,
    • use Remote Desktop Connection to access my desktop machine,
    • download pictures from my digital camera, and
    • compile the occasional program.

    Only that last operation requires a hefty processor, and I do it so rarely that it doesn't bother me that it's kind of slow. (I just run the command line version of the compiler, so that at least takes the IDE overhead out of the picture.)

    I bought this laptop two years ago, used, and it ran just fine until a couple months ago when the internal power supply burnt out. I was ready to abandon the model line and give away the accessories I had bought, including a $200+ double-capacity battery.

    Allow me to digress on laptop batteries. Observe that batteries for old-model laptops cost almost as much as the laptops themselves. That's because the battery is the only real consumable in a laptop computer. The other components will run practically indefinitely if you don't drop them or douse them in soda, but batteries just plain wear out. That's where the money is.

    This means that many ads for used laptops will mention "needs new battery" at the end. And those are the ones I sought out. Because I have a working battery! Most prospective buyers would be turned off by a dead battery, but that didn't bother me one bit.

    The replacement laptop arrived a few days ago, and it runs great. I wiped the drive and reinstalled Windows XP from scratch. (Challenging because the laptop doesn't come with a bootable CD-ROM drive. I had to use floppies!) I may install a handful of programs but that's all. I don't like installing software on my computer. The more programs you install, the more likely there's going to be a conflict somewhere.

    The old laptop has already started being scavenged for parts. A friend of mine needed a replacement laptop hard drive, so I gave him my old one. The battery and power brick can of course be used by the new laptop. The memory from the old Vaio is no use, since the Vaio has only one memory expansion slot. The other parts of the old laptop aren't much use for anything aside from spares. Perhaps I should put the old laptop on concrete blocks on my front lawn.

    Next time (if there is a next time), the story of the dead AlphaServer.

  • The Old New Thing

    The Microsoft corporate network: 1.7 times worse than hell

    • 17 Comments

    Today I'm going to tell a story from 1996. Why? Because I can.

    One of the tests performed by Windows Hardware Quality Labs (WHQL) was the NCT packet stress test which had the nickname "Hell". The purpose of the test was to flood a network card with an insane number of packets, in order to see how it handled extreme conditions. It uncovered packet-dropping bugs, timing problems, all sorts of great stuff. Network card vendors used it to determine what size internal hardware buffers should be in order to cover "all reasonable network traffic scenarios".

    It so happened that at the time this test had currency (1996 era), the traffic on the Microsoft corporate network was approximately 1.7 times worse than the NCT packet stress test. A card could pass the Hell test with flying colors, yet drop 90% of its packets when installed on a computer at Microsoft because the card simply couldn't keep up with the traffic.

    The open secret among network card vendors was, "If you want your card to work with Windows, submit one card to WHQL and send another to a developer on the Windows team."

    (This rule applied to hardware other than network cards. I was "gifted" a sound card from a major manufacturer and installed it on my main machine. It wasn't long before I found and fixed a crashing bug in their driver.)

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    Loading the dictionary, part 6: Taking advantage of our memory allocation pattern

    • 25 Comments

    After our latest round of optimization, the 100ms barrier teased us, just milliseconds away. Profiling the resulting program reveals that 60% of the CPU is spent in operator new. Is there anything we can do about that?

    Indeed, we can. Notice that the memory allocation pattern for the strings in our dictionary is quite special: Once a string is allocated into the dictionary, it is never modified or freed while the dictionary is in use. When the dictionary is freed, all the strings are deleted at once. This means that we can design an allocator tailored to this usage pattern.

    I don't know whether there is a standard name for this thing, so I'm just going to call it a StringPool. A string pool has the following characteristics:

    • Once you allocate a string, you can't modify or free it as long as the pool remains in existence.
    • If you destroy the string pool, all the strings in it are destroyed.

    We implement it by using the same type of fast allocator that the CLR uses: A single pointer. [25 May 2005: The blog server software corrupts the diagram, sorry.]

    allocated free
    p

    To allocate memory, we just increment p by the number of bytes we need. If we run out of memory, we just allocate a new block, point p to its start, and carve the memory out of the new block. Destroying the pool consists of freeing all the blocks.

    Note also that this memory arrangement has very good locality. Instead of scattering the strings all over the heap, they are collected into one location. Furthermore, they are stored in memory in exactly the order we're going to access them, which means no wasted page faults or cache lines. (Well, you don't know that's the order we're going to access them, but it's true. This is one of those "performance-guided designs" I mentioned a little while ago.)

    class StringPool
    {
    public:
     StringPool();
     ~StringPool();
     LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);
    
    private:
     union HEADER {
      struct {
       HEADER* m_phdrPrev;
       SIZE_T  m_cb;
      };
      WCHAR alignment;
     };
     enum { MIN_CBCHUNK = 32000,
            MAX_CHARALLOC = 1024*1024 };
    
    private:
     WCHAR*  m_pchNext;   // first available byte
     WCHAR*  m_pchLimit;  // one past last available byte
     HEADER* m_phdrCur;   // current block
     DWORD   m_dwGranularity;
    }; // colorization fixed 25 May
    

    Each block of memory we allocate begins with a StringPool::HEADER structure, which we use to maintain a linked list of blocks as well as providing enough information for us to free the block when we're done.

    Exercise: Why is HEADER a union containing a structure rather than just being a structure? What is the significance of the alignment member?

    inline RoundUp(DWORD cb, DWORD units)
    {
        return ((cb + units - 1) / units) * units;
    }
    
    StringPool::StringPool()
     : m_pchNext(NULL), m_pchLimit(NULL), m_phdrCur(NULL)
    {
     SYSTEM_INFO si;
     GetSystemInfo(&si);
     m_dwGranularity = RoundUp(sizeof(HEADER) + MIN_CBCHUNK,
                               si.dwAllocationGranularity);
    }
    

    At construction, we compute the size of our chunks. We base it on the system allocation granularity, choosing the next multiple of the system allocation granularity that is at least sizeof(HEADER) + MIN_CBCHUNK in size. Since a chunk is supposed to be a comfortably large block of memory, we need to enforce a minimum chunk size to avoid having an enormous number of tiny chunks if we happen to be running on a machine with a very fine allocation granularity.

    LPWSTR StringPool::AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd)
    {
     size_t cch = pszEnd - pszBegin + 1;
     LPWSTR psz = m_pchNext;
     if (m_pchNext + cch <= m_pchLimit) {
      m_pchNext += cch;
      lstrcpynW(psz, pszBegin, cch);
      return psz;
     }
    
     if (cch > MAX_CHARALLOC) goto OOM;
     DWORD cbAlloc = RoundUp(cch * sizeof(WCHAR) + sizeof(HEADER),
                                                              m_dwGranularity);
     BYTE* pbNext = reinterpret_cast<BYTE*>(
                      VirtualAlloc(NULL, cbAlloc, MEM_COMMIT, PAGE_READWRITE));
     if (!pbNext) {
    OOM:
      static std::bad_alloc OOM;
      throw(OOM);
     }
    
     m_pchLimit = reinterpret_cast<WCHAR*>(pbNext + cbAlloc);
     HEADER* phdrCur = reinterpret_cast<HEADER*>(pbNext);
     phdrCur->m_phdrPrev = m_phdrCur;
     phdrCur->m_cb = cbAlloc;
     m_phdrCur = phdrCur;
     m_pchNext = reinterpret_cast<WCHAR*>(phdrCur + 1);
    
     return AllocString(pszBegin, pszEnd);
    }
    

    To allocate a string, we first try to carve it out of the remainder of the current chunk. This nearly always succeeds.

    If the string doesn't fit in the chunk, we allocate a new chunk based on our allocation granularity. To avoid integer overflow in the computation of the desired chunk size, we check against a fixed "maximum allocation" and go stright to the out-of-memory handler if it's too big.

    Once we have a new chunk, we link it into our list of HEADERs and abandon the old chunk. (Yes, this wastes some memory, but in our usage pattern, it's not much, and trying to squeeze out those last few bytes isn't worth the added complexity.) Once that's done, we try to allocate again; this second time will certainly succeed since we made sure the new chunk was big enough. (And any decent compiler will detect this as a tail recursion and turn it into a "goto".)

    There is subtlety here. Notice that we do not update m_pchNext until after we're sure we either satisfied the allocation or allocated a new chunk. This ensures that our member variables are stable at the points where exceptions can be thrown. Writing exception-safe code is hard, and seeing the difference between code that is and isn't exception safe is often quite difficult.

    StringPool::~StringPool()
    {
     HEADER* phdr = m_phdrCur;
     while (phdr) {
      HEADER hdr = *phdr;
      VirtualFree(hdr.m_phdrPrev, hdr.m_cb, MEM_RELEASE);
      phdr = hdr.m_phdrPrev;
     }
    }
    

    To destroy the string pool, we walk our list of chunks and free each one. Note the importance of copying the HEADER out of the chunk before we free it!

    Using this string pool requires only small changes to the rest of our program.

    struct DictionaryEntry
    {
     bool Parse(const WCHAR *begin, const WCHAR *end, StringPool& pool);
     // void Destruct() {
     //  delete[] m_pszTrad;
     //  delete[] m_pszSimp;
     //  delete[] m_pszPinyin;
     //  delete[] m_pszEnglish;
     // }
     LPWSTR m_pszTrad;
     LPWSTR m_pszSimp;
     LPWSTR m_pszPinyin;
     LPWSTR m_pszEnglish;
    };
    
    class Dictionary
    {
    public:
     Dictionary();
     // ~Dictionary();
     int Length() { return v.size(); }
    private:
     vector v;
     StringPool m_pool;
    };
    
    // Dictionary::~Dictionary()
    // {
    //  for (vector<DictionaryEntry>::iterator i = v.begin();
    //       i != v.end(); i++) {
    //   i->Destruct();
    //  }
    // }
    

    We no longer need to free the strings in the DictionaryEntry manually, so the Destruct method and the Dictionary destructor can go.

    bool DictionaryEntry::Parse(
       const WCHAR *begin, const WCHAR *end,
       StringPool& pool)
    {
     const WCHAR* pch = std::find(begin, end, L' ');
     if (pch >= end) return false;
     m_pszTrad = pool.AllocString(begin, pch);
     begin = std::find(pch, end, L'[') + 1;
     if (begin >= end) return false;
     pch = std::find(begin, end, L']');
     if (pch >= end) return false;
     m_pszPinyin = pool.AllocString(begin, pch);
     begin = std::find(pch, end, L'/') + 1;
     if (begin >= end) return false;
     for (pch = end; *--pch != L'/'; ) { }
     if (begin >= pch) return false;
     m_pszEnglish = pool.AllocString(begin, pch);
     return true;
    }
    
    Dictionary::Dictionary()
    {
     ...
        if (de.Parse(buf, buf + cchResult, m_pool)) {
     ...
    }
    

    And finally, we pass our string pool to DictionaryEntry::Parse so it knows where to get memory for its strings from.

    With these changes, the dictionary loads in 70ms (or 80ms if you include the time it takes to destroy the dictionary). This is 70% faster than the previous version, and over three times as fast if you include the destruction time.

    And now that we've reached our 100ms goal, it's a good time to stop. We've gotten the running time of dictionary loading down from an uncomfortable 2080ms to a peppier 70ms, a nearly 30-fold improvement, by repeatedly profiling and focusing on where the most time is being spent. (I have some more simple tricks that shave a few more milliseconds off the startup time. Perhaps I'll bring them into play if other changes to startup push us over the 100ms boundary. As things stand, the largest CPU consumers are MultiByteToWideChar and lstrcpynW, so that's where I would focus next.)

    That's the end of the first stage. The next stage will be displaying the dictionary in an owner-data listview, but you'll have to wait until next month.

  • The Old New Thing

    Loading the dictionary, part 1: Starting point

    • 33 Comments

    The first thing we'll need to do in our little dictionary program is to load the dictionary into memory. The format of the dictionary file is as a plain text file, each line of which is of the form

    Chinese [pinyin] /English 1/English 2/.../
    舉例 [ju3 li4] /to give an example/
    

    Since it was the Big5 dictionary we downloaded, the Chinese characters are in Big5 format, known to Windows as code page 950. Our program will be Unicode, so we'll have to convert it as we load the dictionary. Yes, I could've used the Unicode version of the dictionary, but it so happens that when I set out to write this program, there was no Unicode version available. Fortunately, this oversight opened up the opportunity to illustrate some other programming decisions and techniques.

    The first stage in our series of exercises will be loading the dictionary into memory.

    #define UNICODE
    #define _UNICODE
    #include <windows.h>
    #include <string>
    #include <fstream>
    #include <iostream> // for cin/cout
    #include <vector>
    
    using std::string;
    using std::wstring;
    using std::vector;
    
    struct DictionaryEntry
    {
     bool Parse(const wstring& line);
     wstring trad;
     wstring simp;
     wstring pinyin;
     wstring english;
    };
    
    bool DictionaryEntry::Parse(const wstring& line)
    {
        wstring::size_type start = 0;
        wstring::size_type end = line.find(L' ', start);
        if (end == wstring::npos) return false;
        trad.assign(line, start, end);
        start = line.find(L'[', end);
        if (start == wstring::npos) return false;
        end = line.find(L']', ++start);
        if (end == wstring::npos) return false;
        pinyin.assign(line, start, end - start);
        start = line.find(L'/', end);
        if (start == wstring::npos) return false;
        start++;
        end = line.rfind(L'/');
        if (end == wstring::npos) return false;
        if (end <= start) return false;
        english.assign(line, start, end-start);
        return true;
    }
    
    class Dictionary
    {
    public:
     Dictionary();
     int Length() { return v.size(); }
     const DictionaryEntry& Item(int i) { return v[i]; }
    private:
     vector<DictionaryEntry> v;
    };
    
    Dictionary::Dictionary()
    {
     std::wifstream src;
     src.imbue(std::locale(".950"));
     src.open("cedict.b5");
     wstring s;
     while (getline(src, s)) {
      if (s.length() > 0 && s[0] != L'#') {
       DictionaryEntry de;
       if (de.Parse(s)) {
        v.push_back(de);
       }
      }
     }
    }
    
    int __cdecl main(int argc, const char* argv[])
    {
     DWORD dw = GetTickCount();
     {
      Dictionary dict;
      std::cout << dict.Length() << std::endl;
      std::cout << GetTickCount() - dw << std::endl;
     }
     std::cout << GetTickCount() - dw << std::endl;
     return 0;
    }
    

    Our dictionary is just a list of words with their English definitions. The Chinese words are written in three forms (traditional Chinese, simplified Chinese, and Pinyin romanization). For those who are curious, there are two writing systems for the Mandarin Chinese language and two phonetic systems. Which one a particular Mandarin-speaking population follows depends on whether they fell under the influence of China's language reform of 1956. Traditional Chinese characters and the Bopomo phonetic system (also called Bopomofo) are used on Taiwan; simplified Chinese characters and the Pinyin system are used in China. Converting Pinyin to Bopomo isn't interesting, so I've removed that part from the program I'm presenting here.

    (The schism in the spelling of the English language follows a similar pattern. Under the leadership of Noah Webster, the United States underwent its own spelling reform, but countries which were under the influence of the British crown retained the traditional spellings. Spelling reform continues in other languages even today, and the subject is almost always highly contentious, with traditionalists and reformists pitted against each other in a battle over a language's—and by proxy, a culture's—identity.)

    The program itself is fairly straightforward. It creates a Unicode file stream wifstream and "imbues" it with code page 950 (Big5). This instructs the runtime to interpret the bytes of the file interpreted in the specified code page. We read strings out of the file, ignore the comments, and parse the rest, appending them to our vector of dictionary entries.

    Parsing the line consists of finding the spaces, brackets, and slashes, and splitting the line into the traditional Chinese, Pinyin, and English components. (We'll deal with simplified Chinese later.)

    When I run this program on my machine, the dictionary loads in 2080ms (or 2140ms if you include the time to run the destructor). This is an unacceptably long startup time, so the first order of business is to make startup faster. That will be the focus of this stage.

    Notice that as a sanity check, I print the total number of words in the dictionary. The number should match the number of lines in the cedict.b5 file (minus the one comment line). If not, then I know that something went wrong. This is an important sanity check: You might make a performance optimization that looks great when you run it past a stopwatch, only to discover that your "optimization" actually introduced a bug. For example, one of my attempted optimizations of this program resulted in a phenomenal tenfold speedup, but only because of a bug that caused it to think it was finished when it had in reality processed only 10% of the dictionary!

    As my colleague Rico Mariani is fond of saying, "It's easy to make it fast if it doesn't have to work!"

  • The Old New Thing

    Developing a Chinese/English dictionary: Introduction

    • 28 Comments

    The other day, one of my colleagues mentioned that his English name "Ben" means "stupid" in Chinese: 笨/bèn/ㄅㄣˋ. (His wife is Chinese; that's why he knows this in the first place.) Knowing that the Chinese language is rich in homophones, I fired up my Chinese/English dictionary program to see if we could find anything better. (Unfortunately, the best I could come up with was 賁/贲/bēn/ㄅㄣ, which means "energetic".)

    Ben seemed to take his appellative fate in stride; he seemed much more interested in the little dictionary program I had written. So, as an experiment, instead of developing tiny samples that illustrate a very focused topic, I'll develop a somewhat larger-scale program (though still small by modern standards) so you can see how multiple techniques come together. The task will take many stages, some of which may take only a day or two, others of which can take much longer. If a particular stage is more than two or three days long, I'll break it up with other articles, and I'll try to leave some breathing room between stages.

    Along the way, we'll learn about owner-data (also known as "virtual") listviews, listview custom-draw, designing for accessibility, window subclassing, laying out child windows, code pages, hotkeys, and optimization.

    If you're going to play along at home, beware that you're going to have to install Chinese fonts to see the program as it evolves, and when you're done, you'll have a Chinese/English dictionary program, which probably won't be very useful unless you're actually studying Chinese...

    If you're not into Win32 programming at all, then, well, my first comment to you is, "So what are you doing here?" And my second comment is, "I guess you're going to be bored for a while." You may want to go read another blog during those boring stretches, or just turn off the computer and go outside for some fresh air and exercise.

    Those who have decided to play along at home will need the following: a copy of the CEDICT Chinese-English dictionary in Big5 format (note: Big5 format) and the Chinese Encoding Converter source code (all we need is the file hcutf8.txt). We'll start digging in next time.

  • The Old New Thing

    Why are DLLs unloaded in the "wrong" order?

    • 15 Comments

    When a program starts or when a DLL is loaded, the loader builds a dependency tree of all the DLLs referenced by that program/DLL, that DLL's dependents, and so on. It then determines the correct order in which to initialize those DLLs so that no DLL is initialized until after all the DLLs upon which it is dependent have been initialized. (Of course, if you have a circular dependency, then this falls apart. And as you well know, calling the LoadLibrary function or the LoadLibraryEx function from inside a DLL's DLL_PROCESS_ATTACH notification also messes up these dependency computations.)

    Similarly, when you unload a DLL or when the program terminates, the de-initialization occurs so that a DLL is de-initialized after all its dependents.

    But when you load a DLL manually, crucial information is lost: Namely that the DLL that is calling LoadLibrary depends on the DLL being loaded. Consequently, if A.DLL manually loads B.DLL, then there is no guarantee that A.DLL will be unloaded before B.DLL. This means, for example, that code like the following is not reliable:

    HSOMETHING g_hSomething;
    typedef HSOMETHING (WINAPI* GETSOMETHING)(void);
    typedef void (WINAPI* FREESOMETHING)(HSOMETHING);
    
    GETSOMETHING GetSomething;
    FREESOMETHING FreeSomething;
    
    // Ignoring race conditions for expository purposes
    void LoadB()
    {
     HINSTANCE hinstB = LoadLibrary(TEXT("B.DLL"));
     if (hinstB) {
      GetSomething = (GETSOMETHING)
              GetProcAddress(hinstB, "GetSomething");
      FreeSomething = (FREESOMETHING)
              FreeProcAddress(hinstB, "FreeSomething");
     }
    }
    
    // Ignoring race conditions for expository purposes
    HSOMETHING CacheSomethingFromB()
    {
     if (!g_hSomething &&
         GetSomething && FreeSomething) {
      g_hSomething = GetSomething();
     }
     return g_hSomething;
    }
    
    BOOL CALLBACK DllMain(HINSTANCE hinst,
          DWORD dwReason, LPVOID lpReserved)
    {
     switch (dwReason) {
     ...
     case DLL_PROCESS_DETACH:
      if (g_hSomething) {
       FreeSomething(g_hSomething); // oops
      }
      break;
     }
     return TRUE;
    }

    At the line marked "oops", there is no guarantee that B.DLL is still in memory because B.DLL does not appear in the dependency list of A.DLL, even though there is a runtime-generated dependency caused by the call to LoadLibrary.

    Why can't the loader keep track of this dynamic dependency? In other words when A.DLL calls LoadLibrary(TEXT("B.DLL")), why can't the loader automatically say "Okay, now A.DLL depends on B.DLL"?

    First of all, because as I've noted before, you can't trust the return address.

    Second, even if you could trust the return address, you still can't trust the return address. Consider:

    // A.DLL - same as before except for one line
    void LoadB()
    {
     HINSTANCE hinstB = MiddleFunction(TEXT("B.DLL"));
     if (hinstB) {
      GetSomething = (GETSOMETHING)
              GetProcAddress(hinstB, "GetSomething");
      FreeSomething = (FREESOMETHING)
              FreeProcAddress(hinstB, "FreeSomething");
     }
    }
    
    // MIDDLE.DLL
    HINSTANCE MiddleFunction(LPCTSTR pszDll)
    {
     return LoadLibrary(pszDll);
    }
    

    In this scenario, the load of B.DLL happens not directly from A.DLL, but rather through an intermediary (in this case, MiddleFunction). Even if you could trust the return address, the dependency would be assigned to MIDDLE.DLL instead of A.DLL.

    "What sort of crazy person would write a function like MiddleFunction?", you ask. This sort of intermediate function is common in helper/wrapper libraries or to provide additional lifetime management functionality (although it doesn't do it any more, though it used to).

    Third, there is the case of the GetModuleHandle function.

    void UseBIfAvailable()
    {
     HINSTANCE hinstB = GetModuleHandle(TEXT("B"));
     if (hinstB) {
      DOSOMETHING DoSomething = (DOSOMETHING)
              GetProcAddress(hinstB, "DoSomething");
      if (DoSomething) {
       DoSomething();
      }
     }
    }
    

    Should this call to GetModuleHandle create a dependency?

    Note also that there are dependencies among DLLs that go beyond just LoadLibrary. For example, if you pass a callback function pointer to another DLL, you have created a reverse dependency.

    A final note is that this sort of implicit dependency, as hard as it is to see as written above, is even worse once you toss global destructors into the mix.

    class SomethingHolder
    {
    public:
     SomethingHolder() : m_hSomething(NULL);
     ~SomethingHolder()
      { if (m_hSomething) FreeSomething(m_hSomething); }
     HSOMETHING m_hSomething;
    };
    
    SomethingHolder g_SomethingHolder;
    ...
    

    The DLL dependency is now hidden inside the SomethingHolder class, and when A.DLL unloads, g_SomethingHolder's destructor will run and try to talk to B.DLL. Hilarity ensues.

  • The Old New Thing

    When people ask for security holes as features: Stealing passwords

    • 33 Comments

    Sometimes people ask for features that are such blatant security holes I don't know what they were thinking.

    Is there a way to get the current user's password? I have a program that does some stuff, then reboots the system, and I want to have the current user's password so I can log that user back in when I'm done, then my program can resume its operation.

    (Sometimes they don't bother explaining why they need the user's password; they just ask for it.)

    Imagine the fantastic security hole if this were possible. Anybody could write a program that steals your password without even having to trick you into typing it. They would just call the imaginary GetPasswordOfCurrentUser function and bingo! they have your password.

    For another angle on credential-stealing, read Larry Osterman's discussion of why delegation doesn't work over the network.

    Even if you didn't want the password itself but merely some sort of "cookie" that could be used to log the user on later, you still have a security hole. Let's call this imaginary function GetPasswordCookieOfCurrentUser; it returns a "cookie" that can be used to log the user on instead of using their password.

    This is just a thinly-disguised GetPasswordOfCurrentUser because that "cookie" is equivalent to a password. Log on with the cookie and you are now that person.

Page 1 of 3 (25 items) 123