• #### Why does Add or Remove Programs show a large blank space?

Some people have noticed that certain programs cause the Add or Remove Programs control panel to create an enormous amount of blank space. What's going on?

These are programs that have bad custom uninstall icon registrations.

If you go to the registry key HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Uninstall, you'll find a list of programs that have registered for appearing in the Add or Remove Programs control panel. Some of them might have been so kind as to provide a "DisplayIcon" value, thereby saving the control panel the indignity of having to guess at an appropriate icon.

Unfortunately, if they put a bad icon registration in that registry value, the result is a bunch of blank space since the control panel is trying to reserve space for a bogus icon.

The format of the icon registration is a filename, optionally followed by a comma and a decimal number.

C:\full\path\to\icon\file.dll
C:\full\path\to\icon\file.dll,123


Since this is not a command line, quotation marks are not necessary (although they are tolerated). Furthermore, the number can be any value except for -1. Why is -1 forbidden? Because the ExtractIcon function treats the value -1 specially.

If the icon file does not exist in the icon file, or if the icon number is -1, then the icon specification is invalid and the Add or Remove Programs control panel will reserve an odd amount of space for an icon that doesn't exist.

Perhaps the Add or Remove Programs control panel should be more tolerant of invalid icon registrations? Or should it stay the way it is, adhering to the "Don't bend over backwards to fix buggy programs; force the program authors to fix their own bugs" policy that so many of my readers advocate? (Noting furthermore that refusing to accomodate invalid icon registrations makes it look like Add or Remove Programs is the buggy one.)

• #### The effect of SetCursor lasts only until the next SetCursor

Of course the effect of the SetCursor function for a thread lasts only until that thread changes the cursor to something else. Any moron knows that, right?

The tricky part is that the SetCursor may come from an unexpected place.

THe most common place people run into this is when they do something like this:

// Put up the hourglass
HCURSOR hcurPrev = SetCursor(hcurWait);
... do some processing ...
// Restore the original cursor
SetCursor(hcurPrev);


This puts up the hourglass during the processing. But if you pump messages (or if a function you call pumps messages), then the hourglass will go away and return to the normal arrow.

That's because when you pump messages, this opens the gates for messages like WM_NCHITTEST and WM_SETCURSOR. The latter in particular will typically result in the cursor changing, either to a cursor selected by the window itself or to the class cursor if the message makes it all the way to DefWindowProc.

If you want to keep the hourglass up even while pumping messages, you need to let the window know that "If you are asked to set the cursor, please put up an hourglass instead of what you would normally display as the cursor." That window would then have to alter its WM_SETCURSOR handling to take this setting into account.

case WM_SETCURSOR:
if (ForceHourglass()) {
SetCursor(hcurWait);
return TRUE;
}
...


Note that forcing the hourglass is only the tip of the iceberg. Even though the cursor is an hourglass, the window is still active and can receive other message, such as mouse clicks and keypresses. If your program is not ready to receive new input during this phase, you need to detect this case and not go into some recursive state if the user, say, impatiently clicks the "Compute!" button while you are still computing.

• #### Understanding ternary raster operations

It's perfectly logical, which doesn't mean that it's easy to understand.

A ternary raster operation describes how three boolean values should combine to form an output boolean. I was going to write up a description but found that the existing documentation pretty much covers it, so I refer you to that. In particular, look at the table that explains where PSo and DPSoo come from.

Now, one question that has come up is "What is the algorithm for deriving the RPN raster operation description string from the operation index?" The answer is, "You stare at it and use your brain's power of pattern-matching." There is no single "correct" expression of an operation index as RPN. For example, "DPSoo" could have been written as "DSPoo" or "DSoPo" or as the equivalent but pointlessly esoteric "DPoDnPnSaao". All of these expressions have the same boolean truth table.

• #### Boil first, then mash

Last year, British schoolchildren (ages six and seven) went to a farm and were baffled by the long, pointy, orange things. (Those who specialize in plant biology have a special term for these strange objects: carrots.)

Their older siblings don't seem to be faring much better. A three-year study revealed that modern Scottish schoolchildren lacked skills such as understanding clothing care instructions, sewing a button, and knowing that you boil the potatoes before you mash them.

Okay, but let's look at those clothing care instructions. Do you know what it means if you have a square with a solid circle inside and two underlines? What about a triangle with two diagonal stripes? I don't know either. (They mean "Tumble dry, gentle cycle, no heat" and "Use non-chlorine bleach as needed", respectively. More symbols deciphered here.)

And while it's true that I can sew a button, I am unable to sew a straight line or with constant tension.

I'm sure every generation bemoans lost skills. Our ancestors probably shook their heads in disappointment when their grandchildren demonstrated themselves unable to churn butter, wash clothes by hand, or operate a mangle.

On the other hand, you really ought to know what a carrot looks like and that you boil first, then mash. I doubt carrots and mashed potatoes are going to go out of style any time soon.

• #### Why are DLLs unloaded in the "wrong" order?

When a program starts or when a DLL is loaded, the loader builds a dependency tree of all the DLLs referenced by that program/DLL, that DLL's dependents, and so on. It then determines the correct order in which to initialize those DLLs so that no DLL is initialized until after all the DLLs upon which it is dependent have been initialized. (Of course, if you have a circular dependency, then this falls apart. And as you well know, calling the LoadLibrary function or the LoadLibraryEx function from inside a DLL's DLL_PROCESS_ATTACH notification also messes up these dependency computations.)

Similarly, when you unload a DLL or when the program terminates, the de-initialization occurs so that a DLL is de-initialized after all its dependents.

But when you load a DLL manually, crucial information is lost: Namely that the DLL that is calling LoadLibrary depends on the DLL being loaded. Consequently, if A.DLL manually loads B.DLL, then there is no guarantee that A.DLL will be unloaded before B.DLL. This means, for example, that code like the following is not reliable:

HSOMETHING g_hSomething;
typedef HSOMETHING (WINAPI* GETSOMETHING)(void);
typedef void (WINAPI* FREESOMETHING)(HSOMETHING);

GETSOMETHING GetSomething;
FREESOMETHING FreeSomething;

// Ignoring race conditions for expository purposes
{
if (hinstB) {
GetSomething = (GETSOMETHING)
FreeSomething = (FREESOMETHING)
}
}

// Ignoring race conditions for expository purposes
HSOMETHING CacheSomethingFromB()
{
if (!g_hSomething &&
GetSomething && FreeSomething) {
g_hSomething = GetSomething();
}
return g_hSomething;
}

BOOL CALLBACK DllMain(HINSTANCE hinst,
DWORD dwReason, LPVOID lpReserved)
{
switch (dwReason) {
...
case DLL_PROCESS_DETACH:
if (g_hSomething) {
FreeSomething(g_hSomething); // oops
}
break;
}
return TRUE;
}

At the line marked "oops", there is no guarantee that B.DLL is still in memory because B.DLL does not appear in the dependency list of A.DLL, even though there is a runtime-generated dependency caused by the call to LoadLibrary.

Why can't the loader keep track of this dynamic dependency? In other words when A.DLL calls LoadLibrary(TEXT("B.DLL")), why can't the loader automatically say "Okay, now A.DLL depends on B.DLL"?

First of all, because as I've noted before, you can't trust the return address.

Second, even if you could trust the return address, you still can't trust the return address. Consider:

// A.DLL - same as before except for one line
{
HINSTANCE hinstB = MiddleFunction(TEXT("B.DLL"));
if (hinstB) {
GetSomething = (GETSOMETHING)
FreeSomething = (FREESOMETHING)
}
}

// MIDDLE.DLL
HINSTANCE MiddleFunction(LPCTSTR pszDll)
{
}


In this scenario, the load of B.DLL happens not directly from A.DLL, but rather through an intermediary (in this case, MiddleFunction). Even if you could trust the return address, the dependency would be assigned to MIDDLE.DLL instead of A.DLL.

"What sort of crazy person would write a function like MiddleFunction?", you ask. This sort of intermediate function is common in helper/wrapper libraries or to provide additional lifetime management functionality (although it doesn't do it any more, though it used to).

Third, there is the case of the GetModuleHandle function.

void UseBIfAvailable()
{
HINSTANCE hinstB = GetModuleHandle(TEXT("B"));
if (hinstB) {
DOSOMETHING DoSomething = (DOSOMETHING)
if (DoSomething) {
DoSomething();
}
}
}


Should this call to GetModuleHandle create a dependency?

Note also that there are dependencies among DLLs that go beyond just LoadLibrary. For example, if you pass a callback function pointer to another DLL, you have created a reverse dependency.

A final note is that this sort of implicit dependency, as hard as it is to see as written above, is even worse once you toss global destructors into the mix.

class SomethingHolder
{
public:
SomethingHolder() : m_hSomething(NULL);
~SomethingHolder()
{ if (m_hSomething) FreeSomething(m_hSomething); }
HSOMETHING m_hSomething;
};

SomethingHolder g_SomethingHolder;
...


The DLL dependency is now hidden inside the SomethingHolder class, and when A.DLL unloads, g_SomethingHolder's destructor will run and try to talk to B.DLL. Hilarity ensues.

• #### I'd like to register my stolen car, please

On NPR's All Things Considered Robert Siegel reported on the curious status of legally stolen cars in Gaza.

Gaza license plates can be red for official, green for taxis, and white for private vehicles. The lower the number on the red plates, the higher the position of the official. The number 30 designates a truck. All this is pretty conventional stuff for licence plates. But then...

"And then the cars which are written in Arabic the letters M and F, those are the stolen cars. And then there is these plates with MHF, it is a stolen car, but working for the authorities. It is a stolen government car. There is also another kind, but it is the same plate, just the numbers are different. If the number starts with 25, then it is a stolen car, but it is allowed to work as a taxi."

[Raymond is currently on vacation; this message was pre-recorded.]

• #### When is a window visible yet not visible?

Today, a quick puzzler.

Consider the following code fragment:

 ShowWindow(hwnd, SW_SHOWNORMAL);
assert(IsWindowVisible(hwnd));


We just showed the window, certainly it is visible, right? Yet the assertion can fire (even in the absence of multi-threading). Why?

Answer below - stop reading if you want to try to solve it yourself.

Take a look at the IsWindowVisible function.

If the specified window, its parent window, its parent's parent window, and so forth, have the WS_VISIBLE style, the return value is nonzero. Otherwise, the return value is zero.

The WS_VISIBLE style indicates that this window is visible in its parent. But the parent itself might not be visible, in which case IsWindowVisible returns FALSE.

[Raymond is currently on vacation; this message was pre-recorded.]

After our latest round of optimization, the 100ms barrier teased us, just milliseconds away. Profiling the resulting program reveals that 60% of the CPU is spent in operator new. Is there anything we can do about that?

Indeed, we can. Notice that the memory allocation pattern for the strings in our dictionary is quite special: Once a string is allocated into the dictionary, it is never modified or freed while the dictionary is in use. When the dictionary is freed, all the strings are deleted at once. This means that we can design an allocator tailored to this usage pattern.

I don't know whether there is a standard name for this thing, so I'm just going to call it a StringPool. A string pool has the following characteristics:

• Once you allocate a string, you can't modify or free it as long as the pool remains in existence.
• If you destroy the string pool, all the strings in it are destroyed.

We implement it by using the same type of fast allocator that the CLR uses: A single pointer. [25 May 2005: The blog server software corrupts the diagram, sorry.]

 allocated free ↑ p

To allocate memory, we just increment p by the number of bytes we need. If we run out of memory, we just allocate a new block, point p to its start, and carve the memory out of the new block. Destroying the pool consists of freeing all the blocks.

Note also that this memory arrangement has very good locality. Instead of scattering the strings all over the heap, they are collected into one location. Furthermore, they are stored in memory in exactly the order we're going to access them, which means no wasted page faults or cache lines. (Well, you don't know that's the order we're going to access them, but it's true. This is one of those "performance-guided designs" I mentioned a little while ago.)

class StringPool
{
public:
StringPool();
~StringPool();
LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);

private:
struct {
SIZE_T  m_cb;
};
WCHAR alignment;
};
enum { MIN_CBCHUNK = 32000,
MAX_CHARALLOC = 1024*1024 };

private:
WCHAR*  m_pchNext;   // first available byte
WCHAR*  m_pchLimit;  // one past last available byte
DWORD   m_dwGranularity;
}; // colorization fixed 25 May


Each block of memory we allocate begins with a StringPool::HEADER structure, which we use to maintain a linked list of blocks as well as providing enough information for us to free the block when we're done.

Exercise: Why is HEADER a union containing a structure rather than just being a structure? What is the significance of the alignment member?

inline RoundUp(DWORD cb, DWORD units)
{
return ((cb + units - 1) / units) * units;
}

StringPool::StringPool()
: m_pchNext(NULL), m_pchLimit(NULL), m_phdrCur(NULL)
{
SYSTEM_INFO si;
GetSystemInfo(&si);
si.dwAllocationGranularity);
}


At construction, we compute the size of our chunks. We base it on the system allocation granularity, choosing the next multiple of the system allocation granularity that is at least sizeof(HEADER) + MIN_CBCHUNK in size. Since a chunk is supposed to be a comfortably large block of memory, we need to enforce a minimum chunk size to avoid having an enormous number of tiny chunks if we happen to be running on a machine with a very fine allocation granularity.

LPWSTR StringPool::AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd)
{
size_t cch = pszEnd - pszBegin + 1;
LPWSTR psz = m_pchNext;
if (m_pchNext + cch <= m_pchLimit) {
m_pchNext += cch;
lstrcpynW(psz, pszBegin, cch);
return psz;
}

if (cch > MAX_CHARALLOC) goto OOM;
DWORD cbAlloc = RoundUp(cch * sizeof(WCHAR) + sizeof(HEADER),
m_dwGranularity);
BYTE* pbNext = reinterpret_cast<BYTE*>(
if (!pbNext) {
OOM:
throw(OOM);
}

m_pchLimit = reinterpret_cast<WCHAR*>(pbNext + cbAlloc);
phdrCur->m_phdrPrev = m_phdrCur;
phdrCur->m_cb = cbAlloc;
m_phdrCur = phdrCur;
m_pchNext = reinterpret_cast<WCHAR*>(phdrCur + 1);

return AllocString(pszBegin, pszEnd);
}


To allocate a string, we first try to carve it out of the remainder of the current chunk. This nearly always succeeds.

If the string doesn't fit in the chunk, we allocate a new chunk based on our allocation granularity. To avoid integer overflow in the computation of the desired chunk size, we check against a fixed "maximum allocation" and go stright to the out-of-memory handler if it's too big.

Once we have a new chunk, we link it into our list of HEADERs and abandon the old chunk. (Yes, this wastes some memory, but in our usage pattern, it's not much, and trying to squeeze out those last few bytes isn't worth the added complexity.) Once that's done, we try to allocate again; this second time will certainly succeed since we made sure the new chunk was big enough. (And any decent compiler will detect this as a tail recursion and turn it into a "goto".)

There is subtlety here. Notice that we do not update m_pchNext until after we're sure we either satisfied the allocation or allocated a new chunk. This ensures that our member variables are stable at the points where exceptions can be thrown. Writing exception-safe code is hard, and seeing the difference between code that is and isn't exception safe is often quite difficult.

StringPool::~StringPool()
{
while (phdr) {
VirtualFree(hdr.m_phdrPrev, hdr.m_cb, MEM_RELEASE);
phdr = hdr.m_phdrPrev;
}
}


To destroy the string pool, we walk our list of chunks and free each one. Note the importance of copying the HEADER out of the chunk before we free it!

Using this string pool requires only small changes to the rest of our program.

struct DictionaryEntry
{
bool Parse(const WCHAR *begin, const WCHAR *end, StringPool& pool);
// void Destruct() {
//  delete[] m_pszSimp;
//  delete[] m_pszPinyin;
//  delete[] m_pszEnglish;
// }
LPWSTR m_pszSimp;
LPWSTR m_pszPinyin;
LPWSTR m_pszEnglish;
};

class Dictionary
{
public:
Dictionary();
// ~Dictionary();
int Length() { return v.size(); }
private:
vector v;
StringPool m_pool;
};

// Dictionary::~Dictionary()
// {
//  for (vector<DictionaryEntry>::iterator i = v.begin();
//       i != v.end(); i++) {
//   i->Destruct();
//  }
// }


We no longer need to free the strings in the DictionaryEntry manually, so the Destruct method and the Dictionary destructor can go.

bool DictionaryEntry::Parse(
const WCHAR *begin, const WCHAR *end,
StringPool& pool)
{
const WCHAR* pch = std::find(begin, end, L' ');
if (pch >= end) return false;
begin = std::find(pch, end, L'[') + 1;
if (begin >= end) return false;
pch = std::find(begin, end, L']');
if (pch >= end) return false;
m_pszPinyin = pool.AllocString(begin, pch);
begin = std::find(pch, end, L'/') + 1;
if (begin >= end) return false;
for (pch = end; *--pch != L'/'; ) { }
if (begin >= pch) return false;
m_pszEnglish = pool.AllocString(begin, pch);
return true;
}

Dictionary::Dictionary()
{
...
if (de.Parse(buf, buf + cchResult, m_pool)) {
...
}


And finally, we pass our string pool to DictionaryEntry::Parse so it knows where to get memory for its strings from.

With these changes, the dictionary loads in 70ms (or 80ms if you include the time it takes to destroy the dictionary). This is 70% faster than the previous version, and over three times as fast if you include the destruction time.

And now that we've reached our 100ms goal, it's a good time to stop. We've gotten the running time of dictionary loading down from an uncomfortable 2080ms to a peppier 70ms, a nearly 30-fold improvement, by repeatedly profiling and focusing on where the most time is being spent. (I have some more simple tricks that shave a few more milliseconds off the startup time. Perhaps I'll bring them into play if other changes to startup push us over the 100ms boundary. As things stand, the largest CPU consumers are MultiByteToWideChar and lstrcpynW, so that's where I would focus next.)

That's the end of the first stage. The next stage will be displaying the dictionary in an owner-data listview, but you'll have to wait until next month.

Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let's see if we can improve that.

The best way to speed up copying strings is not to copy them in the first place. Using a wstring in our DictionaryEntry structure forces the vector class to copy the string data, when all we really need to copy is the pointer and size information. The actual characters can stay put. C++ has a copy constructor but not a "move" constructor.

Let's use plain string pointers rather than wstring objects. The "copy constructor" for a string pointer is just to copy the pointer—exactly what we want here.

struct DictionaryEntry
{
bool Parse(const WCHAR* begin, const WCHAR* end);
void Destruct() {
delete[] m_pszSimp;
delete[] m_pszPinyin;
delete[] m_pszEnglish;
}
LPWSTR m_pszSimp;
LPWSTR m_pszPinyin;
LPWSTR m_pszEnglish;
};


The DictionaryEntry is no longer a structure of wstrings but rather is just a structure of LPWSTRs, which copy much faster. The cost for this, however, is having to free all the strings manually in the dictionary destructor (which we will see below).

Since we aren't using wstrings any more, we need to allocate the memory for the strings and copy them the old fashioned way.

LPWSTR AllocString(const WCHAR* begin, const WCHAR* end)
{
int cch = end - begin + 1;
LPWSTR psz = new WCHAR[cch];
lstrcpynW(psz, begin, cch);
return psz;
}

bool DictionaryEntry::Parse(
const WCHAR* begin, const WCHAR* end)
{
const WCHAR* pch = std::find(begin, end, L' ');
if (pch >= end) return false;
begin = std::find(pch, end, L'[') + 1;
if (begin >= end) return false;
pch = std::find(begin, end, L']');
if (pch >= end) return false;
m_pszPinyin = AllocString(begin, pch);
begin = std::find(pch, end, L'/') + 1;
if (begin >= end) return false;
for (pch = end; *--pch != L'/'; ) { }
if (begin >= pch) return false;
m_pszEnglish = AllocString(begin, pch);
return true;
}


There isn't a std::rfind function, so I coded up a backwards-search-for-slash loop inline. Exercise: Why don't I have to check that pch hasn't underflowed beyond the beginning of the string?

class Dictionary
{
public:
Dictionary();
~Dictionary();
int Length() { return v.size(); }
const DictionaryEntry& Item(int i) { return v[i]; }
private:
vector<DictionaryEntry> v;
};

Dictionary::Dictionary()
{
...
if (cchResult){
// wstring line(buf, cchResult);
DictionaryEntry de;
if (de.Parse(buf, buf + cchResult)) {
...
}

Dictionary::~Dictionary()
{
for (vector<DictionaryEntry>::iterator i = v.begin();
i != v.end(); i++) {
i->Destruct();
}
}


The last bits of the change are to get rid of the temporary wstring we parsed from, since we don't need it any more, and to free all the strings in the Dictionary's destructor.

This new program clocks in at 120ms (or 290ms if you include the time it takes to destroy the dictionary). Again, we're twice as fast as the previous version, but still haven't quite crossed the 100ms barrier. And the time to destroy the dictionary isn't starting to be a more significant factor. But I have another trick up my sleeve.

[Raymond is currently on vacation; this message was pre-recorded.]