# May, 2005

• #### How do I cover the taskbar with a fullscreen window?

For some reason, people think too hard. If you want to create a fullscreen window that covers the taskbar, just create a fullscreen window and the taskbar will automatically get out of the way. Don't go around hunting for the taskbar and poking it; let it do its thing.

HWND CreateFullscreenWindow(HWND hwnd)
{
HMONITOR hmon = MonitorFromWindow(hwnd,
MONITOR_DEFAULTTONEAREST);
MONITORINFO mi = { sizeof(mi) };
if (!GetMonitorInfo(hmon, &mi)) return NULL;
return CreateWindow(TEXT("static"),
TEXT("something interesting might go here"),
WS_POPUP | WS_VISIBLE,
mi.rcMonitor.left,
mi.rcMonitor.top,
mi.rcMonitor.right - mi.rcMonitor.left,
mi.rcMonitor.bottom - mi.rcMonitor.top,
hwnd, NULL, g_hinst, 0);
}

void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
{
if (ch == TEXT(' ')) {
CreateFullscreenWindow(hwnd);
}
}

HANDLE_MSG(hwnd, WM_CHAR, OnChar);


Note that this sample program doesn't worry about destroying that fullscreen window or preventing the user from creating more than one. It's just a sample. The point is seeing how the CreateFullScreenWindow function is written.

We use the MonitorFromWindow function to figure out which monitor we should go fullscreen to. Note that in a multiple monitor system, this might not be the same monitor that the taskbar is on. Fortunately, we don't have to worry about that; the taskbar figures it out.

I've seen people hunt for the taskbar window and then do a ShowWindow(hwndTaskbar, SW_HIDE) on it. This is nuts for many reasons.

First is a mental exercise you should always use when evaluating tricks like this: "What if two programs tried this trick?" Now you have two programs both of which think they are in charge of hiding and showing the taskbar, neither of which is coordinating with the other. The result is a mess. One program hides the taskbar, then the other does, then the first decides it's finished so it unhides the taskbar, but the second program wasn't finished yet and gets a visible taskbar when it thought it should be hidden. Things only go downhill from there.

Second, what if your program crashes before it gets a chance to unhide the taskbar? The taskbar is now permanently hidden and the user has to log off and back on to get their taskbar back. That's not very nice.

Third, what if there is no taskbar at all? It is common in Terminal Server scenarios to run programs by themselves without Explorer [archived]. In this configuration, there is no Explorer, no taskbar. Or maybe you're running on a future version of Windows that doesn't have a taskbar, it having been replaced by some other mechanism. What will your program do now?

Don't do any of this messing with the taskbar. Just create your fullscreen window and let the taskbar do its thing automatically.

• #### Why does Add or Remove Programs show a large blank space?

Some people have noticed that certain programs cause the Add or Remove Programs control panel to create an enormous amount of blank space. What's going on?

These are programs that have bad custom uninstall icon registrations.

If you go to the registry key HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Uninstall, you'll find a list of programs that have registered for appearing in the Add or Remove Programs control panel. Some of them might have been so kind as to provide a "DisplayIcon" value, thereby saving the control panel the indignity of having to guess at an appropriate icon.

Unfortunately, if they put a bad icon registration in that registry value, the result is a bunch of blank space since the control panel is trying to reserve space for a bogus icon.

The format of the icon registration is a filename, optionally followed by a comma and a decimal number.

C:\full\path\to\icon\file.dll
C:\full\path\to\icon\file.dll,123


Since this is not a command line, quotation marks are not necessary (although they are tolerated). Furthermore, the number can be any value except for -1. Why is -1 forbidden? Because the ExtractIcon function treats the value -1 specially.

If the icon file does not exist in the icon file, or if the icon number is -1, then the icon specification is invalid and the Add or Remove Programs control panel will reserve an odd amount of space for an icon that doesn't exist.

Perhaps the Add or Remove Programs control panel should be more tolerant of invalid icon registrations? Or should it stay the way it is, adhering to the "Don't bend over backwards to fix buggy programs; force the program authors to fix their own bugs" policy that so many of my readers advocate? (Noting furthermore that refusing to accomodate invalid icon registrations makes it look like Add or Remove Programs is the buggy one.)

• #### Boil first, then mash

Last year, British schoolchildren (ages six and seven) went to a farm and were baffled by the long, pointy, orange things. (Those who specialize in plant biology have a special term for these strange objects: carrots.)

Their older siblings don't seem to be faring much better. A three-year study revealed that modern Scottish schoolchildren lacked skills such as understanding clothing care instructions, sewing a button, and knowing that you boil the potatoes before you mash them.

Okay, but let's look at those clothing care instructions. Do you know what it means if you have a square with a solid circle inside and two underlines? What about a triangle with two diagonal stripes? I don't know either. (They mean "Tumble dry, gentle cycle, no heat" and "Use non-chlorine bleach as needed", respectively. More symbols deciphered here.)

And while it's true that I can sew a button, I am unable to sew a straight line or with constant tension.

I'm sure every generation bemoans lost skills. Our ancestors probably shook their heads in disappointment when their grandchildren demonstrated themselves unable to churn butter, wash clothes by hand, or operate a mangle.

On the other hand, you really ought to know what a carrot looks like and that you boil first, then mash. I doubt carrots and mashed potatoes are going to go out of style any time soon.

• #### Another dead computer: My personal laptop

I'm kind of surprised at how much people reacted to my previous dead computer story. I guess there's an audience for stories about dead computers.

Today's dead computer is my Sony Vaio PCG-Z505LE laptop, with a 600MHz processor and 192MB of RAM. Certainly a big step up from that 486/50 with 12MB of RAM.

Laptop computers have a comparatively short lifetime. (Hardware vendors must love them.) I've learned that the best value comes from buying used laptops. You have to accept being a bit behind the curve, but the way I use my laptop, it needs to do only a small number of things:

• surf the web,
• use Remote Desktop Connection to access my desktop machine,
• compile the occasional program.

Only that last operation requires a hefty processor, and I do it so rarely that it doesn't bother me that it's kind of slow. (I just run the command line version of the compiler, so that at least takes the IDE overhead out of the picture.)

I bought this laptop two years ago, used, and it ran just fine until a couple months ago when the internal power supply burnt out. I was ready to abandon the model line and give away the accessories I had bought, including a \$200+ double-capacity battery.

Allow me to digress on laptop batteries. Observe that batteries for old-model laptops cost almost as much as the laptops themselves. That's because the battery is the only real consumable in a laptop computer. The other components will run practically indefinitely if you don't drop them or douse them in soda, but batteries just plain wear out. That's where the money is.

This means that many ads for used laptops will mention "needs new battery" at the end. And those are the ones I sought out. Because I have a working battery! Most prospective buyers would be turned off by a dead battery, but that didn't bother me one bit.

The replacement laptop arrived a few days ago, and it runs great. I wiped the drive and reinstalled Windows XP from scratch. (Challenging because the laptop doesn't come with a bootable CD-ROM drive. I had to use floppies!) I may install a handful of programs but that's all. I don't like installing software on my computer. The more programs you install, the more likely there's going to be a conflict somewhere.

The old laptop has already started being scavenged for parts. A friend of mine needed a replacement laptop hard drive, so I gave him my old one. The battery and power brick can of course be used by the new laptop. The memory from the old Vaio is no use, since the Vaio has only one memory expansion slot. The other parts of the old laptop aren't much use for anything aside from spares. Perhaps I should put the old laptop on concrete blocks on my front lawn.

Next time (if there is a next time), the story of the dead AlphaServer.

• #### When people ask for security holes as features: Stealing passwords

Sometimes people ask for features that are such blatant security holes I don't know what they were thinking.

Is there a way to get the current user's password? I have a program that does some stuff, then reboots the system, and I want to have the current user's password so I can log that user back in when I'm done, then my program can resume its operation.

(Sometimes they don't bother explaining why they need the user's password; they just ask for it.)

Imagine the fantastic security hole if this were possible. Anybody could write a program that steals your password without even having to trick you into typing it. They would just call the imaginary GetPasswordOfCurrentUser function and bingo! they have your password.

For another angle on credential-stealing, read Larry Osterman's discussion of why delegation doesn't work over the network.

Even if you didn't want the password itself but merely some sort of "cookie" that could be used to log the user on later, you still have a security hole. Let's call this imaginary function GetPasswordCookieOfCurrentUser; it returns a "cookie" that can be used to log the user on instead of using their password.

This is just a thinly-disguised GetPasswordOfCurrentUser because that "cookie" is equivalent to a password. Log on with the cookie and you are now that person.

The first thing we'll need to do in our little dictionary program is to load the dictionary into memory. The format of the dictionary file is as a plain text file, each line of which is of the form

Chinese [pinyin] /English 1/English 2/.../
舉例 [ju3 li4] /to give an example/


Since it was the Big5 dictionary we downloaded, the Chinese characters are in Big5 format, known to Windows as code page 950. Our program will be Unicode, so we'll have to convert it as we load the dictionary. Yes, I could've used the Unicode version of the dictionary, but it so happens that when I set out to write this program, there was no Unicode version available. Fortunately, this oversight opened up the opportunity to illustrate some other programming decisions and techniques.

The first stage in our series of exercises will be loading the dictionary into memory.

#define UNICODE
#define _UNICODE
#include <windows.h>
#include <string>
#include <fstream>
#include <iostream> // for cin/cout
#include <vector>

using std::string;
using std::wstring;
using std::vector;

struct DictionaryEntry
{
bool Parse(const wstring& line);
wstring simp;
wstring pinyin;
wstring english;
};

bool DictionaryEntry::Parse(const wstring& line)
{
wstring::size_type start = 0;
wstring::size_type end = line.find(L' ', start);
if (end == wstring::npos) return false;
start = line.find(L'[', end);
if (start == wstring::npos) return false;
end = line.find(L']', ++start);
if (end == wstring::npos) return false;
pinyin.assign(line, start, end - start);
start = line.find(L'/', end);
if (start == wstring::npos) return false;
start++;
end = line.rfind(L'/');
if (end == wstring::npos) return false;
if (end <= start) return false;
english.assign(line, start, end-start);
return true;
}

class Dictionary
{
public:
Dictionary();
int Length() { return v.size(); }
const DictionaryEntry& Item(int i) { return v[i]; }
private:
vector<DictionaryEntry> v;
};

Dictionary::Dictionary()
{
std::wifstream src;
src.imbue(std::locale(".950"));
src.open("cedict.b5");
wstring s;
while (getline(src, s)) {
if (s.length() > 0 && s[0] != L'#') {
DictionaryEntry de;
if (de.Parse(s)) {
v.push_back(de);
}
}
}
}

int __cdecl main(int argc, const char* argv[])
{
DWORD dw = GetTickCount();
{
Dictionary dict;
std::cout << dict.Length() << std::endl;
std::cout << GetTickCount() - dw << std::endl;
}
std::cout << GetTickCount() - dw << std::endl;
return 0;
}


Our dictionary is just a list of words with their English definitions. The Chinese words are written in three forms (traditional Chinese, simplified Chinese, and Pinyin romanization). For those who are curious, there are two writing systems for the Mandarin Chinese language and two phonetic systems. Which one a particular Mandarin-speaking population follows depends on whether they fell under the influence of China's language reform of 1956. Traditional Chinese characters and the Bopomo phonetic system (also called Bopomofo) are used on Taiwan; simplified Chinese characters and the Pinyin system are used in China. Converting Pinyin to Bopomo isn't interesting, so I've removed that part from the program I'm presenting here.

(The schism in the spelling of the English language follows a similar pattern. Under the leadership of Noah Webster, the United States underwent its own spelling reform, but countries which were under the influence of the British crown retained the traditional spellings. Spelling reform continues in other languages even today, and the subject is almost always highly contentious, with traditionalists and reformists pitted against each other in a battle over a language's—and by proxy, a culture's—identity.)

The program itself is fairly straightforward. It creates a Unicode file stream wifstream and "imbues" it with code page 950 (Big5). This instructs the runtime to interpret the bytes of the file interpreted in the specified code page. We read strings out of the file, ignore the comments, and parse the rest, appending them to our vector of dictionary entries.

Parsing the line consists of finding the spaces, brackets, and slashes, and splitting the line into the traditional Chinese, Pinyin, and English components. (We'll deal with simplified Chinese later.)

When I run this program on my machine, the dictionary loads in 2080ms (or 2140ms if you include the time to run the destructor). This is an unacceptably long startup time, so the first order of business is to make startup faster. That will be the focus of this stage.

Notice that as a sanity check, I print the total number of words in the dictionary. The number should match the number of lines in the cedict.b5 file (minus the one comment line). If not, then I know that something went wrong. This is an important sanity check: You might make a performance optimization that looks great when you run it past a stopwatch, only to discover that your "optimization" actually introduced a bug. For example, one of my attempted optimizations of this program resulted in a phenomenal tenfold speedup, but only because of a bug that caused it to think it was finished when it had in reality processed only 10% of the dictionary!

As my colleague Rico Mariani is fond of saying, "It's easy to make it fast if it doesn't have to work!"

• #### Developing a Chinese/English dictionary: Introduction

The other day, one of my colleagues mentioned that his English name "Ben" means "stupid" in Chinese: 笨/bèn/ㄅㄣˋ. (His wife is Chinese; that's why he knows this in the first place.) Knowing that the Chinese language is rich in homophones, I fired up my Chinese/English dictionary program to see if we could find anything better. (Unfortunately, the best I could come up with was 賁/贲/bēn/ㄅㄣ, which means "energetic".)

Ben seemed to take his appellative fate in stride; he seemed much more interested in the little dictionary program I had written. So, as an experiment, instead of developing tiny samples that illustrate a very focused topic, I'll develop a somewhat larger-scale program (though still small by modern standards) so you can see how multiple techniques come together. The task will take many stages, some of which may take only a day or two, others of which can take much longer. If a particular stage is more than two or three days long, I'll break it up with other articles, and I'll try to leave some breathing room between stages.

Along the way, we'll learn about owner-data (also known as "virtual") listviews, listview custom-draw, designing for accessibility, window subclassing, laying out child windows, code pages, hotkeys, and optimization.

If you're going to play along at home, beware that you're going to have to install Chinese fonts to see the program as it evolves, and when you're done, you'll have a Chinese/English dictionary program, which probably won't be very useful unless you're actually studying Chinese...

If you're not into Win32 programming at all, then, well, my first comment to you is, "So what are you doing here?" And my second comment is, "I guess you're going to be bored for a while." You may want to go read another blog during those boring stretches, or just turn off the computer and go outside for some fresh air and exercise.

Those who have decided to play along at home will need the following: a copy of the CEDICT Chinese-English dictionary in Big5 format (note: Big5 format) and the Chinese Encoding Converter source code (all we need is the file hcutf8.txt). We'll start digging in next time.

• #### Loading the dictionary, part 3: Breaking the text into lines

Even after moving the character conversion out of the getline function, profiling reveals that getline is still taking nearly 50% of our CPU. The fastest code is code that isn't there, so let's get rid of getline altogether. Oh wait, we still need to break the file into lines. But maybe we can break the file into lines faster than getline did.

#include <algorithm>

class MappedTextFile
{
public:
MappedTextFile(LPCTSTR pszFile);
~MappedTextFile();

const CHAR *Buffer() { return m_p; }
DWORD Length() const { return m_cb; }

private:
PCHAR   m_p;
DWORD   m_cb;
HANDLE  m_hf;
HANDLE  m_hfm;
};

MappedTextFile::MappedTextFile(LPCTSTR pszFile)
: m_hfm(NULL), m_p(NULL), m_cb(0)
{
NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
if (m_hf != INVALID_HANDLE_VALUE) {
DWORD cb = GetFileSize(m_hf, NULL);
m_hfm = CreateFileMapping(m_hf, NULL, PAGE_READONLY, 0, 0, NULL);
if (m_hfm != NULL) {
m_p = reinterpret_cast<PCHAR>
if (m_p) {
m_cb = cb;
}
}
}
}

MappedTextFile::~MappedTextFile()
{
if (m_p) UnmapViewOfFile(m_p);
if (m_hfm) CloseHandle(m_hfm);
if (m_hf != INVALID_HANDLE_VALUE) CloseHandle(m_hf);
}


This very simple class babysits a read-only memory-mapped file. (Yes, there is a bit of oddness with files greater than 4GB, but let's ignore that for now, since it's a distraction from our main point.)

Now that the file is memory-mapped, we can just scan it directly.

Dictionary::Dictionary()
{
MappedTextFile mtf(TEXT("cedict.b5"));
typedef std::codecvt<wchar_t, char, mbstate_t> widecvt;
std::locale l(".950");
const widecvt& cvt = _USE(l, widecvt); // use_facet<widecvt>(l);
const CHAR* pchBuf = mtf.Buffer();
const CHAR* pchEnd = pchBuf + mtf.Length();
while (pchBuf < pchEnd) {
const CHAR* pchEOL = std::find(pchBuf, pchEnd, '\n');
if (*pchBuf != '#') {
size_t cchBuf = pchEOL - pchBuf;
wchar_t* buf = new wchar_t[cchBuf];
mbstate_t state = 0;
char* nextsrc;
wchar_t* nextto;
if (cvt.in(state, pchBuf, pchEOL, nextsrc,
buf, buf + cchBuf, nextto) == widecvt::ok) {
wstring line(buf, nextto - buf);
DictionaryEntry de;
if (de.Parse(line)) {
v.push_back(de);
}
}
delete[] buf;
}
pchBuf = pchEOL + 1;
}
}


We simply scan the memory-mapped file for a '\n' character, which tells us where the line ends. This tells us the location and length of the line, which we use to convert it to Unicode and continue our parsing.

Exercise:Why don't we have to worry about the carriage return that comes before the linefeed?

Exercise:Why don't we have to worry about possibly reading past the end of the file when we check *pchBuf != '#'?

With this change, the program now loads the dictionary in 480ms (or 550ms if you include the time it takes to destroy the dictionary). That's over twice as fast as the previous version.

But it's still not fast enough. A half-second delay between hitting Enter and getting the visual feedback is still unsatisfying. We can do better.

[Raymond is currently on vacation; this message was pre-recorded.]

Looking at the profile for our program so far, 35% of the CPU time is spent copying strings around. Let's see if we can improve that.

The best way to speed up copying strings is not to copy them in the first place. Using a wstring in our DictionaryEntry structure forces the vector class to copy the string data, when all we really need to copy is the pointer and size information. The actual characters can stay put. C++ has a copy constructor but not a "move" constructor.

Let's use plain string pointers rather than wstring objects. The "copy constructor" for a string pointer is just to copy the pointer—exactly what we want here.

struct DictionaryEntry
{
bool Parse(const WCHAR* begin, const WCHAR* end);
void Destruct() {
delete[] m_pszSimp;
delete[] m_pszPinyin;
delete[] m_pszEnglish;
}
LPWSTR m_pszSimp;
LPWSTR m_pszPinyin;
LPWSTR m_pszEnglish;
};


The DictionaryEntry is no longer a structure of wstrings but rather is just a structure of LPWSTRs, which copy much faster. The cost for this, however, is having to free all the strings manually in the dictionary destructor (which we will see below).

Since we aren't using wstrings any more, we need to allocate the memory for the strings and copy them the old fashioned way.

LPWSTR AllocString(const WCHAR* begin, const WCHAR* end)
{
int cch = end - begin + 1;
LPWSTR psz = new WCHAR[cch];
lstrcpynW(psz, begin, cch);
return psz;
}

bool DictionaryEntry::Parse(
const WCHAR* begin, const WCHAR* end)
{
const WCHAR* pch = std::find(begin, end, L' ');
if (pch >= end) return false;
begin = std::find(pch, end, L'[') + 1;
if (begin >= end) return false;
pch = std::find(begin, end, L']');
if (pch >= end) return false;
m_pszPinyin = AllocString(begin, pch);
begin = std::find(pch, end, L'/') + 1;
if (begin >= end) return false;
for (pch = end; *--pch != L'/'; ) { }
if (begin >= pch) return false;
m_pszEnglish = AllocString(begin, pch);
return true;
}


There isn't a std::rfind function, so I coded up a backwards-search-for-slash loop inline. Exercise: Why don't I have to check that pch hasn't underflowed beyond the beginning of the string?

class Dictionary
{
public:
Dictionary();
~Dictionary();
int Length() { return v.size(); }
const DictionaryEntry& Item(int i) { return v[i]; }
private:
vector<DictionaryEntry> v;
};

Dictionary::Dictionary()
{
...
if (cchResult){
// wstring line(buf, cchResult);
DictionaryEntry de;
if (de.Parse(buf, buf + cchResult)) {
...
}

Dictionary::~Dictionary()
{
for (vector<DictionaryEntry>::iterator i = v.begin();
i != v.end(); i++) {
i->Destruct();
}
}


The last bits of the change are to get rid of the temporary wstring we parsed from, since we don't need it any more, and to free all the strings in the Dictionary's destructor.

This new program clocks in at 120ms (or 290ms if you include the time it takes to destroy the dictionary). Again, we're twice as fast as the previous version, but still haven't quite crossed the 100ms barrier. And the time to destroy the dictionary isn't starting to be a more significant factor. But I have another trick up my sleeve.

[Raymond is currently on vacation; this message was pre-recorded.]

After our latest round of optimization, the 100ms barrier teased us, just milliseconds away. Profiling the resulting program reveals that 60% of the CPU is spent in operator new. Is there anything we can do about that?

Indeed, we can. Notice that the memory allocation pattern for the strings in our dictionary is quite special: Once a string is allocated into the dictionary, it is never modified or freed while the dictionary is in use. When the dictionary is freed, all the strings are deleted at once. This means that we can design an allocator tailored to this usage pattern.

I don't know whether there is a standard name for this thing, so I'm just going to call it a StringPool. A string pool has the following characteristics:

• Once you allocate a string, you can't modify or free it as long as the pool remains in existence.
• If you destroy the string pool, all the strings in it are destroyed.

We implement it by using the same type of fast allocator that the CLR uses: A single pointer. [25 May 2005: The blog server software corrupts the diagram, sorry.]

 allocated free ↑ p

To allocate memory, we just increment p by the number of bytes we need. If we run out of memory, we just allocate a new block, point p to its start, and carve the memory out of the new block. Destroying the pool consists of freeing all the blocks.

Note also that this memory arrangement has very good locality. Instead of scattering the strings all over the heap, they are collected into one location. Furthermore, they are stored in memory in exactly the order we're going to access them, which means no wasted page faults or cache lines. (Well, you don't know that's the order we're going to access them, but it's true. This is one of those "performance-guided designs" I mentioned a little while ago.)

class StringPool
{
public:
StringPool();
~StringPool();
LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);

private:
struct {
SIZE_T  m_cb;
};
WCHAR alignment;
};
enum { MIN_CBCHUNK = 32000,
MAX_CHARALLOC = 1024*1024 };

private:
WCHAR*  m_pchNext;   // first available byte
WCHAR*  m_pchLimit;  // one past last available byte
DWORD   m_dwGranularity;
}; // colorization fixed 25 May


Each block of memory we allocate begins with a StringPool::HEADER structure, which we use to maintain a linked list of blocks as well as providing enough information for us to free the block when we're done.

Exercise: Why is HEADER a union containing a structure rather than just being a structure? What is the significance of the alignment member?

inline RoundUp(DWORD cb, DWORD units)
{
return ((cb + units - 1) / units) * units;
}

StringPool::StringPool()
: m_pchNext(NULL), m_pchLimit(NULL), m_phdrCur(NULL)
{
SYSTEM_INFO si;
GetSystemInfo(&si);
si.dwAllocationGranularity);
}


At construction, we compute the size of our chunks. We base it on the system allocation granularity, choosing the next multiple of the system allocation granularity that is at least sizeof(HEADER) + MIN_CBCHUNK in size. Since a chunk is supposed to be a comfortably large block of memory, we need to enforce a minimum chunk size to avoid having an enormous number of tiny chunks if we happen to be running on a machine with a very fine allocation granularity.

LPWSTR StringPool::AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd)
{
size_t cch = pszEnd - pszBegin + 1;
LPWSTR psz = m_pchNext;
if (m_pchNext + cch <= m_pchLimit) {
m_pchNext += cch;
lstrcpynW(psz, pszBegin, cch);
return psz;
}

if (cch > MAX_CHARALLOC) goto OOM;
DWORD cbAlloc = RoundUp(cch * sizeof(WCHAR) + sizeof(HEADER),
m_dwGranularity);
BYTE* pbNext = reinterpret_cast<BYTE*>(
if (!pbNext) {
OOM:
throw(OOM);
}

m_pchLimit = reinterpret_cast<WCHAR*>(pbNext + cbAlloc);
phdrCur->m_phdrPrev = m_phdrCur;
phdrCur->m_cb = cbAlloc;
m_phdrCur = phdrCur;
m_pchNext = reinterpret_cast<WCHAR*>(phdrCur + 1);

return AllocString(pszBegin, pszEnd);
}


To allocate a string, we first try to carve it out of the remainder of the current chunk. This nearly always succeeds.

If the string doesn't fit in the chunk, we allocate a new chunk based on our allocation granularity. To avoid integer overflow in the computation of the desired chunk size, we check against a fixed "maximum allocation" and go stright to the out-of-memory handler if it's too big.

Once we have a new chunk, we link it into our list of HEADERs and abandon the old chunk. (Yes, this wastes some memory, but in our usage pattern, it's not much, and trying to squeeze out those last few bytes isn't worth the added complexity.) Once that's done, we try to allocate again; this second time will certainly succeed since we made sure the new chunk was big enough. (And any decent compiler will detect this as a tail recursion and turn it into a "goto".)

There is subtlety here. Notice that we do not update m_pchNext until after we're sure we either satisfied the allocation or allocated a new chunk. This ensures that our member variables are stable at the points where exceptions can be thrown. Writing exception-safe code is hard, and seeing the difference between code that is and isn't exception safe is often quite difficult.

StringPool::~StringPool()
{
while (phdr) {
VirtualFree(hdr.m_phdrPrev, hdr.m_cb, MEM_RELEASE);
phdr = hdr.m_phdrPrev;
}
}


To destroy the string pool, we walk our list of chunks and free each one. Note the importance of copying the HEADER out of the chunk before we free it!

Using this string pool requires only small changes to the rest of our program.

struct DictionaryEntry
{
bool Parse(const WCHAR *begin, const WCHAR *end, StringPool& pool);
// void Destruct() {
//  delete[] m_pszSimp;
//  delete[] m_pszPinyin;
//  delete[] m_pszEnglish;
// }
LPWSTR m_pszSimp;
LPWSTR m_pszPinyin;
LPWSTR m_pszEnglish;
};

class Dictionary
{
public:
Dictionary();
// ~Dictionary();
int Length() { return v.size(); }
private:
vector v;
StringPool m_pool;
};

// Dictionary::~Dictionary()
// {
//  for (vector<DictionaryEntry>::iterator i = v.begin();
//       i != v.end(); i++) {
//   i->Destruct();
//  }
// }


We no longer need to free the strings in the DictionaryEntry manually, so the Destruct method and the Dictionary destructor can go.

bool DictionaryEntry::Parse(
const WCHAR *begin, const WCHAR *end,
StringPool& pool)
{
const WCHAR* pch = std::find(begin, end, L' ');
if (pch >= end) return false;
begin = std::find(pch, end, L'[') + 1;
if (begin >= end) return false;
pch = std::find(begin, end, L']');
if (pch >= end) return false;
m_pszPinyin = pool.AllocString(begin, pch);
begin = std::find(pch, end, L'/') + 1;
if (begin >= end) return false;
for (pch = end; *--pch != L'/'; ) { }
if (begin >= pch) return false;
m_pszEnglish = pool.AllocString(begin, pch);
return true;
}

Dictionary::Dictionary()
{
...
if (de.Parse(buf, buf + cchResult, m_pool)) {
...
}


And finally, we pass our string pool to DictionaryEntry::Parse so it knows where to get memory for its strings from.

With these changes, the dictionary loads in 70ms (or 80ms if you include the time it takes to destroy the dictionary). This is 70% faster than the previous version, and over three times as fast if you include the destruction time.

And now that we've reached our 100ms goal, it's a good time to stop. We've gotten the running time of dictionary loading down from an uncomfortable 2080ms to a peppier 70ms, a nearly 30-fold improvement, by repeatedly profiling and focusing on where the most time is being spent. (I have some more simple tricks that shave a few more milliseconds off the startup time. Perhaps I'll bring them into play if other changes to startup push us over the 100ms boundary. As things stand, the largest CPU consumers are MultiByteToWideChar and lstrcpynW, so that's where I would focus next.)

That's the end of the first stage. The next stage will be displaying the dictionary in an owner-data listview, but you'll have to wait until next month.

Page 1 of 3 (25 items) 123