• The Old New Thing

    You can't escape those AOL CDs

    • 15 Comments

    One of my colleagues was unpacking one of those $30,000 quad-processor more-memory-than-you-know-what-to-do-with super-server computers. The kind that require their own electrical substation.

    And it came with an AOL CD.

    It's like buying a Lexus and finding a 35-cents-off coupon in the glove compartment.

    Apparently, one of the questions AOL tech support asks when people call in complaining that they can't get their AOL CD to work is, "Do you have a computer?" [Opening Panel Round, second question] because so many people who don't have computers stick the CD into their stereo or DVD player and can't get it to work.

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    Loading the dictionary, part 4: Character conversion redux

    • 24 Comments

    Getting rid of getline was a big help, but 480ms is still not quite peppy enough. You need to respond to user actions within a tenth of a second for thing to seem responsive.

    Profiling the latest endeavor reveals that 40% of our CPU time is spent in codecvt::in. Some debugging reveals that codecvt::in ultimately calls MultiByteToWideChar but uses it to convert only one or two characters at a time, even though we handed it a whole line.

    Let's get rid of codecvt::in and convert the characters ourselves, calling MultiByteToWideChar exactly once to convert the entire line at a single go.

    #define CP_BIG5 950
    
    Dictionary::Dictionary()
    {
     MappedTextFile mtf(TEXT("cedict.b5"));
     // typedef std::codecvt<wchar_t, char, mbstate_t> widecvt;
     // std::locale l(".950");
     // const widecvt& cvt = _USE(l, widecvt); // use_facet<widecvt>(l);
     const CHAR* pchBuf = mtf.Buffer();
     const CHAR* pchEnd = pchBuf + mtf.Length();
     while (pchBuf < pchEnd) {
      const CHAR* pchEOL = std::find(pchBuf, pchEnd, '\n');
      if (*pchBuf != '#') {
       size_t cchBuf = pchEOL - pchBuf;
       wchar_t* buf = new wchar_t[cchBuf];
       DWORD cchResult = MultiByteToWideChar(CP_BIG5, 0,
                              pchBuf, cchBuf, buf, cchBuf);
       if (cchResult) {
        wstring line(buf, cchResult);
        DictionaryEntry de;
        if (de.Parse(line)) {
         v.push_back(de);
        }
       }
       delete[] buf;
      }
      pchBuf = pchEOL + 1;
     }
    }
    

    Instead of using the codecvt::in method to perform character conversion, we go straight to the MultiByteToWideChar function. Notice that we assume that the Big5 string will not generate more Unicode characters than its length in bytes. This happens to be a safe assumption based on our external knowledge of the Big5 encoding. (If the encoding were something else, the assumption may no longer be valid.)

    With this change, the dictionary load time has dropped to 240ms (or 300ms if you include the time it takes to destroy the dictionary). That's twice as fast the previous version, but still not quite close enough to the 100ms goal. We still have some work ahead of us.

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    Loading the dictionary, part 3: Breaking the text into lines

    • 28 Comments

    Even after moving the character conversion out of the getline function, profiling reveals that getline is still taking nearly 50% of our CPU. The fastest code is code that isn't there, so let's get rid of getline altogether. Oh wait, we still need to break the file into lines. But maybe we can break the file into lines faster than getline did.

    #include <algorithm>
    
    class MappedTextFile
    {
    public:
     MappedTextFile(LPCTSTR pszFile);
     ~MappedTextFile();
    
     const CHAR *Buffer() { return m_p; }
     DWORD Length() const { return m_cb; }
    
    private:
     PCHAR   m_p;
     DWORD   m_cb;
     HANDLE  m_hf;
     HANDLE  m_hfm;
    };
    
    MappedTextFile::MappedTextFile(LPCTSTR pszFile)
        : m_hfm(NULL), m_p(NULL), m_cb(0)
    {
     m_hf = CreateFile(pszFile, GENERIC_READ, FILE_SHARE_READ,
                       NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
     if (m_hf != INVALID_HANDLE_VALUE) {
      DWORD cb = GetFileSize(m_hf, NULL);
      m_hfm = CreateFileMapping(m_hf, NULL, PAGE_READONLY, 0, 0, NULL);
      if (m_hfm != NULL) {
       m_p = reinterpret_cast<PCHAR>
                     (MapViewOfFile(m_hfm, FILE_MAP_READ, 0, 0, cb));
       if (m_p) {
        m_cb = cb;
       }
      }
     }
    }
    
    MappedTextFile::~MappedTextFile()
    {
     if (m_p) UnmapViewOfFile(m_p);
     if (m_hfm) CloseHandle(m_hfm);
     if (m_hf != INVALID_HANDLE_VALUE) CloseHandle(m_hf);
    }
    

    This very simple class babysits a read-only memory-mapped file. (Yes, there is a bit of oddness with files greater than 4GB, but let's ignore that for now, since it's a distraction from our main point.)

    Now that the file is memory-mapped, we can just scan it directly.

    Dictionary::Dictionary()
    {
     MappedTextFile mtf(TEXT("cedict.b5"));
     typedef std::codecvt<wchar_t, char, mbstate_t> widecvt;
     std::locale l(".950");
     const widecvt& cvt = _USE(l, widecvt); // use_facet<widecvt>(l);
     const CHAR* pchBuf = mtf.Buffer();
     const CHAR* pchEnd = pchBuf + mtf.Length();
     while (pchBuf < pchEnd) {
      const CHAR* pchEOL = std::find(pchBuf, pchEnd, '\n');
      if (*pchBuf != '#') {
       size_t cchBuf = pchEOL - pchBuf;
       wchar_t* buf = new wchar_t[cchBuf];
       mbstate_t state = 0;
       char* nextsrc;
       wchar_t* nextto;
       if (cvt.in(state, pchBuf, pchEOL, nextsrc,
                       buf, buf + cchBuf, nextto) == widecvt::ok) {
        wstring line(buf, nextto - buf);
        DictionaryEntry de;
        if (de.Parse(line)) {
         v.push_back(de);
        }
       }
       delete[] buf;
      }
      pchBuf = pchEOL + 1;
     }
    }
    

    We simply scan the memory-mapped file for a '\n' character, which tells us where the line ends. This tells us the location and length of the line, which we use to convert it to Unicode and continue our parsing.

    Exercise:Why don't we have to worry about the carriage return that comes before the linefeed?

    Exercise:Why don't we have to worry about possibly reading past the end of the file when we check *pchBuf != '#'?

    With this change, the program now loads the dictionary in 480ms (or 550ms if you include the time it takes to destroy the dictionary). That's over twice as fast as the previous version.

    But it's still not fast enough. A half-second delay between hitting Enter and getting the visual feedback is still unsatisfying. We can do better.

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    The Microsoft corporate network: 1.7 times worse than hell

    • 17 Comments

    Today I'm going to tell a story from 1996. Why? Because I can.

    One of the tests performed by Windows Hardware Quality Labs (WHQL) was the NCT packet stress test which had the nickname "Hell". The purpose of the test was to flood a network card with an insane number of packets, in order to see how it handled extreme conditions. It uncovered packet-dropping bugs, timing problems, all sorts of great stuff. Network card vendors used it to determine what size internal hardware buffers should be in order to cover "all reasonable network traffic scenarios".

    It so happened that at the time this test had currency (1996 era), the traffic on the Microsoft corporate network was approximately 1.7 times worse than the NCT packet stress test. A card could pass the Hell test with flying colors, yet drop 90% of its packets when installed on a computer at Microsoft because the card simply couldn't keep up with the traffic.

    The open secret among network card vendors was, "If you want your card to work with Windows, submit one card to WHQL and send another to a developer on the Windows team."

    (This rule applied to hardware other than network cards. I was "gifted" a sound card from a major manufacturer and installed it on my main machine. It wasn't long before I found and fixed a crashing bug in their driver.)

    [Raymond is currently on vacation; this message was pre-recorded.]

  • The Old New Thing

    Loading the dictionary, part 2: Character conversion

    • 24 Comments

    When you want to optimize a program, you first need to know where the time is being spent. There's no point optimizing a function that isn't actually responsible for your poor performance. For example, if a particular function is responsible for 2% of your CPU time, then even if you optimized it down to infinite speed, your program would speed up at best by only little over 2%. In the comments to yesterday's entry, several people put forth suggestions as to how the program could be optimized, in the process quite amply demonstrating this principle. None of the people who made suggestions actually investigated the program to see where the glaring bottleneck was.

    (Rico Mariani points out that you also need to take performance in account when doing high level designs, choosing algorithms and data structures that are suitable for the level of performance you need. If profiling reveals that a fundamental design decision is responsible for a performance bottleneck, you're in big trouble. You will see this sort of performance-guided design as the program develops. And you should check out Performance Quiz #6 which starts with the very program we're developing here.)

    Upon profiling our dictionary-loader, I discovered that 80% of the CPU time was spent in getline. Clearly this is where the focus needs to be. Everything else is just noise.

    Digging a little deeper, it turns out that 29% of the CPU time was spent by getline doing character set conversion in codecvt::do_in. Some debugging revealed that codecvt::do_in was being called millions of times, each time converting just one or two characters. In fact, for each character in the file, codecvt::do_in was called once and sometimes twice!

    Let's get rid of the piecemeal character set conversion and instead convert entire lines at a time.

    Dictionary::Dictionary()
    {
     std::ifstream src;
     typedef std::codecvt<wchar_t, char, mbstate_t> widecvt;
     std::locale l(".950");
     const widecvt& cvt = _USE(l, widecvt); // use_facet<widecvt>(l);
     src.open("cedict.b5");
     string s;
     while (getline(src, s)) {
      if (s.length() > 0 && s[0] != L'#') {
       wchar_t* buf = new wchar_t[s.length()];
       mbstate_t state = 0;
       char* nextsrc;
       wchar_t* nextto;
       if (cvt.in(state, s.data(), s.data() + s.length(), nextsrc,
                       buf, buf + s.length(), nextto) == widecvt::ok) {
        wstring line(buf, nextto - buf);
        DictionaryEntry de;
        if (de.Parse(line)) {
         v.push_back(de);
        }
       }
       delete[] buf;
      }
     }
    }
    

    Instead of using a wifstream, we just use a non-Unicode ifstream and convert each line to Unicode manually. Doing it a line at a time rather than a character at a time, we hope, will be more efficient.

    We ask code page 950 for a converter, which we call cvt. Notice that the Microsoft C++ compiler requires you to use the strange _USE macro instead of the more traditional use_facet.

    For each line that isn't a comment, we convert it to Unicode. Our lives are complicated by the fact that codecvt::in requires pointers to elements rather than iterators, which means that we can't use a wstring or a vector; we need a plain boring wchar_t[] array. (Notice that we can cheat on the "from" buffer and use the string::data() function to get at a read-only array representation of the string.) If the conversion succeeds, we convert the array into a proper string and continue as before.

    With this tweak, the program now loads the dictionary in 1120ms (or 1180ms if you include the time it takes to destroy the dictionary). That's nearly twice as fast as the previous version.

    You might think that we could avoid redundant allocations by caching the temporary conversion buffer between lines. I tried that, and surprisingly, it actually slowed the program down by 10ms. Such is the counter-intuitive world of optimization. That's why it's important to identify your bottlenecks via measurement instead of just guessing at them.

  • The Old New Thing

    Loading the dictionary, part 1: Starting point

    • 33 Comments

    The first thing we'll need to do in our little dictionary program is to load the dictionary into memory. The format of the dictionary file is as a plain text file, each line of which is of the form

    Chinese [pinyin] /English 1/English 2/.../
    舉例 [ju3 li4] /to give an example/
    

    Since it was the Big5 dictionary we downloaded, the Chinese characters are in Big5 format, known to Windows as code page 950. Our program will be Unicode, so we'll have to convert it as we load the dictionary. Yes, I could've used the Unicode version of the dictionary, but it so happens that when I set out to write this program, there was no Unicode version available. Fortunately, this oversight opened up the opportunity to illustrate some other programming decisions and techniques.

    The first stage in our series of exercises will be loading the dictionary into memory.

    #define UNICODE
    #define _UNICODE
    #include <windows.h>
    #include <string>
    #include <fstream>
    #include <iostream> // for cin/cout
    #include <vector>
    
    using std::string;
    using std::wstring;
    using std::vector;
    
    struct DictionaryEntry
    {
     bool Parse(const wstring& line);
     wstring trad;
     wstring simp;
     wstring pinyin;
     wstring english;
    };
    
    bool DictionaryEntry::Parse(const wstring& line)
    {
        wstring::size_type start = 0;
        wstring::size_type end = line.find(L' ', start);
        if (end == wstring::npos) return false;
        trad.assign(line, start, end);
        start = line.find(L'[', end);
        if (start == wstring::npos) return false;
        end = line.find(L']', ++start);
        if (end == wstring::npos) return false;
        pinyin.assign(line, start, end - start);
        start = line.find(L'/', end);
        if (start == wstring::npos) return false;
        start++;
        end = line.rfind(L'/');
        if (end == wstring::npos) return false;
        if (end <= start) return false;
        english.assign(line, start, end-start);
        return true;
    }
    
    class Dictionary
    {
    public:
     Dictionary();
     int Length() { return v.size(); }
     const DictionaryEntry& Item(int i) { return v[i]; }
    private:
     vector<DictionaryEntry> v;
    };
    
    Dictionary::Dictionary()
    {
     std::wifstream src;
     src.imbue(std::locale(".950"));
     src.open("cedict.b5");
     wstring s;
     while (getline(src, s)) {
      if (s.length() > 0 && s[0] != L'#') {
       DictionaryEntry de;
       if (de.Parse(s)) {
        v.push_back(de);
       }
      }
     }
    }
    
    int __cdecl main(int argc, const char* argv[])
    {
     DWORD dw = GetTickCount();
     {
      Dictionary dict;
      std::cout << dict.Length() << std::endl;
      std::cout << GetTickCount() - dw << std::endl;
     }
     std::cout << GetTickCount() - dw << std::endl;
     return 0;
    }
    

    Our dictionary is just a list of words with their English definitions. The Chinese words are written in three forms (traditional Chinese, simplified Chinese, and Pinyin romanization). For those who are curious, there are two writing systems for the Mandarin Chinese language and two phonetic systems. Which one a particular Mandarin-speaking population follows depends on whether they fell under the influence of China's language reform of 1956. Traditional Chinese characters and the Bopomo phonetic system (also called Bopomofo) are used on Taiwan; simplified Chinese characters and the Pinyin system are used in China. Converting Pinyin to Bopomo isn't interesting, so I've removed that part from the program I'm presenting here.

    (The schism in the spelling of the English language follows a similar pattern. Under the leadership of Noah Webster, the United States underwent its own spelling reform, but countries which were under the influence of the British crown retained the traditional spellings. Spelling reform continues in other languages even today, and the subject is almost always highly contentious, with traditionalists and reformists pitted against each other in a battle over a language's—and by proxy, a culture's—identity.)

    The program itself is fairly straightforward. It creates a Unicode file stream wifstream and "imbues" it with code page 950 (Big5). This instructs the runtime to interpret the bytes of the file interpreted in the specified code page. We read strings out of the file, ignore the comments, and parse the rest, appending them to our vector of dictionary entries.

    Parsing the line consists of finding the spaces, brackets, and slashes, and splitting the line into the traditional Chinese, Pinyin, and English components. (We'll deal with simplified Chinese later.)

    When I run this program on my machine, the dictionary loads in 2080ms (or 2140ms if you include the time to run the destructor). This is an unacceptably long startup time, so the first order of business is to make startup faster. That will be the focus of this stage.

    Notice that as a sanity check, I print the total number of words in the dictionary. The number should match the number of lines in the cedict.b5 file (minus the one comment line). If not, then I know that something went wrong. This is an important sanity check: You might make a performance optimization that looks great when you run it past a stopwatch, only to discover that your "optimization" actually introduced a bug. For example, one of my attempted optimizations of this program resulted in a phenomenal tenfold speedup, but only because of a bug that caused it to think it was finished when it had in reality processed only 10% of the dictionary!

    As my colleague Rico Mariani is fond of saying, "It's easy to make it fast if it doesn't have to work!"

  • The Old New Thing

    Developing a Chinese/English dictionary: Introduction

    • 28 Comments

    The other day, one of my colleagues mentioned that his English name "Ben" means "stupid" in Chinese: 笨/bèn/ㄅㄣˋ. (His wife is Chinese; that's why he knows this in the first place.) Knowing that the Chinese language is rich in homophones, I fired up my Chinese/English dictionary program to see if we could find anything better. (Unfortunately, the best I could come up with was 賁/贲/bēn/ㄅㄣ, which means "energetic".)

    Ben seemed to take his appellative fate in stride; he seemed much more interested in the little dictionary program I had written. So, as an experiment, instead of developing tiny samples that illustrate a very focused topic, I'll develop a somewhat larger-scale program (though still small by modern standards) so you can see how multiple techniques come together. The task will take many stages, some of which may take only a day or two, others of which can take much longer. If a particular stage is more than two or three days long, I'll break it up with other articles, and I'll try to leave some breathing room between stages.

    Along the way, we'll learn about owner-data (also known as "virtual") listviews, listview custom-draw, designing for accessibility, window subclassing, laying out child windows, code pages, hotkeys, and optimization.

    If you're going to play along at home, beware that you're going to have to install Chinese fonts to see the program as it evolves, and when you're done, you'll have a Chinese/English dictionary program, which probably won't be very useful unless you're actually studying Chinese...

    If you're not into Win32 programming at all, then, well, my first comment to you is, "So what are you doing here?" And my second comment is, "I guess you're going to be bored for a while." You may want to go read another blog during those boring stretches, or just turn off the computer and go outside for some fresh air and exercise.

    Those who have decided to play along at home will need the following: a copy of the CEDICT Chinese-English dictionary in Big5 format (note: Big5 format) and the Chinese Encoding Converter source code (all we need is the file hcutf8.txt). We'll start digging in next time.

  • The Old New Thing

    How to query properties of the taskbar

    • 14 Comments

    Occasionally, people want to query properties of the taskbar. I don't quite understand why; you should just get on with your life and let the taskbar get on with its life. After all, there might not even be a taskbar, as we discussed last time.

    But if you really want to know (perhaps you're collecting usability data), here's how:

    #include <stdio.h>
    #include <windows.h>
    #include <shellapi.h>
    
    int __cdecl main(int argc, const char* argv[])
    {
     APPBARDATA abd = { sizeof(abd) };
     UINT uState = (UINT)SHAppBarMessage(ABM_GETSTATE, &abd);
     printf("Taskbar on top? %s\n",
            (uState & ABS_ALWAYSONTOP) ? "yes" : "no");
     printf("Taskbar autohide? %s\n",
            (uState & ABS_AUTOHIDE) ? "yes" : "no");
     return 0;
    }
    

    This little program uses the ABM_GETSTATE message of the SHAppBarMessage function to get the "Always on top" and "Auto-hide" properties of the taskbar.

    Since you're using the SHAppBarMessage function, if a future version of Windows changes the way it maintains the taskbar state (or perhaps even changes the name of the taskbar to something else), your program will still work because the SHAppBarMessage function will be kept in sync with whatever changes happen to the taskbar.

    You can also use the ABM_SETSTATE message to change these states. Note that doing so from a program is discouraged; these state bits belong to the user's preferences. A program shouldn't be messing with the user's preferences. (Well, unless the whole point of the program is to change the user's preferences, of course. But the frequency with which I see this question makes me wonder whether there really are that many settings-tweaking programs out there. I suspect people are using this power for evil, not for good.)

    And to stave off follow-up questions: These are the only two properties of the taskbar that are programmable. Exposing a programmable interface for something as highly visible as the taskbar is a very sensitive issue, because once you grant programmatic access to something, there is a very strong temptation for programs to start abusing it.

  • The Old New Thing

    How do I cover the taskbar with a fullscreen window?

    • 44 Comments

    For some reason, people think too hard. If you want to create a fullscreen window that covers the taskbar, just create a fullscreen window and the taskbar will automatically get out of the way. Don't go around hunting for the taskbar and poking it; let it do its thing.

    As always, start with the scratch program and add the following:

    HWND CreateFullscreenWindow(HWND hwnd)
    {
     HMONITOR hmon = MonitorFromWindow(hwnd,
                                       MONITOR_DEFAULTTONEAREST);
     MONITORINFO mi = { sizeof(mi) };
     if (!GetMonitorInfo(hmon, &mi)) return NULL;
     return CreateWindow(TEXT("static"),
           TEXT("something interesting might go here"),
           WS_POPUP | WS_VISIBLE,
           mi.rcMonitor.left,
           mi.rcMonitor.top,
           mi.rcMonitor.right - mi.rcMonitor.left,
           mi.rcMonitor.bottom - mi.rcMonitor.top,
           hwnd, NULL, g_hinst, 0);
    }
    
    void OnChar(HWND hwnd, TCHAR ch, int cRepeat)
    {
     if (ch == TEXT(' ')) {
      CreateFullscreenWindow(hwnd);
     }
    }
    
        HANDLE_MSG(hwnd, WM_CHAR, OnChar);
    

    Note that this sample program doesn't worry about destroying that fullscreen window or preventing the user from creating more than one. It's just a sample. The point is seeing how the CreateFullScreenWindow function is written.

    We use the MonitorFromWindow function to figure out which monitor we should go fullscreen to. Note that in a multiple monitor system, this might not be the same monitor that the taskbar is on. Fortunately, we don't have to worry about that; the taskbar figures it out.

    I've seen people hunt for the taskbar window and then do a ShowWindow(hwndTaskbar, SW_HIDE) on it. This is nuts for many reasons.

    First is a mental exercise you should always use when evaluating tricks like this: "What if two programs tried this trick?" Now you have two programs both of which think they are in charge of hiding and showing the taskbar, neither of which is coordinating with the other. The result is a mess. One program hides the taskbar, then the other does, then the first decides it's finished so it unhides the taskbar, but the second program wasn't finished yet and gets a visible taskbar when it thought it should be hidden. Things only go downhill from there.

    Second, what if your program crashes before it gets a chance to unhide the taskbar? The taskbar is now permanently hidden and the user has to log off and back on to get their taskbar back. That's not very nice.

    Third, what if there is no taskbar at all? It is common in Terminal Server scenarios to run programs by themselves without Explorer [archived]. In this configuration, there is no Explorer, no taskbar. Or maybe you're running on a future version of Windows that doesn't have a taskbar, it having been replaced by some other mechanism. What will your program do now?

    Don't do any of this messing with the taskbar. Just create your fullscreen window and let the taskbar do its thing automatically.

  • The Old New Thing

    When people ask for security holes as features: Stealing passwords

    • 33 Comments

    Sometimes people ask for features that are such blatant security holes I don't know what they were thinking.

    Is there a way to get the current user's password? I have a program that does some stuff, then reboots the system, and I want to have the current user's password so I can log that user back in when I'm done, then my program can resume its operation.

    (Sometimes they don't bother explaining why they need the user's password; they just ask for it.)

    Imagine the fantastic security hole if this were possible. Anybody could write a program that steals your password without even having to trick you into typing it. They would just call the imaginary GetPasswordOfCurrentUser function and bingo! they have your password.

    For another angle on credential-stealing, read Larry Osterman's discussion of why delegation doesn't work over the network.

    Even if you didn't want the password itself but merely some sort of "cookie" that could be used to log the user on later, you still have a security hole. Let's call this imaginary function GetPasswordCookieOfCurrentUser; it returns a "cookie" that can be used to log the user on instead of using their password.

    This is just a thinly-disguised GetPasswordOfCurrentUser because that "cookie" is equivalent to a password. Log on with the cookie and you are now that person.

Page 375 of 449 (4,482 items) «373374375376377»