July, 2005

  • The Old New Thing

    Converting from traditional to simplified Chinese, part 1: Loading the dictionary

    • 10 Comments

    One step we had glossed over in our haste to get something interesting on the screen in our Chinese/English dictionary program was the conversion from traditional to simplified Chinese characters.

    The format of the hcutf8.txt file is a series of lines, each of which is a UTF-8 encoded string consisting of a simplified Chinese character followed by its traditional equivalents. Often, multiple traditional characters map to a single simplified character. Much more rarely—only twice in our data set—multiple simplified characters map to a single traditional character. Unfortunately, one of the cases is the common syllable 麼, which has two simplifications, either 么 or 麽, the first of which is far more productive. We'll have to keep an eye out for that one.

    (Note also that in real life, the mapping is more complicated than a character-for-character substitution, but I'm willing to forego that level of complexity because this is just for my personal use and people will have realized I'm not a native speaker long before I get caught up in language subtleties like that.)

    One could try to work out a fancy data structure to represent this mapping table compactly, but it turns out that simple is better here: an array of 65536 WCHARs, each producing the corresponding simplification. Most of the array will lie unused, since the characters we are interested in lie in the range U+4E00 to U+9FFF. Consequently, the active part of the table is only about 40Kb, which easily fits inside the L2 cache.

    It is important to know when a simple data structure is better than a complex one.

    The hcutf8.txt file contains a lot of fluff that we aren't interested in. Let's strip that out ahead of time so that we don't waste our time parsing it at run-time.

    #!perl
    $_ = <> until /^# Start zi/; # ignore uninteresting characters
    while (<>) {
     s/\r//g;
     next if length($_) == 7 &&
             substr($_, 0, 3) eq substr($_, 3, 3); # ignore NOPs
     print;
    }
    

    Run the hcutf8.txt file through this filter to clean it up a bit.

    Now we can write our "traditional to simplified" dictionary.

    class Trad2Simp
    {
    public:
     Trad2Simp();
     WCHAR Map(WCHAR chTrad) const { return _rgwch[chTrad]; }
    
    private:
     WCHAR _rgwch[65536]; // woohoo!
    };
    
    Trad2Simp::Trad2Simp()
    {
     ZeroMemory(_rgwch, sizeof(_rgwch));
    
     MappedTextFile mtf(TEXT("hcutf8.txt"));
     const CHAR* pchBuf = mtf.Buffer();
     const CHAR* pchEnd = pchBuf + mtf.Length();
     while (pchBuf < pchEnd) {
      const CHAR* pchCR = std::find(pchBuf, pchEnd, '\r');
      int cchBuf = (int)(pchCR - pchBuf);
      WCHAR szMap[80];
      DWORD cch = MultiByteToWideChar(CP_UTF8, 0, pchBuf, cchBuf,
                                      szMap, 80);
      if (cch > 1) {
       WCHAR chSimp = szMap[0];
       for (DWORD i = 1; i < cch; i++) {
        if (szMap[i] != chSimp) {
         _rgwch[szMap[i]] = chSimp;
        }
       }
       pchBuf = std::find(pchCR, pchEnd, '\n') + 1;
      }
     }
     _rgwch[0x9EBC] = 0x4E48;
    }
    

    We read the file one line at a time, convert it from UTF-8, and for each nontrivial mapping, record it in our dictionary. At the end, we do our little 么 special-case patch-up.

    Next time, we'll use this mapping table to generate simplified Chinese characters into our dictionary.

  • The Old New Thing

    The best book on ActiveX programming ever written

    • 14 Comments

    I was introduced to the glory that is the world of Mr. Bunny many years ago. Mr. Bunny's Guide to ActiveX is probably the best book on ActiveX programming ever written.

    If you haven't figured it out by now, it's a humor book, but it's the sort of madcap insane geek humor that has enough truth in it to make you laugh more.

    My favorite is the first exercise from the first chapter: Connect the dots. (Warning: It's harder than it looks!)

  • The Old New Thing

    How can I recover the dialog resource ID from a dialog window handle?

    • 4 Comments

    Occasionally, I see someone ask a question like the following.

    I have the handle to a dialog window. How can I get the original dialog resource ID that the dialog was created from?

    As we saw in our in-depth discussion of how dialogs are created from dialog templates, the dialog template itself is not saved anywhere. The purpose of a template is to act as the... well... "template" for creating a dialog box. Once the dialog box has been created, there is no need for the template any more. Consequently, there is no reason why the system should remember it.

    Besides, if the dialog were created from a runtime-generated template, saving the original parameters would leave pointers to freed memory. Furthermore, the code that created the dialog box almost certainly modified the dialog box during its WM_INITDIALOG message processing (filling list boxes with data, maybe enabling or disabling some buttons), so the dialog box you see on screen doesn't correspond to a template anywhere.

    It's like asking, "Given a plate of food, how do I recover the original cookbook and page number for the recipe?" By doing a chemical analysis of the food, you might be able to recover "a" recipe, but there is nothing in the food itself that says, "I came from The Joy of Cooking, page 253."

  • The Old New Thing

    What struck me about life in the Republic

    • 41 Comments

    When people asked me for my reaction to the most recent Star Wars movie, I replied that what struck me most was that the Republic doesn't appear to have any building codes. There are these platforms several hundred meters above the ground with no railings. For example, Padmé Amidala's fancy apartment has a front porch far above the ground. Consider: You're carrying a load of packages to the car, the kids are running around, you turn around to yell at one of them, miss a step, and over the rim you go. How many people fall to their deaths in that galaxy?

  • The Old New Thing

    What are SYSTEM_FONT and DEFAULT_GUI_FONT?

    • 22 Comments

    Among the things you can get with the GetStockObject function are two fonts called SYSTEM_FONT and DEFAULT_GUI_FONT. What are they?

    They are fonts nobody uses any more.

    Back in the old days of Windows 2.0, the font used for dialog boxes was a bitmap font called System. This is the font that SYSTEM_FONT retrieves, and it is still the default dialog box font for compatibility reasons. Of course, nobody nowadays would ever use such an ugly font for their dialog boxes. (Among other things, it's a bitmap font and therefore does not look good at high resolutions, nor can it be anti-aliased.)

    DEFAULT_GUI_FONT has an even less illustrious history. It was created during Windows 95 development in the hopes of becoming the new default GUI font, but by July 1994, Windows itself stopped using it in favor of the various fonts returned by the SystemParametersInfo function. Its existence is now vestigial.

    One major gotcha with SYSTEM_FONT and DEFAULT_GUI_FONT is that on a typical US-English machine, they map to bitmap fonts that do not support ClearType.

  • The Old New Thing

    What's the point of DeferWindowPos?

    • 23 Comments

    The purpose of the DeferWindowPos function is to move multiple child windows at one go. This reduces somewhat the amount of repainting that goes on when windows move around.

    Take that DC brush sample from a few months ago and make the following changes:

    HWND g_hwndChildren[2];
    
    BOOL
    OnCreate(HWND hwnd, LPCREATESTRUCT lpcs)
    {
     const static COLORREF s_rgclr[2] =
        { RGB(255,0,0), RGB(0,255,0) };
     for (int i = 0; i < 2; i++) {
      g_hwndChildren[i] = CreateWindow(TEXT("static"), NULL,
            WS_VISIBLE | WS_CHILD, 0, 0, 0, 0,
            hwnd, (HMENU)IntToPtr(s_rgclr[i]), g_hinst, 0);
      if (!g_hwndChildren[i]) return FALSE;
     }
     return TRUE;
    }
    

    Notice that I'm using the control ID to hold the desired color. We retrieve it when choosing our background color.

    HBRUSH OnCtlColor(HWND hwnd, HDC hdc, HWND hwndChild, int type)
    {
      Sleep(500);
      SetDCBrushColor(hdc, (COLORREF)GetDlgCtrlID(hwndChild));
      return GetStockBrush(DC_BRUSH);
    }
    
        HANDLE_MSG(hwnd, WM_CTLCOLORSTATIC, OnCtlColor);
    

    I threw in a half-second sleep. This will make the painting a little easier to see.

    void
    OnSize(HWND hwnd, UINT state, int cx, int cy)
    {
      int cxHalf = cx/2;
      SetWindowPos(g_hwndChildren[0],
                   NULL, 0, 0, cxHalf, cy,
                   SWP_NOZORDER | SWP_NOOWNERZORDER | SWP_NOACTIVATE);
      SetWindowPos(g_hwndChildren[1],
                   NULL, cxHalf, 0, cx-cxHalf, cy,
                   SWP_NOZORDER | SWP_NOOWNERZORDER | SWP_NOACTIVATE);
    }
    

    We place the two child windows side by side in our client area. For our first pass, we'll use the SetWindowPos function to position the windows.

    Compile and run this program, and once it's up, click the maximize box. Observe carefully which parts of the green rectangle get repainted.

    Now let's change our positioning code to use the DeferWindowPos function. The usage pattern for the deferred window positioning functions is as follows:

    HDWP hdwp = BeginDeferWindowPos(n);
    if (hdwp) hdwp = DeferWindowPos(hdwp, ...); // 1 [fixed 7/7]
    if (hdwp) hdwp = DeferWindowPos(hdwp, ...); // 2
    if (hdwp) hdwp = DeferWindowPos(hdwp, ...); // 3
    ...
    if (hdwp) hdwp = DeferWindowPos(hdwp, ...); // n
    if (hdwp) EndDeferWindowPos(hdwp);
    

    There are some key points here.

    • The value you pass to the BeginDeferWindowPos function is the number of windows you intend to move. It's okay if you get this value wrong, but getting it right will reduce the number of internal reallocations.
    • The return value from DeferWindowPos is stored back into the hdwp because the return value is not necessarily the same as the value originally passed in. If the deferral bookkeeping needs to perform a reallocation, the DeferWindowPos function returns a handle to the new defer information; the old defer information is no longer valid. What's more, if the deferral fails, the old defer information is destroyed. This is different from the realloc function which leaves the original object unchanged if the reallocation fails. The pattern p = realloc(p, ...) is a memory leak, but the pattern hdwp = DeferWindowPos(hdwp, ...) is not.

    That second point is important. Many people get it wrong.

    Okay, now that you're all probably scared of this function, let's change our repositioning code to take advantage of deferred window positioning. It's really not that hard at all. (Save these changes to a new file, though. We'll want to run the old and new versions side by side.)

    void
    OnSize(HWND hwnd, UINT state, int cx, int cy)
    {
      HDWP hdwp = BeginDeferWindowPos(2);
      int cxHalf = cx/2;
      if (hdwp) hdwp = DeferWindowPos(hdwp, g_hwndChildren[0],
                   NULL, 0, 0, cxHalf, cy,
                   SWP_NOZORDER | SWP_NOOWNERZORDER | SWP_NOACTIVATE);
      if (hdwp) hdwp = DeferWindowPos(hdwp, g_hwndChildren[1],
                   NULL, cxHalf, 0, cx-cxHalf, cy,
                   SWP_NOZORDER | SWP_NOOWNERZORDER | SWP_NOACTIVATE);
      if (hdwp) EndDeferWindowPos(hdwp);
    }
    

    Compile and run this program, and again, once it's up, maximize the window and observe which regions repaint. Observe that there is slightly less repainting in the new version compared to the old version.

  • The Old New Thing

    Answers to yesterday's holiday fun puzzles

    • 3 Comments

    Puzzle 1: This was a word search consisting of the names of the twelve streets of central downtown Seattle. The unused letters spell out the message "Issaquah year's supply hair conditioner", which takes you to the Issaquah Costco. (In the real puzzle, the secret message was much odder but relied on an inside joke.)

    Puzzle 2: The cryptogram decodes as follows:

    "There's no such thing as a stupid question." Go to the information desk at Center House and ask if they have any tickets available for the Mariners game.

    This was itself a bit of an inside joke, because my friend worked at the information desk at Center House and had to answer stupid questions like this one. (The Mariners play at Safeco Field, not Seattle Center.)

    The clues at the end told you how to map each letter. For example, the first clue "adieu" says that the letter "A" in the cryptogram maps to "U" in the cleartext. The cryptogram was easy enough that my friends didn't need the bonus help, but in case you did, here are the answers: adieu, brook, coolj, dweeb, essay, fungi, genre, humor, incus, johnq, kazoo, lyric, mymtv, novel, overt, poach, quaff, rolex, soyuz, turow, usurp, venom, wrong, xenon, yield, zebra.

    Puzzle 3: This is a double-acrostic puzzle. (Instructions on how to solve a double-acrostic.)

    Give every book fifty pages before you commit to it or give it up. If you're over fifty, take your age and subtract it from one hundred—the result is the number of pages you should read before deciding. Time is too short to read something you don't like.

    The answers to the clues are as follows:

    1. yo-yo diet
    2. overdue
    3. University of Illinois
    4. refought
    5. fondue pot (my friend likes fondue)
    6. indecisive
    7. red-eye
    8. soggy (my friend hates soggy corn flakes)
    9. The Herb Farm
    10. stork (original clue referenced friends who are expecting their first child)
    11. eco-tourism
    12. abbey
    13. tiff
    14. thud
    15. Love-Sac (my friend has one in her living room)
    16. eighth (original clue used Seattle library trivia)
    17. audio-book
    18. pink
    19. Asteroid (original clue referenced a meal we had there)
    20. roommate (original clue referenced her current one)
    21. tagged (she plays softball)
    22. Metro bus route fourteen (with which she's very familiar)
    23. effigy
    24. notary (my friend is a notary too)
    25. trumpet

    The secret message is "Your first Seattle apartment", where she was greeted by her first roomate!

    As you can see, a lot of the clues used inside information. There are also several library-related clues since my friend volunteers for the Seattle Public Library, and the quotation itself was a bit of a gimme because my friend is a huge Nancy Pearl fan.

    Puzzle 4: A straightforward Jumble with a Seattle Center theme. Key Arena, Center House, Space Needle, Monorail and Fun Forest are the anagrams, leading to the destination Earth & Ocean, my friend's favorite dessert restaurant. At Earth and Ocean, she was treated to lunch including one of every dessert on the menu and a special visit from the dessert chef herself.

    Puzzle 5: The solution to the riddle is "Sim + foe + knee = symphony", which led her to Benaroya Hall.

    Puzzle 6: The first series consists of baseball-related terms: shortstop, pinch hitter, home run, umpire, triple, strikeout, sacrifice, infield, line drive, and center field. The second series consists of names of teams in the NBA: Super Sonics, Pistons, Hornets, Rockets, Celtics, Clippers, Cavaliers, Mavericks, Timber Wolves, and Trail Blazers. The final words are "CENTRAL LIBRARY", which of course takes her to Seattle Central Library, where the big party awaited her.

    This was a very busy day for me, constantly tracking my friend's progress through the puzzles, calling all her friends to make sure they were in position, then calling them again after she left to tell them where the party was going to be. (Didn't want to risk them letting slip the final location with a casual remark like, "See you at the library!") Amazingly, she stayed pretty close to the schedule I had sketched out, except at the very end where we needed to stall her for about a half an hour so she wouldn't show up before her party guests!

  • The Old New Thing

    Using script to query information from Internet Explorer windows

    • 14 Comments

    Some time ago, we used C++ to query information from the ShellWindows object and found it straightforward but cumbersome.

    This is rather clumsy from C++ because the ShellWindows object was designed for use by a scripting language like JScript or Visual Basic.

    Let's use one of the languages the ShellWindows object was designed for to enumerate all the open shell windows. Run it with the command line cscript sample.js.

    var shellWindows = new ActiveXObject("Shell.Application").Windows();
    for (var i = 0; i < shellWindows.Count; i++) {
      var w = shellWindows.Item(i);
      WScript.StdOut.WriteLine(w.LocationName + "=" + w.LocationURL);
    }
    

    Well that was quite a bit shorter, wasn't it!

  • The Old New Thing

    Some holiday fun: Puzzle supplementary material

    • 6 Comments

    (Note: This makes sense only after you've gone through the other puzzles.)

    In case my friend got stuck, she could call me and ask for a hint. The original plan was that I would charge her a dare in order to get a hint, but it turns out I was too nice a guy to make her do any of the following things, but here's the list anyway:

    • Scream "I love you, Seattle!" at a street corner
    • Skip across the street
    • Walk sideways
    • Caress a lamppost
    • Dance
    • Sing the national anthem (any country) [typo fixed 7/6]
    • Ask for directions to where you are
    • Recite six lines from a favorite book or poem
    • Wave and shout hello to a traffic light
  • The Old New Thing

    Some holiday fun: Puzzle #6

    • 2 Comments

    (Note: Read the puzzles in order from 1 to 6 for them to make sense.)

    To see the final puzzle you need an SVG- or VML-enabled browser.

    S H O R T S T O P P H T T R S U P E R S O N I C S P I S O N S

    Answers to all puzzles will be posted tomorrow. Please do not post spoilers.

Page 3 of 4 (37 items) 1234