• The Old New Thing

    The apocryphal history of file system tunnelling

    • 34 Comments

    One of the file system features you may find yourself surprised by is tunneling, wherein the creation timestamp and short/long names of a file are taken from a file that existed in the directory previously. In other words, if you delete some file "File with long name.txt" and then create a new file with the same name, that new file will have the same short name and the same creation time as the original file. You can read this KB article for details on what operations are sensitive to tunnelling.

    Why does tunneling exist at all?

    When you use a program to edit an existing file, then save it, you expect the original creation timestamp to be preserved, since you're editing a file, not creating a new one. But internally, many programs save a file by performing a combination of save, delete, and rename operations (such as the ones listed in the linked article), and without tunneling, the creation time of the file would seem to change even though from the end user's point of view, no file got created.

    As another example of the importance of tunneling, consider that file "File with long name.txt", whose short name is say "FILEWI~1.TXT". You load this file into a program that is not long-filename-aware and save it. It deletes the old "FILEWI~1.TXT" and creates a new one with the same name. Without tunnelling, the associated long name of the file would be lost. Instead of a friendly long name, the file name got corrupted into this thing with squiggly marks. Not good.

    But where did the name "tunneling" come from?

    From quantum mechanics.

    Consider the following analogy: You have two holes in the ground, and a particle is in the first hole (A) and doesn't have enough energy to get out. It only has enough energy to get as high as the dotted line.

             
      A   B  

    You get distracted for a little while, maybe watch the Super Bowl halftime show, and when you come back, the particle somehow is now in hole B. This is impossible in classical mechanics, but thanks to the wacky world of quantum mechanics, it is not only possible, but actually happens. The phenomenon is known as tunneling, because it's as if the particle "dug a tunnel" between the two holes, thereby allowing it to get from one hole to another without ever going above the dotted line.

    In the case of file system tunneling, it is information that appears to violate the laws of classical mechanics. The information was destroyed (by deleting or renaming the file), yet somehow managed to reconstruct itself on the other side of a temporal barrier.

    The developer who was responsible for implementing tunneling on Windows 95 got kind of carried away with the quantum mechanics analogy: The fragments of information about recently-deleted or recently-renamed files are kept in data structures called "quarks".

  • The Old New Thing

    When Marketing edits your PDC talk description

    • 23 Comments

    A few years ago, I told a story of how Marketing messed up a bunch of PDC slides by "helpfully" expanding acronyms... into the wrong phrases. Today I got to see Marketing's handiwork again, as they edited my talk description. (Oh, and psst, Marketing folks, you might want to link to the full list of PDC sessions from your Conference Tracks and Sessions page. Unless, of course, y'know, you don't want people to know about it.)

    For one thing, they stuck my name into the description of the talk, thereby drawing attention to me rather than putting the focus on the actual talk topic. Because I'm not there to be me. I'm there to give a talk. If I were just there to be me, the title would be "Raymond Chen reads the newspaper for an hour while listening to music on his headphones."

    (That's why I don't do interviews. Interviews are about the interviewee, and I don't want to talk about me. People should care about the technology, not the people behind it.)

    They also trimmed my topic list but stopped before the punch line.

    ... asynchronous input queues, the hazards of attaching thread input, and other tricks and traps ...

    The punch line was "... and how it happens without your knowledge." After all, you don't care about the fine details of a feature you don't use. The point is that it's happening behind your back so you'd better know about it because you're using it whether you realize it or not.

    They also took out the reference to finger puppets.

  • The Old New Thing

    Where did the names of the computer Hearts opponents come from?

    • 13 Comments

    A Windows 95 story in commemoration of the tenth anniversary of its release to manufacturing (RTM).

    Danny Glasser explains where the names for the computer opponents in the game Hearts came from.

    I didn't myself know where the names came from, but Danny's explanation of the source of the Windows 95 names brought back memories of the child of one of our co-workers, whose name I will not reveal but you can certainly narrow it down to one of three. He/she was exceedingly well-behaved and definitely helped to make those long hours slightly more tolerable. I remember once we heard the receptionist's voice come over the public address system, which was itself quite a shock because nobody ever uses the public address system. The message was, "Will X please come to the receptionist's desk. Your son/daughter is here."

    Space Cadet JimH picks up the story and explains how he went about writing the computer player logic. (And no, the computer players don't cheat.)

  • The Old New Thing

    Converting from traditional to simplified Chinese, part 3: Highlighting differences

    • 5 Comments

    One of the things that is interesting to me as a student of the Chinese languages is to recognize where the traditional and simplified Chinese scripts differ. Since this is my program, I'm going to hard-code the color for simplified Chinese script: maroon.

    To accomplish the highlighting, we take advantage of listview's custom-draw feature. Custom-draw allows you to make minor changes to the way items are displayed on the screen. It's a middle ground between having listview do all the work (via default drawing behavior) and having the program do all the work (via owner-draw).

    The custom-draw cycle for shell common controls consists of series of NM_CUSTOMDRAW notifications, starting with the most general and getting more specific. The reason for the break-down is multi-fold. First, it allows the listview control to short-circuit a portion of custom-draw behavior if the parent window does not indicate that it wishes to customize a particular behavior. This reduces message traffic and improves performance when large numbers of items are being drawn. Second, it allows the parent window to target its customizations to the drawing stages it is interested in.

    Listviews are peculiar among the shell common controls in that its items sometimes (but not always) have sub-items. This complicates the drawing process since it requires listview to accomodate both styles: large icon view does not use sub-items, but report view does. To address this, the CDDS_ITEMPREPAINT stage is entered when an item is about to paint, and any changes made by the parent window are considered to be effective for the entire item. If you want to make changes on a per-subitem basis, you have to respond to CDDS_ITEMPREPAINT | CDDS_SUBITEM and set your properties (or reset them if you want to return to the default) for that sub-item.

    With those preliminary remarks settled, we can dive in.

    class RootWindow : public Window
    {
     ...
    protected:
     ...
     LRESULT OnLVCustomDraw(NMLVCUSTOMDRAW* pcd);
     ...
    private:
     HWND m_hwndLV;
     COLORREF m_clrTextNormal;
     Dictionary m_dict;
    };
    

    We declare our listview custom-draw handler as well as the member variable in which we remember the normal text color so that we can reset it for columns we do not intend to colorize.

    LRESULT RootWindow::OnNotify(NMHDR *pnm)
    {
     switch (pnm->code) {
     case LVN_GETDISPINFO:
      OnGetDispInfo(CONTAINING_RECORD(pnm, NMLVDISPINFO, hdr));
      break;
     case NM_CUSTOMDRAW:
      if (pnm->hwndFrom == m_hwndLV) {
       return OnLVCustomDraw(CONTAINING_RECORD(
                             CONTAINING_RECORD(pnm, NMCUSTOMDRAW, hdr),
                                                    NMLVCUSTOMDRAW, nmcd));
      }
      break;
     }
     return 0;
    }
    

    If we receive a NM_CUSTOMDRAW notification from the listview control, we call our new handler. The multiple calls to the CONTAINING_RECORD macro are necessary because the NMHDR structure is nestled two levels deep inside the NMLVCUSTOMDRAW structure.

    LRESULT RootWindow::OnLVCustomDraw(NMLVCUSTOMDRAW* pcd)
    {
     switch (pcd->nmcd.dwDrawStage) {
     case CDDS_PREPAINT: return CDRF_NOTIFYITEMDRAW;
     case CDDS_ITEMPREPAINT:
      m_clrTextNormal = pcd->clrText;
      return CDRF_NOTIFYSUBITEMDRAW;
     case CDDS_ITEMPREPAINT | CDDS_SUBITEM:
      pcd->clrText = m_clrTextNormal;
      if (pcd->iSubItem == COL_SIMP &&
        pcd->nmcd.dwItemSpec < (DWORD)Length()) {
        const DictionaryEntry& de = Item(pcd->nmcd.dwItemSpec);
        if (de.m_pszSimp) {
          pcd->clrText = RGB(0x80, 0x00, 0x00);
        }
      }
      break;
     }
     return CDRF_DODEFAULT;
    }
    

    During the CDDS_PREPAINT stage, we indicate our desire to receive CDDS_ITEMPREPAINT notifications. During the CDDS_ITEMPREPAINT stage, we save the normal text color and indicate that we want to receive sub-item notifications. It is in the sub-item notification CDDS_ITEMPREPAINT | CDDS_SUBITEM that the real work happens.

    First, we reset the color to the default on the assumption that we will not need to colorize this column. But if the column is the simplified Chinese column, if the item number is valid, and if the simplified Chinese is different from the traditional Chinese, then we set the text color to maroon.

    That's enough with the Chinese/English dictionary for now. All this time, and we don't even have search capability yet! We'll work on that next month.

  • The Old New Thing

    Converting from traditional to simplified Chinese, part 2: Using the dictionary

    • 8 Comments

    Now that we have our traditional-to-simplified pseudo-dictionary, we can use it to generate simplified Chinese words in our Chinese/English dictionary.

    class StringPool
    {
    public:
     StringPool();
     ~StringPool();
     LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);
     LPWSTR DupString(const WCHAR* pszBegin)
     {
      return AllocString(pszBegin, pszBegin + lstrlen(pszBegin));
     }
     ...
    };
    

    The DupString method is a convenience we will use below.

    Dictionary::Dictionary()
    {
     ...
        if (de.Parse(buf, buf + cchResult, m_pool)) {
         bool fSimp = false;
         for (int i = 0; de.m_pszTrad[i]; i++) {
          if (pmap->Map(de.m_pszTrad[i])) {
           fSimp = true;
           break;
          }
         }
         if (fSimp) {
          de.m_pszSimp = m_pool.DupString(de.m_pszTrad);
          for (int i = 0; de.m_pszTrad[i]; i++) {
           if (pmap->Map(de.m_pszTrad[i])) {
            de.m_pszSimp[i] = pmap->Map(de.m_pszTrad[i]);
           }
          }
         } else {
          de.m_pszSimp = NULL;
         }
         v.push_back(de);
        }
     ...
    }
    

    After we parse each entry from the dictionary, we scan the traditional Chinese characters to see if any of them have been simplified. If so, then we copy the traditional Chinese string and use the Trad2Simp object to convert it to simplified Chinese.

    If the string is the same in both simplified and traditional Chinese, then we set m_pszSimp to NULL. This may seem a bit odd, but it'll come in handy later. Yes, it makes the m_pszSimp member difficult to use. I could have created an accessor function for it (so that it falls back to traditional Chinese if the simplified Chinese is NULL), but I'm feeling lazy right now, and this is just a one-shot program.

    void RootWindow::OnGetDispInfo(NMLVDISPINFO* pnmv)
    {
     ...
      switch (pnmv->item.iSubItem) {
       case COL_TRAD:    pszResult = de.m_pszTrad;    break;
       case COL_SIMP:    pszResult =
          de.m_pszSimp ? de.m_pszSimp : de.m_pszTrad; break;
       case COL_PINYIN:  pszResult = de.m_pszPinyin;  break;
       case COL_ENGLISH: pszResult = de.m_pszEnglish; break;
      }
     ...
    }
    

    Finally, we tell our OnGetDispInfo handler what to return when the listview asks for the text that goes into the simplified Chinese column. With these changes, we can display both the traditional and simplified Chinese for each entry in our dictionary.

    Next time, a minor tweak to our display code, which happens to illustrate custom-draw as a nice side-effect.

  • The Old New Thing

    Converting from traditional to simplified Chinese, part 1: Loading the dictionary

    • 10 Comments

    One step we had glossed over in our haste to get something interesting on the screen in our Chinese/English dictionary program was the conversion from traditional to simplified Chinese characters.

    The format of the hcutf8.txt file is a series of lines, each of which is a UTF-8 encoded string consisting of a simplified Chinese character followed by its traditional equivalents. Often, multiple traditional characters map to a single simplified character. Much more rarely—only twice in our data set—multiple simplified characters map to a single traditional character. Unfortunately, one of the cases is the common syllable 麼, which has two simplifications, either 么 or 麽, the first of which is far more productive. We'll have to keep an eye out for that one.

    (Note also that in real life, the mapping is more complicated than a character-for-character substitution, but I'm willing to forego that level of complexity because this is just for my personal use and people will have realized I'm not a native speaker long before I get caught up in language subtleties like that.)

    One could try to work out a fancy data structure to represent this mapping table compactly, but it turns out that simple is better here: an array of 65536 WCHARs, each producing the corresponding simplification. Most of the array will lie unused, since the characters we are interested in lie in the range U+4E00 to U+9FFF. Consequently, the active part of the table is only about 40Kb, which easily fits inside the L2 cache.

    It is important to know when a simple data structure is better than a complex one.

    The hcutf8.txt file contains a lot of fluff that we aren't interested in. Let's strip that out ahead of time so that we don't waste our time parsing it at run-time.

    #!perl
    $_ = <> until /^# Start zi/; # ignore uninteresting characters
    while (<>) {
     s/\r//g;
     next if length($_) == 7 &&
             substr($_, 0, 3) eq substr($_, 3, 3); # ignore NOPs
     print;
    }
    

    Run the hcutf8.txt file through this filter to clean it up a bit.

    Now we can write our "traditional to simplified" dictionary.

    class Trad2Simp
    {
    public:
     Trad2Simp();
     WCHAR Map(WCHAR chTrad) const { return _rgwch[chTrad]; }
    
    private:
     WCHAR _rgwch[65536]; // woohoo!
    };
    
    Trad2Simp::Trad2Simp()
    {
     ZeroMemory(_rgwch, sizeof(_rgwch));
    
     MappedTextFile mtf(TEXT("hcutf8.txt"));
     const CHAR* pchBuf = mtf.Buffer();
     const CHAR* pchEnd = pchBuf + mtf.Length();
     while (pchBuf < pchEnd) {
      const CHAR* pchCR = std::find(pchBuf, pchEnd, '\r');
      int cchBuf = (int)(pchCR - pchBuf);
      WCHAR szMap[80];
      DWORD cch = MultiByteToWideChar(CP_UTF8, 0, pchBuf, cchBuf,
                                      szMap, 80);
      if (cch > 1) {
       WCHAR chSimp = szMap[0];
       for (DWORD i = 1; i < cch; i++) {
        if (szMap[i] != chSimp) {
         _rgwch[szMap[i]] = chSimp;
        }
       }
       pchBuf = std::find(pchCR, pchEnd, '\n') + 1;
      }
     }
     _rgwch[0x9EBC] = 0x4E48;
    }
    

    We read the file one line at a time, convert it from UTF-8, and for each nontrivial mapping, record it in our dictionary. At the end, we do our little 么 special-case patch-up.

    Next time, we'll use this mapping table to generate simplified Chinese characters into our dictionary.

  • The Old New Thing

    The best book on ActiveX programming ever written

    • 14 Comments

    I was introduced to the glory that is the world of Mr. Bunny many years ago. Mr. Bunny's Guide to ActiveX is probably the best book on ActiveX programming ever written.

    If you haven't figured it out by now, it's a humor book, but it's the sort of madcap insane geek humor that has enough truth in it to make you laugh more.

    My favorite is the first exercise from the first chapter: Connect the dots. (Warning: It's harder than it looks!)

  • The Old New Thing

    How can I recover the dialog resource ID from a dialog window handle?

    • 4 Comments

    Occasionally, I see someone ask a question like the following.

    I have the handle to a dialog window. How can I get the original dialog resource ID that the dialog was created from?

    As we saw in our in-depth discussion of how dialogs are created from dialog templates, the dialog template itself is not saved anywhere. The purpose of a template is to act as the... well... "template" for creating a dialog box. Once the dialog box has been created, there is no need for the template any more. Consequently, there is no reason why the system should remember it.

    Besides, if the dialog were created from a runtime-generated template, saving the original parameters would leave pointers to freed memory. Furthermore, the code that created the dialog box almost certainly modified the dialog box during its WM_INITDIALOG message processing (filling list boxes with data, maybe enabling or disabling some buttons), so the dialog box you see on screen doesn't correspond to a template anywhere.

    It's like asking, "Given a plate of food, how do I recover the original cookbook and page number for the recipe?" By doing a chemical analysis of the food, you might be able to recover "a" recipe, but there is nothing in the food itself that says, "I came from The Joy of Cooking, page 253."

  • The Old New Thing

    What struck me about life in the Republic

    • 41 Comments

    When people asked me for my reaction to the most recent Star Wars movie, I replied that what struck me most was that the Republic doesn't appear to have any building codes. There are these platforms several hundred meters above the ground with no railings. For example, Padmé Amidala's fancy apartment has a front porch far above the ground. Consider: You're carrying a load of packages to the car, the kids are running around, you turn around to yell at one of them, miss a step, and over the rim you go. How many people fall to their deaths in that galaxy?

  • The Old New Thing

    What are SYSTEM_FONT and DEFAULT_GUI_FONT?

    • 22 Comments

    Among the things you can get with the GetStockObject function are two fonts called SYSTEM_FONT and DEFAULT_GUI_FONT. What are they?

    They are fonts nobody uses any more.

    Back in the old days of Windows 2.0, the font used for dialog boxes was a bitmap font called System. This is the font that SYSTEM_FONT retrieves, and it is still the default dialog box font for compatibility reasons. Of course, nobody nowadays would ever use such an ugly font for their dialog boxes. (Among other things, it's a bitmap font and therefore does not look good at high resolutions, nor can it be anti-aliased.)

    DEFAULT_GUI_FONT has an even less illustrious history. It was created during Windows 95 development in the hopes of becoming the new default GUI font, but by July 1994, Windows itself stopped using it in favor of the various fonts returned by the SystemParametersInfo function. Its existence is now vestigial.

    One major gotcha with SYSTEM_FONT and DEFAULT_GUI_FONT is that on a typical US-English machine, they map to bitmap fonts that do not support ClearType.

Page 368 of 449 (4,482 items) «366367368369370»