July, 2005

  • The Old New Thing

    The Northwest Mahler Festival performs Mahler's Second Symphony ("Resurrection")

    • 8 Comments

    Last night I attended the Northwest Mahler Festival's performance of Mahler's Second Symphony (The Resurrection). The concert opened with Copland's El Salón México and Barber's Prayers of Kierkegaard. [Typo fixed 12:30pm]

    The Copland was kind of shaky, in a way that I couldn't quite put a finger on. The wind balance seemed a bit off, and it somehow didn't seem to "come together". By contrast, my knowledge of the Barber was zero, so they could've pulled out kazoos and I wouldn't've know that something was amiss.

    The Mahler demands quite a bit from both the woodwind and brass sections, but I was relieved to find that the tricky problem of getting them to play friendly appeared to be nonexistent. The Mahler "came together". (Well, duh, this is the Northwest Mahler Festival, after all.) I was so at ease with it that I started to ignore the occasional technical error...

    Performances of Mahler symphonies have a significant visual component. It's always a blast to see the clarinets playing "Schalltrichter auf", and for the Second, I was oddly fascinated by the rute. (I think my favorite Mahler percussion instrument is the "large hammer striking a wooden block" from the Sixth Symphony. When you see the percussionist raise that huge mallet, you know it's coming... and when the blow finally comes, it sends shock waves through your body.)

    Anyway, there's no real point to this entry. Just babbling about a symphony concert that I found very satisfying.

  • The Old New Thing

    Why does FindFirstFile find short names?

    • 21 Comments

    The FindFirstFile function matches both the short and long names. This can produce somewhat surprising results. For example, if you ask for "*.htm", this also gives you the file "x.html" since its short name is "X~1.HTM".

    Why does it bother matching short names? Shouldn't it match only long names? After all, only old 16-bit programs use short names.

    But that's the problem: 16-bit programs use short names.

    Through a process known as generic thunks, a 16-bit program can load a 32-bit DLL and call into it. Windows 95 and the Windows 16-bit emulation layer in Windows NT rely heavily on generic thunks so that they don't have to write two versions of everything. Instead, the 16-bit version just thunks up to the 32-bit version.

    Note, however, that this would mean that 32-bit DLLs would see two different views of the file system, depending on whether they are hosted from a 16-bit process or a 32-bit process.

    "Then make the FindFirstFile function check to see who its caller is and change its behavior accordingly," doesn't fly because you can't trust the return address.

    Even if this problem were solved, you would still have the problem of 16/32 interop across the process boundary.

    For example, suppose a 16-bit program calls WinExec("notepad X~1.HTM"). The 32-bit Notepad program had better open the file X~1.HTM even though it's a short name. What's more, a common way to get properties of a file such as its last access time is to call FindFirstFile with the file name, since the WIN32_FIND_DATA structure returns that information as part of the find data. (Note: GetFileAttributesEx is a better choice, but that function is comparatively new.) If the FindFirstFile function did not work for short file names, then the above trick would fail for short names passed across the 16/32 boundary.

    As another example, suppose the DLL saves the file name in a location external to the process, say a configuration file, the registry, or a shared memory block. If a 16-bit program program calls into this DLL, it would pass short names, whereas if a 32-bit program calls into the DLL, it would pass long names. If the file system functions returned only long names for 32-bit programs, then the copy of the DLL running in a 32-bit program would not be able to read the data written by the DLL running in a 16-bit program.

  • The Old New Thing

    What is the deal with the ES_OEMCONVERT flag?

    • 14 Comments

    The ES_OEMCONVERT edit control style is a holdover from 16-bit Windows. This ancient MSDN article from the Windows 3.1 SDK describes the flag thus:

    ES_OEMCONVERT causes text entered into the edit control to be converted from ANSI to OEM and then back to ANSI. This ensures proper character conversion when the application calls the AnsiToOem function to convert a Windows string in the edit control to OEM characters. ES_OEMCONVERT is most useful for edit controls that contain filenames.

    Set the wayback machine to, well, January 31, 1992, the date of the article.

    At this time, the predominant Windows platform was Windows 3.0. Windows 3.1 was still a few months away from release, and Windows NT 3.1 was over a year away. The predominant file system was 16-bit FAT, and the relevant feature of FAT of this era for the purpose of this discussion is that file names were stored on disk in the OEM character set. (We discussed the history behind the schism between the OEM and ANSI code pages in an earlier article.)

    Since GUI programs used the ANSI character set, but file names were stored in the OEM character set, the only characters that could be used in file names from GUI programs were those that exist in both character sets. If a character existed in the ANSI character set but not the OEM character set, then there would be no way of using it as a file name; and if a character existed in the OEM character set but not the ANSI character set, the GUI program couldn't manipulate it.

    The ES_OEMCONVERT flag on a edit control ensures that only characters that exist in both the ANSI and OEM character sets are used, hence the remark "ES_OEMCONVERT is most useful for edit controls that contain filenames".

    Fast-forward to today.

    All the popular Windows file systems support Unicode file names and have for ten years. There is no longer a data loss converting from the ANSI character set to the character set used by the file system. Therefore, there is no need to filter out any characters to forestall the user typing a character that will be lost during the conversion to a file name. In other words, the ES_OEMCONVERT flag is pointless today. It's a leftover from the days before Unicode.

    Indeed, if you use this flag, you make your program worse, not better, because it unnecessarily restricts the set of characters that the user will be allowed to use in file names. A user running the US-English version of Windows would not be allowed to enter Chinese characters as a file name, for example, even though the file system is perfectly capable of creating files whose names contain those characters.

  • The Old New Thing

    Watching the game of "Telephone" play out on the Internet

    • 21 Comments

    Let's see if I can get this straight.

    First, Chris Pirillo says (timecode 37:59) he's not entirely pleased with the word "podcast" in Episode 11 of This Week in Tech. The Seattle-PI then reports that the sentiment is shared with "several Microsoft employees" who have coined the word "blogcast" to replace it. Next, c|net picks up the story and says that the word "podcast" is a "faux-pas" on Microsoft campus. [Typo fixed: 9am]

    In this manner, a remark by someone who isn't even a Microsoft employee becomes, through rumor, speculation, and wild extrapolation, a word-ban at Microsoft.

    Pretty neat trick.

  • The Old New Thing

    If InitCommonControls doesn't do anything, why do you have to call it?

    • 13 Comments

    One of the problems beginners run into when they start using shell common controls is that they forget to call the InitCommonControls function. But if you were to disassemble the InitCommonControls function itself, you'll see that it, like the FlushInstructionCache function, doesn't actually do anything.

    Then why do you need to call it?

    As with FlushInstructionCache, what's important is not what it performs, but just the fact that you called it.

    Recall that merely listing a lib file in your dependencies doesn't actually cause your program to be bound to the corresponding DLL. You have to call a function in that DLL in order for there to be an import entry for that DLL. And InitCommonControls is that function.

    Without the InitCommonControls function, a program that wants to use the shell common controls library would otherwise have no reference to COMCTL32.DLL in its import table. This means that when the program loads, COMCTL32.DLL is not loaded and therefore is not initialized. Which means that it doesn't register its window classes. Which means that your call to the CreateWindow function fails because the window class has not been registered.

    That's why you have to call a function that does nothing. It's for your own good.

    (Of course, there's the new InitCommonControlsEx function that lets you specify which classes you would like to be registered. Only the classic Windows 95 classes are registered when COMCTL32.DLL loads. For everything else you have to ask for it explicitly.)

  • The Old New Thing

    The apocryphal history of file system tunnelling

    • 34 Comments

    One of the file system features you may find yourself surprised by is tunneling, wherein the creation timestamp and short/long names of a file are taken from a file that existed in the directory previously. In other words, if you delete some file "File with long name.txt" and then create a new file with the same name, that new file will have the same short name and the same creation time as the original file. You can read this KB article for details on what operations are sensitive to tunnelling.

    Why does tunneling exist at all?

    When you use a program to edit an existing file, then save it, you expect the original creation timestamp to be preserved, since you're editing a file, not creating a new one. But internally, many programs save a file by performing a combination of save, delete, and rename operations (such as the ones listed in the linked article), and without tunneling, the creation time of the file would seem to change even though from the end user's point of view, no file got created.

    As another example of the importance of tunneling, consider that file "File with long name.txt", whose short name is say "FILEWI~1.TXT". You load this file into a program that is not long-filename-aware and save it. It deletes the old "FILEWI~1.TXT" and creates a new one with the same name. Without tunnelling, the associated long name of the file would be lost. Instead of a friendly long name, the file name got corrupted into this thing with squiggly marks. Not good.

    But where did the name "tunneling" come from?

    From quantum mechanics.

    Consider the following analogy: You have two holes in the ground, and a particle is in the first hole (A) and doesn't have enough energy to get out. It only has enough energy to get as high as the dotted line.

             
      A   B  

    You get distracted for a little while, maybe watch the Super Bowl halftime show, and when you come back, the particle somehow is now in hole B. This is impossible in classical mechanics, but thanks to the wacky world of quantum mechanics, it is not only possible, but actually happens. The phenomenon is known as tunneling, because it's as if the particle "dug a tunnel" between the two holes, thereby allowing it to get from one hole to another without ever going above the dotted line.

    In the case of file system tunneling, it is information that appears to violate the laws of classical mechanics. The information was destroyed (by deleting or renaming the file), yet somehow managed to reconstruct itself on the other side of a temporal barrier.

    The developer who was responsible for implementing tunneling on Windows 95 got kind of carried away with the quantum mechanics analogy: The fragments of information about recently-deleted or recently-renamed files are kept in data structures called "quarks".

  • The Old New Thing

    When Marketing edits your PDC talk description

    • 23 Comments

    A few years ago, I told a story of how Marketing messed up a bunch of PDC slides by "helpfully" expanding acronyms... into the wrong phrases. Today I got to see Marketing's handiwork again, as they edited my talk description. (Oh, and psst, Marketing folks, you might want to link to the full list of PDC sessions from your Conference Tracks and Sessions page. Unless, of course, y'know, you don't want people to know about it.)

    For one thing, they stuck my name into the description of the talk, thereby drawing attention to me rather than putting the focus on the actual talk topic. Because I'm not there to be me. I'm there to give a talk. If I were just there to be me, the title would be "Raymond Chen reads the newspaper for an hour while listening to music on his headphones."

    (That's why I don't do interviews. Interviews are about the interviewee, and I don't want to talk about me. People should care about the technology, not the people behind it.)

    They also trimmed my topic list but stopped before the punch line.

    ... asynchronous input queues, the hazards of attaching thread input, and other tricks and traps ...

    The punch line was "... and how it happens without your knowledge." After all, you don't care about the fine details of a feature you don't use. The point is that it's happening behind your back so you'd better know about it because you're using it whether you realize it or not.

    They also took out the reference to finger puppets.

  • The Old New Thing

    Where did the names of the computer Hearts opponents come from?

    • 13 Comments

    A Windows 95 story in commemoration of the tenth anniversary of its release to manufacturing (RTM).

    Danny Glasser explains where the names for the computer opponents in the game Hearts came from.

    I didn't myself know where the names came from, but Danny's explanation of the source of the Windows 95 names brought back memories of the child of one of our co-workers, whose name I will not reveal but you can certainly narrow it down to one of three. He/she was exceedingly well-behaved and definitely helped to make those long hours slightly more tolerable. I remember once we heard the receptionist's voice come over the public address system, which was itself quite a shock because nobody ever uses the public address system. The message was, "Will X please come to the receptionist's desk. Your son/daughter is here."

    Space Cadet JimH picks up the story and explains how he went about writing the computer player logic. (And no, the computer players don't cheat.)

  • The Old New Thing

    Converting from traditional to simplified Chinese, part 3: Highlighting differences

    • 5 Comments

    One of the things that is interesting to me as a student of the Chinese languages is to recognize where the traditional and simplified Chinese scripts differ. Since this is my program, I'm going to hard-code the color for simplified Chinese script: maroon.

    To accomplish the highlighting, we take advantage of listview's custom-draw feature. Custom-draw allows you to make minor changes to the way items are displayed on the screen. It's a middle ground between having listview do all the work (via default drawing behavior) and having the program do all the work (via owner-draw).

    The custom-draw cycle for shell common controls consists of series of NM_CUSTOMDRAW notifications, starting with the most general and getting more specific. The reason for the break-down is multi-fold. First, it allows the listview control to short-circuit a portion of custom-draw behavior if the parent window does not indicate that it wishes to customize a particular behavior. This reduces message traffic and improves performance when large numbers of items are being drawn. Second, it allows the parent window to target its customizations to the drawing stages it is interested in.

    Listviews are peculiar among the shell common controls in that its items sometimes (but not always) have sub-items. This complicates the drawing process since it requires listview to accomodate both styles: large icon view does not use sub-items, but report view does. To address this, the CDDS_ITEMPREPAINT stage is entered when an item is about to paint, and any changes made by the parent window are considered to be effective for the entire item. If you want to make changes on a per-subitem basis, you have to respond to CDDS_ITEMPREPAINT | CDDS_SUBITEM and set your properties (or reset them if you want to return to the default) for that sub-item.

    With those preliminary remarks settled, we can dive in.

    class RootWindow : public Window
    {
     ...
    protected:
     ...
     LRESULT OnLVCustomDraw(NMLVCUSTOMDRAW* pcd);
     ...
    private:
     HWND m_hwndLV;
     COLORREF m_clrTextNormal;
     Dictionary m_dict;
    };
    

    We declare our listview custom-draw handler as well as the member variable in which we remember the normal text color so that we can reset it for columns we do not intend to colorize.

    LRESULT RootWindow::OnNotify(NMHDR *pnm)
    {
     switch (pnm->code) {
     case LVN_GETDISPINFO:
      OnGetDispInfo(CONTAINING_RECORD(pnm, NMLVDISPINFO, hdr));
      break;
     case NM_CUSTOMDRAW:
      if (pnm->hwndFrom == m_hwndLV) {
       return OnLVCustomDraw(CONTAINING_RECORD(
                             CONTAINING_RECORD(pnm, NMCUSTOMDRAW, hdr),
                                                    NMLVCUSTOMDRAW, nmcd));
      }
      break;
     }
     return 0;
    }
    

    If we receive a NM_CUSTOMDRAW notification from the listview control, we call our new handler. The multiple calls to the CONTAINING_RECORD macro are necessary because the NMHDR structure is nestled two levels deep inside the NMLVCUSTOMDRAW structure.

    LRESULT RootWindow::OnLVCustomDraw(NMLVCUSTOMDRAW* pcd)
    {
     switch (pcd->nmcd.dwDrawStage) {
     case CDDS_PREPAINT: return CDRF_NOTIFYITEMDRAW;
     case CDDS_ITEMPREPAINT:
      m_clrTextNormal = pcd->clrText;
      return CDRF_NOTIFYSUBITEMDRAW;
     case CDDS_ITEMPREPAINT | CDDS_SUBITEM:
      pcd->clrText = m_clrTextNormal;
      if (pcd->iSubItem == COL_SIMP &&
        pcd->nmcd.dwItemSpec < (DWORD)Length()) {
        const DictionaryEntry& de = Item(pcd->nmcd.dwItemSpec);
        if (de.m_pszSimp) {
          pcd->clrText = RGB(0x80, 0x00, 0x00);
        }
      }
      break;
     }
     return CDRF_DODEFAULT;
    }
    

    During the CDDS_PREPAINT stage, we indicate our desire to receive CDDS_ITEMPREPAINT notifications. During the CDDS_ITEMPREPAINT stage, we save the normal text color and indicate that we want to receive sub-item notifications. It is in the sub-item notification CDDS_ITEMPREPAINT | CDDS_SUBITEM that the real work happens.

    First, we reset the color to the default on the assumption that we will not need to colorize this column. But if the column is the simplified Chinese column, if the item number is valid, and if the simplified Chinese is different from the traditional Chinese, then we set the text color to maroon.

    That's enough with the Chinese/English dictionary for now. All this time, and we don't even have search capability yet! We'll work on that next month.

  • The Old New Thing

    Converting from traditional to simplified Chinese, part 2: Using the dictionary

    • 8 Comments

    Now that we have our traditional-to-simplified pseudo-dictionary, we can use it to generate simplified Chinese words in our Chinese/English dictionary.

    class StringPool
    {
    public:
     StringPool();
     ~StringPool();
     LPWSTR AllocString(const WCHAR* pszBegin, const WCHAR* pszEnd);
     LPWSTR DupString(const WCHAR* pszBegin)
     {
      return AllocString(pszBegin, pszBegin + lstrlen(pszBegin));
     }
     ...
    };
    

    The DupString method is a convenience we will use below.

    Dictionary::Dictionary()
    {
     ...
        if (de.Parse(buf, buf + cchResult, m_pool)) {
         bool fSimp = false;
         for (int i = 0; de.m_pszTrad[i]; i++) {
          if (pmap->Map(de.m_pszTrad[i])) {
           fSimp = true;
           break;
          }
         }
         if (fSimp) {
          de.m_pszSimp = m_pool.DupString(de.m_pszTrad);
          for (int i = 0; de.m_pszTrad[i]; i++) {
           if (pmap->Map(de.m_pszTrad[i])) {
            de.m_pszSimp[i] = pmap->Map(de.m_pszTrad[i]);
           }
          }
         } else {
          de.m_pszSimp = NULL;
         }
         v.push_back(de);
        }
     ...
    }
    

    After we parse each entry from the dictionary, we scan the traditional Chinese characters to see if any of them have been simplified. If so, then we copy the traditional Chinese string and use the Trad2Simp object to convert it to simplified Chinese.

    If the string is the same in both simplified and traditional Chinese, then we set m_pszSimp to NULL. This may seem a bit odd, but it'll come in handy later. Yes, it makes the m_pszSimp member difficult to use. I could have created an accessor function for it (so that it falls back to traditional Chinese if the simplified Chinese is NULL), but I'm feeling lazy right now, and this is just a one-shot program.

    void RootWindow::OnGetDispInfo(NMLVDISPINFO* pnmv)
    {
     ...
      switch (pnmv->item.iSubItem) {
       case COL_TRAD:    pszResult = de.m_pszTrad;    break;
       case COL_SIMP:    pszResult =
          de.m_pszSimp ? de.m_pszSimp : de.m_pszTrad; break;
       case COL_PINYIN:  pszResult = de.m_pszPinyin;  break;
       case COL_ENGLISH: pszResult = de.m_pszEnglish; break;
      }
     ...
    }
    

    Finally, we tell our OnGetDispInfo handler what to return when the listview asks for the text that goes into the simplified Chinese column. With these changes, we can display both the traditional and simplified Chinese for each entry in our dictionary.

    Next time, a minor tweak to our display code, which happens to illustrate custom-draw as a nice side-effect.

Page 2 of 4 (37 items) 1234