December, 2004

  • The Old New Thing

    How to open those plastic packages of electronics without injuring yourself

    • 57 Comments

    Small electronics nowadays come in those impossible-to-open plastic packages. A few weeks ago I tried to open one and managed not to slice my hand with the knife I was using. (Those who know me know that knives and I don't get along well.) Unfortunately, I failed to pay close attention to the sharp edges of the cut plastic and ended up cutting three of my fingers.

    The next day, I called the manufacturer's product support number and politely asked, "How do I open the package?"

    The support person recommended using a pair of very heavy scissors. (I tried scissors, but mine weren't heavy enough and couldn't cut through the thick plastic.) Cut across the top, then down the sides, being careful to avoid the sharp shards you're creating. (You might want to wear gloves.)

    If you bought someone a small electronics thingie, consider keeping a pair of heavy scissors on hand. That's my tip for the season.

  • The Old New Thing

    Sometimes people don't like it when you enforce a standard

    • 50 Comments

    Your average computer user wouldn't recognize a standards document if they were hit in the face with it.

    I'm reminded of a beta bug report back in 1996 regarding how Outlook Express (then called "Microsoft Internet Mail and News") handled percent signs in email addresses (I think). The way Outlook Express did it was standards-conformant, and I sent the relevant portion of the RFC to the person who reported the bug. Here's what I got back:

    I have never read the RFC's (most people, I'm sure, haven't) but I know when something WORKS in one mail reader (Netscape) and DOESN'T WORK in another (MSIMN).

    The problem, restated to comply with your RFC:

    MS Internet Mail and News DO NOT HANDLE PERCENT SIGNS like the RFC says.

    That first sentence pretty much captures the reaction most of the world has to standards documents: They are meaningless. If Outlook Express doesn't behave the same way as Netscape, then it's a bug in Outlook Express, regardless of what the standards documents say.

    There are many "strangenesses" in the way Internet Explorer handles certain aspects of HTML when you don't run it in strict mode. For example, did you notice that the font you set via CSS for your BODY tag doesn't apply to tables? Or that invoking the submit method on a form does not fire the onsubmit event? That's because Netscape didn't do it either, and Internet Explorer had to be bug-for-bug compatible with Netscape because web sites relied on this behavior.

    The last paragraph in the response is particularly amusing. The person is using the word "RFC" as a magic word, not knowing what it means. Apparently if you want to say that something doesn't work as you expect, you say that it doesn't conform to the RFC. Whether your expectation agrees with the RFC is irrelevant. (By his own admission, the person who filed the bug didn't even read the RFC.)

  • The Old New Thing

    You can create an infinitely recursive directory tree

    • 40 Comments

    It is possible to create an infinitely recursive directory tree. This throws many recursive directory-traversal functions into disarray. Here's how you do it. (Note: Requires NTFS.)

    Create a directory in the root of your C: drive, call it C:\C, for lack of a more creative name. Right-click My Computer and select Manage. click on the Disk Management snap-in.

    From the Disk Management snap-in, right-click the C drive and select "Change Drive Letter and Paths...".

    From the "Change Drive Letter and Paths for C:" dialog, click "Add", then where it says "Mount in the following empty NTFS folder", enter "C:\C". Click OK.

    Congratulations, you just created an infinitely recursive directory.

    C:\> dir
    
     Volume in drive has no label
     Volume Serial Number is A035-E01D
    
     Directory of C:\
    
    08/19/2001  08:43 PM                 0 AUTOEXEC.BAT
    12/23/2004  09:43 PM    <JUNCTION>     C
    05/05/2001  04:09 PM                 0 CONFIG.SYS
    12/16/2001  04:34 PM    <DIR>          Documents and Settings
    08/10/2004  12:00 AM    <DIR>          Program Files
    08/28/2004  01:08 PM    <DIR>          WINDOWS
                   2 File(s)              0 bytes
                   4 Dir(s)   2,602,899,968 bytes free
    
    C:\> dir C:\C
    
     Volume in drive has no label
     Volume Serial Number is A035-E01D
    
     Directory of C:\C
    
    08/19/2001  08:43 PM                 0 AUTOEXEC.BAT
    12/23/2004  09:43 PM    <JUNCTION>     C
    05/05/2001  04:09 PM                 0 CONFIG.SYS
    12/16/2001  04:34 PM    <DIR>          Documents and Settings
    08/10/2004  12:00 AM    <DIR>          Program Files
    08/28/2004  01:08 PM    <DIR>          WINDOWS
                   2 File(s)              0 bytes
                   4 Dir(s)   2,602,899,968 bytes free
    
    
    C:\> dir C:\C\C\C\C\C\C
    
     Volume in drive has no label
     Volume Serial Number is A035-E01D
    
     Directory of C:\C\C\C\C\C\C
    
    08/19/2001  08:43 PM                 0 AUTOEXEC.BAT
    12/23/2004  09:43 PM    <JUNCTION>     C
    05/05/2001  04:09 PM                 0 CONFIG.SYS
    12/16/2001  04:34 PM    <DIR>          Documents and Settings
    08/10/2004  12:00 AM    <DIR>          Program Files
    08/28/2004  01:08 PM    <DIR>          WINDOWS
                   2 File(s)              0 bytes
                   4 Dir(s)   2,602,899,968 bytes free
    

    Go ahead and add as many "\C"s as you like. You'll just get your own C drive back again.

    Okay, now that you've had your fun, go back to the "Change Drive Letter and Paths for C:" dialog and Remove the "C:\C" entry. Do this before you create some real havoc.

    Now imagine what happens if you had tried a recursive treecopy from that mysterious C:\C directory. Or if you ran a program that did some sort of recursive operation starting from C:\C, like, say, trying to add up the sizes of all the files in it.

    If you're writing such a program, you need to be aware of reparse points (that thing that shows up as <JUNCTION> in the directory listing). You can identify them because their file attributes include the FILE_ATTRIBUTE_REPARSE_POINT flag. Of course, what you do when you find one of these is up to you. I'm just warning you that these strange things exist and if you aren't careful, your program can go into an infinite loop.

  • The Old New Thing

    Tintin goes to the neurologist

    • 38 Comments

    The Canadian Medical Association Journal traditionally runs an offbeat research paper in their Christmas edition, for which there is apparently huge competition. This year, Tintin goes to the neurologist. The feedback is fun to read too. (External news coverage here and here.)

    My first exposure to Tintin was—of course—in Sweden. (Why "of course"? Because it seems that everything I do ties back to Sweden somehow...)

    While browsing through a music store's clearance bin, I found an audio dramatization of Den svarta ön. I recognized "Tintin" as the name of a popular children's character, though I myself had never read any of the stories.

    I started listening to the CD and found the story amazingly dull. However, I chalked this up to my bad Swedish listening comprehension, figuring that if only I understood more of it, the story would be more enjoyable.

    Some months later, I tested this theory: I went to the library, found a copy of The Black Island in English translation, and read it.

    It was an amazingly dull story.

    During my most recent trip to Taiwan, the person I was telling this story to couldn't figure out what children's character I was talking about. We happened to be in a bookstore and I stumbled across a copy of the same story in Chinese translation. (The Chinese translation of Tintin's name is - dīng-dīng, in case anybody else finds themselves in the same jam.) Of course, having found the book, I had to buy it; it's sort of become a collection now. Someday I'll try to read it, but not quite yet. My Chinese is barely at phrase-book level right now.

    It has been pointed out that even though Tintin is ostensibly a journalist, over his 45-year career he filed but one story. You'd think his editor would be kind of upset by now.

    I also have copies of Harry Potter and the Philosopher's Stone in all (but one) of the various languages I know or am trying to learn. And I'm counting the American and British English versions as different. Because they are.

  • The Old New Thing

    How to get more hits on Google than even Steve Ballmer

    • 36 Comments

    Ein deutscher Blogger namens Tony schrieb dass Robert Scoble is gaining on Steve Ballmer. Mit anderen Worten, dass eine Google-Suche nach "Robert Scoble" ungefähr 172.000 Seiten findet, während eine Google-Suche nach "Steve Ballmer" ungefähr 302.000 Seiten zeigt. Er fragte, ob jemand einen anderen Microsoft-Angestellten finden kann, der mehr Google-Ergebnisse als Robert Scoble bekommt.

    Na klar, das ist ganz leicht, und ich werde euch in das Geheimnis ziehen.

    Suche nach "David Smith". Ach, du meine Güte! 869.000 Ergebnisse! Noch mehr als Steve Ballmer!

    Du musst nur einen Person finden, der einen sehr gewöhnlichen Namen hat.

  • The Old New Thing

    Why did Windows 95 run the timer at 55ms?

    • 32 Comments

    The story behind the 55ms timer tick rate goes all the way back to the original IBM PC BIOS. The original IBM PC used a 1.19MHz crystal, and 65536 cycles at 1.19MHz equals approximately 55ms. (More accurately, it was more like 1.19318MHz and 54.92ms.)

    But that just pushes the question to another level. Why 1.19...MHz, then?

    With that clock rate, 216 ticks equals approximately 3600 seconds, which is one hour. (If you do the math it's more like 3599.59 seconds.) [Update: 4pm, change 232 to 216; what was I thinking?]

    What's so special about one hour?

    The BIOS checked once an hour to see whether the clock has crossed midnight. When it did, it needed to increment the date. Making the hourly check happen precisely when a 16-bit tick count overflowed saved a few valuable bytes in the BIOS.

    Another reason for the 1.19MHz clock speed was that it was exactly one quarter of the original CPU speed, namely 4.77MHz, which was in turn 4/3 times the NTSC color burst frequency of 3.5MHz. Recall that back in these days, personal computers sent their video output to a television set. Monitors were for the rich kids. Using a timer related to the video output signal saved a few dollars on the motherboard.

    Calvin Hsia has another view of the story behind the 4.77MHz clock.

    (Penny-pinching was very common at this time. The Apple ][ had its own share of penny-saving hijinks.)

  • The Old New Thing

    Researchers find connection between lack of sleep and weight gain

    • 31 Comments
    Shortage of sleep is linked to obesity, according to research published yesterday.

    Lack of sleep boosts levels of a hormone that triggers appetite and lowers levels of a hormone that tells your body it is full according to the team. The scientists will now study whether obese people should sleep more to lose weight.
  • The Old New Thing

    Computing the size of a directory is more than just adding file sizes

    • 30 Comments

    One might think that computing the size of a directory would be a simple matter of adding up the sizes of all the files in it.

    Oh if it were only that simple.

    There are many things that make computing the size of a directory difficult, some of which even throw into doubt the even existence of the concept "size of a directory".

    Reparse points
    We mentioned this last time. Do you want to recurse into reparse points when you are computing the size of a directory? It depends why you're computing the directory size.

    If you're computing the size in order to show the user how much disk space they will gain by deleting the directory, then you do or don't, depending on how you're going to delete the reparse point.

    If you're computing the size in preparation for copying, then you probably do. Or maybe you don't - should the copy merely copy the reparse point instead of tunneling through it? What do you if the user doesn't have permission to create reparse points? Or if the destination doesn't support reparse points? Or if the user is creating a copy because they are making a back-up?

    Hard links
    Hard links are multiple directory entries for the same file. If you're calculating the size of a directory and you find a hard link, do you count the file at its full size? Or do you say that each directory entry for a hard link carries a fraction of the "weight" of the file? (So if a file has two hard links, then each entry counts for half the file size.)

    Dividing the "weight" of the file among its hard links avoids double-counting (or higher), so that when all the hard links are found, the file's total size is correctly accounted for. And it represents the concept that all the hard links to a file "share the cost" of the resources the file consumes. But what if you don't find all the hard links? It it correct that the file was undercounted? [Minor typo fixed, 12pm]

    If you're copying a file and you discover that it has multiple hard links, what do you do? Do you break the links in the copy? Do you attempt to reconstruct them? What if the destination doesn't support hard links?

    Compressed files
    By this I'm talking about filesystem compression rather than external compression algorithms like ZIP.

    When adding up the size of the files in a directory, do you add up the logical size or the physical size? If you're computing the size in preparation for copying, then you probably want the logical size, but if you're computing to see how much disk space would be freed up by deleting it, then you probably want physical size.

    But if you're computing for copying and the copy destination supports compression, do you want to use the physical size after all? Now you're assuming that the source and destination compression algorithms are comparable.

    Sparse files
    Sparse files have the same problems as compressed files. Do you want to add up the logical or physical size?

    Cluster rounding
    Even for uncompressed non-sparse files, you may want to take into account the size of the disk blocks. A directory with a lot of small files requires up more space on disk than just the sum of the file sizes. Do you want to reflect this in your computations? If you traversed across a reparse point, the cluster size may have changed as well.

    Alternate data streams
    Alternate data streams are another place where a file can occupy disk space that is not reflected in its putative "size".

    Bookkeeping overhead
    There is always bookkeeping overhead associated with file storage. In addition to the directory entry (or entries), space also needs to be allocated for the security information, as well as the information that keeps track of where the file's contents can be found. For a highly-fragmented file, this information can be rather extensive. Do you want to count that towards the size of the directory? If so, how?

    There is no single answer to all of the above questions. You have to consider each one, apply it to your situation, and decide which way you want to go.

    (And copying a directory tree is even scarier. What do you do with the ACLs? Do you copy them too? Do you preserve the creation date? It all depends on why you're copying the tree.)

  • The Old New Thing

    BOOL vs. VARIANT_BOOL vs. BOOLEAN vs. bool

    • 29 Comments

    Still more ways of saying the same thing. Why so many?

    Because each was invented by different people at different times to solve different problems.

    BOOL is the oldest one. Its definition is simply

    typedef int BOOL;
    

    The C programming language uses "int" as its boolean type, and Windows 1.0 was written back when C was the cool language for systems programming.

    Next came BOOLEAN.

    typedef BYTE  BOOLEAN;
    

    This type was introduced by the OS/2 NT team when they decided to write a new operating system from scratch. It lingers in Win32 in the places where the original NT design peeks through, like the security subsystem and interacting with drivers.

    Off to the side came VARIANT_BOOL.

    typedef short VARIANT_BOOL;
    #define VARIANT_TRUE ((VARIANT_BOOL)-1)
    #define VARIANT_FALSE ((VARIANT_BOOL)0)
    

    This was developed by the Visual Basic folks. Basic uses -1 to represent "true" and 0 to represent "false", and VARIANT_BOOL was designed to preserve this behavior.

    Common bug: When manipulating VARIANTs of type VT_BOOL, and you want to set a boolean value to "true", you must use VARIANT_TRUE. Many people mistakenly use TRUE or true, which are not the same thing as VARIANT_TRUE. You can cause problem with scripting languages if you get them confused. (For symmetry, you should also use VARIANT_FALSE instead of FALSE or false. All three have the same numerical value, however. Consequently, a mistake when manipulating "false" values is not fatal.)

    Newest on the scene is bool, which is a C++ data type that has the value true or false. You won't see this used much (if at all) in Win32 because Win32 tries to remain C-compatible.

    (Note that C-compatible isn't the same as C-friendly. Although you can do COM from C, it isn't fun.)

  • The Old New Thing

    Using fibers to simplify enumerators, part 1: When life is easier for the enumerator

    • 28 Comments

    The COM model for enumeration (enumeration objects) is biased towards making life easy for the consumer and hard for the producer. The enumeration object (producer) needs to be structured as a state machine, which can be quite onerous for complicated enumerators, for example, tree walking or composite enumeration.

    On the other hand, the callback model for producer (used by most Win32 functions) is biased towards making life easy for the enumerator and hard for the consumer. This time, it is the consumer that needs to be structured as a state machine, which is more work if the consumer is doing something complicated with each callback. (And even if not, you have to create a context structure to pass state from the caller, through the enumerator, to the callback.)

    For example, suppose we want to write a routine that walks a directory structure, allowing the caller to specify what to do at each decision point. Let's design this first using the callback paradigm:

    #include <windows.h>
    #include <shlwapi.h>
    #include <stdio.h>
    
    enum FERESULT {
     FER_CONTINUE,      // continue enumerating
                        // (if directory: recurse into it)
     FER_SKIP,          // skip this file/directory
     FER_STOP,          // stop enumerating
    };
    
    enum FEOPERATION {
     FEO_FILE,          // found a file
     FEO_DIR,           // found a directory
     FEO_LEAVEDIR,      // leaving a directory
    };
    
    typedef FERESULT (CALLBACK *FILEENUMCALLBACK)
        (FEOPERATION feo,
         LPCTSTR pszDir, LPCTSTR pszPath,
         const WIN32_FIND_DATA* pwfd,
         void *pvContext);
    
    FERESULT EnumDirectoryTree(LPCTSTR pszDir,
        FILEENUMCALLBACK pfnCB, void* pvContext);
    

    The design here is that the caller calls EnumDirectoryTree and provides a callback function that is informed of each file found and can decide how the enumeration should proceed.

    Designing this as a callback makes life much simpler for the implementation of EnumDirectoryTree.

    FERESULT EnumDirectoryTree(
        LPCTSTR pszDir,
        FILEENUMCALLBACK pfnCB, void *pvContext)
    {
     FERESULT fer = FER_CONTINUE;
     TCHAR szPath[MAX_PATH];
     if (PathCombine(szPath, pszDir, TEXT("*.*"))) {
      WIN32_FIND_DATA wfd;
      HANDLE hfind = FindFirstFile(szPath, &wfd);
      if (hfind != INVALID_HANDLE_VALUE) {
       do {
        if (lstrcmp(wfd.cFileName, TEXT(".")) != 0 &&
            lstrcmp(wfd.cFileName, TEXT("..")) != 0 &&
            PathCombine(szPath, pszDir, wfd.cFileName)) {
         FEOPERATION feo = (wfd.dwFileAttributes &
                         FILE_ATTRIBUTE_DIRECTORY) ?
                         FEO_DIR : FEO_FILE;
         fer = pfnCB(feo, pszDir, szPath, &wfd, pvContext);
         if (fer == FER_CONTINUE) {
          if (feo == FEO_DIR) {
           fer = EnumDirectoryTree(szPath, pfnCB, pvContext);
           if (fer == FER_CONTINUE) {
            fer = pfnCB(FEO_LEAVEDIR, pszDir, szPath,
                        &wfd, pvContext);
           }
          }
         } else if (fer == FER_SKIP) {
          fer = FER_CONTINUE;
         }
        }
       } while (FindNextFile(hfind, &wfd));
       FindClose(hfind);
      }
     }
     return fer;
    }
    

    Note: I made no attempt to make this function at all efficient since that's not my point here. It's highly wasteful of stack space (which can cause problems when walking deep directory trees). This function also doesn't like paths deeper than MAX_PATH; fixing this is beyond the scope of this series. Nor do I worry about reparse points, which can induce infinite loops if you're not careful.

    Well, that wasn't so hard to write. But that's because we made life hard for the consumer. The consumer needs to maintain state across each callback. For example, suppose you wanted to build a list of directories and their sizes (both including and excluding subdirectories).

    class EnumState {
    public:
     EnumState()
       : m_pdirCur(new Directory(NULL)) { }
     ~EnumState() { Dispose(); }
     FERESULT Callback(FEOPERATION feo,
        LPCTSTR pszDir, LPCTSTR pszPath,
        const WIN32_FIND_DATA* pwfd);
     void FinishDir(LPCTSTR pszDir);
    
    private:
    
     struct Directory {
      Directory(Directory* pdirParent)
       : m_pdirParent(pdirParent)
       , m_ullSizeSelf(0)
       , m_ullSizeAll(0) { }
      Directory* m_pdirParent;
      ULONGLONG m_ullSizeSelf;
      ULONGLONG m_ullSizeAll;
     };
     Directory* Push();
     void Pop();
     void Dispose();
    
     Directory* m_pdirCur;
    };
    
    EnumState::Directory* EnumState::Push()
    {
     Directory* pdir = new Directory(m_pdirCur);
     if (pdir) {
      m_pdirCur = pdir;
     }
     return pdir;
    }
    
    void EnumState::Pop()
    {
     Directory* pdir = m_pdirCur->m_pdirParent;
     delete m_pdirCur;
     m_pdirCur = pdir;
    }
    
    void EnumState::Dispose()
    {
     while (m_pdirCur) {
      Pop();
     }
    }
    
    void EnumState::FinishDir(LPCTSTR pszDir)
    {
      m_pdirCur->m_ullSizeAll +=
        m_pdirCur->m_ullSizeSelf;
      printf("Size of %s is %I64d (%I64d)\n",
       pszDir, m_pdirCur->m_ullSizeSelf,
       m_pdirCur->m_ullSizeAll);
    }
    
    ULONGLONG FileSize(const WIN32_FIND_DATA *pwfd)
    {
      return 
        ((ULONGLONG)pwfd->nFileSizeHigh << 32) +
        pwfd->nFileSizeLow;
    }
    
    FERESULT EnumState::Callback(FEOPERATION feo,
        LPCTSTR pszDir, LPCTSTR pszPath,
        const WIN32_FIND_DATA* pwfd)
    {
     if (!m_pdirCur) return FER_STOP;
    
     switch (feo) {
     case FEO_FILE:
      m_pdirCur->m_ullSizeSelf += FileSize(pwfd);
      return FER_CONTINUE;
    
     case FEO_DIR:
      if (Push()) {
       return FER_CONTINUE;
      } else {
       return FER_SKIP;
      }
    
     case FEO_LEAVEDIR:
      FinishDir(pszPath);
    
     /* Propagate size into parent */
      m_pdirCur->m_pdirParent->m_ullSizeAll +=
        m_pdirCur->m_ullSizeAll;
      Pop();
      return FER_CONTINUE;
    
     default:
      return FER_CONTINUE;
     }
     /* notreached */
    }
    
    FERESULT CALLBACK EnumState_Callback(
        FEOPERATION feo,
        LPCTSTR pszDir, LPCTSTR pszPath,
        const WIN32_FIND_DATA* pwfd,
        void* pvContext)
    {
     EnumState* pstate =
        reinterpret_cast<EnumState*>(pvContext);
     return pstate->Callback(feo, pszDir,
                pszPath, pwfd);
    }
    
    int __cdecl main(int argc, char **argv)
    {
     EnumState state;
     if (EnumDirectoryTree(TEXT("."),
            EnumState_Callback,
            &state) == FER_CONTINUE) {
      state.FinishDir(TEXT("."));
     }
     return 0;
    }
    

    Boy that sure was an awful lot of typing, and what's worse, the whole structure of the program has been obscured by the explicit state management. It sure is hard to tell at a glance what this chunk of code is trying to do. Instead, you have to stare at the EnumState class and reverse-engineer what's going on.

    (Yes, I could have simplified this code a little by using a built-in stack class, but as I have already noted in the context of smart pointers, I try to present these articles in "pure" C++ so people won't get into arguments about which class library is best.)

    Tomorrow, we'll look at how the world would be if the function EnumDirectoryTree were spec'd out by the caller rather than the enumerator!

Page 1 of 4 (34 items) 1234