August, 2011

  • The Old New Thing

    Modernizing our simple program that retrieves information about the items in the Recycle Bin

    • 18 Comments

    Last time, we wrote a simple program to print various properties of the items in the Recycle Bin, and we did so in the classical style, using item ID lists and IShell­Folders. One thing you may have noticed is that a lot of functions take the combination of an IShell­Folder and a PCUITEMID_CHILD. In the shell namespace, operations on items usually happen by means of the pair (folder, child), and one of the common mistakes made by beginners is failing to keep track of the pairing and passing child pidls to the wrong parent folder.

    Even if you're not a beginner and are good at keeping track of which child pidls correspond to which parent folders, it's still extra work you have to do, and it means that a lot of functions take two parameters in order to describe one thing.

    Enter IShell­Item.

    The IShell­Item encapsulates the pair (folder, child). This solves two problems:

    1. You only have to pass one thing around (the IShell­Item) instead of two (the IShell­Folder and the PCUITEMID_CHILD).
    2. By keeping track of the two items as a single unit, it reduces the risk that you'll accidentally use a child pidl with the wrong parent folder.

    Another complexity of the classic shell interface is that there are a bunch of ways of obtaining COM objects from a shell folder:

    • IShell­Folder::Bind­To­Object
    • IShell­Folder::Bind­To­Storage
    • IShell­Folder::Create­View­Object
    • IShell­Folder::Get­UI­Object­Of
    • IUnknown::Query­Interface (thanks to the desktop special case we saw last time).

    The IShell­Item::Bind­To­Handler interface hides these special-cases by dealing with them under the covers so you don't have to. You just call IShell­Item::Bind­To­Handler and it figures out where to get the object and what weird special cases apply. (It also takes care of the weird S_FALSE return value from IShell­Folder::Enum­Objects.)

    And then there's the annoyance of IShell­Folder::Get­Display­Name­Of using the kooky STRRET structure. The IShell­Item::Get­Display­Name function encapsulates that away for you by doing the work to convert that STRRET into a boring string pointer.

    First up in modernizing our sample program is to change Bind­To­Csidl to return a shell item instead of a shell folder.

    HRESULT BindToCsidlItem(int csidl, IShellItem ** ppsi)
    {
     *ppsi = NULL;
     HRESULT hr;
     PIDLIST_ABSOLUTE pidl;
     hr = SHGetSpecialFolderLocation(NULL, csidl, &pidl);
     if (SUCCEEDED(hr)) {
      hr = SHCreateShellItem(NULL, NULL, pidl, ppsi);
      CoTaskMemFree(pidl);
     }
     return hr;
    }
    

    But wait, since we're modernizing, we may as well upgrade to SHGet­Known­Folder­ID­List:

    HRESULT BindToKnownFolderItem(REFKNOWNFOLDER rfid, IShellItem ** ppsi)
    {
     *ppsi = NULL;
     HRESULT hr;
     PIDLIST_ABSOLUTE pidl;
     hr = SHGetKnownFolderIDList(rfid, 0, NULL, &pidl);
     if (SUCCEEDED(hr)) {
      hr = SHCreateShellItem(NULL, NULL, pidl, ppsi);
      CoTaskMemFree(pidl);
     }
     return hr;
    }
    

    Hey wait, there's a function for this already in Windows 7! It's called SHGet­Known­Folder­Item. Yay, now we can delete the function entirely.

    Next, we convert Print­Display­Name to use IShell­Item and the item-based display name flags SIGDN.

    void PrintDisplayName(IShellItem *psi, SIGDN sigdn, PCTSTR pszLabel)
    {
     LPWSTR pszName;
     HRESULT hr = psi->GetDisplayName(sigdn, &pszName);
     if (SUCCEEDED(hr)) {
      _tprintf(TEXT("%s = %ws\n"), pszLabel, pszName);
      CoTaskMemFree(pszName);
     }
    }
    

    And then we convert Print­Detail to use IShell­Item. Oh wait, now we've hit a snag: The IShell­Item interface doesn't have a helper method that wraps IShell­Folder2::Get­Details­Ex. Fortunately, there is a way to ask IShell­Item to regurgitate the IShell­Folder and PITEMID_CHILD that it is wrapping: You use the IParent­And­Item::Get­Parent­And­Item method.

    void PrintDetail(IShellItem *psi,
        const SHCOLUMNID *pscid, PCTSTR pszLabel)
    {
     IParentAndItem *ppni;
     HRESULT hr = psi->QueryInterface(IID_PPV_ARGS(&ppni));
     if (SUCCEEDED(hr)) {
      IShellFolder *psf;
      PITEMID_CHILD pidl;
      hr = ppni->GetParentAndItem(NULL, &psf, &pidl);
      if (SUCCEEDED(hr)) {
       VARIANT vt;
       hr = psf->GetDetailsEx(pidl, pscid, &vt);
       if (SUCCEEDED(hr)) {
        hr = VariantChangeType(&vt, &vt, 0, VT_BSTR);
        if (SUCCEEDED(hr)) {
         _tprintf(TEXT("%s: %ws\n"), pszLabel, V_BSTR(&vt));
        }
        VariantClear(&vt);
       }
       psf->Release();
       CoTaskMemFree(pidl);
      }
     }
    }
    

    Wow, it looks like we lost ground there. Ah, but Windows Vista extends IShell­Item with the IShell­Item2 interface, and that has a bunch of new methods for retrieving properties.

    void PrintDetail(IShellItem2 *psi,
        const SHCOLUMNID *pscid, PCTSTR pszLabel)
    {
      PROPVARIANT vt;
      HRESULT hr = psi->GetProperty(*pscid, &vt);
      if (SUCCEEDED(hr)) {
       hr = VariantChangeType(&vt, &vt, 0, VT_BSTR);
       if (SUCCEEDED(hr)) {
        _tprintf(TEXT("%s: %ws\n"), pszLabel, V_BSTR(&vt));
       }
       PropVariantClear(&vt);
      }
     }
    }
    

    But wait, there's more. There's a special accessor just for retrieving properties as strings!

    void PrintDetail(IShellItem2 *psi2,
        const SHCOLUMNID *pscid, PCTSTR pszLabel)
    {
     LPWSTR pszValue;
     HRESULT hr = psi2->GetString(*pscid, &pszValue);
     if (SUCCEEDED(hr)) {
      _tprintf(TEXT("%s: %ws\n"), pszLabel, pszValue);
      CoTaskMemFree(pszValue);
     }
    }
    

    Okay, that's more like it. Now let's update the main program.

    int __cdecl _tmain(int argc, PTSTR *argv)
    {
     HRESULT hr = CoInitialize(NULL);
     if (SUCCEEDED(hr)) {
      IShellItem *psiRecycleBin;
      hr = SHGetKnownFolderItem(FOLDERID_RecycleBinFolder, KF_FLAG_DEFAULT,
                                NULL, IID_PPV_ARGS(&psiRecycleBin));
      if (SUCCEEDED(hr)) {
       IEnumShellItems *pesi;
       hr = psiRecycleBin->BindToHandler(NULL, BHID_EnumItems,
                                         IID_PPV_ARGS(&pesi));
       if (hr == S_OK) {
        IShellItem *psi;
        while (pesi->Next(1, &psi, NULL) == S_OK) {
         IShellItem2 *psi2;
         if (SUCCEEDED(psi->QueryInterface(IID_PPV_ARGS(&psi2)))) {
          _tprintf(TEXT("------------------\n"));
    
          PrintDisplayName(psi2, SIGDN_PARENTRELATIVE,
                                 TEXT("ParentRelative"));
          PrintDisplayName(psi2, SIGDN_NORMALDISPLAY, TEXT("Normal"));
          PrintDisplayName(psi2, SIGDN_FILESYSPATH, TEXT("FileSys"));
    
          PrintDetail(psi2, &SCID_OriginalLocation, TEXT("Original Location"));
          PrintDetail(psi2, &SCID_DateDeleted, TEXT("Date deleted"));
          PrintDetail(psi2, &PKEY_Size, TEXT("Size"));
          psi2->Release();
         }
         psi->Release();
        }
       }
       psiRecycleBin->Release();
      }
      CoUninitialize();
     }
     return 0;
    }
    

    Okay, so now we know how to enumerate the contents of the Recycle Bin and obtain properties of the items in it. How do we purge or restore items? We'll look at that next time.

  • The Old New Thing

    How can I get information about the items in the Recycle Bin?

    • 12 Comments

    For some reason, a lot of people are interested in programmatic access to the contents of the Recycle Bin. They never explain why they care, so it's possible that they are looking at their problem the wrong way.

    For example, one reason for asking, "How do I purge an item from the Recycle Bin given a path?" is that some operation in their program results in the files going into the Recycle Bin and they want them to be deleted entirely. The correct solution is to clear the FOF_ALLOW­UNDO flag when deleting the items in the first place. Moving to the Recycle Bin and then purging is the wrong solution because your search-and-destroy mission may purge more items than just the ones your program put there.

    The Recycle Bin is somewhat strange in that it can have multiple items with the same name. Create a text file called TEST.TXT on your desktop, then delete it into the Recycle Bin. Create another text file called TEST.TXT on your desktop, then delete it into the Recycle Bin. Now open your Recycle Bin. Hey look, you have two TEST.TXT files with the same path!

    Now look at that original problem: Suppose the program, as part of some operation, moves the file TEST.TXT from the desktop to the Recycle Bin, and then the second half of the program goes into the Recycle Bin, finds TEST.TXT and purges it. Well, there are actually three copies of TEST.TXT in the Recycle Bin, and only one of them is the one you wanted to purge.

    Okay, I got kind of sidetracked there. Back to the issue of getting information about the items in the Recycle Bin.

    The Recycle Bin is a shell folder, and the way to enumerate the contents of a shell folder is to bind to it and enumerate its contents. The low-level interface to the shell namespace is via IShell­Folder. There is an easier-to-use medium-level interface based on IShell­Item, and there's a high-level interface based on Folder designed for scripting.

    I'll start with the low-level interface. As usual, the program starts with a bunch of header files.

    #include <windows.h>
    #include <stdio.h>
    #include <tchar.h>
    #include <shlobj.h>
    #include <shlwapi.h>
    #include <propkey.h>
    

    The Bind­To­Csidl function binds to a folder specified by a CSIDL. The modern way to do this is via KNOWN­FOLDER, but just to keep you old fogeys happy, I'm doing things the classic way since you refuse to upgrade from Windows XP. (We'll look at the modern way later.)

    HRESULT BindToCsidl(int csidl, REFIID riid, void **ppv)
    {
     HRESULT hr;
     PIDLIST_ABSOLUTE pidl;
     hr = SHGetSpecialFolderLocation(NULL, csidl, &pidl);
     if (SUCCEEDED(hr)) {
      IShellFolder *psfDesktop;
      hr = SHGetDesktopFolder(&psfDesktop);
      if (SUCCEEDED(hr)) {
       if (pidl->mkid.cb) {
        hr = psfDesktop->BindToObject(pidl, NULL, riid, ppv);
       } else {
        hr = psfDesktop->QueryInterface(riid, ppv);
       }
       psfDesktop->Release();
      }
      CoTaskMemFree(pidl);
     }
     return hr;
    }
    

    The subtlety here is in the test for pidl->mkid.cb. The IShell­Folder::Bind­To­Object method is for binding to child objects (or grandchildren or deeper descendants). If the object you want is the desktop itself, then you can't use IShell­Folder::Bind­To­Object since the desktop is not a child of itself. In fact, if the object you want is the desktop itself, then you already have the desktop, so we just Query­Interface for it. It's an annoying special case which usually lurks in your code until somebody tries something like "Save file to desktop" or changes the location of a special folder to the desktop, and then boom you trip over the fact that the desktop is not a child of itself. (See further discussion below.)

    Another helper function prints the display name of a shell namespace item. There isn't much interesting here either.

    void PrintDisplayName(IShellFolder *psf,
        PCUITEMID_CHILD pidl, SHGDNF uFlags, PCTSTR pszLabel)
    {
     STRRET sr;
     HRESULT hr = psf->GetDisplayNameOf(pidl, uFlags, &sr);
     if (SUCCEEDED(hr)) {
      PTSTR pszName;
      hr = StrRetToStr(&sr, pidl, &pszName);
      if (SUCCEEDED(hr)) {
       _tprintf(TEXT("%s = %s\n"), pszLabel, pszName);
       CoTaskMemFree(pszName);
      }
     }
    }
    

    Our last helper function retrieves a property from the shell namespace and prints it. (Obviously, if we wanted to do something other than print it, we could coerce the type to something other than VT_BSTR.)

    void PrintDetail(IShellFolder2 *psf, PCUITEMID_CHILD pidl,
        const SHCOLUMNID *pscid, PCTSTR pszLabel)
    {
     VARIANT vt;
     HRESULT hr = psf->GetDetailsEx(pidl, pscid, &vt);
     if (SUCCEEDED(hr)) {
      hr = VariantChangeType(&vt, &vt, 0, VT_BSTR);
      if (SUCCEEDED(hr)) {
       _tprintf(TEXT("%s: %ws\n"), pszLabel, V_BSTR(&vt));
      }
      VariantClear(&vt);
     }
    }
    

    Okay, now we can get down to business. The properties we will display from each item in the Recycle Bin are the item name and path, the original location (before the item was deleted), the date the item was deleted, and the size of the item.

    Getting the name and path are done with various combinations of flags to IShell­Folder::Get­Display­Name­Of, whereas getting the other properties involve talking to the shell property system. (My colleague Ben Karas covers the shell property system on his blog.) The SHCOLUMN­ID documentation says that the displaced property set applies to items which have been moved to the Recycle Bin, so we can define those column IDs based on the values provided in shlguid.h:

    const SHCOLUMNID SCID_OriginalLocation =
       { PSGUID_DISPLACED, PID_DISPLACED_FROM };
    const SHCOLUMNID SCID_DateDeleted =
       { PSGUID_DISPLACED, PID_DISPLACED_DATE };
    

    The other property we want is System.Size, which the documentation says is defined as PKEY_Size by the propkey.h header file.

    Okay, let's roll!

    int __cdecl _tmain(int argc, PTSTR *argv)
    {
     HRESULT hr = CoInitialize(NULL);
     if (SUCCEEDED(hr)) {
      IShellFolder2 *psfRecycleBin;
      hr = BindToCsidl(CSIDL_BITBUCKET, IID_PPV_ARGS(&psfRecycleBin));
      if (SUCCEEDED(hr)) {
       IEnumIDList *peidl;
       hr = psfRecycleBin->EnumObjects(NULL,
         SHCONTF_FOLDERS | SHCONTF_NONFOLDERS, &peidl);
       if (hr == S_OK) {
        PITEMID_CHILD pidlItem;
        while (peidl->Next(1, &pidlItem, NULL) == S_OK) {
         _tprintf(TEXT("------------------\n"));
    
         PrintDisplayName(psfRecycleBin, pidlItem,
                          SHGDN_INFOLDER, TEXT("InFolder"));
         PrintDisplayName(psfRecycleBin, pidlItem,
                          SHGDN_NORMAL, TEXT("Normal"));
         PrintDisplayName(psfRecycleBin, pidlItem,
                          SHGDN_FORPARSING, TEXT("ForParsing"));
    
         PrintDetail(psfRecycleBin, pidlItem,
                     &SCID_OriginalLocation, TEXT("Original Location"));
         PrintDetail(psfRecycleBin, pidlItem,
                     &SCID_DateDeleted, TEXT("Date deleted"));
         PrintDetail(psfRecycleBin, pidlItem,
                     &PKEY_Size, TEXT("Size"));
    
         CoTaskMemFree(pidlItem);
        }
       }
       psfRecycleBin->Release();
      }
      CoUninitialize();
     }
     return 0;
    }
    

    The only tricky part is the test for whether the call to IShell­Folder::Enum­Objects succeeded, highlighted above. According to the rules for IShell­Folder::Enum­Objects, the method is allowed to return S_FALSE to indicate that there are no children, in which case it sets peidl to NULL.

    If you are willing to call functions new to Windows Vista, you can simplify the Bind­To­Csidl function by using the helper function SHBind­To­Object. This does the work of getting the desktop folder and handling the desktop special case.

    HRESULT BindToCsidl(int csidl, REFIID riid, void **ppv)
    {
     HRESULT hr;
     PIDLIST_ABSOLUTE pidl;
     hr = SHGetSpecialFolderLocation(NULL, csidl, &pidl);
     if (SUCCEEDED(hr)) {
      hr = SHBindToObject(NULL, pidl, NULL, riid, ppv);
      CoTaskMemFree(pidl);
     }
     return hr;
    }
    

    But at this point, I'm starting to steal from the topic I scheduled for next time, namely modernizing this program to take advantage of some new helper functions and interfaces. We'll continue next time.

  • The Old New Thing

    Why can't I use PSGUID_STORAGE like a GUID?

    • 24 Comments

    The stgprop.h header file defines a GUID called PSGUID_STORAGE, but a customer was having trouble using it.

        GUID guid;
        ...
        // This generates a strange compiler error
        if (IsEqualGUID(guid, PSGUID_STORAGE)) { ... }
    

    The strange compiler error the customer referred to is the following:

    test.cpp(136) : error C2143: syntax error : missing ')' before '{'
    test.cpp(136) : error C2059: syntax error : ')'
    test.cpp(136) : error C2143: syntax error : missing ';' before '{'
    test.cpp(136) : error C2059: syntax error : '{'
    test.cpp(136) : error C2059: syntax error : ')'
    test.cpp(137) : error C2059: syntax error : '}'
    test.cpp(137) : error C2143: syntax error : missing ';' before '}'
    test.cpp(137) : error C2059: syntax error : '}'
    

    "I don't see what the compiler is complaining about. The parentheses appear to be properly matched before the left brace."

    Remember, what you see is not necessarily what the compiler sees. Let's take another look at this mysterious GUID:

    #define PSGUID_STORAGE  { 0xb725f130,           \
                              0x47ef, 0x101a,       \
                              { 0xa5, 0xf1, 0x02, 0x60, 0x8c, 0x9e, 0xeb, 0xac } }
    

    Well there's your problem. After the preprocessor does its substitution, the line becomes

        if (IsEqualGUID(guid, { 0xb725f130,
                  0x47ef, 0x101a,
                  { 0xa5, 0xf1, 0x02, 0x60, 0x8c, 0x9e, 0xeb, 0xac } })) { ... }
    

    and that's not legal C/C++. (Though with a little tweaking, you can get GCC to accept it.) The PSGUID_STORAGE symbols is intended to be used as an initializer:

    const GUID StorageGuid = PSGUID_STORAGE;
    

    "How did you know that?"

    I didn't, but I went to the effort of looking at the definition in the header file and figuring it out from inspection.

    Why is it defined this way instead of

    DEFINE_GUID(PSGUID_STORAGE, 0xb725f130, 0x47ef,
            0x101a, 0xa5, 0xf1, 0x02, 0x60, 0x8c, 0x9e, 0xeb, 0xac);
    

    ?

    Because this GUID is used as the FMTID of a PROPERTY­KEY. The PROPERTY­KEY structure looks like this:

    typedef struct {
      GUID  fmtid;
      DWORD pid;
    } PROPERTYKEY;
    

    The intended usage is evidently

    const PROPERTYKEY
    PKEY_STORAGE_DIRECTORY = { PSGUID_STORAGE, PID_STG_DIRECTORY };
    

    Since the C language does not permit global variables to be initialized from other global variables (or at least it didn't at the time PROPERTY­KEYs were defined; who knows what crazy features will show up in C1X), PSGUID_STORAGE needs to be a macro which expands to an initializer rather than being a global variable.

    Today's question was really just settling the prerequisites for tomorrow's topic. Stay tuned.

  • The Old New Thing

    Random musings on the introduction of long file names on FAT

    • 21 Comments

    Tom Keddie thinks that the format of long file names on FAT deserves an article. Fortunately, I don't have to write it; somebody else already did.

    So go read that article first. I'm just going to add some remarks and stories.

    Hi, welcome back.

    Coming up with the technique of setting Read-only, System, Hidden, and Volume attributes to hide LFN entries took a bit of trial and error. The volume label was the most important part, since that was enough to get 90% of programs which did low-level disk access to lay off those directory entries. The other bits were added to push the success rate ever so close to 100%.

    The linked article mentions rather briefly that the checksum is present to ensure that the LFN entries correspond to the SFN entry that immediately follows. This is necessary so that if the directory is modified by code that is not LFN-aware (for example, maybe you dual-booted into Windows 3.1), and the file is deleted and the directory entry is reused for a different file, the LFN fragments won't be erroneously associated with the new file. Instead, the fragments are "orphans", directory entries for which the corresponding SFN entry no longer exists. Orphaned directory entries are treated as if they were free.

    The cluster value in a LFN entry is always zero for compatibility with disk utilities who assume that a nonzero cluster means that the directory entry refers to a live file.

    The linked article wonders what happens if the ordinals are out of order. Simple: If the ordinals are out of order, then they are invalid. The file system simply treats them as orphans. Here's an example of how out-of-order ordinals can be created. Start with the following directory entries:

    (2) "e.txt"
    (1) "Long File Nam"
    "LONGFI~1.TXT"
    (2) "e2.txt"
    (1) "Long File Nam"
    "LONGFI~2.TXT"

    Suppose this volume is accessed by a file system that does not support long file names, and the user deletes LONGFI~1.TXT. The directory now looks like this:

    (2) "e.txt"
    (1) "Long File Nam"
    (free)
    (2) "e2.txt"
    (1) "Long File Nam"
    "LONGFI~2.TXT"

    Now the volume is accessed by a file system that supports long file names, and the user renames Long File Name2.txt to Wow that's a really long file name there.txt.

    (2) "e.txt"
    (4) "e.txt"
    (3) "ile name ther"
    (2) "really long f"
    (1) "Wow that's a "
    "WOWTHA~1.TXT"

    Since the new name is longer than the old name, more LFN fragments need to be used to store the entire name, and oh look isn't that nice, there are some free entries right above the ones we're already using, so let's just take those. Now if you read down the table, you see that the ordinal goes from 2 up to 4 (out of order) before continuing in the correct order. When the file system sees this, it knows that the entry with ordinal 2 is an orphan.

    One last historical note: The designers of this system didn't really expect Windows NT to adopt long file names on FAT, since Windows NT already had its own much-better file system, namely, NTFS. If you wanted long file names on Windows NT, you'd just use NTFS and call it done. Nevertheless, the decision was made to store the file names in Unicode on disk, breaking with the long-standing practice of storing FAT file names in the OEM character set. The decision meant that long file names would take up twice as much space (and this was back in the days when disk space was expensive), but the designers chose to do it anyway "because it's the right thing to do."

    And then Windows NT added support for long file names on FAT and the decision taken years earlier to use Unicode on disk proved eerily clairvoyant.

  • The Old New Thing

    Stupid command-line trick: Counting the number of lines in stdin

    • 42 Comments

    On unix, you can use wc -l to count the number of lines in stdin. Windows doesn't come with wc, but there's a sneaky way to count the number of lines anyway:

    some-command-that-generates-output | find /c /v ""
    

    It is a special quirk of the find command that the null string is treated as never matching. The /v flag reverses the sense of the test, so now it matches everything. And the /c flag returns the count.

    It's pretty convoluted, but it does work.

    (Remember, I provide the occasional tip on batch file programming as a public service to those forced to endure it, not as an endorsement of batch file programming.)

    Now come da history: Why does the find command say that a null string matches nothing? Mathematically, the null string is a substring of every string, so it should be that if you search for the null string, it matches everything. The reason dates back to the original MS-DOS version of find.exe, which according to the comments appears to have been written in 1982. And back then, pretty much all of MS-DOS was written in assembly language. (If you look at your old MS-DOS floppies, you'll find that find.exe is under 7KB in size.) Here is the relevant code, though I've done some editing to get rid of distractions like DBCS support.

            mov     dx,st_length            ;length of the string arg.
            dec     dx                      ;adjust for later use
            mov     di, line_buffer
    lop:
            inc     dx
            mov     si,offset st_buffer     ;pointer to beg. of string argument
    
    comp_next_char:
            lodsb
            cmp     al,byte ptr [di]
            jnz     no_match
    
            dec     dx
            jz      a_matchk                ; no chars left: a match!
            call    next_char               ; updates di
            jc      no_match                ; end of line reached
            jmp     comp_next_char          ; loop if chars left in arg.
    

    If you're rusty on your 8086 assembly language, here's how it goes in pseudocode:

     int dx = st_length - 1;
     char *di = line_buffer;
    lop:
     dx++;
     char *si = st_buffer;
    comp_next_char:
     char al = *si++;
     if (al != *di) goto no_match;
     if (--dx == 0) goto a_matchk;
     if (!next_char(&di)) goto no_match;
     goto comp_next_char;
    

    In sort-of-C, the code looks like this:

     int l = st_length - 1;
     char *line = line_buffer;
    
     l++;
     char *string = st_buffer;
     while (*string++ == *line && --l && next_char(&line)) {} 
    

    The weird - 1 followed by l++ is an artifact of code that I deleted, which needed the decremented value. If you prefer, you can look at the code this way:

     int l = st_length;
     char *line = line_buffer;
     char *string = st_buffer;
     while (*string++ == *line && --l && next_char(&line)) {} 
    

    Notice that if the string length is zero, there is an integer underflow, and we end up reading off the end of the buffers. The comparison loop does stop, because we eventually hit bytes that don't match. (No virtual memory here, so there is no page fault when you run off the end of a buffer; you just keep going and reading from other parts of your data segment.)

    In other words, due to an integer underflow bug, a string of length zero was treated as if it were a string of length 65536, which doesn't match anywhere in the file.

    This bug couldn't be fixed, because by the time you got around to trying, there were already people who discovered this behavior and wrote batch files that relied on it. The bug became a feature.

    The integer underflow was fixed, but the code is careful to treat null strings as never matching, in order to preserve existing behavior.

    Exercise: Why is the loop label called lop instead of loop?

  • The Old New Thing

    Magic dirt, the fate of former professional athletes, and other sports randomness

    • 14 Comments

    A sports-related (mostly baseball) link dump.

  • The Old New Thing

    What do SizeOfStackReserve and SizeOfStackCommit mean for a DLL?

    • 5 Comments

    Nothing.

    Those fields in the IMAGE_OPTIONAL_HEADER structure are meaningful only when they appear in the EXE. The values provided in DLLs are ignored.

    Size­Of­Heap­Reserve and Size­Of­Heap­Commit fall into the same category. In general, flags and fields which control process settings have no effect when declared in a DLL. We've seen a few examples already, like the /LARGE­ADDRESS­AWARE flag or the markers which indicate the default layout direction.

  • The Old New Thing

    Why doesn't the Open Files list in the Shared Folders snap-in show all my open files?

    • 30 Comments

    A customer wanted a way to determine which users were using specific files on their server. They fired up the Shared Folders MMC snap-in and went to the Open Files list. They found that the results were inconsistent. Some file types like .exe and .pdf did show up in the list when they were open, but other file types like .txt did not. The customer asked for an explanation of the inconsistency and for a list of which file types work and which ones don't.

    The customer is confusing two senses of the term open file. From the file system point of view, an open file is one that has an outstanding handle reference. This is different from the user interface concept of "There is an open window on my screen showing the contents of the file."

    The Open Files list shows files which are open in the file system sense, not in the user interface sense.

    Whether a file shows up in the Open Files list depends on the application that is used to open the file (in the user interface sense). Text files are typically opened by Notepad, and Notepad reads the entire contents of the file into memory and closes the file handle. Therefore, the file is open (in the file system sense) only when it is in the process of being loaded or saved.

    There is no comprehensive list of which types of files fall into which category because the behavior is not a function of the file type but rather a function of the application being used to view the file. (If you open a .txt file in Word, I believe it will keep the file system handle open until you close the document window.)

    The customer seemed satisfied with the explanation. They ran some experiments and observed that Hey, check it out, if I load a really big text file into Notepad, I can see it show up in the Open Files list momentarily. They never did come back with any follow-up questions, so I don't know how they went about solving the original problem. (Maybe they used a SACL to audit who was opening the files.)

  • The Old New Thing

    You don't make something easier to find by hiding it even more deeply

    • 28 Comments

    Commenter rolfhub suggested that, to help people recover from accidentally going into Tiny Footprint Mode, the Task Manager could display a right-click context menu with an entry to return to normal mode.

    My initial reaction to this was Huh? Who right-clicks on nothing? Tiny Footprint Mode is itself already a bad secret hidden setting. Having the exit from the mode be a right-click menu on a blank space is a double-secret hidden setting.

    If I had dictatorial control over all aspects of the shell, I would put a Restore button  in the upper right corner to let people return to normal mode.

  • The Old New Thing

    Why are the alignment requirements for SLIST_ENTRY so different on 64-bit Windows?

    • 3 Comments

    The Interlocked­Push­Entry­SList function stipulates that all list items must be aligned on a MEMORY_ALLOCATION_ALIGNMENT boundary. For 32-bit Windows, MEMORY_ALLOCATION_ALIGNMENT is 8, but the SLIST_ENTRY structure itself does not have a DECLSPEC_ALIGN(8) attribute. Even more confusingly, the documentation for SLIST_ENTRY says that the 64-bit structure needs to be 16-byte aligned but says nothing about the 32-bit structure. So what are the memory alignment requirements for a 32-bit SLIST_ENTRY, 8 or 4?

    It's 8. No, 4. No wait, it's both.

    Officially, the alignment requirement is 8. Earlier versions of the header file did not stipulate 8-byte alignment, and changing the declaration would have resulted in existing structures which (inadvertently) misaligned the field changing size and layout when the new requirement was imposed. So the 32-bit structure was sort-of grandfathered in. You should still align it on 8-byte boundaries, but the header file doesn't enforce it to avoid breaking existing code.

    Fortunately, when the 64-bit version was introduced, the proper alignment directive was introduced right off the bat. How about that: sometimes Microsoft learns from its mistakes after all.

    Why are the alignment requirements greater than the natural word size? To avoid the ABA problem. A standard workaround for the ABA problem is to append additional information (a "tag") to the pointer so that when the value changes from B back to A, the tag ensures that the second A still looks different from the first one. Many CPU architectures have a "double-pointer-sized atomic compare-and-swap" instruction, and some of them have the additional requirement that the double-pointer needs to be on a double-pointer boundary (8 bytes for 32-bit pointers and 16 bytes for 64-bit pointers).

    "But wait, the double-pointer compare-and-swap is used on the SLIST_HEADER, not on the SLIST_ENTRY. Why does the SLIST_ENTRY need to be double-pointer aligned, too?"

    While it's true that many CPU architectures have a "double-pointer-sized atomic compare-and-swap" instruction, some support only a "pointer-sized atomic compare-and-swap". For example, the original AMD64 architecture did not have a CMPXCHG16B instruction; the largest data size for an atomic compare-and-swap was 8 bytes. As a result, the Slist functions need to pack a 64-bit pointer, a list depth, and tag information into a single 64-bit value. One of the tricks they used was imposing a memory alignment of 16 bytes. This freed up four bits in the pointer for use as a tag.

Page 1 of 3 (26 items) 123