August, 2011

  • The Old New Thing

    Ow, I'm too safe!

    • 26 Comments

    One of my friends is a geek, and, naturally, fully researches everything he does, from cement pouring to bicycle parts, perhaps a bit obsessively. He made sure to get five-point restraints for his children's car seats, for example. And he naturally tightens the belts snugly when putting his children in the car.

    At one point, as he was strapping his daughter in, she complained, "Ow! I'm too safe!"

    Because as far as she was concerned, "being safe" was a synonym for "having a tight seat belt." I leave you to figure out how she came to this conclusion.

  • The Old New Thing

    Why does IFileOperation skip junctions even though I passed FOFX_NOSKIPJUNCTIONS?

    • 5 Comments

    The IFile­Operation::Set­Operation­Flags method accepts a number of flags to modify the file operation, among them today's subject FOFX_NO­SKIP­JUNCTIONS. A customer reported that they couldn't get this flag to work: Whether they set it or not, the IFile­Operation skipped over file system junctions.

    The term junction evolved two independent different meanings. The shell team invented the term shell namespace junction in Windows 95 to refer to a point in the shell namespace in which one type of namespace extension is grafted into another. For example, a directory of the form name.{guid} serves as the transition point between the default file system namespace and a custom namespace.

    Meanwhile, the file system team developed the term NTFS junction point to refer to a directory entry which links to another location.

    If you just hear the word junction by itself, you need to use context to determine whether it is short for shell namespace junction or NTFS junction point.

    Since IFile­Operation::Set­Operation­Flags is a shell interface, the shell interpretation is more likely (and is the correct one in this case). The FOFX_NO­SKIP­JUNCTIONS flag has no effect on the behavior of the IFile­Operation interface on NTFS junction points; it modifies the behavior on shell namespace junctions.

  • The Old New Thing

    Starting up inside the box

    • 41 Comments

    the shell team received two customer questions about a month apart which seemed unrelated but had the same root cause.

    I found that in Windows Vista, the xcopy command is ten times slower than it was in Windows XP. What is the source of this slowdown, and how can I fix it?
    We have an application which takes a very long time to start up on Windows Vista than it did in Windows XP. We noticed that the slowdown occurs only if we set the application to autostart.

    Let's look at the second one first, since that customer provided a useful piece of information: The slowdown occurs only if they set the program to run automatically at logon. In Windows Vista, programs which are set to run automatically at logon run with reduced priority. This was done in response to the fact that application developers went angling for a bonus and decided to slow down the operating system overall in order to get their program to start up faster. To counteract this tragedy of the commons, the performance team runs these programs inside a job object with reduced CPU, I/O, and paging priority—which the performance team informally calls boxing— for 60 seconds, so that the user isn't forced to sit and wait for all these startup programs to finish doing whatever "really important" stuff they want to do.

    Okay, back to the first customer, the one who reported that xcopy was taking a long time. It took a bit of back-and-forth, but eventually the customer revealed that they were performing the xcopy in a batch file which they placed in the Startup group. Once they volunteered that information, the reason for the slowdown became obvious: Their batch file was running inside the box, and consequently ran with low-priority I/O.

    There is no way to escape the box, but it so happens that logon-triggered scheduled tasks are not placed inside a box. That's your escape hatch. Don't abuse it. (Of course, now that I've told everybody how to avoid being put in a box, everybody will now take advantage of it, because eventually, nothing is special any more.)

    Oh, and if you look more closely at the Delay_Sec setting on a Windows 7 machine, you'll see that it's set to zero, so the boxing behavior is effectively disabled on Windows 7. I guess the performance team gave up. "Fine, if you want your computer to run like a dog when it starts up, then go right ahead. I won't try to save you from yourself any more."

    Bonus chatter: You can explicitly "put yourself inside a box" by using the PROCESS_MODE_BACKGROUND_BEGIN process priority mode. Programs which are intended to run in the background with minimal impact on the rest of the system can use this mode.

  • The Old New Thing

    Why does creating a shortcut to a file change its last-modified time... sometimes?

    • 15 Comments

    A customer observed that sometimes, the last-modified timestamp on a file would change even though nobody modified the file, or at least consciously took any steps to modify the file. In particular, they found that simply double-clicking the file in Explorer was enough to trigger the file modification.

    It took a while to puzzle out, but here's what's going on:

    When you double-click a file in Explorer, Explorer adds it to the Recent Items list. Internally, this is done by creating a shortcut to the item. The nice thing about a shortcut is that it knows how to track its target. That way, if you move an item, then try to open it from the Recent Items list, the shortcut tracking code will try to find where you moved it to. You moved the file. The shortcut still works. Magic.

    Shortcut target tracking magic is accomplished with the assistance of object identifiers, and object identifiers, as we saw earlier, are created on demand the moment somebody first asks for one.

    And that's where the file modification is coming from. If the file is freshly-created, it won't have an object identifier. When you create a shortcut to it (which happens implicitly when it is added to the Recent Items list), that triggers the creation of an object identifier, which in turn updates the last-modified time on the file.

    Frustratingly, the Link­Resolve­Ignore­Link­Info and No­Resolve­Track policies do not prevent the creation of object identifiers. Those policies control whether the tracking information is used during the resolve process, but they don't control whether the tracking information is obtained during shortcut creation. (Who knows, maybe you're creating the shortcut to be used on a machine where those policies are not in effect.) To suppress collecting the volume information and object identifier at shortcut creation time, you need to pass the SLDF_FORCE_NO_LINKINFO and SLDF_FORCE_NO_LINKTRACK flags to the IShell­Link­Data­List::Set­Flags method when you create the shortcut.

  • The Old New Thing

    Why does the runas command require its command line to be quoted?

    • 18 Comments

    Commenter teo complained that the runas command requires its command line to be quoted.

    Well, if you think about it, why single out runas? Pretty much all programs require their command line to be quoted if they contain special characters (like spaces that you want to be interpreted as part of a file name instead of as an argument separator). The runas command is just doing things the way everybody else does.

    Recall that on Windows, programs perform their own command line parsing. This isn't unix where the command shell does the work of parsing quotation marks and globs before handing the (now-partly-parsed) command line to the child process. Mind you, most programs do not do their own custom parsing; they rely on the C runtime library to do the parsing into arguments.

    Okay, but let's single out the runas command anyway, because runas does live in a slightly different world. It is a convention dating back to MS-DOS that programs which accept a command line as an argument do so without requiring quoting. The archetypal example of this is the command processor itself. Whatever you pass after the /C flag is treated as the command line to execute. Once the /C is encountered, parsing stops and everything from there to the end of the raw command line is treated as the argument. (It also imposes the requirement that /C be the last parameter on the command line.) (Note also that there is a special weirdo rule in the cmd.exe parser with respect to the /C and /K switches; see cmd /? for details.)

    (Therefore, if you want a program that forwards its command line to another program, the way to do this is not to parse your command line and then try to unparse it but rather to just forward the original command line.)

    The authors of the runas program appeared not to be aware of this historical convention at the time they wrote it. They just used the regular C runtime library command line parser, unaware that "programs which accept a command line on the command line" fall into a special alternate reality. Hence the need for the double-extra-quoting.

    Back when the runas program was being developed, I pointed out this historical alternate reality to the people responsible for the runas program. They took my remarks under advisement but evidently chose to stick with the "standard" parsing rules rather than entering the little-known alternate reality. (As a consolation prize, they did add some examples at the end of the runas /? output to explain how quotation marks should be used.)

  • The Old New Thing

    ReadDirectoryChangesW reads directory changes, but what if the directory doesn't change?

    • 11 Comments

    The Read­Directory­ChangesW function reads changes to a directory. But not all changes that happen to files in a directory result in changes to the directory.

    Now, it so happens that nearly all changes to a file in a directory really do result in something happening to the file's directory entry. If you write to a file, the last-write time in the directory entry changes. If you rename a file, the name in the directory entry changes. If you create a file, a new directory entry is created.

    But there are some changes that do not affect the directory entry. I've heard rumors that if you write to a file via a memory-mapped view, that will not update the last-write time in the directory entry. (I don't know if it's true, but if it's not, then just pick some other file-modifying operation that doesn't affect the directory entry, like modifying the contents of a file through a hard link in another directory, or explicitly suppressing file timestamp changes by calling Set­File­Time with a timestamp of 0xFFFFFFFF`FFFFFFFF.) The point is that since these changes have no effect on the directory, they are not recognized by Read­Directory­ChangesW. The Read­Directory­ChangesW function tells you about changes to the directory; if something happens that doesn't change the directory, then Read­Directory­ChangesW will just shrug its shoulders and say, "Hey, not my job."

    If you need to track all changes, even those which do not result in changes to the directory, you need to look at other techniques like the change journal (a.k.a. USN journal).

    The intended purpose of the Read­Directory­ChangesW function is to assist programs like Windows Explorer which display the contents of a directory. If something happens that results in a change to the directory listing, then it is reported by Read­Directory­ChangesW. In other words, Read­Directory­ChangesW tells you when the result of a Find­First­File/Find­Next­File loop changes. The intended usage pattern is doing a Find­First­File/Find­Next­File to collect all the directory entries, and then using the results from Read­Directory­ChangesW to update that collection incrementally.

    In other words, Read­Directory­ChangesW allows you to optimize a directory-viewing tool so it doesn't have to do full enumerations all the time.

    This design philosophy also explains why, if too many changes have taken place in the directory between calls to Read­Directory­ChangesW, the function will fail with an error called ERROR_NOTIFY_ENUM_DIR. It's telling you, "Whoa, like so much happened that I couldn't keep track of it all, so you'll just have to go back and do another Find­First­File/Find­Next­File loop."

  • The Old New Thing

    The ways people mess up IUnknown::QueryInterface, episode 4

    • 27 Comments

    One of the rules for IUnknown::Query­Interface is so obvious that nobody even bothers to state it explicitly as a rule: "If somebody asks you for an interface, and you return S_OK, then the pointer you return must point to the interface the caller requested." (This feels like the software version of dumb warning labels.)

    During compatibility testing for Windows Vista, we found a shell extension that behaved rather strangely. Eventually, the problem was traced to a broken IUnknown::Query­Interface implementation which depended subtly on the order in which interfaces were queried.

    The shell asked for the IExtract­IconA and IExtract­IconW interfaces in the following order:

    // not the actual code but it gets the point across
    IExtractIconA *pxia;
    IExtractIconW *pxiw;
    punk->QueryInterface(IID_IExtractIconA, &pxia);
    punk->QueryInterface(IID_IExtractIconW, &pxiw);
    

    One particular shell extension would return the same pointer to both queries; i.e., after the above code executed, pxia == pxiw even though neither interface derived from the other. The two interfaces are not binary-compatible, because IExtract­IconA::Get­Icon­Location operates on ANSI strings, whereas IExtract­IconW::Get­Icon­Location operates on Unicode strings.

    The shell called pxiw->Get­Icon­Location, but the object interpreted the szIcon­File as an ANSI string buffer; as a result, when the shell went to look at it, it saw gibberish.

    Further experimentation revealed that if the order of the two Query­Interface calls were reversed, then pxiw->Get­Icon­Location worked as expected. In other words, the first interface you requested "locked" the object into that interface, and asking for any other interface just returned a pointer to the locked interface. This struck me as very odd; coding up the object this way seems to be harder than doing it the right way!

    // this code is wrong - see discussion above
    class CShellExtension : public IExtractIcon
    {
     enum { MODE_UNKNOWN, MODE_ANSI, MODE_UNICODE };
      HRESULT CShellExtension::QueryInterface(REFIID riid, void **ppv)
      {
       *ppv = NULL;
       if (riid == IID_IUnknown) *ppv = this;
       else if (riid == IID_IExtractIconA) {
        if (m_mode == MODE_UNKNOWN) m_mode = MODE_ANSI;
        *ppv = this;
       } else if (riid == IID_IExtractIconW) {
        if (m_mode == MODE_UNKNOWN) m_mode = MODE_UNICODE;
        *ppv = this;
       }
       if (*ppv) AddRef();
       return *ppv ? S_OK : E_NOINTERFACE;
      }
      ... AddRef / Release ...
    
      HRESULT GetIconLocation(UINT uFlags, LPTSTR szIconFile, UINT cchMax,
                              int *piIndex, UINT *pwFlags)
      {
       if (m_mode == MODE_ANSI) lstrcpynA((char*)szIconFile, "foo", cchMax);
       else lstrcpynW((WCHAR*)szIconFile, L"foo", cchMax);
       ...
      }
      ...
    }
    

    Instead of implementing both IExtract­IconA and IExtract­IconW, my guess is that they implemented just one of the interfaces and made it alter its behavior based on which interface it thinks it needs to pretend to be. It never occurred to them that the single interface might need to pretend to be two different things at the same time.

    The right way of supporting two interfaces is to actually implement two interfaces and not write a single morphing interface.

    class CShellExtension : public IExtractIconA, public IExtractIconW
    {
      HRESULT CShellExtension::QueryInterface(REFIID riid, void **ppv)
      {
       *ppv = NULL;
       if (riid == IID_IUnknown ||
           riid == IID_IExtractIconA) {
        *ppv = static_cast<IExtractIconA*>(this);
       } else if (riid == IID_IExtractIconW) {
        *ppv = static_cast<IExtractIconW*>(this);
       }
       if (*ppv) AddRef();
       return *ppv ? S_OK : E_NOINTERFACE;
      }
      ... AddRef / Release ...
    
      HRESULT GetIconLocation(UINT uFlags, LPSTR szIconFile, UINT cchMax,
                              int *piIndex, UINT *pwFlags)
      {
       lstrcpynA(szIconFile, "foo", cchMax);
       return GetIconLocationCommon(uFlags, piIndex, pwFlags);
      }
    
      HRESULT GetIconLocation(UINT uFlags, LPWSTR szIconFile, UINT cchMax,
                              int *piIndex, UINT *pwFlags)
      {
       lstrcpynW(szIconFile, L"foo", cchMax);
       return GetIconLocationCommon(uFlags, piIndex, pwFlags);
      }
      ...
    }
    

    We worked around this in the shell by simply changing the order in which we perform the calls to IUnknown::Query­Interface and adding a comment explaining why the order of the calls is important.

    (This is another example of how the cost of a compatibility fix is small potatoes. The cost of deciding whether or not to apply the fix far exceeds the cost of just doing it for everybody.)

    A different shell extension had a compatibility problem that also was traced back to a dependency on the order in which the shell asked for interfaces. The shell extension registered as a context menu extension, but when the shell tried to create it, it got E_NO­INTERFACE back:

    CoCreateInstance(CLSID_YourAwesomeExtension, NULL,
                     CLSCTX_INPROC_SERVER, IID_IContextMenu, &pcm);
    // returns E_NOINTERFACE?
    

    This was kind of bizarre. I mean, the shell extension went to the effort of registering itself as a context menu extension, but when the shell said, "Okay, it's show time, let's do the context menu dance!" it replied, "Sorry, I don't do that."

    The vendor explained that the shell extension relies on the order in which the shell asked for interfaces. The shell used to create and initialize the extension like this:

    // error checking and other random bookkeeping removed
    IShellExtInit *psei;
    IContextMenu *pcm;
    
    CoCreateInstance(CLSID_YourAwesomeExtension, NULL,
                     CLSCTX_INPROC_SERVER, IID_IShellExtInit, &psei);
    psei->Initialize(...);
    psei->QueryInterface(IID_IContextMenu, &pcm);
    psei->Release();
    // use pcm
    

    We changed the order in a manner that you would think should be equivalent:

    CoCreateInstance(CLSID_YourAwesomeExtension, NULL,
                     CLSCTX_INPROC_SERVER, IID_IContextMenu, &pcm);
    pcm->QueryInterface(IID_IShellExtInit, &psei);
    psei->Initialize(...);
    psei->Release();
    

    (Of course, it's not written in so many words in the code; the various parts are spread out over different components and helper functions, but this is the sequence of calls the shell extension sees.)

    The vendor explained that their shell extension will not respond to any shell extension interfaces (aside from IShell­Ext­Init) until it has been initialized, because it is at that point that they decide which extensions they want to support. Unfortunately, this violates the first of the four explicit rules for IUnknown::Query­Interface, namely that the set of interfaces must be static. (It's okay to have an object expose different interfaces conditionally, as long as it understands that once it says yes or no to a particular interface, it is committed to answering the same way for the lifetime of the object.)

  • The Old New Thing

    Slim reader/writer locks don't remember who the owners are, so you'll have to find them some other way

    • 2 Comments

    The slim reader/writer lock is a very convenient synchronization facility, but one of the downsides is that it doesn't keep track of who the current owners are. When your thread is stuck waiting to acquire a slim reader/writer lock, a natural thing to want to know is which threads own the resource your stuck thread waiting for.

    Since there's not facility for going from the waiting thread to the owning threads, you'll just have to find the owning threads some other way. Here's the thread that is waiting for the lock in shared mode:

    ntdll!ZwWaitForKeyedEvent+0xc
    ntdll!RtlAcquireSRWLockShared+0x126
    dbquery!CSearchSpace::Validate+0x10b
    dbquery!CSearchSpace::DecomposeSearchSpace+0x3c
    dbquery!CQuery::AddConfigs+0xdc
    dbquery!CQuery::ResolveProviders+0x89
    dbquery!CResults::CreateProviders+0x85
    dbquery!CResults::GetProviders+0x61
    dbquery!CResults::CreateResults+0x11c
    

    Okay, how do you find the thread that owns the lock?

    First, slim reader/writer locks are usable only within a process, so the candidate threads are the one within the process.

    Second, the usage pattern for locks is nearly always something like

        enter lock
        do something
        exit lock
    

    It is highly unusual for a function to take a lock and exit to external code with the lock held. (It might exit to other code within the same component, transferring the obligation to exit the lock to that other code.) Therefore, you want to look for threads that are still inside dbquery.dll, possibly even still inside CSearch­Space (if the lock is a per-object lock rather than a global one).

    Of course, the possibility might be that the code that entered the lock messed up and forgot to release it, but if that's the case, no amount of searching for it will find anything since the culprit is long gone. Since debugging is an exercise in optimism, we may as well proceed on the assumption that we're not in the case. If it fails to find the lock owner, then we may have to revisit the assumption.

    Finally, the last trick is knowing which threads to ignore. For now, you can also ignore the threads that are waiting for the lock, since they are the victims not the cause. (Again, if we fail to find the lock owner, we can revisit the assumption that they are not the cause; for example, they may be attempting to acquire the lock recursively.)

    As it happens, there is only one thread in the process that passes all the above filters.

    dbquery!CProp::Marshall+0x3b
    dbquery!CRequest::CRequest+0x24c
    dbquery!CQuery::Execute+0x668
    dbquery!CResults::FillParams+0x1c4
    dbquery!CResults::AddProvider+0x4e
    dbquery!CResults::AddConfigs+0x1c5
    dbquery!CResults::CreateResults+0x145
    

    This may not be the source of the problem, but it's a good start. (Actually, it looks very promising since the problem is probably that the process on the other side of the marshaller is stuck.)

  • The Old New Thing

    Why does the Shift+F10 menu differ from the right-click menu?

    • 35 Comments

    The Shift+F10 key is a keyboard shortcut for calling up the context menu on the selected item. but if you look closely, you might discover that the right-click menu and the Shift+F10 menu differ in subtle ways. Shouldn't they be the same? After all, that's the point of being a keyboard shortcut, right?

    Let's set aside the possibility that a program might be intentionally making them different, in violation of UI guidelines. For example, a poorly-designed program might use the WM_RBUTTON­UP message as the trigger to display the context menu instead of using the WM_CONTEXT­MENU message, in which case Shift+F10 won't do anything at all. Or the poorly-designed program may specifically detect that the WM_CONTEXT­MENU message was generated from the keyboard and choose to display a different menu. (This on top of the common error of forgetting to display a keyboard-invoked context menu at the currently selected item.) If somebody intentionally makes them different, then they'll be different.

    Okay, so the program is not intentionally creating a distinction between mouse-initiated and keyboard-initiated context menus. Shift+F10 and right-click both generate the WM_CONTEXT­MENU message, and therefore the same menu-displaying code is invoked. The subtle difference is that when you press Shift+F10, the shift key is down, and as we all know, holding the shift key while calling up a context menu is a Windows convention for requesting the extended context menu rather than the normal context menu.

    You get a different menu not because the program is going out of its way to show you a different menu, but because the use of the shift key accidentally triggers the extended behavior. It's like why when you look at yourself in the mirror, your eyes are always open, or why when you call your own phone number, the line is always busy. To avoid this, use the Menu key (confusingly given the virtual key name VK_APPS) to call up the context menu. (This is the key that has a picture of a menu on it, usually to the right of your space bar.) When you press that key, the code which decides whether to show a normal or extended context menu will see that the shift key is not held down, and it'll go for the normal context menu.

    Of course, you can also press Shift+AppMenu, but then you'll have come full circle.

  • The Old New Thing

    What does the CreateProcess function do if there is no space between the program name and the arguments?

    • 25 Comments

    In an old discussion of why the Create­Process function modifies its command line, commenter Random832 asked, "What if there is no space between the program name and arguments - like "cmd/?" - where does it put the null then?"

    The Create­Process function requires a space between the program name and arguments. If you leave out the space, then the arguments are considered as part of the program name (and you'll almost certainly get ERROR_FILE_NOT_FOUND back).

    It sounds like Random832 has confused Create­Process command line parsing with cmd.exe command line parsing. Clearly the two parsers are different; you can see this even without playing with spaces between the program name and the arguments:

    C:\>C:\Program Files\Windows NT\Accessories\wordpad.exe
    'C:\Program' is not recognized as an internal or external command,
    operable program or batch file.
    

    If the command line had been parsed by Create­Process, this would have succeeded in running the Wordpad program, because, as I noted in the original article, the Create­Process function modifies its command line in order to find where the program name ends and the command lines begin, an example of which can be found in the Create­Process documentation. In this case, it would have plunked the null character into each of the spaces in the command line, finding that each one failed, until it finally tried treating the entire string as the program name, at which point it would have succeeded. The fact that it failed demonstrates that Create­Process didn't do the parsing.

    The cmd.exe program permits the space between a program name and its arguments to be elided if the arguments begin with a character not permitted in file names. Once it figures out what you're running, and it determines that what you're running is a program, it call the Create­Process function with an explicit application and command line.

    But you don't have to take my word for it. You can just see for yourself. (In fact, this is exactly what I did to investigate the issue in the first place.)

    C:>ntsd -2 cmd.exe
    

    Two windows will open, one for your debugger and one for cmd.exe. (You are welcome to replace ntsd with your favorite debugger. I chose ntsd because—at least until Windows XP—it came preinstalled, thereby avoiding multiplying the problem from one to two.)

    In the debugger, set a breakpoint on kernel32!Create­ProcessW, then resume execution. In the cmd.exe window, type cmd/?. The breakpoint will fire, and you can look at the parameters:

    Breakpoint 0 hit
    eax=0046f600 ebx=00000000 ecx=004f8de0 edx=00000000 esi=00000000 edi=00000001
    eip=757820ba esp=0046f544 ebp=0046f704 iopl=0         nv up ei pl zr na pe nc
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000246
    kernel32!CreateProcessW:
    757820ba 8bff            mov     edi,edi
    0:000> dd esp l4
    0046f544  4a5e3dd7 004f5420 004f8db0 00000000
    0:000> du 004f5420
    004f5420  "C:\Windows\system32\cmd.exe"
    0:000> du 004f8db0
    004f8db0  "cmd /?"
    

    Observe that cmd.exe did its own manual path search to arrive at an executable of C:\Windows\system32\cmd.exe, and also that it secretly inserted a space between the cmd and the slash.

Page 2 of 3 (26 items) 123