December, 2006

  • The Old New Thing

    What does an invalid handle exception in LeaveCriticalSection mean?


    Internally, a critical section is a bunch of counters and flags, and possibly an event. (Note that the internal structure of a critical section is subject to change at any time—in fact, it changed between Windows XP and Windows 2003. The information provided here is therefore intended for troubleshooting and debugging purposes and not for production use.) As long as there is no contention, the counters and flags are sufficient because nobody has had to wait for the critical section (and therefore nobody had to be woken up when the critical section became available).

    If a thread needs to be blocked because the critical section it wants is already owned by another thread, the kernel creates an event for the critical section (if there isn't one already) and waits on it. When the owner of the critical section finally releases it, the event is signaled, thereby alerting all the waiters that the critical section is now available and they should try to enter it again. (If there is more than one waiter, then only one will actually enter the critical section and the others will return to the wait loop.)

    If you get an invalid handle exception in LeaveCriticalSection, it means that the critical section code thought that there were other threads waiting for the critical section to become available, so it tried to signal the event, but the event handle was no good.

    Now you get to use your brain to come up with reasons why this might be.

    One possibility is that the critical section has been corrupted, and the memory that normally holds the event handle has been overwritten with some other value that happens not to be a valid handle.

    Another possibility is that some other piece of code passed an uninitialized variable to the CloseHandle function and ended up closing the critical section's handle by mistake. This can also happen if some other piece of code has a double-close bug, and the handle (now closed) just happened to be reused as the critical section's event handle. When the buggy code closes the handle the second time by mistake, it ends up closing the critical section's handle instead.

    Of course, the problem might be that the critical section is not valid because it was never initialized in the first place. The values in the fields are just uninitialized garbage, and when you try to leave this uninitialized critical section, that garbage gets used as an event handle, raising the invalid handle exception.

    Then again, the problem might be that the critical section is not valid because it has already been destroyed. For example, one thread might have code that goes like this:

    ... do stuff...

    While that thread is busy doing stuff, another thread calls DeleteCriticalSection(&cs). This destroys the critical section while another thread was still using it. Eventually that thread finishes doing its stuff and calls LeaveCriticalSection, which raises the invalid handle exception because the DeleteCriticalSection already closed the handle.

    All of these are possible reasons for an invalid handle exception in LeaveCriticalSection. To determine which one you're running into will require more debugging, but at least now you know what to be looking for.

    Postscript: One of my colleagues from the kernel team points out that the Locks and Handles checks in Application Verifier are great for debugging issues like this.

  • The Old New Thing

    Do not overload the E_NOINTERFACE error


    One of the more subtle ways people mess up IUnknown::QueryInterface is returning E_NOINTERFACE when the problem wasn't actually an unsupported interface. The E_NOINTERFACE return value has very specific meaning. Do not use it as your generic "gosh, something went wrong" error. (Use an appropriate error such as E_OUTOFMEMORY or E_ACCESSDENIED.)

    Recall that the rules for IUnknown::QueryInterface are that (in the absence of catastrophic errors such as E_OUTOFMEMORY) if a request for a particular interface succeeds, then it must always succeed in the future for that object. Similarly, if a request fails with E_NOINTERFACE, then it must always fail in the future for that object.

    These rules exist for a reason.

    In the case where COM needs to create a proxy for your object (for example, to marshal the object into a different apartment), the COM infrastructure does a lot of interface caching (and negative caching) for performance reasons. For example, if a request for an interface fails, COM remembers this so that future requests for that interface are failed immediately rather than being marshalled to the original object only to have the request fail anyway. Requests for unsupported interfaces are very common in COM, and optimizing that case yields significant performance improvements.

    If you start returning E_NOINTERFACE for problems other than "The object doesn't support this interface", COM will assume that the object really doesn't support the interface and may not ask for it again even if you do. This in turn leads to very strange bugs that defy debugging: You are at a call to IUnknown::QueryInterface, you set a breakpoint on your object's implementation of IUnknown::QueryInterface to see what the problem is, you step over the call and get E_NOINTERFACE back without your breakpoint ever hitting. Why? Because at some point in the past, you said you didn't support the interface, and COM remembered this and "saved you the trouble" of having to respond to a question you already answered. The COM folks tell me that they and their comrades in product support end up spending hours debugging customer's problems like "When my computer is under load, sometimes I start getting E_NOINTERFACE for interfaces I definitely support."

    Save yourself and the COM folks several hours of frustration. Don't return E_NOINTERFACE unless you really mean it.

  • The Old New Thing

    Okay, I changed my mind, I wrote a book after all


    Back in 2003, I wrote that I'm doing this instead of writing a book. That was true then, but last year I decided to give this book thing another go, only to find that publishers generally aren't interested in this stuff any more.

    "Does the world really need another book on Win32? Nobody buys Win32 books any more, that dinosaur!"

    "A conversational style book? People want books with step-by-step how-to's and comprehensive treatments, not water cooler anecdotes!"

    "Just 200 pages? There isn't enough of an audience for a book that small!"

    Luckily, I found a sympathetic ear from the folks at Addison-Wesley Professional who were willing to take a chance on my unorthodox proposal. But I caved on the length, bringing it up to 500 pages. Actually, I came up with more like 700 pages of stuff, and they cut it back to 500, because 700 pages would take the book into the next price tier, and "There isn't enough of an audience for a book that big!"

    Eighteen months later, we have The Old New Thing: Practical Development Throughout the Evolution of Windows, following in what appears to be the current fad of giving your book a title of the form Catchy Phrase: Longer Explanation of What the Catchy Phrase Means.

    It's a selection of entries from this blog, loosely organized, and with new material sprinkled in. There are also new chapters that go in depth into parts of Win32 you use every day but may not fully understand (the dialog manager, window messages), plus a chapter dedicated to Taxes. (For some reason, the Table of Contents on the book web site is incomplete.)

    Oh, and those 200 pages that got cut? They'll be made available for download as "bonus chapters". (The bonus chapters aren't up yet, so don't all rush over there looking for them.)

    The nominal release date for the book is January 2007, which is roughly in agreement with the book web site which proclaims availability on December 29th. Just in time for Christmas your favorite geek, if your favorite geek can't read a calendar.

    Now I get to see how many people were lying when they said, "If you wrote a book based on this blog, I'd buy it."

    (Update: The bonus chapters are now available.)

    (Update: Now available in Japanese! ISBN 978-4756150004.)

    (Update: Now available in Chinese! ISBN 7111219194.)

  • The Old New Thing

    Stop the madness: Subdirectories of My Documents


    As a follow-up to the difference between My Documents and Application Data, I'd like to rant about all the subdirectories of My Documents that programs create because they think they're so cool.

    • Visual Studio Projects
    • My eBooks
    • My Received Files
    • Remote Desktops
    • My Scans
    • My Data Sources
    • My Virtual Machines
    • My Archives

    I'm sure there are more.

    Everything in the My Documents folder the user should be able to point to and say, "I remember creating that file on such-and-such date when I did a 'Save' from Program Q." If it doesn't pass that test, then don't put it into My Documents. Use Application Data.

    And don't create subdirectories off of My Documents. If the user wants to organize their documents into subdirectories, that's their business. You just ask them where they want their documents and let it go at that.

    (Yes, I'm not a fan of My Music, My Videos, and My Pictures, either.)

    Omar Shahine points out that Apple has similar guidelines for the Macintosh. I wonder how well people follow them.

  • The Old New Thing

    Do not write in-process shell extensions in managed code


    Jesse Kaplan, one of the CLR program managers, explains why you shouldn't write in-process shell extensions in managed code. The short version is that doing so introduces a CLR version dependency which may conflict with the CLR version expected by the host process. Remember that shell extensions are injected into all processes that use the shell namespace, either explicitly by calling SHGetDesktopFolder or implicitly by calling a function like SHBrowseForFolder, ShellExecute, or even GetOpenFileName. Since only one version of the CLR can be loaded per process, it becomes a race to see who gets to load the CLR first and establish the version that the process runs, and everybody else who wanted some other version loses.

    Update 2013: Now that version 4 of the .NET Framework supports in-process side-by-side runtimes, is it now okay to write shell extensions in managed code? The answer is still no.

  • The Old New Thing

    Nailing down what constitutes valuable consideration


    Last time, I introduced a friend I called "Bob" for the purposes of this story. At a party earlier this year, I learned second-hand what Bob had been up to more recently.

    The team Bob worked for immediately prior to his retirement gave him a call. "Hi, Bob. We're trying to ship version N+1 of Product X, and we really need your help. I know you're all retired and stuff, and you don't live in the area any more, but you're the only guy who can save us. Could you come out of retirement just for a few months?"

    Bob said, "Okay. This is a favor to you guys since I like you so much."

    When he sat down to sign the paperwork, he took the contract and crossed out the amount of money he would be paid and wrote in its place, "One dollar". Because he wasn't taking this job to get rich. He was doing it as a favor to his old team. He then signed it and returned the contract to the agency.

    The contracting agency was flabbergasted. "You can't do this for just one dollar! That's completely unheard of!" The real reason the agency was so upset is probably that their fee was a percentage of whatever Bob made, and if Bob made only one dollar, they would effectively be doing all the paperwork and getting paid a stick of chewing gum.

    Bob said, "Okay, then, if you want me to get paid 'for real', send me a contract with 'real money'."

    The agency sent him the original contract (before he changed it to "one dollar"), and Bob sent it back, indignant. "I said 'real money'. This amount is an insult."

  • The Old New Thing

    Why do user interface actions tend to occur on the release, not on the press?


    If you pay close attention, you'll notice that most user interface actions tend to occur on the release, not on the press. When you click on a button, the action occurs when the mouse button is released. When you press the Windows key, the Start menu pops up when you release it. When you tap the Alt key, the menu becomes active when you release it. (There are exceptions to this general principle, of course, typing being the most notable one.) Why do most actions wait for the release?

    For one thing, waiting for the completion of a mouse action means that you create the opportunity for the user to cancel it. For example, if you click the mouse while it is over a button (a radio button, push button, or check box), then drag the mouse off the control, the click is cancelled.

    But a more important reason for waiting for the press is to ensure that the press won't get confused with the action itself. For example, suppose you are in mode where objects disappear when the user clicks on them. For example, it might be a customization dialog, with two columns, one showing available objects and another showing objects in use. Clicking on an available object moves it to the list of in-use objects and vice versa. Now, suppose you acted on the click rather than the release. When the mouse button goes down while the mouse is over on an item, you remove it from the list and add it to the opposite list. This moves the items the user clicked on, so that the item beneath the mouse is now some other item that moved into the original item's position. And then the mouse button is released, and you get a WM_LBUTTONUP message for the new item. Now you have two problems: First, the item the user clicked on got a WM_LBUTTONDOWN and no corresponding WM_LBUTTONUP, and second, the new item got a WM_LBUTTONUP with no corresponding WM_LBUTTONDOWN.

    You can also get into a similar situation with the keyboard, though it takes more work. For example, if you display a dialog box while the Alt key is still pressed rather than waiting for the release, the Alt key may autorepeat and end up delivered to the dialog box. This prevents the dialog box from appearing since it's stuck in menu mode that was initiated by the Alt key, and it's is waiting for you to finish your menu operation before it will display itself.

    Now, this type of mismatch situation is not often a problem, but when it does cause a problem, it's typically a pretty nasty one. This is particularly true if you're using some sort of windowless framework that tries to associate mouse and keyboard events with the corresponding windowless objects. When the ups and downs get out of sync, things can get mighty confusing.

    (This entry was posted late because a windstorm knocked out power to the entire Seattle area. My house still doesn't have electricity.)

  • The Old New Thing

    The name WinMain is just a convention


    Although the function WinMain is documented in the Platform SDK, it's not really part of the platform. Rather, WinMain is the conventional name for the user-provided entry point to a Windows program.

    The real entry point is in the C runtime library, which initializes the runtime, runs global constructors, and then calls your WinMain function (or wWinMain if you prefer a Unicode entry point).

  • The Old New Thing

    If you let people read a file, then they can copy it


    Here's a question that floated past my view:

    How do I set the ACLs on a file so users can read it but can't copy it? I can't find a "Copy" access mask that I can deny. If I can't deny copying, I'd at least like to audit it, so I can tell who made a copy of the file.

    There is no "Copy" access mask because copying is not a fundamental file operation. Copying a file is just reading it into memory and then writing it out. Once the bytes come off the disk, the file system has no control any more over what the user does with them.

  • The Old New Thing

    I bet somebody is looking to get a really nice bonus for that feature: Attention


    "I bet somebody is looking to get a really nice bonus for that feature."

    A customer was having trouble with one of their features that scans for resources that their program can use, and, well, the details aren't important. What's important is that their feature ran in the Startup group, and as soon as it found a suitable resource, it displayed a balloon tip: "Resource XYZ has been found. Click here to add it to your resource portfolio."

    We interrupted them right there.

    — Why are you doing this?

    "Oh, it's a great feature. That way, when users run our program, they don't have to go looking for the resources they want to operate with. We already found the resources for them."

    — But why are you doing it even when your program isn't running? The user is busy editing a document or working on a spreadsheet or playing a game. The message you're displaying is out of context: You're telling users about a program they aren't even using.

    "Yeah, but this feature is really important to us. It's crucial in order to remain competitive in our market."

    — The message is not urgent. It's a disruption. Why don't you wait until they launch your program to tell them about the resources you found? That way, the information appears in context: They're using your program, and the program tells them about these new resources.

    "We can't do that! That would be annoying!"

Page 1 of 4 (32 items) 1234