November, 2009

  • The Old New Thing

    The magic of chocolate milk

    • 19 Comments

    While enjoying a meal with my nieces (at the time, ages 3 and 5), I diluted my chocolate milk to cut the sweetness. The nieces then demanded that I dilute their chocolate milk as well, because as far as they could determine, it was a magical way to create more chocolate milk.

  • The Old New Thing

    When you want to copy a file into a folder, make sure you have a folder

    • 19 Comments

    This story is inspired by an actual customer problem.

    The program LitWare.exe is used for TPS management, and when you want to create a new TPS report, you have to pick a cover sheet. The program shows you the cover sheets that have been defined, which it loads from the C:\TPS Cover Sheets directory.

    The customer found that on one of the machines, the cover sheets weren't showing up, even though the standard system setup copies a sample cover sheet into the C:\TPS Cover Sheets directory. The error message they got was Cannot load cover sheets. The directory name is invalid.

    The customer did some troubleshooting and determined that "The cover sheet directory is missing, and we have a file instead."

    C:\>dir
     Volume in drive C is INITECH
     Volume Serial Number is BAAD-F00D
    
     Directory of C:\
    
    09/18/2006  02:43 PM                24 autoexec.bat
    09/18/2006  02:43 PM                10 config.sys
    03/18/2009  10:30 AM    <DIR>          Program Files
    11/21/2008  01:04 PM             1,677 TPS Cover Sheets
    02/20/2008  10:39 AM    <DIR>          Users
    05/29/2009  02:23 PM    <DIR>          Windows
                   2 File(s)          1,711 bytes
                   3 Dir(s)  229,031,751,680 bytes free
    

    One of my colleagues employed psychic powers to determine that at the time the customer tried to install the sample cover sheet on the machine, the C:\TPS Cover Sheets directory did not yet exist, and that the batch file they used to set up a new computer just does a copy \\server\TPSConfig\Sample.tps "C:\TPS Cover Sheets", which results in a file being created with the name C:\TPS Cover Sheets.

    The customer was surprised by this conclusion. "I would think that copy will fail if the C:\TPS Cover Sheets directory doesn't exist, but this might be our problem. We'll look into it." (I guess this customer never used the copy command to copy a file to a new name.)

    If the destination of a copy command exists and is a directory, then the source files are copied into that directory. If the destination of a copy command does not exists or if it exists and is a file, then the destination is treated as a file name for the destination. (If there is more than one source file, then they are concatenated as if they were text files.)

    The customer went back and checked the scripts, and the line they used was almost exactly what my colleague predicted:

    copy "\\INITECH\Defaults\Sample cover sheet.tps" "C:\TPS Cover Sheets" /Y
    

    If the C:\TPS Cover Sheets directory hasn't been created yet, then that would explain the behavior they're seeing: The copy command sees that the destination doesn't exist and assumes you are doing a file-to-file copy (as opposed to a file-to-directory copy). In this case, the problem was that copying a sample cover sheet was a step they added to their setup scripts, but they added it before the step that creates the cover sheet directory. Reordering the two steps fixed the problem.

  • The Old New Thing

    I want to take all your chocolate milk

    • 17 Comments

    My older niece visited me at work one day, and I got her a carton of chocolate milk, which she very much enjoyed. Some days later, she told me, "I want to go to your work."

    "Why?" I asked.

    "I want to take all your chocolate milk."

    Missing from the story is that upon returning home after that first visit, she told everybody about her awesome visit with her uncle, and that he even got her a chocolate milk from the refrigerator. "And the chocolate milk is free, you can just take it!"

    Her uncle (not me, a different uncle) told her, "Then you should go there with a knapsack and take all the chocolate milk."

    That uncle is clearly a troublemaker. I'll have to keep an eye on him.

  • The Old New Thing

    You thought reasoning about signals was bad, reasoning about a total breakdown of normal functioning is even worse

    • 17 Comments

    A customer came to the Windows team with a question, the sort of question which on its face seems somewhat strange, which is itself a sign that the question is merely the tip of a much more dangerous iceberg.

    Under what circumstances will the GetEnvironmentVariable function hang?

    This is kind of an open-ended question. I mean, for example, somebody might sneak in and call SuspendThread on your thread while GetEnvironmentVariable is running, which will look like a hang because the call never completes because the thread is frozen.

    But the real question for the customer is, "What sort of problem are you seeing that is manifesting itself in an apparent hang in the GetEnvironmentVariable function?"

    The customer was kind enough to elaborate.

    We have a global unhandled exception filter in our application so we can log all failures. After we finish logging, we call ExitProcess, but we find that the application never actually exits. If we connect a debugger to the stuck application, we see it hung in GetEnvironmentVariable.

    Your gut response should be, "Holy cow, I'm surprised you even got that far!"

    This isn't one of those global unhandled exception filters that got installed because your program plays some really clever game with exceptions, No, this is an "Oh no, my program just crashed and I want to log it" exception handler. In other words, when this exception handler "handles" an exception, it's because your program has encountered some sort of serious internal programming error for which the program did not know how to recover. We saw earlier that you can't do much in a signal handler because you might have interrupted a block of code which was in the middle of updating some data structures, leaving them momentarily inconsistent. But this exception filter is in an even worse state: Not only is there a good chance that the program is in the middle of updating something and left it in an inconsistent state, you are in fact guaranteed that the system is in a corrupted state.

    Why is this a guarantee? Because if the system were in a consistent state, you wouldn't have crashed!

    Programming is about establishing invariants, perturbing them, and then re-establishing them. It is a game of stepping-stone from one island of consistency to another. But the code that does the perturbing and the re-establishing assumes that it's starting from a consistent state to begin with. For example, a function that removes a node from a doubly-linked list manipulates some backward and forward link pointers (temporarily violating the linked list invariant), and then when it's finished, the linked list is back to a consistent state. But this code assumes that the linked list is not corrupted to begin with!

    Let's look again at that call to ExitProcess. That's going to detach all the DLLs, calling each DLL's DllMain with the DLL_PROCESS_DETACH notification. But of course, those DllMain are going to assume that the data structures are intact and nothing is corrupted. On the other hand, you know for a fact that these prerequisites are not met—the program crashed precisely because something is corrupted. One DLL might walk a linked list—but you might have crashed because that linked list is corrupted. Another DLL might try to delete a critical section—but you might have crashed because the data structure containing the critical section is corrupted.

    Heck, the crash might have been inside somebody's DLL_PROCESS_DETACH handler to begin with, for all you know.

    "Yeah, but the documentation for TerminateProcess says that it does not clean up shared memory."

    Well, it depends on what you mean by clean up. The reference count on the shared memory is properly decremented when the handle is automatically closed as part of process cleanup, and the shared memory will be properly freed once there are no more references to it. It is not cleaned up in the sense of "corruption is repaired"—but of course the operating system can't do that because it doesn't know what the semantics of your shared memory block are.

    But this is hardly anything to get concerned about because your program doesn't know how to un-corrupt the data either.

    "It also says that DLLs don't receive their DLL_PROCESS_DETACH notification."

    As we saw before, this is a good thing in the case of a corrupted process, because the code that runs in DLL_PROCESS_DETACH assumes that your process has not been corrupted in the first place. There's no point running it when you know the process is corrupted. You're just making a bad situation worse.

    "It also says that I/O will be in an indeterminate state."

    Well yeah, but that's no worse than what you have now, which is that your I/O is in an indeterminate state. You don't know what buffers your process hasn't flushed, but since your process is corrupted, you have no way of finding out anyway.

    "Are you seriously recommending that I use TerminateProcess to exit the last chance exception handler?!?"

    Your process is unrecoverably corrupted. (This is a fact, because if there were a way to recover from it, you would have done it instead of crashing.) What other options are there?

    Quit while you're behind.

  • The Old New Thing

    Caches are nice, but they confuse memory leak detection tools

    • 16 Comments

    Knowledge Base article 139071 has the technically correct but easily misinterpreted title FIX: OLE Automation BSTR caching will cause memory leak sources in Windows 2000. The title is misleading because it makes you think that Oh, this is a fix for a memory leak in OLE Automation, but that's not what it is.

    The BSTR is the string type used by OLE Automation, and since strings are used a lot, OLE Automation maintains a cache of recently-freed strings which it can re-use when somebody allocates a new one. Caches are nice (though you need to make sure you have a good replacement policy), but they confuse memory leak detection tools, because the memory leak detection tool will not be able to match up the allocator with the deallocator. What the memory leak detection tool sees is not the creation and freeing of strings but rather the allocation and deallocation of memory. And if there is a string cache (say, of just one entry, for simplicity), what the memory leak detection tool sees is only a part of the real story.

    • Program (line 1): Creates string 1.
    • String manager: Allocates memory block A for string 1.
    • Program (line 2): Frees string 1.
    • String manager: Puts memory block A into cache.
    • Program (line 3): Creates string 2.
    • String manager: Re-uses memory block A for string 2.
    • Program (line 4): Creates string 3.
    • String manager: Allocates memory block B for string 3.
    • Program (line 5): Frees string 3.
    • String manager: Puts memory block B into cache.
    • Program (line 6): Frees string 2.
    • String manager: Deallocates memory block A since there is no room in the cache.

    Your program sees only the lines marked Program:, and the memory leak detection tool sees only the underlined part. As a result, the memory leak detection tool sees a warped view of the program's string usage:

    • Line 1 of your program allocates memory block A.
    • Line 4 of your program allocates memory block B.
    • Line 6 of your program deallocates memory block A.

    Notice that the memory leak detection tool thinks that line 6 freed the memory allocated by line 1, even though the two lines of the program are unrelated. Line 6 is freeing string 2, and line 1 is creating string 1!

    Notice also that the memory leak detection tool will report a memory leak, because it sees that you allocated two memory blocks but deallocated only one of them. The memory leak detection tool will say, "Memory allocated at line 4 is never freed." And you stare at line 4 of your program and insist that the memory leak detection tool is on crack because there, you freed it right at the very next line! You chalk this up as "Stupid memory leak detection tool, it has all these useless false positives."

    Even worse: Suppose somebody deletes line 6 of your program, thereby introducing a genuine memory leak. Now the memory leak detection tool will report two leaks:

    • Memory allocated at line 1 is never freed.
    • Memory allocated at line 4 is never freed.

    You already marked the second report as bogus during your last round of investigation. Now you look at the first report, and decide that it too is bogus; I mean look, we free the string right there at line 2!

    Result: A memory leak is introduced, the memory leak detection tool finds it, but you discard it as another bug in the memory leak detection tool.

    When you're doing memory leak detection, it helps to disable your caches. That way, the high-level object creation and destruction performed in your program maps more directly to the low-level memory allocation and deallocation functions tracked by the memory leak detection tool. In our example, if there were no cache, then every Create string would map directly to an Allocate memory call, and every Free string would map directly to a Deallocate memory call.

    What KB article 139071 is trying to say is FIX: OLE Automation BSTR cache cannot be disabled in Windows 2000. Windows XP already contains support for the OANOCACHE environment variable, which disables the BSTR cache so you can investigate those BSTR leaks more effectively. The hotfix adds support for OANOCACHE to Windows 2000.

    Bonus chatter: Why do we have BSTR anyway? Why not just use null-terminated strings everywhere?

    The BSTR data type was introduced by Visual Basic. They couldn't use null-terminated strings because Basic permits nulls to be embedded in strings. Whereas Win32 is based on the K&R C  way of doing things, OLE automation is based on the Basic way of doing things.

  • The Old New Thing

    The difference between assignment and attachment with ATL smart pointers

    • 15 Comments

    Last time, I presented a puzzle regarding a memory leak. Here's the relevant code fragment:

    CComPtr<IStream> pMemoryStream;
    CComPtr<IXmlReader> pReader;
    UINT nDepth = 0;
    
    //Open read-only input stream
    pMemoryStream = ::SHCreateMemStream(utf8Xml, cbUtf8Xml);
    

    The problem here is assigning the return value of SHCreateMemStream to a smart pointer instead of attaching it.

    The SHCreateMemStream function creates a memory stream and returns a pointer to it. That pointer has a reference count of one, in accordance with COM rules that a function which produces a reference calls AddRef, and the responsibility is placed upon the recipient to call Release. The assignment operator for CComPtr<T> is a copy operation: It AddRefs the pointer and saves it. You're still on the hook for the reference count of the original pointer.

    ATLINLINE ATLAPI_(IUnknown*) AtlComPtrAssign(IUnknown** pp, IUnknown* lp)
    {
            if (lp != NULL)
                    lp->AddRef();
            if (*pp)
                    (*pp)->Release();
            *pp = lp;
            return lp;
    }
    
    template <class T>
    class CComPtr
    {
    public:
            ...
    
            T* operator=(T* lp)
            {
                    return (T*)AtlComPtrAssign((IUnknown**)&p, lp);
            }
    

    Observe that assigning a T* to a CComPtr<T> AddRefs the incoming pointer and Releases the old pointer (if any). When the CComPtr<T> is destructed, it will release the pointer, undoing the AddRef that was performed by the assignment operator. In other words, assignment followed by destruction has a net effect of zero on the pointer you assigned. The operation behaves like a copy.

    Another way of putting a pointer into a CComPtr<T> is with the Attach operator. This is a transfer operation:

            void Attach(T* p2)
            {
                    if (p)
                            p->Release();
                    p = p2;
            }
    

    Observe that there is no AddRef here. When the CComPtr<T> is destructed, it will perform the Release, which doesn't undo any operation performed by the Attach. Instead, it releases the reference count held by the original pointer you attached.

    Let's put this in a table, since people seem to like tables:

    Operation Behavior Semantics
    Attach() Takes ownership Transfer semantics
    operator=() Creates a new reference Copy semantics

    You use the Attach method when you want to assume responsibility for releasing the pointer (ownership transfer). You use the assignment operator when you want the original pointer to continue to be responsible for its own release (no ownership transfer).

    There is also a Detach method which is the opposite of Attach: Detaching a pointer from the CComPtr<T> means "I am taking over responsibility for releasing this pointer." The CComPtr<T> gives you its pointer and then forgets about it; you're now on your own.

    The memory leak in the code fragment above occurs because the assignment operator has copy semantics, but we wanted transfer semantics, since we want the smart pointer to take the responsibility for releasing the pointer when it is destructed.

    pMemoryStream.Attach(::SHCreateMemStream(utf8Xml, cbUtf8Xml));
    

    The CComPtr<T>::operator=(T*) method is definitely one of the more dangerous methods in the CComPtr<T> repertoire, because it's so easy to assign a pointer to a smart pointer without giving it a moment's thought. (Another dangerous method is the T** CComPtr<T>::operator&(), but at least that has an assertion to try to catch the bad usages. Even nastier is the secret QI'ing assignment operator.) I have to say that there is merit to Ben Hutchings' recommendation simply not to allow a simple pointer to be assigned to a smart pointer, precisely because the semantics are easily misunderstood. (The boost library, for example, follows Ben's recommendation.)

    Here's another exercise based on what you've learned:

    Application Verifier told us that we have a memory leak, and we traced it back to the function GetTextAsInteger.

    BSTR GetInnerText(IXMLDOMNode *node)
    {
        BSTR bstrText = NULL;
        node->get_text(&bstrText);
        return bstrText;
    }
    
    DWORD GetTextAsInteger(IXMLDOMNode *node)
    {
        DWORD value = 0;
    
        CComVariant innerText = GetInnerText(node);
        hr = VariantChangeType(&innerText, &innerText, 0, VT_UI4);
        if (SUCCEEDED(hr))
        {
            value = V_UI4(&innerText);
        }
    
        return value;
    }
    

    Obviously, the problem is that we passed the same input and output pointers to VariantChangeType, causing the output integer to overwrite the input BSTR, resulting in the leak of the BSTR. But when we fixed the function, we still got the leak:

    DWORD GetTextAsInteger(IXMLDOMNode *node)
    {
        DWORD value = 0;
    
        CComVariant innerText = GetInnerText(node);
        CComVariant textAsValue;
        hr = VariantChangeType(&innerText, &textAsValue, 0, VT_UI4);
        if (SUCCEEDED(hr))
        {
            value = V_UI4(&textAsValue);
        }
    
        return value;
    }
    

    Is there a leak in the VariantChangeType function itself?

    Hint: It is in fact explicitly documented that the output parameter to VariantChangeType can be equal to the input parameter, which results in an in-place conversion. There was nothing wrong with the original call to VariantChangeType.

  • The Old New Thing

    News flash: Healthy people live longer

    • 15 Comments

    Researchers have determined that people in good physical condition live longer. Who'd'a thunk it?

  • The Old New Thing

    When asked to choose among multiple options, the politician will pick all of them

    • 14 Comments

    During the run-up to a local election some time ago, the newspaper posed the same set of questions to each of the candidates and published the responses in a grid format so the readers could easily compare them.

    The candidates agreed on some issues, had opposing positions on others, but the question whose answers struck me was one of the form "If budget cuts forced you to eliminate one of the following four programs, which would you cut?"

    • Candidate 1: "I have no intention of letting our budget get into a situation in which this would become an issue. All of these programs are very important to our community, and under my leadership, they will continue to be funded."
    • Candidate 2: "I don't believe we need to eliminate any of these popular programs. If we review our financial situation, we will find that we can continue to provide for all of them."
    • Candidate 3: "Much as I personally enjoy Program X, it ranks as a lower priority to me than the other options. Program X was originally a community-run program, and I would encourage residents and the business community to step forward and keep alive this program which has greatly benefited our community over the years."

    Notice that the first two candidates, when asked to make a tough decision, opted to make no decision at all. (Compare another election in which the mainstream candidates rated everything as high priority.) The first candidate said, "This would never happen." The second candidate said, "It's not happening." The third candidate is the only one who sat down and made the call to cut one of the programs. The first two were playing politics, afraid to make a decision for fear that it would alienate some portion of the electorate. The third understood the situation and made the hard decision.

    I voted for the third candidate.

    Today is Election Day in the United States. Don't forget to vote. (Void where prohibited.)

  • The Old New Thing

    How do I create a toolbar that sits in the taskbar?

    • 12 Comments

    Commenter Nick asks, "How would you go about creating a special toolbar to sit on the taskbar like the Windows Media Player 10 minimised toolbar?"

    You would look at the DeskBand API SDK Sample in the Windows Platform SDK.

    The magic word is DeskBand. This MSDN page has an overview.

    Bonus chatter: I've seen some online speculation as to whether a DeskBand counts as a shell extension, because of the guidance against writing shell extensions in managed code. As with all guidance, you need to understand the rationale behind the guidance so you can apply the guidance intelligently instead of merely following it blindly off a cliff. Summarizing the rationale: Since only one version of the CLR can exist in a process, any shell extension which runs inside the host process which uses the CLR may inject a version of the CLR that conflicts with the version of the CLR the host process (or some other component in the host process) wants to use. Now that you understand the reason, you also can answer the question, "Is a DeskBand a shell extension (for the purpose of this guidance)?" Yes, because DeskBands (like all other COM objects registered as in-process servers) run inside the host process.

    As another example of how understanding the rationale behind guidance lets you know when the guidance no longer applies: In the time since the original guidance was developed, the CLR team came up with a way to run multiple versions of the CLR inside a single process (for specific values of "multiple"). Therefore, if you use one of those "I won't conflict with other versions of the CLR inside the same process" versions, then you can see that the rationale behind the guidance no longer applies.

  • The Old New Thing

    What a drag: You can be a drag in managed code, too

    • 12 Comments

    David Anson digests my earlier series on virtual drag/drop and translates it into managed code. His example of dragging his entire RSS feed is an excellent illustration of dragging dynamically-generated virtual content. (I didn't use an example like that because the purpose of the What a drag series was to get something done in the least amount of code, and generating a stream from a URL takes an awful lot of code when doing it from the unmanaged side, which would ultimately detract from the point of the example.)

    Bonus: He takes the example further by adding asynchronous support.

Page 3 of 4 (34 items) 1234