November, 2009

  • The Old New Thing

    Can I talk to that William fellow? He was so helpful


    Today we're going to take a little trip in the wayback machine with the help of my colleague Seth Manheim, who was there when this happened.

    Set the date to November 22, 1989, twenty years ago and one day. Bill Gates is being taken on a guided tour of the product support department's new office building, and during his visit, he asks one of the people manning the phones, "Mind if I take this call?"

    Bill puts on a headset, sits down, and answers the phone. "Hello, this is Microsoft Product Support, William speaking. How can I help you?"

    Bill talks with the customer, collects the details of the problem, searches in the product support Knowledge Base, sifts through the search results, finds the solution, and patiently walks the customer through fixing the problem.

    The customer is thrilled that William was able to fix the problem so quickly, and with such a pleasant attitude. Bill wraps up the call. "And thank you for using Microsoft products."

    At no point did Bill identify himself as anything other than William. The customer had no idea that the product support engineer who took the call was none other than Bill Gates.

    But the story doesn't end there.

    Even though this story took place while most of the support staff were on their lunch break, news travels quickly, and soon everybody in the department knows about The time Bill took a product support call.

    Some time later, the same customer calls back with a follow-up question.

    Hi, I called you folks with a problem with XYZ, and I talked with a nice man named William who straightened it all out. But I have another question. Can I speak with William?

    "Okay, let me see if William is available." The product support engineer brings up the customer's service record and looks at the name of the support engineer who handled the earlier call: billg.

    "Yeah, um, I'm sorry, but William is not available right now. His friends call him Bill, by the way. The person who helped you last time? That was Bill Gates."

    Oh my God.

    While I'm tinkering with the wayback machine, I may as well point you to a story from a few years ago with a similar (but less dramatic) punch line.

  • The Old New Thing

    Hey, is there somebody around to accept this award?


    Back in the late 1990s, some large Internet association conducted a survey in order to bestow awards in categories like Best Web server and Best Web browser, and one of the categories was Best Web authoring tool.

    We didn't find out about this until the organization contacted the Windows team and said, "Hi, we would like to present Microsoft with the award for Best Web authoring tool. Please let us know who the author of Notepad is, so that we can invite them to the award ceremony."

    Yup, Notepad won the award for Best Web authoring tool.

    The mail went out to the team. "Hey, does anybody remember who wrote Notepad?"

    Even a decade ago, the original authorship of Notepad was lost to the mists of time. I think the person who ended up going was the original author of the multi-line edit control, since that's where the guts of Notepad lie.

  • The Old New Thing

    Little-known command line utility: clip


    Windows Vista includes a tiny command line utility called clip. All it does is paste its stdin onto the clipboard.

    dir | clip
    echo hey | clip

    For the opposite direction, I use a little perl script:

    use Win32::Clipboard;
    print Win32::Clipboard::GetText();
  • The Old New Thing

    Where did WIN32_LEAN_AND_MEAN come from?

    Commenter asdf wonders where WIN32_LEAN_AND_MEAN came from.

    The WIN32_LEAN_AND_MEAN symbol was introduced in the Windows 95 time frame as a way to exclude a bunch of Windows header files when you include windows.h. You can take a look at your windows.h file to see which ones they are.

    The symbol was added as part of the transition from 16-bit Windows to 32-bit Windows. The 16-bit windows.h header file didn't include all of those header files, and defining WIN32_LEAN_AND_MEAN brought you back to the 16-bit Windows philosophy of a minimal set of header files for writing a bare-bones Windows program. This appeased the programmers who liked to micro-manage their header files, and it was a big help because, at the time the symbol was introduced, precompiled header files were not in common use. As I recall, on a 50MHz 80486 with 8MB of memory, switching to WIN32_LEAN_AND_MEAN shaved three seconds off the compile time of each C file. When your project consists of 20 C files, that's a whole minute saved right there.

    Moore's Law and precompiled headers have conspired to render the WIN32_LEAN_AND_MEAN symbol relative useless. It doesn't really save you much any more. But at one point, it did.

  • The Old New Thing

    How do I get the command line of another process?


    Win32 doesn't expose a process's command line to other processes. From Win32's point of view, the command line is just a conveniently initialized parameter to the process's startup code, some data copied from the launching process to the new process and forgotten. We'll get back to the Win32 point of view a little later.

    If you look around in WMI, you'll find a Win32_Process object, and lo and behold, it has a CommandLine property. Let's check it out, using the standard WMI application:

    strComputer = "."
    Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")
    Set colItems = objWMIService.ExecQuery("Select * from Win32_Process")
    For Each objItem in colItems
         Wscript.Echo objItem.Name
         Wscript.Echo objItem.CommandLine

    I fully anticipate that half of my readers will stop right there. "Thanks for the script. Bye!" And they won't bother reading the analysis. "Because analysis is boring, and it'll just tell me stuff I don't want to hear. The analysis is going to tell me why this won't work, or why it's a bad idea, and that just cramps my style."

    Remember that from Win32's point of view, the command line is just a string that is copied into the address space of the new process. How the launching process and the new process interpret this string is governed not by rules but by convention.

    What's more, since the string is merely a "preinitialized variable", a process could in principle (and many do in practice, although usually inadvertently) write to the memory that holds the command line, in which case, if you go snooping around for it, you'll see the modified command line. There is no secret hiding place where the kernel keeps the "real original command line," any more than there is a secret hiding place where the C compiler keeps the "real original parameters to a function."

    This is just another manifestation of the principle of not keeping track of information you don't need.

    What does this mean for people who disregard this principle and go after the command line of another process? You have to understand what you are getting is non-authoritative information. In fact, it's worse. It's information the application itself may have changed in order to try to fool you, so don't use it to make important decisions.

  • The Old New Thing

    We found the author of Notepad, sorry you didn't go to the award ceremony


    I've received independent confirmations as to the authorship of Notepad, so I'm inclined to believe it. Sorry you didn't get to go to the award ceremony.

    The original author of Notepad also served as the development manager for Windows 95. His job was to herd the cats that made up the programmers who worked on Windows 95, a job which you can imagine falls into the "not easy" category.

    After Windows 95, he retired from the software industry and became a high school science teacher. At a social event some years later, I met him again and asked about the transition from software development manager to high school science teacher.

    His response: "You'd be surprised how many of the skills transfer."

  • The Old New Thing

    We're using a smart pointer, so we can't possibly be the source of the leak


    A customer reported that there was a leak in the shell, and they included the output from Application Verifier as proof. And yup, the memory that was leaked was in fact allocated by the shell:

    VERIFIER STOP 00000900 : pid 0x3A4: A heap allocation was leaked.
            497D0FC0 : Address of the leaked allocation.
            002DB580 : Adress to the allocation stack trace.
            0D65CFE8 : Address of the owner dll name.
            6F560000 : Base of the owner dll.
    1: kd> du 0D65CFE8
    0d65cfe8  "SHLWAPI.dll"
    1: kd> !heap -p -a 497D0FC0
    1: kd> dps 002DB580

    On the other hand, SHCreateMemStream is an object creation function, so it's natural that the function allocate some memory. The responsibility for freeing the memory belongs to the caller.

    We suggested that the customer appears to have leaked the interface pointer. Perhaps there's a hole where they called AddRef and managed to avoid the matching Release.

    "Oh no," the customer replied, "that's not possible. We call this function in only one place, and we use a smart pointer, so a leak is impossible." The customer was kind enough to include a code snippet and even highlighted the lines that proved they weren't leaking.

    CComPtr<IStream> pMemoryStream;
    CComPtr<IXmlReader> pReader;
    UINT nDepth = 0;
    //Open read-only input stream
    pMemoryStream = ::SHCreateMemStream(utf8Xml, cbUtf8Xml);

    The exercise for today is to identify the irony in the highlighted lines.

    Hint. Answers (and more discussion) tomorrow.

  • The Old New Thing

    Caches are nice, but they confuse memory leak detection tools


    Knowledge Base article 139071 has the technically correct but easily misinterpreted title FIX: OLE Automation BSTR caching will cause memory leak sources in Windows 2000. The title is misleading because it makes you think that Oh, this is a fix for a memory leak in OLE Automation, but that's not what it is.

    The BSTR is the string type used by OLE Automation, and since strings are used a lot, OLE Automation maintains a cache of recently-freed strings which it can re-use when somebody allocates a new one. Caches are nice (though you need to make sure you have a good replacement policy), but they confuse memory leak detection tools, because the memory leak detection tool will not be able to match up the allocator with the deallocator. What the memory leak detection tool sees is not the creation and freeing of strings but rather the allocation and deallocation of memory. And if there is a string cache (say, of just one entry, for simplicity), what the memory leak detection tool sees is only a part of the real story.

    • Program (line 1): Creates string 1.
    • String manager: Allocates memory block A for string 1.
    • Program (line 2): Frees string 1.
    • String manager: Puts memory block A into cache.
    • Program (line 3): Creates string 2.
    • String manager: Re-uses memory block A for string 2.
    • Program (line 4): Creates string 3.
    • String manager: Allocates memory block B for string 3.
    • Program (line 5): Frees string 3.
    • String manager: Puts memory block B into cache.
    • Program (line 6): Frees string 2.
    • String manager: Deallocates memory block A since there is no room in the cache.

    Your program sees only the lines marked Program:, and the memory leak detection tool sees only the underlined part. As a result, the memory leak detection tool sees a warped view of the program's string usage:

    • Line 1 of your program allocates memory block A.
    • Line 4 of your program allocates memory block B.
    • Line 6 of your program deallocates memory block A.

    Notice that the memory leak detection tool thinks that line 6 freed the memory allocated by line 1, even though the two lines of the program are unrelated. Line 6 is freeing string 2, and line 1 is creating string 1!

    Notice also that the memory leak detection tool will report a memory leak, because it sees that you allocated two memory blocks but deallocated only one of them. The memory leak detection tool will say, "Memory allocated at line 4 is never freed." And you stare at line 4 of your program and insist that the memory leak detection tool is on crack because there, you freed it right at the very next line! You chalk this up as "Stupid memory leak detection tool, it has all these useless false positives."

    Even worse: Suppose somebody deletes line 6 of your program, thereby introducing a genuine memory leak. Now the memory leak detection tool will report two leaks:

    • Memory allocated at line 1 is never freed.
    • Memory allocated at line 4 is never freed.

    You already marked the second report as bogus during your last round of investigation. Now you look at the first report, and decide that it too is bogus; I mean look, we free the string right there at line 2!

    Result: A memory leak is introduced, the memory leak detection tool finds it, but you discard it as another bug in the memory leak detection tool.

    When you're doing memory leak detection, it helps to disable your caches. That way, the high-level object creation and destruction performed in your program maps more directly to the low-level memory allocation and deallocation functions tracked by the memory leak detection tool. In our example, if there were no cache, then every Create string would map directly to an Allocate memory call, and every Free string would map directly to a Deallocate memory call.

    What KB article 139071 is trying to say is FIX: OLE Automation BSTR cache cannot be disabled in Windows 2000. Windows XP already contains support for the OANOCACHE environment variable, which disables the BSTR cache so you can investigate those BSTR leaks more effectively. The hotfix adds support for OANOCACHE to Windows 2000.

    Bonus chatter: Why do we have BSTR anyway? Why not just use null-terminated strings everywhere?

    The BSTR data type was introduced by Visual Basic. They couldn't use null-terminated strings because Basic permits nulls to be embedded in strings. Whereas Win32 is based on the K&R C  way of doing things, OLE automation is based on the Basic way of doing things.

  • The Old New Thing

    Stories of anticipating dead computers: Windows Home Server


    Like most geeks, I have a bit of history with dead computers. In the past, I used the "wait until it breaks, and then panic" model, but recently I've begun being a bit more anticipatory, like replacing an old laptop before it actually expires.

    Anticipating another future dead computer, I bought an external USB hard drive for backing up important files, but upon reading the description on the box, I started to have second thoughts. It came with its own backup software that reportedly installed automatically when you plugged in the drive (!). I didn't want that; I just wanted a boring USB hard drive.

    One of my friends (who used to work with USB devices) cautioned me: "Those things are evil. Some of them enumerate as a keyboard and 'type in' a device driver so they can own your machine even if you have autorun disabled." Wow, that's a level of craziness I previously had not been aware of.

    Upon further discussion, I was convinced to return the external hard drive unopened and instead get a copy of Windows Home Server. I went for the Acer Aspire EasyStore H340 instead of trying to build my own reduced-footprint low-power quiet-fan computer. And amazingly, the EasyStore comes with only two pieces of shovelware, the excellent LightsOut add-in, which I kept, and some annoying trialware, which was easily uninstalled.

    I felt kind of weird getting a Home Server since I have only one home computer of consequence, so I'd basically have a one-computer network. (I do have that laptop, but I'm careful not to keep anything on it that isn't already backed up somewhere else.) And because the Home Server would easily be the most powerful computer in the house, even though all it does is sit there doing nothing most of the time. But the convenience is hard to beat. It just sits there quietly and does its job of backing up the other computer every night. (And seeing as I had the machine anyway, I also have it back up my laptop, even though there's nothing really important on it. Most nights, the laptop backup takes only five minutes. And just because I can, I even back up the old laptop that doesn't even do anything any more aside from surf the Internet!)

    Of course, the first thing you do with a new gadget is tinker with it, and I installed Whiist and created a photo album. It was so easy to do, I feel like I'm losing my geek cred. I mean, this sort of thing is supposed to involve hours of staring at the screen, scouring the Internet for information, and groveling through hundreds of settings trying to get things working. If anybody can get a home server up and running with automatic nightly backups and an online photo album by just clicking on some fluffy GUI buttons, then what will I have to feel superior about?

    I'm kidding. My hat's off to the legendary Charlie Kindel and the Windows Home Server team They hit this one out of the park. It's an awesome product.

    Now that backing up is so painless, it has set a new baseline behavior: Now, I feel kind of uneasy making large-scale changes to files on my home computer unless I have a complete backup. (Backups are the reason I bought the server. All the other features, like the photo album, are just gravy.)

    And yes, every few weeks, I restore a randomly-selected file from backup just to make sure the backups are working.

    FTC disclaimer: Although Windows Home Server is a product of Microsoft Corporation (my employer), no compensation was tied to this review. (I didn't even get an employee discount.) I'm just a happy customer.

  • The Old New Thing

    You thought reasoning about signals was bad, reasoning about a total breakdown of normal functioning is even worse


    A customer came to the Windows team with a question, the sort of question which on its face seems somewhat strange, which is itself a sign that the question is merely the tip of a much more dangerous iceberg.

    Under what circumstances will the GetEnvironmentVariable function hang?

    This is kind of an open-ended question. I mean, for example, somebody might sneak in and call SuspendThread on your thread while GetEnvironmentVariable is running, which will look like a hang because the call never completes because the thread is frozen.

    But the real question for the customer is, "What sort of problem are you seeing that is manifesting itself in an apparent hang in the GetEnvironmentVariable function?"

    The customer was kind enough to elaborate.

    We have a global unhandled exception filter in our application so we can log all failures. After we finish logging, we call ExitProcess, but we find that the application never actually exits. If we connect a debugger to the stuck application, we see it hung in GetEnvironmentVariable.

    Your gut response should be, "Holy cow, I'm surprised you even got that far!"

    This isn't one of those global unhandled exception filters that got installed because your program plays some really clever game with exceptions, No, this is an "Oh no, my program just crashed and I want to log it" exception handler. In other words, when this exception handler "handles" an exception, it's because your program has encountered some sort of serious internal programming error for which the program did not know how to recover. We saw earlier that you can't do much in a signal handler because you might have interrupted a block of code which was in the middle of updating some data structures, leaving them momentarily inconsistent. But this exception filter is in an even worse state: Not only is there a good chance that the program is in the middle of updating something and left it in an inconsistent state, you are in fact guaranteed that the system is in a corrupted state.

    Why is this a guarantee? Because if the system were in a consistent state, you wouldn't have crashed!

    Programming is about establishing invariants, perturbing them, and then re-establishing them. It is a game of stepping-stone from one island of consistency to another. But the code that does the perturbing and the re-establishing assumes that it's starting from a consistent state to begin with. For example, a function that removes a node from a doubly-linked list manipulates some backward and forward link pointers (temporarily violating the linked list invariant), and then when it's finished, the linked list is back to a consistent state. But this code assumes that the linked list is not corrupted to begin with!

    Let's look again at that call to ExitProcess. That's going to detach all the DLLs, calling each DLL's DllMain with the DLL_PROCESS_DETACH notification. But of course, those DllMain are going to assume that the data structures are intact and nothing is corrupted. On the other hand, you know for a fact that these prerequisites are not met—the program crashed precisely because something is corrupted. One DLL might walk a linked list—but you might have crashed because that linked list is corrupted. Another DLL might try to delete a critical section—but you might have crashed because the data structure containing the critical section is corrupted.

    Heck, the crash might have been inside somebody's DLL_PROCESS_DETACH handler to begin with, for all you know.

    "Yeah, but the documentation for TerminateProcess says that it does not clean up shared memory."

    Well, it depends on what you mean by clean up. The reference count on the shared memory is properly decremented when the handle is automatically closed as part of process cleanup, and the shared memory will be properly freed once there are no more references to it. It is not cleaned up in the sense of "corruption is repaired"—but of course the operating system can't do that because it doesn't know what the semantics of your shared memory block are.

    But this is hardly anything to get concerned about because your program doesn't know how to un-corrupt the data either.

    "It also says that DLLs don't receive their DLL_PROCESS_DETACH notification."

    As we saw before, this is a good thing in the case of a corrupted process, because the code that runs in DLL_PROCESS_DETACH assumes that your process has not been corrupted in the first place. There's no point running it when you know the process is corrupted. You're just making a bad situation worse.

    "It also says that I/O will be in an indeterminate state."

    Well yeah, but that's no worse than what you have now, which is that your I/O is in an indeterminate state. You don't know what buffers your process hasn't flushed, but since your process is corrupted, you have no way of finding out anyway.

    "Are you seriously recommending that I use TerminateProcess to exit the last chance exception handler?!?"

    Your process is unrecoverably corrupted. (This is a fact, because if there were a way to recover from it, you would have done it instead of crashing.) What other options are there?

    Quit while you're behind.

Page 1 of 4 (34 items) 1234