• The Old New Thing

    The time one of my colleagues debugged a line-of-business application for a package delivery service

    • 41 Comments

    Back in the days of Windows 95 development, one of my colleagues debugged a line-of-business application for a major delivery service. This was a program that the company gave to its top-tier high-volume customers, so that they could place and track their orders directly. And by directly, I mean that the program dialed the modem (since that was how computers communicated with each other back then) to contact the delivery service's mainframe (it was all mainframes back then) and upload the new orders and download the status of existing orders.¹

    Version 1.0 of the application had a notorious bug: Ninety days after you installed the program, it stopped working. They forgot to remove the beta expiration code. I guess that's why they have a version 1.01.

    Anyway, the bug that my colleague investigated was that if you entered a particular type of order with a particular set of options in a particular way, then the application crashed your system. Setting up a copy of the application in order to replicate the problem was itself a bit of an ordeal, but that's a whole different story.

    Okay, the program is set up, and yup, it crashes exactly as described when run on Windows 95. Actually, it also crashes exactly as described when run on Windows 3.1. This is just plain an application bug.

    Here's why it crashed: After the program dials up the mainframe to submit the order, it tries to refresh the list of orders that have yet to be delivered. The code that does this assumes that the list of undelivered orders is the control with focus. But if you ask for labels to be printed, then the printing code changes focus in order to display the "Please place the label on the package exactly like this" dialog, and as a result, the refresh code can't find the undelivered order list and crashes on a null pointer. (I'm totally making this up, by the way. The details of the scenario aren't important to the story.)

    Okay, well, that's no big deal. A null pointer fault should just put up the Unrecoverable Application Error dialog box and close the program. Why does this particular null pointer fault crash the entire system?

    The developers of the program saw that their refresh code sometimes crashed on a null pointer, and instead of fixing it by actually fixing the code so it could find the list of undelivered orders even if it didn't have focus, or fixing it by adding a null pointer check, they fixed it by adding a null pointer exception handler. (I wish to commend myself for resisting the urge to put the word fixed in quotation marks in that last sentence.)

    Now, 16-bit Windows didn't have structured exception handling. The only type of exception handler was a global exception handler, and this wasn't just global to the process. This was global to the entire system. Your exception handler was called for every exception everywhere. If you screwed it up, you screwed up the entire system. (I think you can see where this is going.)

    The developers of the program converted their global exception handler to a local one by going to every function that had a "We seem to crash on a null pointer and I don't know why" bug and making these changes:

    extern jmp_buf caught;
    extern BOOL trapExceptions;
    
    void scaryFunction(...)
    {
     if (setjmp(&caught)) return;
     trapExceptions = TRUE;
     ... body of function ...
     trapExceptions = FALSE;
    }
    

    Their global exception handler checks the trapExceptions global variable, and if it is TRUE, they set it back to FALSE and do a longjmp which sends control back to the start of the function, which detects that something bad must have happened and just returns out of the function.

    Yes, things are kind of messed up as a result of this. Yes, there is a memory leak. But at least their application didn't crash.

    On the other hand, if the global variable is FALSE, because their application crashed in some other function that didn't have this special protection, or because some other totally unrelated application crashed, the global exception handler decided to exit the application by running around freeing all the DLLs and memory associated with their application.

    Okay, so far so good, for certain values of good.

    These system-wide exception handlers had to be written in assembly code because they were dispatched with a very strange calling convention. But the developers of this application didn't write their system-wide exception handler in assembly language. Their application was written in MFC, so they just went to Visual C++ (as it was then known), clicked through some Add a Windows hook wizard, and got some generic HOOKPROC. (I don't know if Visual C++ actually had an Add a Windows hook wizard; they could just have copied the code from somewhere.) Nevermind that these system-wide exception handlers are not HOOKPROCs, so the function has the wrong prototype. What's more, the code they used marked the hook function as __loadds. This means that the function saves the previous value of the DS register on entry, then changes the register to point to the application's data, and on exit, the function restores the previous value of DS.

    Okay, now we're about to enter the set piece at the end of the movie: Our hero's fear of spiders, his girlfriend's bad ankle from an old soccer injury, the executive toy on the villain's desk, and all the other tiny little clues dropped in the previous ninety minutes come together to form an enormous chain reaction.

    The application crashes on a null pointer. The system-wide custom exception handler is called. The crash is not one that is being protected by the global variable, so the custom exception handler frees the application from memory. The system-wide custom exception handler now returns, but wait, what is it returning to?

    The crash was in the application, which means that the DS register it saved on entry to the custom exception handler points to the application's data. The custom exception handler freed the application's data and then returned, declaring the exception handled. As the function exited, it tried to restore the original DS register, but the CPU said, "Nice try, but that is not a valid value for the DS register (because you freed it)." The CPU reported this error by (dramatic pause) raising an exception.

    That's right, the system-wide custom exception handler crashed with an exception.

    Okay, things start snowballing. This is the part of the movie where the director uses quick cuts between different locations, maybe with a little slow motion thrown in.

    Since an exception was raised, the custom exception handler is called recursively. Each time through the recursion, the custom exception handler frees all the DLLs and memory associated with the application. But that's okay, right? Because the second and subsequent times, the memory was already freed, so the attempts to free them again will just fail with an invalid parameter error.

    But wait, their list of DLLs associated with the application included USER, GDI, and KERNEL. Now, Windows is perfectly capable of unloading dependent DLLs when you unload the main DLL, so when they unloaded their main program, the kernel already decremented the usage count on USER, GDI, and KERNEL automatically. But they apparently didn't trust Windows to do this, because after all, it was Windows that was causing their application to crash, so they took it upon themselves to free those DLLs manually.

    Therefore, each time through the loop, the usage counts for USER, GDI, and KERNEL drop by one. Zoom in on the countdown clock on the ticking time bomb.

    Beep beep beep beep beep. The reference count finally drops to zero. The window manager, the graphics subsystem, and the kernel itself have all been unloaded from memory. There's nothing left to run the show!

    Boom, bluescreen. Hot flaming death.

    The punch line to all this is that whenever you call the company's product support line and describe a problem you encountered, their response is always, "Yeah, we're really sorry about that one."

    Bonus chatter: What is that whole different story mentioned near the top?

    Well, when the delivery service sent the latest version of the software to the Windows 95 team, they also provided an account number to use. My colleague used that account number to try to reproduce the problem, and since the problem occurred only after the order was submitted, she would have to submit delivery requests, say for a letter to be picked up from 221B Baker Street and delivered to 62 West Wallaby Street, or maybe for a 100-pound package of radioactive material to be picked up from 1600 Pennsylvania Avenue and delivered to 10 Downing Street.

    After about two weeks of this, my colleague got a phone call from Microsoft's shipping department. "What the heck are you doing?"

    It turns out that the account number my colleague was given was Microsoft's own corporate account number. As in a real live account. She was inadvertently prank-calling the delivery company and sending actual trucks all over the country to pick up nonexistent letters and packages. Microsoft's shipping department and people from the delivery service's headquarters were frantic trying to trace where all the bogus orders were coming from.

    ¹ Mind you, this sort of thing is the stuff that average Joe customers can do while still in their pajamas, but back in those days, it was a feature that only top-tier customers had access to, because, y'know, mainframe.

  • The Old New Thing

    Outdoor Trek: Mirror, Mirror starts this weekend

    • 1 Comments

    As previously noted, Outdoor Trek will be staging live performances of the Star Trek episode Mirror, Mirror. The schedule is up.

    Three weekends starting this Saturday at Blanche Lavizzo Park. Saturday performances are 7pm; Sunday performances are 2pm. Admission is free.

    Attend if you dare.

  • The Old New Thing

    What does it mean when GetQueuedCompletionStatus return ERROR_SEM_TIMEOUT?

    • 9 Comments

    A customer asked for assistance interpreting a failure of the Get­Queued­Completion­Status function.

    We are observing that Get­Queued­Completion­Status is intermittently behaving as follows:

    • The handle is a SOCKET.
    • The function returns FALSE.
    • lpOverlapped != NULL.
    • Get­Last­Error reports ERROR_SEM_TIMEOUT: "The semaphore timeout period has expired."

    That's all the information we have in our log files. We don't know the value of number­Of­Bytes or completion­Key, sorry.

    We realize that this is a rather vague question, but when this problem hits our machines, it causes our internal logic to go into a reset state since it doesn't know what the error means or how to recover. Resetting is expensive, and we would prefer to handle this error in a less drastic manner, if only we knew what it meant.

    The error code ERROR_SEM_TIMEOUT is a rather bad translation of the underlying status code STATUS_IO_TIMEOUT, which is much more meaningful. It means that the I/O operation timed out.

    Colleagues of mine from the networking team chimed in with additional information:

    A common source of this error with TCP sockets is that the maximum retransmission count and timeout have been reached on a bad (or broken) link.

    If you know that the handle is a socket, then you can use WSA­Get­Overlapped­Result on the lpOverlapped that got returned. Winsock will convert the status code to something more Winsocky. In this case, it would have given you WSA­ETIMED­OUT, which makes it clearer what happened.

  • The Old New Thing

    How do I configure Windows Update programmatically?

    • 20 Comments

    First of all, normal programs shouldn't be messing with Windows Update configuration. That's something the user (or the user's administrator) decides. If you're an IT administrator, then you can use Group Policy to configure Windows Update on your network.

    But maybe you're a special case where the above remarks don't apply. Say you're a data center and all the systems are running inside of virtual machines and you don't want them installing updates or rebooting without your permission, so you want to run a script when you set up the image to disable updates.

    You can use the Microsoft.Update.Auto­Update object, known to native code as IAutomatic­Updates. Here's a script that prints your current update settings:

    var AU = new ActiveXObject("Microsoft.Update.AutoUpdate");
    var Settings = AU.Settings;
    WScript.StdOut.WriteLine(Settings.NotificationLevel);
    

    The notification levels are documented as Automatic­Updates­Notification­Level. If you want to change the notification level, you can update the level in the Settings object, and then save it.

    var AU = new ActiveXObject("Microsoft.Update.AutoUpdate");
    var Settings = AU.Settings;
    Settings.NotificationLevel = 1; // aunlDisabled
    Settings.Save();
    

    All the various settings are documented in MSDN, though you have to dig through IAutomatic­Updates­Settings, IAutomatic­Updates­Settings2, and IAutomatic­Updates­Settings3 to find them all.

  • The Old New Thing

    Why does Outlook map Ctrl+F to Forward instead of Find, like all right-thinking programs?

    • 97 Comments

    It's a widespread convention that the Ctrl+F keyboard shortcut initiates a Find operation. Word does it, Excel does it, Wordpad does it, Notepad does it, Internet Explorer does it. But Outlook doesn't. Why doesn't Outlook get with the program?

    Rewind to 1995.

    The mail team was hard at work on their mail client, known as Exchange (code name Capone, in keeping with all the Chicago-related code names from that era). Back in those days, the Ctrl+F keyboard shortcut did indeed call up the Find dialog, in accordance with convention.

    And then a bug report came in from a beta tester who wanted Ctrl+F to forward rather than find, because he had become accustomed to that keyboard shortcut from the email program he used before Exchange.

    That beta tester was Bill Gates.

  • The Old New Thing

    Enumerating integer compositions (the return of the binomial coefficients)

    • 12 Comments
    In number theory, a composition of an integer is an ordered sequence of positive integers which sum to the target value. For example, the value 3 can be written as 3, 1+2, 2+1, or 1+1+1.

    You can think about the target number as a string of stars, and a composition is a way of breaking the stars into groups. For example, here are the compositions of 3:

    * * *3
    *|* *1+2
    * *|*2+1
    *|*|*  1+1+1

    How would you generate all compositions of a particular length? In the above example, the compositions of length 2 would be 1+2 and 2+1. Let's take a look at the last star in the composition. If it is immediately preceded by a space, then removing it results in a string one star shorter, but with the same number of groups (but the last group is one star smaller). In other words, what's left behind is a composition of n − 1 of length k. You can recover the original string by adding a star at the end.

    On the other hand, if the last star is immediately preceded by a vertical line, then removing it deletes an entire group, so what remains is a string one star shorter with one fewer group. In other words, what's left behind is a composition of n − 1 of length k − 1. You can recover the original string by adding a separator and a star at the end.

    Therefore, our algorithm goes like this:

    • Handle base cases.
    • Otherwise,
      • Recursively call Compositions(n − 1, k) and add a star to the end. (I.e., increment the last term.)
      • Recursively call Compositions(n − 1, k − 1) and add a vertical line and a star to the end. (I.e., add a +1 to the end.)
    function Compositions(n, k, f) {
     if (n == 0) { return; }
     if (k == 1) { f([n]); return; }
     Compositions(n-1, k, function(s) {
      f(s.map(function(v, i) { // increment the last element
        return i == s.length - 1 ? v + 1 : v;
      }));
     });
     Compositions(n-1, k-1, function(s) {
      f(s.concat(1)); // append a 1
     });
    }
    
    Compositions(5, 3, function(s) { console.log(s.join("+")); });
    

    Once again, this algorithm should look awfully familiar, because we've seen it twice before, once in the context of enumerating subsets with binomial coefficients, and again when Enumerating bit strings with a specific number of bits set. All we're doing is decorating the results differently.

    Here's a way to see directly how compositions are the same as subset selection. Let's ignore the stars and instead look at the gaps between them.

    * * * * *
     ^ ^ ^ ^
    

    Each of the gaps can hold either a space or a vertical line. Breaking n into k pieces is the same as drawing k − 1 vertical lines in the n − 1 gaps. In other words, you have n − 1 locations and you want to choose k − 1 of them: Ta da, we converted the problem into generating subsets of size k − 1 from a collection of size n − 1. (In mathematics, this visualization is known as stars and bars.)

    Therefore, we could have made the Subsets function do the work:

    function Compositions(n, k, f) {
     Subsets(n-1, k-1, function(s) {
      s.push(n);
      f(s.map(function(v, i) { return v - (s[i-1]||0); }));
      s.pop();
     });
    }
    

    The callback merely calculates the differences between adjacent elements of the subset, which is the number of stars between each line. There is a little extra playing around in order to create a virtual vertical bar at the beginning and end.

    Since there is an incremental way of enumerating subsets, there should be an incremental way of enumerating compositions. If you look at how the incremental subset enumerator works, you can see how it maps to incremental composition enumeration: Incrementing an index is the same as moving a bar to the right, which maps to incrementing one term and decrementing the subsequent term. Resetting subsequent indices to the minimum values corresponds to setting the corresponding term to 1. The only trick is maintaining the value of the final term, which gathers all the values squeezed out of earlier terms.

    function NextComposition(s) {
     var k = s.length;
     for (var i = k - 1; i >= 1; i--) {
      if (s[i] > 1) {
       s[i]--;
       s[i-1]++;
       for (; i < k - 1; i++) { s[k-1] += s[i] - 1; s[i] = 1; }
       return true;
      }
     }
     return false;
    }
    

    Exercise: If you wanted to generate all compositions of any length, you could do it by generating all compositions of length 1, then compositions of length 2, and so on up to length n. What's the easier way of doing it?

    Bonus chatter: If you want to generate all partitions (which is like compositions, except that order doesn't matter), you can use this recursive version or this iterative one.

  • The Old New Thing

    If I duplicate a handle, can I keep using the duplicate after closing the original?

    • 10 Comments

    A customer asked whether it was okay to use a duplicated handle even after the original handle was closed.

    Yes. That's sort of why you would duplicate it.

    Duplicating a handle creates a second handle which refers to the same underlying object as the original. Once that's done, the two handles are completely equivalent. There's no way to know which was the original and which is the duplicate. Either handle can be used to access the underlying object, and the underlying object is not torn down until all handles to it have been closed.

    One tricky bit here is that since you have two ways to refer to the same thing, changes made to the object via one handle will be reflected when observed through the other handle. That's because the changes you're making are to the object itself, not to the handle. For example, if you duplicate the handle to an event, then you can set the event via either handle.

    That may all sound obvious, but one thing to watch out for is the case of file handles: The current file position is a property of the file object, not the handle. Say you duplicate a file handle and give the original to one component and the duplicate to another. Now, when either component reads from or writes to the file, it's going to change the current position of the file object, and consequently may confuse the other component (who may not have expected the current position to be changing). Also, if the underlying file is a synchronous file handle, the file operations on the underlying file will be synchronized. If one component starts a read, the other component won't be able to access the file object until that read completes.

    If you want to create a second handle to a file that has its own file pointer and is not synchronized against the first file handle, you can use the Re­Open­File function to create a second file object with its own synchronization and its own file position, but which refers to the same underlying file.

    (Don't forget to get your sharing modes right! The second file object's access and sharing modes must be compatible with access and sharing modes of the original file object. Otherwise the call will fail with a sharing violation.)

  • The Old New Thing

    Using the REFIID/void** pattern for returning COM objects for future-proofing and to avoid problems with dependencies

    • 36 Comments

    Suppose you have a function that creates a reference to a COM object:

    // pixie.h
    
    STDAPI CreateShellItemFromPixieDust(
        const PIXIEDUST *ppd,
        IShellItem **ppsi);
    

    There are a few issues with this design.

    First of all, it requires that whoever uses your header file must have included shlobj.h first, since that's where IShell­Item is defined. You could solve that problem by putting #include <shlobj.h> at the top of pixie.h, but that creates its own problems. For example, many header files alter their behavior based on symbols that have been #defined, and including that other header file as part of pixie.h means that it's going to use the settings that were active at the time pixie.h was included (which may not be what the clients of your header file are expecting). For example:

    #include <windows.h>
    #include <ole2.h>
    #include <pixie.h>
    #define STRICT_TYPED_ITEMIDS
    #include <shlobj.h>
    

    This program wants to use strict typed item IDs, so it defines the magic symbol before including shlobj.h. Unfortunately, that request is ignored because the pixie.h header file secretly included shlobj.h prematurely.

    This can get particularly messy if somebody wants to include shlobj.h with particular preprocessor tricks temporarily active.

    #include <windows.h>
    #include <ole2.h>
    #include <pixie.h>
    
    // The WINDOWDATA structure added in Windows Vista conflicts
    // with the one we defined back in 2000, so rename it to
    // WINDOWDATA_WINDOWS.
    #define WINDOWDATA WINDOWDATA_WINDOWS
    #include <shlobj.h>
    #undef WINDOWDATA
    
    // Here's our version of WINDOWDATA
    #include <contosotypes.h>
    

    This code works around a naming conflict that was created when Windows Vista added a structure called WINDOW­DATA to shlobj.h. The application already had a structure with the same name, so it has to rename the one in shlobj.h to some other name to avoid a redefinition error.

    If you made pixie.h include shlobj.h on its own, it would do so without this fancy renaming, and the developers of that code will curse and say something like this:

    #include <windows.h>
    #include <ole2.h>
    
    // The WINDOWDATA structure added in Windows Vista conflicts
    // with the one we defined back in 2000, so rename it to
    // WINDOWDATA_WINDOWS.
    #define WINDOWDATA WINDOWDATA_WINDOWS
    
    // pixie.h secretly includes shlobj.h so we have to put its #include
    // under WINDOWDATA protection.
    #include <pixie.h>
    
    #include <shlobj.h>
    
    #undef WINDOWDATA
    
    
    // Here's our version of WINDOWDATA
    #include <contosotypes.h>
    

    Another problem with the Create­Shell­Item­From­Pixie­Dust function is that it hard-codes the output interface to IShell­Item. When everybody moves on to IShell­Item2, all the callers will have to follow the Create­Shell­Item­From­Pixie­Dust call with a Query­Interface to get the interface they really want. (Which, if your object is out-of-process, could mean another round trip to the server.)

    The solution to both of these problems is to simply make the caller specify what type of object they want.

    // pixie.h
    
    STDAPI CreateShellItemFromPixieDust(
        const PIXIEDUST *ppd,
        REFIID riid,
        void **ppv);
    

    Now that we are no longer mentioning IShell­Item explicitly, we don't need to include shlobj.h any more. And if the caller wants IShell­Item2, they can just ask for it.

    Your creation function used to look like this:

    STDAPI CreateShellItemFromPixieDust(
        const PIXIEDUST *ppd,
        IShellItem **ppsi)
    {
        *ppsi = nullptr;
        IShellItem *psiResult;
        HRESULT hr = ... do whatever ...;
        if (SUCCEEDED(hr))
        {
           *ppsi = psiResult;
        }
        return hr;
    }
    

    You simply have to tweak the way you return the pointer:

    STDAPI CreateShellItemFromPixieDust(
        const PIXIEDUST *ppd,
        REFIID riid,
        void **ppv)
    {
        *ppsi = nullptr;
        IShellItem *psiResult;
        HRESULT hr = ... do whatever ...;
        if (SUCCEEDED(hr))
        {
           hr = psiResult->QueryInterface(riid, ppv);
           psiResult->Release();
        }
        return hr;
    }
    

    Callers of your function would go from

    IShellItem *psi;
    hr = CreateShellItemFromPixieDust(ppd, &psi);
    

    to

    IShellItem *psi;
    hr = CreateShellItemFromPixieDust(ppd, IID_PPV_ARGS(&psi));
    

    If the caller decides that they really want an IShell­Item2, they merely have to change their variable declaration; the call to the creation function is unchanged.

    IShellItem2 *psi;
    hr = CreateShellItemFromPixieDust(ppd, IID_PPV_ARGS(&psi));
    
  • The Old New Thing

    Why are only some of the Windows 7 system notification icons colorless?

    • 52 Comments

    André decided to play "gotcha" by noting that not all of the notification icons went colorless and wondered what the criteria was for deciding which ones to make colorless and which ones to make colorful.

    It's very simple: The design team generated colorless versions of the most commonly-seen notification icons. They didn't have the time to generate colorless versions of all notification icons, so they focused on the ones that gave them the most benefit.

    This is the standard tradeoff you have to make whenever you have finite resources. Eventually the marginal benefit of redrawing one more icon exceeds its marginal cost, at which point you stop. The marginal cost is measured not only in actual resources (designers can redraw only so many icons per day, and you have money to hire only so many designers) but also in opportunity cost (time spent redrawing icons to be colorless is time not spent on other design tasks).

    This is the same reason that not all icons in Windows XP were given the full-color perspective-view treatment. For example, nearly all of the icons in the Administrative Tools section are the old Windows 2000-style 16-color flat (or isometric) icons. The design team focused on the 100ish most commonly used icons and went to the effort of redrawing them in the Windows XP style. The more rarely-used icons were left in the old style because the cost of converting them did not merit the benefit.

    The same thing happened in Windows Vista, when the icon design changed yet again. The style became less stylized and more realistic, but not quite photorealistic, and the angle of presentation changed. The design team had the resources to convert the most commonly used icons, and the rest were left as they were.

    It's the Pareto Principle again. If you have finite resources (and who doesn't) you may find that you can get 80% of the benefit by doing only 20% of the work. And that leaves 80% of your capacity available to address some other problem.

  • The Old New Thing

    How do I turn off email reminders for my Windows Live calendar, and disable the birthday calendar while I'm at it?

    • 12 Comments

    While having lunch with my colleagues a few days ago, they complained that they get annoying email for all their calendar events.

    I mentioned, "So turn off the email notifications."

    "You can do that?" they replied, incredulous.

    Here's how you do it.

    First, go to calendar.live.com and sign in to your calendar.

    In the upper right corner of the screen, click on the gear icon.

    Sven Svensson
    👨

    From the drop-down menu, select Options. From the Options page, select the calendar you want to change.

    Edit your reminder and calendar settings
      Sven's calendar
      Birthday calendar

    In the Calendar settings, untick the checkboxes.

    Notifications
    Email notifications
    Get reminders for events and tasks
    Get daily agenda

    And then click Save to save your changes.

    My colleagues also noted "It used to mean something when somebody remembered your birthday. But now, everybody gets a notification on your birthday, and you get birthday wishes from random people you haven't talked to in years." Which led to the next complaint, "I just want to turn off the birthday calendar."

    No problem. Go back and click on the gear icon, and instead of selecting Options, look under Filter Calendars and untick the check box next to Birthday calendar.

    If you have a Windows Phone, you can remove it from your phone calendar by opening the Calendar app, opening the appbar, selecting settings, and then unticking the check box next to Birthday calendar.

Page 7 of 432 (4,317 items) «56789»