• The Old New Thing

    How did the invalid floating point operand exception get raised when I disabled it?


    Last time, we learned about the dangers of uninitialized floating point variables but left with a puzzle: Why wasn't this caught during internal testing?

    I dropped a hint when I described how SNaNs work: You have to ask the processor to raise an exception when it encounters a signaling NaN, and the program disabled that exception. Why was an exception being raised when it had been disabled?

    The clue to the cause was that the customer that was encountering the crash reported that it tended to happen after they printed a report. It turns out that the customer's printer driver was re-enabling the invalid operand exception in its DLL_PROCESS_ATTACH handler. Since the exception was enabled, the SNaN exception, which was previously masked, was now live, and it crashed the program.

    I've also seen DLLs change the floating point rounding state in their DLL_PROCESS_ATTACH handler. This behavior can be traced back to old versions of the C runtime library which reset the floating point state as part of their DLL_PROCESS_ATTACH; this behavior was corrected as long ago as 2002 (possibly even earlier; I don't know for sure). Obviously that printer driver was even older. Good luck convincing the vendor to fix a bug in a driver for a printer they most likely don't even manufacture any more. If anything, they'll probably just treat it as incentive for you to buy a new printer.

    When you load external code into your process, you implicitly trust that the code won't screw you up. This is just another example of how a DLL can inadvertently screw you up.


    One might argue that the LoadLibrary function should save the floating point state before loading a library and restore it afterwards. This is an easy suggestion to make in retrospect. Writing software would be so much easier if people would just extend the courtesy of coming up with a comprehensive list of "bugs applications will have that you should protect against" before you design the platform. That way, when a new class of application bugs is found, and they say "You should've protected against this!", you can point to the list and say, "Nuh, uh, you didn't put it on the list. You had your chance."

    As a mental exercise for yourself: Come up with a list of "all the bugs that the LoadLibrary function should protect against" and how the LoadLibrary function would go about doing it.

  • The Old New Thing

    What other programs are filtered from the Start menu's list of frequently-used programs?


    We already saw that programs in the pin list are pruned from the most-frequently-used programs list because they would be redundant. Another fine-tuning rule was introduced after the initial explorations with the new Windows XP Start menu: Programs with specific "noise" names are removed from consideration.

    Many "noise" programs were showing up as frequently used because they happened to be shortcuts to common helper programs like Notepad or Wordpad to display a "Read Me" document. These shortcuts needed to be filtered out so that they couldn't be nominated as, say, the Notepad representative. The list of English "poison words" is given in Knowledge Base article 282066.

    (Incidentally, a program can also register itself as not eligible for inclusion in the front page of the Start menu by creating a NoStartPage value in its application registration.)

    We'll see in the epilogue that Windows Vista uses an improved method for avoiding the "unwanted representative" problem.

  • The Old New Thing

    Points are earned by programs, not by shortcuts


    The first subtlety of the basic principle that determines which programs show up in the Start menu is something you may not have noticed when I stated it:

    Each time you launch a program, it "earns a point", and the longer you don't launch a program, the more points it loses.

    Notice that the rule talks about programs, not shortcuts.

    The "points" for a program are tallied from all the shortcuts that exist on the All Programs section of the Start menu. Many programs install multiple shortcuts, say one to the root of the All Programs menu and another to a deep folder. It doesn't matter how many shortcuts you have; if they all point to the same program, then it is that program that earns the points when you use any of the shortcuts.

    One the Start menu decides that a program has earned enough points to make it to the front page, it then has to choose which shortcut to use to represent that program. This is an easy decision if there's only one shortcut. If there are multiple shortcuts to the same program, then the most-frequently-used shortcut is selected as the one to appear on the front page of the Start menu.

    If you paid really close attention, you may have noticed a subtlety to this subtlety. We'll take that up next time.

    Please hold off your questions until the (two-week!) series is complete, because I suspect a later entry will answer them. (This series is an expansion upon the TechNet column on the same topic. If you've read the TechNet article, then a lot of this series will be review.)*


    *I wrote this last time, but that didn't stop people from asking questions anyway. I don't expect it'll work today either, but who knows, maybe you'll surprise me.

  • The Old New Thing

    Psychic debugging: IP on heap


    Somebody asked the shell team to look at this crash in a context menu shell extension.

    IP_ON_HEAP:  003996d0
    ChildEBP RetAddr
    00b2e1d8 68f79ca6 0x3996d0
    00b2e1f4 7713a7bd ATL::CWindowImplBaseT<
                               ATL::CWindow,ATL::CWinTraits<2147483648,0> >
    00b2e220 77134be0 USER32!InternalCallWinProc+0x23
    00b2e298 7713a967 USER32!UserCallWinProcCheckWow+0xe0
    eax=68f79c63 ebx=00000000 ecx=00cade10 edx=7770df14 esi=002796d0 edi=000603cc 
    eip=002796d0 esp=00cade4c ebp=00cade90 iopl=0         nv up ei pl nz na pe nc 
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206 
    002796d0 c744240444bafb68 mov     dword ptr [esp+4],68fbba44

    You should be able to determine the cause instantly.

    I replied,

    This shell extension is using a non-DEP-aware version of ATL. They need to upgrade to ATL 8 or disable DEP.

    This was totally obvious to me, but the person who asked the question met it with stunned amazement. I guess the person forgot that older versions of ATL are notorious DEP violators. You see a DEP violation, you see that it's coming from ATL, and bingo, you have your answer. When DEP was first introduced, the base team sent out mail to the entire Windows division saying, "Okay, folks, we're turning it on. You're going to see a lot of application compatibility problems, especially this ATL one."

    Psychic powers sometimes just means having a good memory.

    Even if you forgot that information, it's still totally obvious once you look at the scenario and understand what it's trying to do.

    The fault is IP_ON_HEAP which is precisely what DEP protects against. The next question is why IP ended up on the heap. Was it a mistake or intentional?

    Look at the circumstances surrounding the faulting instruction again. The faulting instruction is the window procedure for a window, and the action is storing a constant into the stack. The symbols of the caller tell us that it's some code in ATL, and you can even go look up the source code yourself:

    template <class TBase, class TWinTraits>
    LRESULT CALLBACK CWindowImplBaseT< TBase, TWinTraits >
      ::StartWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) {
        CWindowImplBaseT< TBase, TWinTraits >* pThis =
                  (CWindowImplBaseT< TBase, TWinTraits >*)
        pThis->m_hWnd = hWnd; 
        pThis->m_thunk.Init(pThis->GetWindowProc(), pThis); 
        WNDPROC pProc = pThis->m_thunk.GetWNDPROC(); 
        ::SetWindowLongPtr(hWnd, GWLP_WNDPROC, (LONG_PTR)pProc);
        return pProc(hWnd, uMsg, wParam, lParam);

    Is pProc corrupted and we're jumping to a random address on the heap? Or was this intentional?

    ATL is clearly generating code on the fly (the window procedure thunk), and it is in execution of the thunk that we encounter the DEP exception.

    Now, you didn't need to have the ATL source code to realize that this is what's going on. It is a very common pattern in framework libraries to put a C++ wrapper around window procedures. Since C++ functions have a hidden this parameter, the wrappers need to sneak that parameter in somehow, and one common technique is to generate some code on the fly that sets up the hidden this parameter before calling the C++ function. The value at [esp+4] is the window handle, something that can be recovered from the this pointer, so it's a handly thing to replace with this before jumping to the real C++ function.

    The address being stored as the this parameter is 68fbba44, which is inside the DLL in question. (You can tell this because the return address, which points to the ATL thunk code, is at 68f79ca6 which is in the same neighborhood as the mystery pointer.) Therefore, this is almost certainly an ATL thunk for a static C++ object.

    In other words, this is extremely unlikely be a jump to a random address. The code at the address looks too good. It's probably jumping there intentionally, and the fact that it's coming from a window procedure thunk confirms it.

    But our tale is not over yet. The plot thickens. We'll continue next time.

  • The Old New Thing

    Why doesn't String.Format throw a FormatException if you pass too many parameters?


    Welcome to CLR Week 2009. As always, we start with a warm-up.

    The String.Format method doesn't throw a FormatException if you pass too many parameters, but it does if you pass too few. Why the asymmetry?

    Well, this is the type of asymmetry you see in the world a lot. You need a ticket for each person that attends a concert. If you have too few tickets, they won't let you in. If you have too many, well, that's a bit wasteful, but you can still get in; the extras are ignored. If you create an array with 10 elements and use only the first five, nobody is going to raise an ArrayBiggerThanNecessary exception. Similarly, the String.Format message doesn't mind if you pass too many parameters; it just ignores the extras. There's nothing harmful about it, just a bit wasteful.

    Besides, you probably don't want this to be an error:

    if (verbose) {
      format = "{0} is not {1} (because of {2})";
    } else {
      format = "{0} not {1}";
    String.Format(format, "Zero", "One", "Two");

    Think of the format string as a SELECT clause from the dataset provided by the remaining parameters. If your table has fields ID and NAME and you select just the ID, there's nothing wrong with that. But if you ask for DATE, then you have an error.

  • The Old New Thing

    2007 Q3 link clearance: Microsoft blogger edition


    A few random links that I've collected from other Microsoft bloggers.

  • The Old New Thing

    Why can't I pass a reference to a derived class to a function that takes a reference to a base class by reference?


    "Why can't I pass a reference to a derived class to a function that takes a reference to a base class by reference?" That's a confusing question, but it's phrased that way because the simpler phrasing is wrong!

    Ths misleading simplified phrasing of the question is "Why can't I pass a reference to a derived class to a function that takes a base class by reference?" And in fact the answer is "You can!"

    class Base { }
    class Derived : Base { }
    class Program {
      static void f(Base b) { }
      public static void Main()
          Derived d = new Derived();

    Our call to f passes a reference to the derived class to a function that takes a reference to the base class. This is perfectly fine.

    When people ask this question, they are typically wondering about passing a reference to the base class by reference. There is a double indirection here. You are passing a reference to a variable, and the variable is a reference to the base class. And it is this double reference that causes the problem.

    class Base { }
    class Derived : Base { }
    class Program {
      static void f(ref Base b) { }
      public static void Main()
          Derived d = new Derived();
          f(ref d); // error

    Adding the ref keyword to the parameter results in a compiler error:

    error CS1503: Argument '1': cannot convert from 'ref Derived' to 'ref Base'

    The reason this is disallowed is that it would allow you to violate the type system. Consider:

      static void f(ref Base b) { b = new Base(); }

    Now things get interesting. Your call to f(ref d) passes a reference to a Derived by reference. When the f function modifies its formal parameter b, it's actually modifying your variable d. What's worse, it's putting a Base in it! When f returns, your variable d, which is declared as being a reference to a Derived is actually a reference to the base class Base.

    At this point everything falls apart. Your program calls some method like d.OnlyInDerived(), and the CLR ends up executing a method on an object that doesn't even support that method.

    You actually knew this; you just didn't know it. Let's start from the easier cases and work up. First, passing a reference into a function:

    void f(SomeClass s);
       T t = new T();

    The function f expects to receive a reference to a SomeClass, but you're passing a reference to a T. When is this legal?

    "Duh. T must be SomeClass or a class derived from SomeClass."

    What's good for the goose is good for the gander. When you pass a parameter as ref, it not only goes into the method, but it also comes out. (Not strictly true but close enough.) You can think of it as a bidirectional parameter to the function call. Therefore, the rule "If a function expects a reference to a class, you must provide a reference to that class or a derived class" applies in both directions. When the parameter goes in, you must provide a reference to that class or a derived class. And when the parameter comes out, it also must be a reference to that class or a derived class (because the function is "passing the parameter" back to you, the caller).

    But the only time that S can be T or a subclass, while simultaneously having T be S or a subclass is when S and T are the same thing. This is just the law of antisymmetry for partially-ordered sets: "if a ≤ b and b ≤ a, then a = b."

  • The Old New Thing

    If you pin a program, it doesn't show up in the frequently-used programs list


    After the initial explorations with the Windows XP Start menu, we had to add a rule that fine-tuned the results: If a program is pinned, then it is removed from consideration as a frequently-used program.

    For example, if you right-click Lotus Notes and select "Pin to Start menu", then it goes into the pin list and will never show up in the dynamic portion of the front page of the Start menu. This tweak was added to avoid the ugly situation where you have two icons for the same program on the front page of the Start menu, when only one would do the job.

    This is another manifestation of the "Don't show me something I already know" principle, which we saw earlier when we discussed why the All Programs list doesn't use Intellimenus. After all, you pinned the program to your Start menu because you run it often. There's no point in showing it again at the top of your "frequently-used" list; you knew that already! Use that scarce real estate to show the user something that is actually of value.

    Next time, another fine-tuning rule that tries to filter the noise from the results.

  • The Old New Thing

    The program doesn't have to be run from the Start menu to earn Start menu points


    There's a second subtlety to the basic principle that determines which programs show up in the Start menu:

    Each time you launch a program, it "earns a point", and the longer you don't launch a program, the more points it loses.

    Since programs earn points and not shortcuts, a program can earn points even if you don't use the Start menu to run it.

    In usability studies, we often see people who run programs by digging through their Program Files directory until they find an icon that looks promising and then double-click it. If there is a shortcut on the All Programs section of the Start menu that points to the same program, then that shortcut will eventually work its way onto the front page, assuming the user runs the program often enough.

    This is why you will see a program appear on the front page of the Start menu even though you never ran it from the Start menu. The program earned points because you ran the program manually, or because you opened a document that is associated with that program. Promoting a program run this way helps users realize that they can run Backgammon from the Start menu instead of having to open My Computer, then click on my C drive, then click on Program Files, then MSN Gaming Zone, then Windows, and then double-click the icon with the strange name bckgzm. I've seen usability sessions where the users did this repeatedly, and they considered it perfectly normal, albeit frustrating. "Computers are so hard to use."

    Next time, we'll look at how the pin list influences the list of frequently-used programs.

  • The Old New Thing

    Data breakpoints are based on the linear address, not the physical address


    When you ask the debugger to set a read or write breakpoint, the breakpoint fires only if the address is read from or written to by the address you specify. If the memory is mapped to another address and modified at that other address, then your breakpoint won't see it.

    For example, if you have multiple views on the same data, then modifications to that data via alternate addresses will not trigger the breakpoint.

    The hardware breakpoint status is part of the processor context, which is maintained on a per-thread basis. Each thread maintains its own virtualized hardware breakpoint status. You don't notice this in practice because debuggers are kind enough to replicate the breakpoint state across all threads in a process so that the breakpoint fires regardless of which thread triggers it. But that replication typically doesn't extend beyond the process you're debugging; the debugger doesn't bother replicating your breakpoints into other processes! This means that if you set a write breakpoint on a block of shared memory, and the write occurs in some other process, your breakpoint won't fire since it's not your process that wrote to it.

    When you call into kernel mode, there is another context switch, this time between user mode and kernel mode, and the kernel mode context of course doesn't have your data breakpoint. Which is a good thing, because if that data breakpoint fired in kernel mode, how is your user-mode debugger expected to be able to make any sense of it? The breakpoint fired when executing code that user mode doesn't have permission to access, and it may have fired while the kernel mode code owned an important critical section or spinlock, a critical section the debugger itself may very well need. Imagine if the memory were accessed by the keyboard driver. Oops, now your keyboard processing has been suspended. Even worse, what if the memory were accessed by a a hardware interrupt handler? Hardware interrupt handlers can't even access paged memory, much less allow user-mode code to run.

    This "program being debugged takes a lock that the debugger itself needs" issue isn't usually a problem when a user-mode debugger debugs a user-mode process, because the locks held by a user-mode process typically affect only that process. If a process takes a critical section, sure that may deadlock the process, but the debugger is not part of the process, so it doesn't care.

    Of course, the "debugger is its own world" principle falls apart if the debugger is foolish enough to require a lock that the program being debugged also uses. Debugger authors therefore must be careful to avoid these sorts of cross-process dependencies. (News flash: Writing a debugger is hard.) You can still run into trouble if the program being debugged has done something with global consequences like create a fullscreen topmost window (thereby covering the debugger) or installed a global keyboard hook (thereby interfering with typing). If you've tried debugging a system service, you may have run into this sort of cross-process deadlock. For example, if you debug the service that is responsible for the networking client, and the debugger tries to access the network (for example, to load symbols), you've created a deadlock since the debugger needs to access the network, which it can't do because the networking service is stopped in the debugger.

    Hardware debugging breakpoints are a very convenient tool for chasing down bugs, but you have to understand their limitations.

    Additional reading: Data breakpoint oddities.

Page 7 of 449 (4,482 items) «56789»