• The Old New Thing

    Why can't I declare a type that derives from a generic type parameter?

    • 18 Comments

    A lot of questions about C# generics come from the starting point that they are just a cutesy C# name for C++ templates. But while the two may look similar in the source code, they are actually quite different.

    C++ templates are macros on steroids. No code gets generated when a template is "compiled"; the compiler merely hangs onto the source code, and when you finally instantiate it, the actual type is inserted and code generation takes place.

    // C++ template
    template<class T>
    class Abel
    {
    public:
     int DoBloober(T t, int i) { return t.Bloober(i); }
    };
    

    This is a perfectly legal (if strange) C++ template class. But when the compiler encounters this template, there are a whole bunch of things left unknown. What is the return type of T::Bloober? Can it be converted to an int? Is T::Bloober a static method? An instance method? A virtual instance method? A method on a virtual base class? What is the calling convention? Does T::Bloober take an int argument? Or maybe it's a double? Even stranger, it might accept a Canoe which gets converted from an int by a converting constructor. Or maybe it's a function that takes two parameters, but the second parameter has a default value.

    Nobody knows the answers to these questions, not even the compiler. It's only when you decide to instantiate the template

    Abel<Baker> abel;
    

    that these burning questions can be answered, overloaded operators can be resolved, conversion operators can be hunted down, parameters can get pushed on the stack in the correct order, and the correct type of call instruction can be generated.

    In fact, the compiler doesn't even care whether or not Baker has a Bloober method, as long as you never call Abel<Baker>::DoBloober!

    void f()
    {
     Abel<int> a; // no error!
    }
    
    void g()
    {
     Abel<int> a;
     a.DoBloober(0, 1); // error here
    }
    
    Only if you actually call the method does the compiler start looking for how it can generate code for the DoBloober method.

    C# generics aren't like that.

    Unlike C++, where a non-instantiated template exists only in the imaginary world of potential code that could exist but doesn't, a C# generic results in code being generated, but with placeholders where the type parameter should be inserted.

    This is why you can use generics implemented in another assembly, even without the source code to that generic. This is why a generic can be recompiled without having to recompile all the assemblies that use that generic. The code for the generic is generated when the generic is compiled. By comparison no code is generated for C++ templates until the template is instantiated.

    What this means for C# generics is that if you want to do something with your type parameter, it has to be something that the compiler can figure out how to do without knowing what T is. Let's look at the example that generated today's question.

    class Foo<T>
    {
     class Bar : T
     { ... }
    }
    

    This is flagged as an error by the compiler:

    error CS0689: Cannot derive from 'T' because it is a type parameter
    

    Deriving from a generic type parameter is explicitly forbidden by 25.1.1 of the C# language specification. Consider:

    class Foo<T>
    {
     class Bar : T
     {
       public void FlubberMe()
       {
         Flubber(0);
       }
     }
    }
    

    The compiler doesn't have enough information to generate the IL for the FlubberMe method. One possibility is

    ldarg.0        // "this"
    ldc.i4.0    // integer 0 - is this right?
    call T.Flubber // is this the right type of call?
    

    The line ldc.i4.0 is a guess. If the method T.Flubber were actually void Flubber(long l), then the line would have to be ldc.i4.0; conv.i8 to load an 8-byte integer onto the stack instead of a 4-byte integer. Or perhaps it's void Flubber(object o), in which case the zero needs to be boxed.

    And what about that call instruction? Should it be a call or callvirt?

    And what if the method returned a value, say, string Flubber(int i)? Now the compiler also has to generate code to discard the return value from the top of the stack.

    Since the source code for a generic is not included in the assembly, all these questions have to be answered at the time the generic is compiled. Besides, you can write a generic in Managed C++ and use it from VB.NET. Even saving the source code won't be much help if the generic was implemented in a language you don't have the compiler for!

  • The Old New Thing

    Why doesn't String.Format throw a FormatException if you pass too many parameters?

    • 19 Comments

    Welcome to CLR Week 2009. As always, we start with a warm-up.

    The String.Format method doesn't throw a FormatException if you pass too many parameters, but it does if you pass too few. Why the asymmetry?

    Well, this is the type of asymmetry you see in the world a lot. You need a ticket for each person that attends a concert. If you have too few tickets, they won't let you in. If you have too many, well, that's a bit wasteful, but you can still get in; the extras are ignored. If you create an array with 10 elements and use only the first five, nobody is going to raise an ArrayBiggerThanNecessary exception. Similarly, the String.Format message doesn't mind if you pass too many parameters; it just ignores the extras. There's nothing harmful about it, just a bit wasteful.

    Besides, you probably don't want this to be an error:

    if (verbose) {
      format = "{0} is not {1} (because of {2})";
    } else {
      format = "{0} not {1}";
    }
    String.Format(format, "Zero", "One", "Two");
    

    Think of the format string as a SELECT clause from the dataset provided by the remaining parameters. If your table has fields ID and NAME and you select just the ID, there's nothing wrong with that. But if you ask for DATE, then you have an error.

  • The Old New Thing

    Is DEP on or off on Windows XP Service Pack 2?

    • 11 Comments

    Last time, we traced an IP_ON_HEAP failure to a shell extension that used an older version of ATL which was not DEP-friendly. But that led to a follow-up question:

    Why aren't we seeing this same crash in the main program as in the shell extension? That program uses the same version of ATL, but it doesn't crash.

    The reason is given in this chart. Notice that the default configuration is OptIn, which means that DEP is off for all processes by default, but is on for all Windows system components. That same part of the page describes how you can change to OptOut so that the default is to turn on DEP for all processes except for the ones you put on the exception list. There's more information on this excerpt from the "Changes to Functionality in Microsoft Windows XP Service Pack 2" document.

    The program that comes with the shell extension is not part of Windows, so DEP is disabled by default. But Explorer is part of Windows, so DEP is enabled for Explorer by default. That's why only Explorer encounters this problem.

    (This little saga does illustrate the double-edged sword of extensibility. If you make your system extensible, you allow other people to add features to it. On the other hand, you also allow other people to add bugs to it.)

    The saga of the DEP exception is not over, however, because it turns out I've been lying to you. More information tomorrow.

  • The Old New Thing

    2007 Q3 link clearance: Microsoft blogger edition

    • 16 Comments

    A few random links that I've collected from other Microsoft bloggers.

  • The Old New Thing

    Psychic debugging: IP on heap

    • 43 Comments

    Somebody asked the shell team to look at this crash in a context menu shell extension.

    IP_ON_HEAP:  003996d0
    
    ChildEBP RetAddr
    00b2e1d8 68f79ca6 0x3996d0
    00b2e1f4 7713a7bd ATL::CWindowImplBaseT<
                               ATL::CWindow,ATL::CWinTraits<2147483648,0> >
                         ::StartWindowProc+0x43
    00b2e220 77134be0 USER32!InternalCallWinProc+0x23
    00b2e298 7713a967 USER32!UserCallWinProcCheckWow+0xe0
    ...
    
    eax=68f79c63 ebx=00000000 ecx=00cade10 edx=7770df14 esi=002796d0 edi=000603cc 
    eip=002796d0 esp=00cade4c ebp=00cade90 iopl=0         nv up ei pl nz na pe nc 
    cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00010206 
    002796d0 c744240444bafb68 mov     dword ptr [esp+4],68fbba44
    

    You should be able to determine the cause instantly.

    I replied,

    This shell extension is using a non-DEP-aware version of ATL. They need to upgrade to ATL 8 or disable DEP.

    This was totally obvious to me, but the person who asked the question met it with stunned amazement. I guess the person forgot that older versions of ATL are notorious DEP violators. You see a DEP violation, you see that it's coming from ATL, and bingo, you have your answer. When DEP was first introduced, the base team sent out mail to the entire Windows division saying, "Okay, folks, we're turning it on. You're going to see a lot of application compatibility problems, especially this ATL one."

    Psychic powers sometimes just means having a good memory.

    Even if you forgot that information, it's still totally obvious once you look at the scenario and understand what it's trying to do.

    The fault is IP_ON_HEAP which is precisely what DEP protects against. The next question is why IP ended up on the heap. Was it a mistake or intentional?

    Look at the circumstances surrounding the faulting instruction again. The faulting instruction is the window procedure for a window, and the action is storing a constant into the stack. The symbols of the caller tell us that it's some code in ATL, and you can even go look up the source code yourself:

    template <class TBase, class TWinTraits>
    LRESULT CALLBACK CWindowImplBaseT< TBase, TWinTraits >
      ::StartWindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) {
        CWindowImplBaseT< TBase, TWinTraits >* pThis =
                  (CWindowImplBaseT< TBase, TWinTraits >*)
                      _AtlWinModule.ExtractCreateWndData();
        pThis->m_hWnd = hWnd; 
        pThis->m_thunk.Init(pThis->GetWindowProc(), pThis); 
        WNDPROC pProc = pThis->m_thunk.GetWNDPROC(); 
        ::SetWindowLongPtr(hWnd, GWLP_WNDPROC, (LONG_PTR)pProc);
        return pProc(hWnd, uMsg, wParam, lParam);
    } 
    

    Is pProc corrupted and we're jumping to a random address on the heap? Or was this intentional?

    ATL is clearly generating code on the fly (the window procedure thunk), and it is in execution of the thunk that we encounter the DEP exception.

    Now, you didn't need to have the ATL source code to realize that this is what's going on. It is a very common pattern in framework libraries to put a C++ wrapper around window procedures. Since C++ functions have a hidden this parameter, the wrappers need to sneak that parameter in somehow, and one common technique is to generate some code on the fly that sets up the hidden this parameter before calling the C++ function. The value at [esp+4] is the window handle, something that can be recovered from the this pointer, so it's a handly thing to replace with this before jumping to the real C++ function.

    The address being stored as the this parameter is 68fbba44, which is inside the DLL in question. (You can tell this because the return address, which points to the ATL thunk code, is at 68f79ca6 which is in the same neighborhood as the mystery pointer.) Therefore, this is almost certainly an ATL thunk for a static C++ object.

    In other words, this is extremely unlikely be a jump to a random address. The code at the address looks too good. It's probably jumping there intentionally, and the fact that it's coming from a window procedure thunk confirms it.

    But our tale is not over yet. The plot thickens. We'll continue next time.

  • The Old New Thing

    Data breakpoints are based on the linear address, not the physical address

    • 6 Comments

    When you ask the debugger to set a read or write breakpoint, the breakpoint fires only if the address is read from or written to by the address you specify. If the memory is mapped to another address and modified at that other address, then your breakpoint won't see it.

    For example, if you have multiple views on the same data, then modifications to that data via alternate addresses will not trigger the breakpoint.

    The hardware breakpoint status is part of the processor context, which is maintained on a per-thread basis. Each thread maintains its own virtualized hardware breakpoint status. You don't notice this in practice because debuggers are kind enough to replicate the breakpoint state across all threads in a process so that the breakpoint fires regardless of which thread triggers it. But that replication typically doesn't extend beyond the process you're debugging; the debugger doesn't bother replicating your breakpoints into other processes! This means that if you set a write breakpoint on a block of shared memory, and the write occurs in some other process, your breakpoint won't fire since it's not your process that wrote to it.

    When you call into kernel mode, there is another context switch, this time between user mode and kernel mode, and the kernel mode context of course doesn't have your data breakpoint. Which is a good thing, because if that data breakpoint fired in kernel mode, how is your user-mode debugger expected to be able to make any sense of it? The breakpoint fired when executing code that user mode doesn't have permission to access, and it may have fired while the kernel mode code owned an important critical section or spinlock, a critical section the debugger itself may very well need. Imagine if the memory were accessed by the keyboard driver. Oops, now your keyboard processing has been suspended. Even worse, what if the memory were accessed by a a hardware interrupt handler? Hardware interrupt handlers can't even access paged memory, much less allow user-mode code to run.

    This "program being debugged takes a lock that the debugger itself needs" issue isn't usually a problem when a user-mode debugger debugs a user-mode process, because the locks held by a user-mode process typically affect only that process. If a process takes a critical section, sure that may deadlock the process, but the debugger is not part of the process, so it doesn't care.

    Of course, the "debugger is its own world" principle falls apart if the debugger is foolish enough to require a lock that the program being debugged also uses. Debugger authors therefore must be careful to avoid these sorts of cross-process dependencies. (News flash: Writing a debugger is hard.) You can still run into trouble if the program being debugged has done something with global consequences like create a fullscreen topmost window (thereby covering the debugger) or installed a global keyboard hook (thereby interfering with typing). If you've tried debugging a system service, you may have run into this sort of cross-process deadlock. For example, if you debug the service that is responsible for the networking client, and the debugger tries to access the network (for example, to load symbols), you've created a deadlock since the debugger needs to access the network, which it can't do because the networking service is stopped in the debugger.

    Hardware debugging breakpoints are a very convenient tool for chasing down bugs, but you have to understand their limitations.

    Additional reading: Data breakpoint oddities.

  • The Old New Thing

    Why can't I pass a reference to a derived class to a function that takes a reference to a base class by reference?

    • 23 Comments

    "Why can't I pass a reference to a derived class to a function that takes a reference to a base class by reference?" That's a confusing question, but it's phrased that way because the simpler phrasing is wrong!

    Ths misleading simplified phrasing of the question is "Why can't I pass a reference to a derived class to a function that takes a base class by reference?" And in fact the answer is "You can!"

    class Base { }
    class Derived : Base { }
    
    class Program {
      static void f(Base b) { }
    
      public static void Main()
      {
          Derived d = new Derived();
          f(d);
      }
    }
    

    Our call to f passes a reference to the derived class to a function that takes a reference to the base class. This is perfectly fine.

    When people ask this question, they are typically wondering about passing a reference to the base class by reference. There is a double indirection here. You are passing a reference to a variable, and the variable is a reference to the base class. And it is this double reference that causes the problem.

    class Base { }
    class Derived : Base { }
    
    class Program {
      static void f(ref Base b) { }
    
      public static void Main()
      {
          Derived d = new Derived();
          f(ref d); // error
      }
    }
    

    Adding the ref keyword to the parameter results in a compiler error:

    error CS1503: Argument '1': cannot convert from 'ref Derived' to 'ref Base'
    

    The reason this is disallowed is that it would allow you to violate the type system. Consider:

      static void f(ref Base b) { b = new Base(); }
    

    Now things get interesting. Your call to f(ref d) passes a reference to a Derived by reference. When the f function modifies its formal parameter b, it's actually modifying your variable d. What's worse, it's putting a Base in it! When f returns, your variable d, which is declared as being a reference to a Derived is actually a reference to the base class Base.

    At this point everything falls apart. Your program calls some method like d.OnlyInDerived(), and the CLR ends up executing a method on an object that doesn't even support that method.

    You actually knew this; you just didn't know it. Let's start from the easier cases and work up. First, passing a reference into a function:

    void f(SomeClass s);
    
    ...
       T t = new T();
       f(t);
    

    The function f expects to receive a reference to a SomeClass, but you're passing a reference to a T. When is this legal?

    "Duh. T must be SomeClass or a class derived from SomeClass."

    What's good for the goose is good for the gander. When you pass a parameter as ref, it not only goes into the method, but it also comes out. (Not strictly true but close enough.) You can think of it as a bidirectional parameter to the function call. Therefore, the rule "If a function expects a reference to a class, you must provide a reference to that class or a derived class" applies in both directions. When the parameter goes in, you must provide a reference to that class or a derived class. And when the parameter comes out, it also must be a reference to that class or a derived class (because the function is "passing the parameter" back to you, the caller).

    But the only time that S can be T or a subclass, while simultaneously having T be S or a subclass is when S and T are the same thing. This is just the law of antisymmetry for partially-ordered sets: "if a ≤ b and b ≤ a, then a = b."

  • The Old New Thing

    How did the invalid floating point operand exception get raised when I disabled it?

    • 23 Comments

    Last time, we learned about the dangers of uninitialized floating point variables but left with a puzzle: Why wasn't this caught during internal testing?

    I dropped a hint when I described how SNaNs work: You have to ask the processor to raise an exception when it encounters a signaling NaN, and the program disabled that exception. Why was an exception being raised when it had been disabled?

    The clue to the cause was that the customer that was encountering the crash reported that it tended to happen after they printed a report. It turns out that the customer's printer driver was re-enabling the invalid operand exception in its DLL_PROCESS_ATTACH handler. Since the exception was enabled, the SNaN exception, which was previously masked, was now live, and it crashed the program.

    I've also seen DLLs change the floating point rounding state in their DLL_PROCESS_ATTACH handler. This behavior can be traced back to old versions of the C runtime library which reset the floating point state as part of their DLL_PROCESS_ATTACH; this behavior was corrected as long ago as 2002 (possibly even earlier; I don't know for sure). Obviously that printer driver was even older. Good luck convincing the vendor to fix a bug in a driver for a printer they most likely don't even manufacture any more. If anything, they'll probably just treat it as incentive for you to buy a new printer.

    When you load external code into your process, you implicitly trust that the code won't screw you up. This is just another example of how a DLL can inadvertently screw you up.

    Sidebar

    One might argue that the LoadLibrary function should save the floating point state before loading a library and restore it afterwards. This is an easy suggestion to make in retrospect. Writing software would be so much easier if people would just extend the courtesy of coming up with a comprehensive list of "bugs applications will have that you should protect against" before you design the platform. That way, when a new class of application bugs is found, and they say "You should've protected against this!", you can point to the list and say, "Nuh, uh, you didn't put it on the list. You had your chance."

    As a mental exercise for yourself: Come up with a list of "all the bugs that the LoadLibrary function should protect against" and how the LoadLibrary function would go about doing it.

  • The Old New Thing

    2007 year-end link clearance

    • 5 Comments

    A few random links that I've collected over the last six months.

    And then the obligatory plug for my column in TechNet Magazine, which, despite the fact that Microsoft's name is on the magazine cover, does not establish the official Microsoft position on anything.

  • The Old New Thing

    Don't be helpless: You can put things together, it doesn't have to be a single command

    • 29 Comments

    Humans are distinguished among all animal species by their advanced development of and heavy reliance on tools. Don't betray your ancestors. Use those tools you have.

    For example, during the debugging of a thread pool problem, it looked like somebody did a PostThreadMessage to a thread pool thread and left the message unprocessed after the thread pool function returned. Who could it have been? Well, one idea was to see if there were any DLLs in the system which called both QueueUserWorkItem and PostThreadMessage.

    I did a little legwork and contributed the following analysis to the mail thread:

    Of all the DLLs loaded into the process, the following call PostThreadMessage:

    SHLWAPI.dll 77D72436 221 PostThreadMessageA
    SHELL32.dll 77D78596 222 PostThreadMessageW
    ole32.dll 77D78596 222 PostThreadMessageW
    ... (list trimmed; you get the idea) ...

    Of those DLLs, these also call QueueUserWorkItem:

    shlwapi.dll
    shell32.dll
    ... (list trimmed; you get the idea) ...

    Astounded, somebody wanted to know how I came up with that list.

    Nothing magic. You have the tools, you have a brain, so connect the dots.

    The lm debugger command lists all the DLLs loaded into the process. Copy the output from the debugger window and paste it into a text file. Now write a little script that takes each line of the text file and does a link /dump /imports on the corresponding DLL. I happen to prefer perl for this sort of thing, but you can use a boring batch file if you like.

    for /f %i in (dlls.txt) do ^
    @echo %i & link /dump /imports %i | findstr PostThreadMessage
    

    Scrape the results off the screen, prune out the misses, and there you have it.

    "I tried that, but the result wasn't in the same format as what you posted."

    Well, yeah. There's no law that says that I can't manually reformat the data before presenting it in an email message. Since there were only a dozen hits, it's not worth writing a script to do that type of data munging. Typing "backspace, home, up-arrow" twelve times is a lot faster than writing a script to take the output of the above batch file and turn it into the output I used in the email message.

    Another boring batch file filters the list to those DLLs that also call QueueUserWorkItem. Writing it (or a script in your favorite language) is left as an exercise.

    No rocket science here. Just taking a bunch of tools and putting them together to solve a problem. That's what your brain is for, after all.

Page 7 of 431 (4,310 items) «56789»