Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • >Without recompiling Charlie.DLL, should Charlie.DLL now start calling Bravo.M(int) ?

    No it should not. Here's why:

    In my mental model, the phrase "calling a method" is an abbreviation for "sending a message to an object, that reacts by executing a method". That's were the word "method" comes from: it's the objects method of responding to a message.

    Overload resolution is about determining the message to send, not the method to execute. Overload resolution _conceptually_ happens at the call site, when compiling the call. Choosing the method to react to a particular message _conceptually_ happens at the receiving site. As the receiving site may have been compiled before or after the calling site, the call site cannot statically use any information about it, except when the receiving site is compiled together with a call site, because it is in the same module.

    Note that the same is true for inlining a call. A hypothetical super-optimizing C# compiler would be allowed to inline a non-virtual call (before compiling to IL that is) if and only if the callee is compiled into the same module as the caller.

    Another point of view, compatible with the previous one, is that overloads are actually different methods, with different messages (message = signature + parameter values). It's like the good old C++ compilers, that use(d) name mangling to enable overloading method names. I think of overloads as methods with different names.

    To complete the point: virtual methods are very different from overloaded methods, as a virtual method and its override are two different methods reacting to the same message (or slot in the VMT), where two overloaded methods react to two different messages (and if they happen to be virtual, have two different slots in the VMT).

  • To commenters:

    1) Whether you agree or not, other people may depend on the current behavior without even realizing it.  They would be quite surprised if their code stopped working and required them to change their entire inheritance model.

    2) There are many other ways to accomplish the desired behavior

    3) Do you really want your code to depend on such a subtle and misunderstood detail? Wouldn't it be better if such an important architectural detail were spelled out explicitly?

    4) If you really can't recompile, see point 2

    5) To those of you who say Microsoft should change to match the customer's mental model, well, other customers may have different mental models than you.  Please don't break my code because you think yours is more important.

  • Eric wrote:

    > One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

    And share all the interesting parts with us in a lengthy series of blog posts, I hope. :)

  • Count me in on the "surprised" crowd.

    If an assembly is not strongly named, then it is possible and allowed to run against a version that is different from originally compiled. Heck, even if _is_ strongly named you can do it. The scenario here is a library upgrade/patch. Yes, if it is your code you should recompile and retest against the new library. But what about the case of a library that is used by another library, or an executable? This is a valid scenario.

    Steve Bjorg hit it on the head with the example that clearly shows it is the wrong behavior. I have a system that uses a 3rd party library(hardly uncommon in open source tool installations). Unbenownst to be (as a customer) the implementor derived from a base in the library.

    Example:

    LibraryA has the following:

       public class LoggerBase: IDisposable

       {

           protected virtual void Dispose(bool disposing)

           {

           }

           public void Dispose()

           {

               GC.SuppressFinalize(this);

               Dispose(true);

           }

       }

       public class FileLogger : LoggerBase

       {

           private FileStream file;

           public FileLogger()

           {

               file = new FileStream("Log.log");

           }

           //Probably should implement dispose, but we're going to fix this class later anyway.

       }

    This is part of a framework used by some build system I have or something, and they inherit SpecialLogger from FileLogger.  

    Then the library implementors decided that using a managed resource was too "expensive", and decided to go an unmanaged route.  They put out a new version with:

       public class FileLogger : LoggerBase

       {

           private int filePtr;

           public FileLogger()

           {

               filePtr = AllocateUnmanagedResource();

           }

           protected override void  Dispose(bool disposing)

           {

               ReleaseUnmanagedResource(filePtr);

           base.Dispose(disposing);

           }

       }

    If I patch my library, then I start leaking handles because only HALF of the modified code is executing. In what way is that _ever_ correct behavior?

    "Surely what is good for some non-virtual calls is good for all non-virtual calls, no?"

    This is really the crux of the different viewpoints. This question doesn't even make sense to me, because until now I assumed that base.Foo() WAS virtual. I guess to you it makes sense that it isn't, because that's how it was done.

    So of course the new non-virtual M(int) shouldn't be called. It wasn't a virtual call, it was a non-virtual call to Alpha.M(object). Somehow I don't see the conflict here.

  • Couldn't this unusual possibility also occur?

    public class Bravo: Delta // In new Bravo.dll, now pointing to a new base class

    {

     public override void M()

     {

       Console.WriteLine("Bravo");

       base.M();

     }

    }

    ...where Delta looks like...

    public class Delta

    {

     public virtual void M()

     {

       Console.WriteLine("Delta");

     }

    }

    Now what happens?

  • Eric,

    This has been a fascinating discussion. Is there any way to share these design notes? I am sure these notes would be far more interesting read than the spec. I know, I know, you will not be able to share those design notes, but it wouldn't hurt to ask, right? :)

  • I see the problem with the update. And I see why it's not a problem for Java - there's no "new" or "override" keyword so it's impossible for a method inserted into the hierarchy with the same name and signature NOT to be an override.

    I'm not sure I agree with the tradeoff; I'd rather have a crash in the case where someone introduces a new private method with the same name as an override, than have the wrong behavior silently succeed. But at least I understand the reason for the tradeoff.

    And it does, indeed, seem to be one without a good solution, unfortunately. I'd argue for the new CLI opcode, but obviously the C# team tried that and failed. Perhaps a compiler option to switch to calling Bravo.M() even if it meant that certain kinds of changes to Bravo might cause the code to crash? To my mind, "certain kinds of changes to Bravo might cause my code to crash" is better than "certain kinds of changes to Bravo might cause my code to completely ignore all the invariants that the author of Bravo was trying to enforce, but still continue to run without giving any error messages".

    The only other solution I can come up with is to have a way for the compiler to automatically insert a do-nothing stub for every non-overridden base class virtual method, equivalent to override M() { base.M(); }. But that'd enlarge the code quite a lot and mostly unnecessarily.

  • Eric,

    I am always impressed by the thoughtfullness and thouroughness of the C# compiler team - the fact that they identified this issue in the early days is impressive - and the complexity of the problems that have to be coordinated between the CLR team and the language teams is also remarkable. The insight this kind of discussion provides into language and runtime design is wonderful, thank you.

    One question that occurs to me though.  When adding a private new M(), why can't the compiler also generate an overriding M() that pass-through calls base.M()? Presumably this would allow the CLR at runtime to correctly select between the two. I understand that normally (in user-written C# code) you can't both override a method M() and supply a private method M() directly, but what prevents the compiler from being able to do so? Presumably the private/override versions of M() are placed in different "slots", yes?

  • If I understand the behavior correctly, it used to work as most of us expect. However, swapping out the Bravo assembly with one that has

    public class Bravo: Alpha

    {

     new private void M()

     {

       Console.WriteLine("Bravo");

     }

    }

    would cause your program to crash unless you recompile. Then they fixed it to make that scenario work, but broke the scenario where a maintenance release (fixing bugs, patching security holes) adds an override to a class that didn't have one before.

    So instead of an extremely unlikely situation breaking my program and crashing, an unusual, but still less unlikely, situation silently breaks my program's invariants with no way to debug? I think I like the old behavior better.

    I don't see "new private" methods as something you add to a class; I see them as something you already had in your class, but you had to add the "new" only because a base class was modified to include a new method of the same name. If I'm the author of Bravo class, and I already know that my base class Alpha has a method M, why would I create a new private method M? Unless I really wanted to confuse all maintainers of the class, I would pick a name for the method that doesn't have the same name as an unrelated method of the base class.

    Am I missing something here? How often does somebody add a new private method to a class that has the same name as a virtual method of a base class?

  • Aren't all the calls we're discussing made using a particular mdtoken representing the target method, and not by name?  So I'm a little surprised that there exists an mdtoken for B::M() when B didn't override A::M(), and I'm doubly surprised that the metadata token which represents that potential B::M() is structured in such a way that it could match "B::new M()" instead of "The implementation of A::M() in class B" which clearly is that sort of "truncated search virtcall" that fixing this problem requires.  In fact, use of Reflection.Emit certainly led me to believe that the metadata token was of the latter style.

    In fact, it sounds like the CLR could provide the desired behavior without a new instruction.  I propose that if you upcast arg.0 and then do a virtcall, you ought to get the "truncated search".  Currently you'd do this to call the most overridden method prior to a "new" method, it seems like it shouldn't break any existing code to define the truncated search if this is used in the absence of a "new" method (and then it continues to work if a "new" method if added by an update to the assembly defining an intermediate parent class).

  • > Whether you agree or not, other people may depend on the current behavior without even realizing it.  They would be quite surprised if their code stopped working and required them to change their entire inheritance model.

    Note that this works both ways - there may also be existing code which is written using the "wrong" mental model, and is therefore buggy, without people who wrote it knowing about that.

  • @Ben

    > Aren't all the calls we're discussing made using a particular mdtoken representing the target method, and not by name?  So I'm a little surprised that there exists an mdtoken for B::M() when B didn't override A::M()

    Have a look at Ecma-335 describing "call" instruction. I had cited the relevant paragraph in an earlier comment, but here it is again, for convenience:

    "If the method does not exist in the class specified by the metadata token, the base classes are searched to find the most derived class which defines the method and that method is called.

    [Rationale: This implements“call base class” behavior. end rationale]"

    >  and I'm doubly surprised that the metadata token which represents that potential B::M() is structured in such a way that it could match "B::new M()" instead of "The implementation of A::M() in class B"

    There is no difference between the two. On MSIL level, a non-newslot method named M will override any method M with a matching signature inherited from a base class. Thus, a token for B::M always references M in B, whether it overrides anything or not.

    .override is a different thing, but it's orthogonal to all this.

    > In fact, it sounds like the CLR could provide the desired behavior without a new instruction.  I propose that if you upcast arg.0 and then do a virtcall, you ought to get the "truncated search".  Currently you'd do this to call the most overridden method prior to a "new" method, it seems like it shouldn't break any existing code to define the truncated search if this is used in the absence of a "new" method (and then it continues to work if a "new" method if added by an update to the assembly defining an intermediate parent class).

    It seems to me that it would break any existing IL that presently contains no-op upcasts on ldarg.0. While the upcasts are meaningless, such IL is perfectly valid, and has a well-defined meaning alread.

  • Hrm... so the behavior was as a lot of us expected, but a really obscure case would crash the CLR. So a breaking change to C# compiler was made in 2003.

    I really don't see how this would have been a breaking change in the CLR. After all, a compiler would still be free to make any sort of call they wanted. It would really be a "do a virtual dispatch call on object X, treating it was type Y for method lookup". Could be a useful instruction in other cases.

    It seems that would have been a not to difficult and fairly safe _addition_ to the CLR, in order to prevent a breaking change to the compiler.

    Weird.

  • The current compiler behaviour can still cause runtime crashes.  Suppose you have four classes in the hierarchy.

    Insert an extra class Aleph in the hierarchy between Alpha and Bravo and give Bravo an override of method M.  When Charlie is compiled the call to M goes to Bravo.  In a future update Bravo loses its override of M and, at runtime, the call to Bravo.M ends up at Alpha.M and everything still works.  But then another update comes along adding a private method M to Aleph.  And the code crashes at runtime.  But maybe this is a little far-fetched.

    P.S. I know the blogging software isn't your problem, but the captcha image doesn't appear in FireFox.  Since it's a JPEG with an aspx extension I'm guessing the server isn't sending the correct mimetype.

  • This has been a very interesting discussion, and I am suprised by the number of people who have not been aware of this. Obviously, the majority of readers (or at least posters) are not aware of the basic tenet:

    "If you develop a NON-SEALED class, you MUST test the implementation via (often multiple) derived classes."

    To minimize the impact, one technique is to override every virtual method that you may possibly want to change the implementation at a later day with a simple call to base. [We actuallly do this via our automatic "class setup" wizard].

Page 5 of 9 (134 items) «34567»