Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • +1 for another programmer who thought the customer's mental model was right.

    That being said, dynamically swapping out base classes is risky business, and I would not be surprised if it bit me.

    You admit in your article that there are two equally valid mental models, but imply that your belief is that most programmers expect the static model.  Based on the limited sample of your blog comments, that belief appears to be incorrect.  Would you reconsider your opinion if, for sake of argument, the compiler actually was surprising the vast majority of your developers.

    All that being said, fixing this doesn't even come close to meeting the -100 point barrier for me.  Spend you time on a decent metaprogramming system.

  • Unlike most of the commentors, I actually agree with Eric on this one. The customer controlled all 3 dlls, there is no reason not to distribute the recompiled Charlie.dll with the recompiled Bravo.dll. Compilers aren't some big scary program (on the usage side of things), you just hit CTRL+SHIFT+B and magically dlls spit out in your output directory.

  • @Robert: "The customer controlled all 3 dlls, there is no reason not to distribute the recompiled Charlie.dll with the recompiled Bravo.dll"

    That may be the case in this scenario but it's certainly not always the case. If you're the developer of a library, do you necessarily know to tell all your consumers that they have to recompile any time your library is updated? How are you supposed to patch bugs in your library if you can't get every customer to recompile their code?

    Does .NET document rules for binary compatibility? As in "if you limit your changes to adding new methods and classes and ... and ... and ... then existing code will work without recompilation"? I've always presumed "adding or removing an override to a virtual method" to be one of those binary-compatibility-preserving operations. Apparently it isn't.

  • @Stuart - no matter the language or operating environment, experience has told me, when in doubt, recompile. If I upgrade to a new version of a library, I'm going to recompile and test, and I would expect the same of people who upgrade to the latest and greatest of any library I wrote to do the same. Recompilation is cheap.

  • Even if the object semantic is broken, it's quite natural for a statically-checked language to make this optimisation. Findind at runtime which base method should be called would slow down each and every virtual call.

    Where I work, when we ship hotfixes, we just follow the rule of always delivering a fresh Charlie.dll when just Bravo.dll has changed. This ensures we can never run into this problem.

  • Eric, I must with you.  Especially when there is such a simple way to get the customer’s expected behavior...

    public class Bravo: Alpha // In Bravo.DLL

     public override void M()

     {

       base.M();

     }

    }

    ... if the above code was in Bravo before Charlie was compiled, then the virtual method call would work as he expected.

  • I'm adding my voice to the chorus - the customer's right, and the compiler's behavior is quite surprising - surprising enough to be wrong.

    You say that base.M()is a non-virtual call - but that's a misleading statement; it merely happens to be implemented as such currently.  The code is asking it's _base_ class  (in this context, Bravo) to execute a method - not just any method, but a virtual method.  The fact that Bravo happens to have implemented M() by silently falling through to Alpha's implementation isn't particularly discoverable nor relevant - who cares where Bravo got the implementation from?

    The current resolution means that adding or removing code such as the following is a breaking change (even without reflection or whatnot):

    class A {public virtual int F() {...}}

    class B:A { public override int F(){return base.F();}}

    B's implementation of F clearly looks like a no-op, and it's extremely surprising that removing it changes the semantics of the program.

    Then there's the problem of assymmetry:  You say that you don't expect Charlie to call Bravo when Bravo suddenly implements M() - but the other way around does work - if you remove Bravo's M() then, as expected, things just work - you don't get an assembly load error complaining of a missing function, rather the superclasses method is chosen.  So, base.M() walks the inheritance stack upwards and picks the first M implemented compile time and then again walks the inheritance chain upward runtime - hardly a sane mental model anywhere.  So, is base.M virtual or not? it's marked virtual, it behaves as a virtual call when you remove the implementation, but it behaves like a static call when you add an implementation - that doesn't make _any_ sense.

    The safety argument rings hollow - if you're meddling with various assemblies and overriding virtual methods, and somebody with the rights to rewrite a particular assembly does so, how is is surprising that new or changed implementations will be picked up?  That's kind of the point of virtual methods in the first place.  If you don't trust code, then don't trust an inherited virtual method to do what it happened to do when you compiled it.  Using that as a security boundary is asking for trouble.

  • I dont agree with the customer model at all in this case, contrary to most here it seems.

    If you suddenly decide to override a method in a class that someone else down the line is consuming you are in a way changing the contract with said class, so you need to recompile not only the middle class  but anything that comes after.

    I find it much more logical and safe than the other behaviour.

  • Adam Robinson wrote:

    "While I haven't yet encountered this issue, I know that I would have been utterly confounded by the fact that the compiler exhibited this behavior."

    And I would have to say, I would completely expect this behavior (oddly enough)...although I've never had to deal with it.

    Then again, I look at it from the point of view of having toyed around with building my own compilers in the past. To me, this sort of falls in the category of "it's better to be safe and recompile" rather than assume anything about how things will be handled under the hood.

  • I'm with the customer too. If M is virtual, I always thought that base.M() is a special virtual call that skips the implementation in the current class.

    If M is not-virtual, I expect it to be resolved statically. So, if someone adds a non-virtual M in Bravo, I'd expect the call in Charlie to still resolve to Alpha. In any case, newly introducing a method in Bravo that hides a non-virtual base class method although B is not sealed and has already been derived from sounds to me like bad programming practice. It would only be legitimate if it can't break anything.

  • With respect to your update, Eric:

    > How is that situation made any different from the customer's perspective by making the non-virtual method call a base call instead of an ordinary non-virtual method call? Surely what is good for some non-virtual calls is good for all non-virtual calls, no?

    I think the point of contention is that base.M() is not obviously a "non-virtual call" - in fact, those opposing it would rather want it to behave like virtual. This is even explicitly mentioned in some of the comments above.

    The informal definition of base-calls that many (most?) programmers use is "call the method of the nearest base class". More formally, this would amount to "do a virtual call, as if the type of the object was that of the immediate base".

    So, as a question, it is, perhaps, narrower than it really should be - it makes certain assumptions that are themselves contested by those who'd answer it differently from you.

  • "Just recompile" is not an answer in general, because all 3 assemblies may come from different vendors. Don't forget that Charlie may also be declared in some library which your code is using, and for which no source code is provided.

  • Focus wrote:

    "If you suddenly decide to override a method in a class that someone else down the line is consuming you are in a way changing the contract with said class, so you need to recompile not only the middle class  but anything that comes after."

    It depends. If the newly introduced override can't break derived classes, i.e. doesn't change observable state or behaviour in a way that matters to derived classes, then introducing a new override should be ok. I suppose that in the customer scenario everything would have been fine if the C# compiler behaviour had matched the expectation of the Bravo.dll developer.

  • I always thought that if I didn't override some virtual Foo method it behaves _exactly_ like if I write "public override void Foo(){base.Foo();}". So if there is nothing to say more than call base method, I can safely delete this override, just like I can remove default constructor with empty body or add/remove for events.

    Actual behavior clearly creates absolutely not obvious problem for maintaining binary compatibility. Before I was sure that nobody can skip my methods and directly call base methods omitting call my overrides. Now I can't be sure that my class invariants are preserved.

    It seems now I can get two different behaviors simultaneously: if this method is a part of interface and I call it via this interface or via instance of concrete most derived class. What happens with a call via middle class ((Bravo)new Charlie()).M()? I guess you will also get “Charlie / Alpha” and Bravo.M will be skipped even if there is clearly call to Bravo.M().

    I always read "base" as "virtual call, excepting current class and all types down to hierarchy".

    So this behavior is clearly a bug for me. I vote to change it.

    I see pros for current behavior: non virtual call is much faster, this is very rare case (but this case is looks quite possible), and there is no such "half-virtual" call instruction in IL (really isn't?). But in first place this is wrong behavior, so it cannot be excused by this technical reasons.

    I want you change this. Either by replacing non-virtual call with "half-virtual" call or by forcing recompilation with IL verification to check isn't there some overrides that are jumped over with non-virtual call.

  • Quick extra note: some people noted that this is an optimization; if so, that optimization can still occur at assembly load time; there's no need for the compiler to bother with it and you still retain the lower cost of non virtual dispatch at runtime.

    I'm also curious how fixing this bug is supposed to break existing code - It strikes me as very odd that it's even _possible_ to call Alpha's M() without Bravo's consent - that's certainly not easy (or even possible?) to achieve normally on a virtual function like that - so it's very unlikely there's any code relying on that very hard-to-trigger behavior - right?

Page 3 of 9 (134 items) 12345»