Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • (Before reading this article, I actually thought the base call was a "magic virtual base call" that called the nearest base class function.)

    But..., the whole point of the "runtime-assembly-binding bindingRedirect" in configuration files is to be able to replace assemblies with new versions without recompiling the code.

    It's extremly important that one doesn't place code in new overrides. When Microsoft releases service packs for the framework, do they (you) never create new overrides in non-sealed public classes? I thought this was happening frequently especially in UI-control-libraries where you are supposed to inherit from base classes. If they do, everyone has to re-compile all the applications for them to work properly.

  • I seem to remember reading something about the Java developers fixing this bug (yes, I consider it a bug) in JDK1.2 or so.

    I'm with the customer: I find it pretty horrific that this wasn't fixed. I understand the logic of being conservative about changing things, but this strikes me as dangerous.

    People who have the mental model that the customer has, are likely to use (and in fact I frequently DO use) the ability to override something as a sort of 'security wrapper' around the base class methods. I put security in quotes intentionally: I know it's bypassable if you have full trust or the ability to use reflection or whatever. It's a guard against doing something *accidentally*, to enforce invariants. For example, consider this in a world pre-generics:

    public class StringList : ArrayList {

     override Add(object o) {

       if (!(o is String)) throw new ArgumentException("parameter must be a string");

       base.Add(o);

     }

     // and override other methods as well similarly

    }

    Suppose that somebody else inherited from a prior version of StringList in which the StringList developer forgot to override Add. Now they can add things into the StringList that aren't strings. Your approach means that the developer of StringList can't assume that his overrides will actually be called. Is that really the "conservative" approach?

  • Couldn't there be some sort of attribute that tells the compiler that we want bravo to be inserted between alpha and charlie?  Then we could expose this technique in situations where we expect our Type II development to insert method calls, while preventing exposure of other calls?  

  • In fact... I seem to remember that the Java developers considered the bug so problematic that not only did they change the compiler in the next version, but they also *changed the specification of how the runtime handled non-virtual calls to virtual methods* so that code compiled with the previous, buggy versions of the compiler would still run correctly. I can't remember how they changed it - I think it struck me as hacky, and off the top of my head, I can't see an algorithm that'd have the desired effect. But I'm fairly sure that's what they did.

    The C# language syntax for the feature points pretty clearly to the customer's interpretation, too. It's "base.whatever()". The base class of Charlie is Bravo, not Alpha. So a plain reading of base.whatever() is "call Bravo's implementation of this" - which may, in the end, delegate to Alpha's.

    Note to self: add do-nothing overrides for every virtual method every time I inherit from a class, from now on...

  • The customer-proposed behavior seems to fit better with dynamic languages to me.  A lot of C# is about pushing as much verification as possible into the compiler, such as type analysis and deciding what method to call.  As more work gets moved to runtime, you become more dependent on automated tests for correctness.  In C#, you will almost never end up with a production bug because you didn't specify the right number of arguments or called a non-existent method, especially if you recompile against new versions of binaries.  In a dynamic language, these things can happen easily without sufficient testing.  So, while the current behavior may seem non-intuitive at first, I think it is the correct behavior for a static language like C#.  I may start to feel differently depending on the direction which the new dynamic features in the language take.

  • It seems like 'you' (the fake one having the fake conversation with the fake customer) actually want to be able to write a C# equivalent to 'this->Alpha::M()'.  When you say "A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure" it's true only so far as you are actually sure what 'base.Bleck()' is calling.  The only way to *be* sure in the current model is to read the source of the class that you are inheriting from and actually check whether the method is overridden or not.  That seems a bit sketchy to me, but I can see where you are coming from: we generally expect things to get modified at the leaves, and test or design to that effect.

    So, time for new syntax and breaking changes, hurray!

  • It seems like 'you' (the fake one having the fake conversation with the fake customer) actually want to be able to write a C# equivalent to 'this->Alpha::M()'.  When you say "A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure" it's true only so far as you are actually sure what 'base.Bleck()' is calling.  The only way to *be* sure in the current model is to read the source of the class that you are inheriting from and actually check whether the method is overridden or not.  That seems a bit sketchy to me, but I can see where you are coming from: we generally expect things to get modified at the leaves, and test or design to that effect.

    So, time for new syntax and breaking changes, hurray!

  • More to the point, if this your mental model:

    "I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure."

    This is an example of the "it rather involved being on the other side of this airtight hatchway" attitude: You're running code linking to a different version of Bravo.dll!

    Unless the ONLY thing you do with Bravo.dll is make base class methods that are expected to skip it entirely and call through to Alpha, you're ALREADY in an untested scenario and you can't presume any of your invariants to hold.

    Use some kind of strong naming and assembly binding to guarantee that your code won't run if the version of Bravo.dll isn't exactly the one you tested against.

    The current behavior means that the author of Bravo.dll is getting HIS invariants silently bypassed without any ability to do anything about it. If you have to write code that presumes that any time you override a method your override might be silently skipped, that makes it really hard to enforce invariants!

  • By coincidence I posed a similar question last week, both at StackOverflow (http://stackoverflow.com/questions/2476754/method-binding-to-base-method-in-external-library-cant-handle-new-virtual-method) and to you Eric directly, so I've been waiting on this post to go live...

    I can understand your arguments but I still don't fully agree with you so I'll do my best to make you reconsider.

    Firstly, as far as I can see from talking to other developers and from the existence of the customer in your post (ok, maybe this is not statistically significant but still...), most developers not actually working on the C# compiler does seem to believe that the base call is some kind of magic virtual call.

    I don't really buy the argument that making the base call semi-virtual would be less safe or predictable, if the evil hackers were able to get you to use their version of the dll they can probably steal your data and/or crash your program anyway. This is why there are such things as signed libraries.

    The reason why you never heard this question before is probably because most of the time it doesn't matter, it mostly only matters when you develop a library containing classes meant to be derived from and where you deploy new versions of the library without recompiling the applications. This is, in my opinion, a not-so-crazy scenario. (In our case we make a framework for building web sites, containing, among other things, controls of various kinds. Web controls have lots of life cycle event methods, OnInit, Onload, OnPreRender etc, and at times changes to our basic controls make it necessary to add an overload to one of these methods at new level in the hierarchy. On the other hand we like to be able to deploy new versions of our product to live sites with a minimum of downtime and tricky manual steps like recompiling the entire site...)

    I understand that changing something like this is probably more complicated than I think but in my opinion this would be a change with very small negative consquences and the result would be that the base call would work the way most people seem to think it does. As it is right now we are tempted to just add slews of "empty" overrides just in case we might want to add some code there some time in the future.

  • I think that the real surprise here is not in the behavior of "base" per se, so much so as it is with the apparent mismatch between "base" and "override" semantics. If "override M" in C would have overridden A.M and ignored B.M here as well, an argument for consistent rule could be made: always determine those things out at compile-time. But this is not the case, so from a perspective of a developer who wants "100% of the safe, predictable, understandable, non-dynamic, testable behavior" - well, but you don't have that here already, either way!

    And the reason why a match here is important is that, in practice, "base" is used together with "override" - in an overridden method, to call a base implementation -  9 times out of 10, if not more often than that. So, for many developers, the mental model of what "override" means is _defined in terms of "base"_!

  • I'm pretty sure you're wrong on this one, Eric, particularly with respect to the principle of least surprise.

    The OO mental model for base.Foo() is that you pass the message along to your superclass. That's it. It's out of your hands then. Static analysis doesn't come into the equation - it's just an implementation detail.

  • I'm not sure I can agree with you. As a matter of fact, you are the one who convinced me that you're wrong: http://stackoverflow.com/questions/2323401/how-to-call-base-base-method/2327821#2327821

    It seems that you said that calling "base.base.M()" is illegal in C# because it could break the invariants of the base class. However today you say that I am stuck with "base.base.M()" even when what I meant was "base.M()" because it maintains my invariants.

    What good is maintaining my invariants if it breaks the invariants of my base class? How can I possibly expect to maintain my invariants if I'm doing something known to break the invariants of my base class?

    Well, at least now I know how to call "base.base.M()". I just have to compile against a version of the base class that doesn't override M, and then run against the real version of the class that does override M.

  • I'm with the customer on this one too.

    This is quite contrary to my expectations.

    As others have said, I would expect to be able to switch and change existing DLLs and have related code automatically be updated without re-compilation.

    More fundamentally, though, I would expect the compiler to honor my virtual code, even if it thinks it knows better. Such optimizations belong in the runtime, not baked into the compiled IL.

    I now fear that I will need to review many virtual calls at the IL level and possibly patch the IL...

    I used to manually compile variance code before C# 4 which now supports it. Now I may need to do this on virtual calls... maybe I should just do all of my coding in IL.

    This is terrible. Please fix it.

  • > Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

    > No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky.

    I discovered that  a while ago, and was rather surprised myself:

    http://stackoverflow.com/questions/1456785/a-definite-guide-to-api-breaking-changes-in-net/1522718#1522718

    But it is, in fact, well-specified by Ecma-335 (PII 3.19 "call"):

    "If the method does not exist in the class specified by the metadata token, the base classes are searched to find the most derived class which defines the method and that method is called.

    [Rationale: This implements “call base class” behavior. end rationale]"

    Interesting; if I understand the Rationale correctly, they have in fact specified it this way precisely so as to provide a way to implement the "base" semantics that the customer is asking for!

  • This is not crazy-seeming at all. In fact, I'm pretty sure this will be causing hard-to-find bugs when using existing binaries on .NET 4 - surely some non-sealed classes got new overrides?

    I suppose this is also the reason why this was fixed in Java.

    Yes it's a breaking change and too late to fix that now (especially for existing .NET 2 binaries); but please fix it for the next compiler version! The meaning of "base" should be the immediate base class, not "whatever base class happened to have that method when the assembly was compiled".

Page 1 of 9 (134 items) 12345»