Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • Suppose you have a bug fix you inserted into Bravo, that makes Charlie work correctly.  Charlie was created by the Type 2 team at a distant location, or there could be several different versions of Charlie   created by various type 2 teams around the world.  Are you saying that all of them have to recompile their Charlie because we added a bug fix implementation in Bravo?

  • @Gavin

    In addition to silently failing in cryptic ways, it is also possible to replace _some_ of the code in a base class and to have some other code not execute. I still don't see how this is ever acceptable. Either fail to load, or load, but don't half load.

  • I'm impressed. Sounds like a sensible way to avoid (the .net version of) DLL hell.

  • > just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime

    I propose we refer to this performance penalty as an "extra base hit".

  • I can see the arguments for both sides, but I tend to agree with the customer (and the majority of comments). Having to recompile *every* bit of .NET code every time there's a service pack or security patch is not an acceptable option; neither is manually inserting no-op overrides of every virtual method from your base classes "just in case".

    If Bravo v2 breaks Charlie v1, you'll always need to recompile. With the pre-2003 compiler, non-breaking changes wouldn't require a recompilation; with the current version, they *might*, and you have no way of knowing until things start breaking in interesting ways.

    If the base method call works 99% of the time when Bravo.dll is replaced, and crashes 1% of the time when BravoCorp do something stupid, surely that's better than not working quite as you expect 99% of the time and the other 1% also not quite working as you expected (but not crashing).

    In other words, if there's something wrong, it's better to crash than to do the wrong thing.

  • I'm completely with Eric on this. If you change a component you should recompile and re-test every component that references it - directly or indirectly.

    Even if we didn't agree on this, changing the compiler as the customer requires would not achieve anything: there are many other situations in which our code would go berserk if we just swapped a DLL without recompiling. Constants and enumerations come to mind... maybe they can be fixed too, to a point, but I cannot imagine how that might happen without giving up at least constant folding in the process. No, thanks.

  • I read the article but skimmed through the comments so if this question has already been asked I apologize.

    What happens if the order of operation is changed just a little. What if M exists in Bravo at the time that Charlie is compiled. Then M is removed from Bravo, recompiled (without compiling Charlie).

    I tried this and got Alph / Charlie once I recompiled Bravo. Is this the correct behavior?

    If this is correct then the scenario the "customer" wants can be done, through a strange work around but it can be done.

  • Wow, I was annoyed when I read your article the first time, and now I am down right angry. You give advice that totally contradicts the entire versioning model of the CLR. You said:

    "If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy."

    Yet one of the core features of dotnet is that it is generally backward compatible. In fact the CLR team seems to take pains to keep it backward compatible. (A bunch of stuff changed from 1.1, and some backward compatibility was broken, but the majority of my 1.1 code and dlls run fine on 2.0 and later.) It would be absolutely absurd if we had to recompile everything after each time the CLR is updated or patched, something that just can't be done without modifying base classes.

  • Intuitively I side with the customer, but practically I side with the status quo.

    Is it counter-intuitive? Yes.

    Should the JIT'er do the base resolution? Maybe

    BUT ... who in their right mind would ever just chuck a new dll without testing in the first place? In my entire life I've never had an update from a 3rd party that didn't break some existing code. Updates from your own team are worse as they generally don't follow any sort of rigor when designing assemblies.

    Nobody said creating API's was easy, and how many project dll's do you have that are 1.0.0.0?

    It does beg the question though why the runtime lets you do this in the first place if it's so dangerous / error prone. If everything truly is that dependent then the runtime should kill it instead of allowing it to continue. Maybe saying the checksum of bravo.dll is 123 but on disk is 456 -> Fail.

    Maybe the solution is to have an attribute on the class that will instruct the compiler to create stub methods for all overrides. That way (assuming there are no other breaking changes) if you need to replace Bravo.dll, when compilation occurs, the compiler can say "OK I'll give you stub methods so all base() calls from your descendants will always go to you, which lets you do this replacement scenario.

    At least that way the CLR doesn't change, the compiler has minimal change, it's very explicit, and the safe/good defaults stay in place.

  • The question is where the “middle” is. Is it within MSIL, or JIT/NGEN?

    An object using polymorphism has a hidden “vtable”. When to bind the vtable to the object is the key to confront both sides.

    Eric side wins if people agree the vtable should be build when C# compiler is compiling code.

    Eric side loses if people agree C# complier just making the MSIL. JIT or NGEN should do the binding based on those managed DLLs are used.

    Personally, I like the first mental model at work. But from backward compatibility point of view, I support the second mental model, because that is the unmanaged C++ vtable should look like, and by that time, there is no unknown there.

    I think the ball should roll back to the CLR team. Or someone should redefine what “virtual” means in .net framework instead of adopting the “virtual” from C++ if the problem is too large to be addressed there.

  • Exciting reading as always in your excellent blog Eric.

    I fully understand that this topic is not a clear-cut question with a "everyone is happy - no problems whatsoever"-answer.

    I thought that I would look at a another real life scenario. (Apart from the issues we have with this in a product in our company.)

    I looked at the service pack changes made in one of the WPF-assemblies by Microsoft.

    I compared the 3.0.6920.0 version (the version you get if you installed the original 3.5-framework) of PresentationFramework.dll with the version I get if  I update it using Windows Update (3.0.6920.4902).

    There were some methods and properties that have new overrides in the new service pack version. Some of the methods were:

    System.Windows.Controls.TreeViewItem: MeasureOverride

    System.Windows.Controls.VirtualizingStackPanel: OnGotKeyboardFocus and OnLostKeyboardFocus

    System.Windows.Documents.Table: BeginInit and EndInit

    I haven't checked what code was added in the functions mentioned, but I suspect that at least some bad things will happen if it is not executed by derived classes that was compiled with a version earlier than 3.5SP1.

    Keeping binary compatibility is not easy. You constantly have to make decisions what changes can be made, and what impact will they have. Like some of you have mentioned before me, there are some things you can change and some things you can't, and still have it binary compatible. But not being able to add new overrides in classes and be somewhat confident that the base-code will be executed (by the standard base-call), severely limits the things you can change.

    Apparently at least the developers of the WPF seems to think that, since they have created new overrides on some of the classes in a service pack. Or maybe those changes were necessary, and they estimated that not that much derived code would break by releasing the service packs. (My guess is they didn't think they were introducing problems just by overriding those methods.)

    Of course, this doesn't mean that this behavior has to change. It's just one more example of what some (and I go out on a limb and say, most) developers expect from the "base-call". The solution of course is, when in doubt, recompile. But that equates to: "Everytime Microsoft releases a new hotfix or service pack, recompile and update all installations of the application." That is not a very preferable scenario.

    Disclaimer: I compared the 3.0.6920.0-version from a Vista-machine with the 3.0.6920.4902-version on my Windows 7 machine. Since I haven't looked at the actual code added in the SP, all the changes they made, can possibly be of the type that doesn't have to be executed. Or they expected the changes to be breaking changes. My analysis of the assembly may also be faulty. Then, my apologies to the WPF-team, for thinking that you didn't fully understand all the details of how the compiler/framework worked.

  • I don't understand these comments.  The majority seems to be saying "I want to be able to change the method signatures of my base class without recompiling any derived classes."  That's just crazy. Even by adding an override, you are changing the exposed surface of the class.

    As I understand it, you don't _have_ to recompile Bravo.dll every time you make a change, but only  if you add a method or change the signature of any exposed method.  That sounds reasonable to me.

    If you change a _method_ signature, you have to recompile all code that references it, yes?  Likewise, if you change a class signature, you have to recompile all the classes that derive from it.  How can you safely derive from a class if you don't even know what methods it exposes?  Besides, there are other ways to accomplish this without all the fuss.

  • nikov said: I wonder, what behavior the customer would expect, if Bravo were modified as follows:

    The () method is still the best match. You would need a hidebyname method (e.g. if it were written in Visual Basic with the Shadows keyword) for this to even change if you _did_ recompile Charlie. Even if you'd picked a better example (say Alpha had the params object[]) still the difference that it is overriding the _same_ method, rather than being a different method with the same name or same signature. You'd no more expect to pick it up without recompiling than if it were another method added to Alpha.

    Incidentally - Intellisense erroneously shows hidden methods as overloads in a hidebyname situation (even though C# itself prevents access to the hidden methods without a cast)

    Gavin Greig said: I don't think those concerns should override common sense; by default, compiled code should not change its behaviour when a change like this occurs - when completely unknown code is inserted where no method existed before. That should only occur after the new method's been explicitly approved through recompilation (and retesting, etc.).

    Except it is not "no method existed before" - and, taking your argument to its logical conclusion, a new DLL should only be usable _at all_ by recompiling. What's so special about a new override (not a new method or even a new overload) that requires being "explicitly approved through recompilation (and retesting, etc)" when other code changes in the library, such as changing an "existing" override [whose body was a single base call] to do something else, or even an added override in Bravo being called _on an instance of Bravo_, do not? You haven't defined "a change like this" clearly in a way that allows your interpretation to be 'common sense' and that means that the existing behavior in all [any!] other cases follows 'common sense'.

    BW said: I'm impressed. Sounds like a sensible way to avoid (the .net version of) DLL hell.

    Sounds more like a way to create it, if Bravo.M2 [which exists in both versions] is modified in version 2 to depend on Bravo.M having been called, those changes _will_ silently be called _without_ Bravo.M having been called. If you're going to say as a philosophical point that any change to a base class ought to require recompiling any code that derives from it, then you need to make sure that _any change to a base class_ will make classes derived from previous versions fail to run, rather than merely silently leaving out _some_ (but obv. not all) of the changes

  • I'm sorry to say that the C# team is wrong wrong wrong on this one. I've never run into this issue and I don't think too many have since it hasn't cropped up all that much but nonetheless, its IMHO a pretty big problem that should be fixed ASAP.

    I'm not really go into any technicalities but my reasoning as to why the current behaviour is so wrong goes as follows:

    Imagine the original Bravo.dll implementation were something like this:

    class Bravo: Alpha

    {

       protected override bool DoM() {

       ...

       return whatever

       }

    }

    Charlie implementation is:

    class Charlie: Bravo

    {

        protected override M()

        {

              if (base.DoM())

                 base.M();

         }

    }

    When I see this code I understand (at least up to now) base as the immediately less derived class in the inheritance chain. In this case that would be Bravo. I don't really care if the implementation of the methods I'm calling is in Bravo or in Alpha, i'm still calling THROUGH Bravo because thats what intuitively base means to me.

    If suddenly we change the Bravo implementation and add an override of M() what happens with the code in Charlie? Well, we have the really bad situation of having base mean two entirely different things in the same declaration body. That is base.DoM() really means base.DoM() but base.M() actually means base.base.M().

    I'm sorry but that is plain wrong and it goes against all that Eric has written in previous posts about the design philosophy of C#.

  • I should add that base in the current implementation is always meaning two different things. But when we change the implementation in Bravo its when the "two meaning" ambiguity becomes obvious.

    I can find thousands of places in this blog where Eric has stated that one of the c# design principles is that something cant have two meanings in the same declaration scope. Sadly it seems to not be the case with the base keyword. I just found out thanks to this blog though.

Page 7 of 9 (134 items) «56789