Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • This is clearly a thorny issue with strong arguments for both sides.  Even in light of good arguments, I still feel that recompiling when one of your dependencies changes is the safest thing to do.  If the objects in question were interface implementations, I think everyone would agree that it is not sensible to hot-swap a new interface definition in that had a added methods since the previous version.  The key difference from that case is that with a virtual call there is a fall back candidate.  The CLR's behavior of falling back to the next candidate in the virtual call chain is as worrisome to me as the compiler's current behavior.  I would prefer the CLR to validate that the runtime method chain is consistent with the compile-time method chain, and error out if not.  So in this, case, the CLR could detect that in C.M(), base.M() resolves to Alpha.M(), bypassing Bravo.M() which is almost certainly not intended.

    IMHO, binary compatible changes should be limited to tightly encapsulated implementation details. Protected methods are clearly visible on the outside, and that makes them part of an object's contract.  When the contract changes, beliefs which were previously true cannot be assumed to still be so.

  • I wouldn't expect this to change.  To me in addition to the logic above, this seems like an intuitive part of static or pre-compiled code versus a dynamic language.  In a dynmic environment I can swap out B and have different results.  In a static compilation (even if I've only compiled to a interim language) I can't change source code or swap out a portion of the compiled stack.

    If you prefer the behavior associated with being able to swap out a portion of the class hierarchy, use a dynamic language.  If you expect the static class inheritance behavior use a static language.  Some static languages will offer a dynamic option (C# 4.0 being one such) and within that dynamic or interpretied area I expect the behavior to support changes to the class definition after compilation.

    The advantage of compilation are a static environment and speedier execution, while an interpretive or dynamic environment allows for greater flexibiity with the according difficulty in testing all of the alternatives.  This is part of the evaluation of which language to use when creating a solution.

  • @Bill Sheldon

    I dont agree as the current state of affairs does in fact let you swap as you say with no problems whatsoever. Another issue is if this should be done without recompiling Charlie.

    If the original implementation of Bravo.dll did have an override of M and I decided to change the implementation of M later on and issue a new version of Bravo, Charlie would consume the new version fine. Its the fact that sometimes Charlie WONT notice the changes done in Bravo what is unexpected and IMHO not right.

    These seemingly equivalent implementations of Bravo turn out to be quite different and that is disconcerting to say the least.

    class Bravo {}

    class Bravo {protected override void M() {}}

  • This has been stated before, I just want to emphasize this once more. How can we be sure that UI libraries built on top WindowsForms or WebForms or WPF (ones with deep inheritance hierarchy) compiled under .net 3.5 still be running normally in .net 4?

    It's quite possible that in some control new override apperared in 4.0, and developer who put it there suggested that this is backwards compatible change, that do not need documentation.

  • Some commenters think that you should always recompile all applications whenever you swap in a new dll, you seem to be missing the point. The point is that you *can* swap in a new library dll and most people thought that there were only two possibilities when you did that:

    1. The library contained no breaking changes and the application would still work, running the code in the new version of the library or

    2. The library did contain breaking changes, there would be a runtime error and you would be forced to recompile and perhaps change some code that depended on the library.

    Now we see that there is a third possibility, namely that that the library can contain changes that don't cause a runtime error but will silently fail in running the new code.

    If adding new method overrides is a breaking change, then it should cause a runtime error swapping in dll's containing new method overrides. This would probably break a great number of applications all over the world...

    If adding new method overrides is *not* a breaking change then those new overrides should not be bypassed and the base call should work the way most intuitively think it does. I cannot see any way this can break existing applications, the only real counter-argument is that this means work for the compiler team (and maybe for the CLR team too) and since they don't have unlimited time they have to prioritize and there may be other things that take precedence.

  • We shouldn't forget that object declares virtual methods as well.

    public class DerivedClass: BaseClass

     {

       private readonly int _i;

       public Gamma(int i)

       {

         _i = i;

       }

       public override bool Equals(object obj)

       {

         return _i == ((DerivedClass)obj)._i && base.Equals(obj);

       }

       public override int GetHashCode()

       {

         return base.GetHashCode() ^ _i;

       }

     }

    What if BaseClass changes, and overrides Equals?  DerivedClass would ignore this change, and call object.Equals() instead.

    This is not what I expected at all.

  • > I wouldn't expect this to change. To me in addition to the logic above, this seems like an intuitive part of static or pre-compiled code versus a dynamic language.

    Well, doesn't DLL mean "dynamic link library"? :-) Maybe it's time to consider a different file extension for assemblies.

  • I'd like to join the majority of the commenters and endorse the customers point of view.

    The examples from Tobias Berg and Steve Bjorg really show that there are valid scenarios for expecting a completely runtime based resolution of method calls. Being forced to add dummy overrides in your assemblies just to ensure correct future behavior seems to negate the purpose of having virtual calls in the first place, and certainly renders the "compile-time resolution is safer" argument invalid.

  • np: the fundamental question here , I think, is whether the user IS changing the signature or not. I don't think they are--but that's because I had assumed that every method that I didn't implement really did have that little magical implicit "send it on up the chain":

    class Bravo2 : Alpha { public override void M() { base.M() } }

    This is how a LOT of things work, I think. Routed events will route through everything in the middle, whether they declare a handler or not, and if I add a new event handler half-way up the chain on the fly, my code accomodates it.

    So, +1 for the customer.

  • I fully agree with the customer.

    The above arguments are not convincing.

    I also agree with Sandro's comment. Why would you allow virtual resolution to a private method? Is there any practical or even technical value for this? As you said, "private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them". Fix this at the CLR level and then you are free to generate the appropriate code that nearly all of us are expecting.

  • @Tobias Berg:

    Your scenario is indeed compelling, but it is unfortunately far from reality:

    1) there could be breaking changes in Bravo.dll that do not cause Charlie.dll to fail at all. A trivial case in which this is true is when Charlie isn't using all the features of Bravo, so the modified code is either never called, or is called with parameters that will cause the same results as before to be generated.

    2) there could be breaking changes in Bravo that cause Charlie to generate invalid results without crashing. I proposed the most blatant one (a change in the values of an enumeration or a constant), but there are many other cases in which this is true.

    3) a non breaking change in Bravo could still cause Charlie to fail, with or without a run-time error. This is common whenever the author of Charlie relied on bugs or undocumented features of Bravo (just to make a common example, Charlie might rely on the order in which events are fired by Bravo).

    As you see, the concept of "breaking change" is an elusive one... the only definition that makes sense to me is that a change in Bravo is breaking (with respect to Charlie) if it causes Charlie's test suite to fail. But since Charlie's test suite isn't going to be around when you swap in the new Bravo, re-testing is mandatory.

    Which boils down to Eric's original statement.

  • *decloaking, delurking*

    This must be one of Erics best ever.

    Partly for his original post but mostly for all the comments. Awesome.

    Eric updated his article and said there are two different mindsets.

    Still he gets flac for being 'customer' vs. 'Eric'.

    I still find it amusing that people thinks Eric is the 'supreme leader of .NET' just because he blog about C#.

    And by that chord; while prof. Barbara Liskov had a lot to say about OO, she does not hold a badge or have any mandate on how to implement OO. Nor do C++, and Java; nor Pascal etc etc Ad nauseam.

    Me myself and I, was totally amazed by all the comments voting for the 'customers' view.

    That assemblies are not easy to hotswap we all must understand.

    It is even on the exam if you wan't to get certified on .NET and C#.

    What is the difference between a 'const' and a 'static readonly'?

    Your answer can earn you +1 or epic fail on the exam.

    How assemblies works is mandatory knowledge if you want Microsoft to call you a pro.

    But I had no idea that assemblies was so hooked at compile-time that a hotswapped .dll might even hurt 'normal' calling conventions on base.

    Guess you learn something new everyday.

    Let's take this down to a level that does not involve 'customer' and 'eric'.

    Let's keep it simple. I'm with Robert Davis on this one: - Recompilation is cheap!

    If you hotswap a .dll without any tests or even a compile, you are back to dll-hell. There is no pity for you.

    I would never ever 'inject' a new .dll without compiling. That's just wrong.

    Summary:

    If you ever get burned on this 'static behavior' you should seriously consider your deployment strata.

    Fast poll:

    Who ever got burned by this?

    1. me

    2. not me

    3. won't tell

    4. don't know.

    My guess is that the 'customer' voters will vote 4. I wanna be in category 2.

    Please don't 'fix' what is not broken.

  • I just can't believe that the CLR designers really expected us to recompile our apps every time a DLL is revised.

    Imagine that you're a vendor with a graphical control library that depends on WPF. Every time a new service pack is released for WPF, you need to recompile your library, right? Only it doesn't matter that you have to recompile because you need to get your new DLLs to your clients. But your clients aren't the end-user; the clients of your custom control library are app authors. This means that your clients' apps will all start misbehaving when *their* customers download a service pack.

    So every time there's a new revision of WPF, you have to compile a new set of DLLs, distribute them to your clients (assuming you even know who they are), and then they have to distribute the new DLLs to their customers (assuming they even know who they are).

    But wait, there's more! Your clients don't know which revision of WPF will be on their customers' machines so they have to ship an install package with a different set of DLLs for each possible WPF revision. That way the right set can be installed at setup time, and the correct version can be swapped in if the app detects that the user has installed a WPF service pack.

    How is this different from any other change in a service pack that breaks your DLL? Almost any other kind of breaking change is something you can detect by looking at the version and adjusting your behavior accordingly. Changing a base class's overrides, though, requires a separate version for each revision of the base class DLL because while each revision could add overrides, it could also remove them. This means that there is no single version of your DLL that can work with all revisions of the base classes.

    I don't know about you, but I would prefer a system that *didn't* require what I just described.

  • I would also like to put in a WOW!  I imagine that the debate within Microsoft was even more passionate than the responses in these posts.  Especially since, "The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class"

    It had to be very frustrating for the C# team to not be able to create the functionality that they thought was best, and then to actually have to make the situation worse for their users between version 1.0 and 2.0 so that the C# compiler and other tools would be simpler and faster.  I bet this was a bitter pill to swallow for some of the team.

    Thanks for sharing some of these intimate details of the decision process of the creation and evolution of C#.

  • Gabe: It could be IUnknown, IDispatch and mismatching guids as a dll-hell saga. That's fun, if you have the time for it. regsrv32 was oldschool when it was named COM+. Now you have to drag-n-drop your COM+. That is just on the plus side.  =)

    Maybe above was a red herring and a moot point, but:

    As far I can tell, in the real world you sign your 'dynamic link libriaries' and go by public key token in your .config and tell what is supported and what is required (which are almost opposites compared to daily speech). If you do it right the 'new bravo' wouldn't even be loaded just because you put a file in some folder.

    I have such a hard time figuring out when this would ever be an issue?

    For real, if the bravo-dude codes his bravo.dll and then later figures he should 'just add some code that might break stuff;' there is something seriously wrong with the bravo package. Maybe we shouldn't have used bravo in the first place? If bravo was commercial and expensive, can we haz our monziez back nao?

    If you are giving me a bravo2.dll. Fine. If you give me bravo.dll and do a .M2() or even .M17(), you've been doing too much DirectX and need therapy. By the idiom of that you never break an interface/contract. But if you give me a bravo.dll, that looks and smells like a bravo.dll and you 'forget' to tell consumers that your new override of .M() has breaking changes in the oo-hierarchy, then I am all of the sudden all for capital punishment.

    I'm not that smart, so I would like for dudes like Eric, and some java-dudes, to analyze what the 'cost' of doing runtime check to make academia in OO happy. .NET already starts slow, especially asp.net. I have no idea, but how much slower would a runtime check take?

    The proper solution as far as I can tell, already provided above. For bravo-team: is do just do hollow calls to base for all virtrual (virgin) calls. I'd put that into the 'refactoring' bucket, and if Visual Studio 2015 won't do it, maybe re-sharper will sport such a feature for the few people that needs it.

    Now, I just love to be proven wrong. I usually are. But please, pretty please, give me a real world example when this would ever be an issue?

Page 8 of 9 (134 items) «56789