Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • To all of those who say: "recompile to be safe" - there are scenarios where you cannot recompile.

    1) Our application has a plugin model. Most plugins are rarely updated compared with our application, so it is important for us to allow old plugins to run in new application versions. We don't have any source code for these plugins available. So we try to keep our application binary compatible with previous versions.

    This issue means for us that we are unable to ever add any overrides in any non-sealed class. This includes quite a lot of user controls (WPF and WinForms) where adding overrides to existing methods is considered normal. Plugins also overriding these methods expect "base.OnEvent(...)" to mean "handle that event the usual way", not "bypass the app's usual handling and call WPF directly, breaking the app's invariants".

    2) We're producing a WPF control library. It runs fine on both .NET 3.5 and 4.0. However, as we've learned in this post, using a .NET 3.5 compiled library may break WPF's invariants by calling the wrong base method (unless the WPF team avoided adding any overrides) - so from now on, we and our customers have to deal with maintaining two separate builds where one could suffice if this issue was fixed.

    In summary, this issue makes binary compatibility very, very fragile. I thought binary compatiblity of assemblies way an important feature of C#, but it appears I was mistaken :(

  • While I have tremendous respect for Eric, I have to agree with the majority of those who have commented here that the behavior is unexpected and probably undesirable (at the very least violating the principle of least surprise).

    First, is this behavior consistent with the spec, particularly the last paragraph of §7.5.8: "When a base-access references a virtual function member (a method, property, or indexer), the determination of which function member to invoke at run-time (§7.4.4) is changed. The function member that is invoked is determined by finding the most derived implementation (§10.6.3) of the function member with respect to B (instead of with respect to the run-time type of this, as would be usual in a non-base access). Thus, within an override of a virtual function member, a base-access can be used to invoke the inherited implementation of the function member."

    Second, the "evil hackers" argument seems like something of a red herring.  As you yourself stated in a previous entry: "When you call a method in Smith and pass in some data, you are essentially trusting Smith to (1) make proper use of your data[...], and (2) take some action on your behalf. "  If Bravo is compromised yet we trust it sufficiently to derive from a class defined within it, we already lost, no?  Besides, either the assembly is strongly-signed and is trusted (in which case we trust it to not be evil) or it isn’t (in which case nothing is stopping evil code anyway).

    Third, as an author of class C, if I call base.M() should I care if (or do I even have any means to know) whether M() is immediately defined by the parent class?  By not defining trivial overrides (protected override void M() {base.M();}), is B explicitly abrogating its ability to override virtual calls to M from derived classes?

    Fourth, the key difference between the example the compiler faced and the counterexample you defined in the epilogue is that of M() being a virtual method.  I think it fair as a developer to expect that a call to a virtual method can change at runtime to an implementation defined by a subclass, and to adequately prepare for this possibility.  For a non-virtual class, I would probably expect a call to (this.M()) to equate to the compile-time equivalent (in this case, ((Alpha)this).M()).

    It would seem that by calling a virtual method to begin with, we are already, as you state, "giv[ing] up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested".  If this is unacceptable, why would we swap out the DLL for a base class in this fashion to begin with?

  • I was going to post a response to your update but Pavel's post at 11:06am that starts "With respect to your update, Eric:" says almost exactly what I'd want to say. To elaborate slightly:

    In my mental model, a base call is not a non-virtual call: in fact, my mental model of C# doesn't include a concept of a "non-virtual CALL" at all. My mental model is that non-virtualness applies to *methods*, not to *calls*. So in the scenario in your update, you are making a call to the non-virtual method Alpha.M(). In the original scenario, you're making a call to the *virtual* method M(). The "base" prefix has a special meaning that indicates the virtual lookup should proceed from the immediate base class of the current class, but it doesn't negate the "virtual"ness. It doesn't turn a virtual method into a non-virtual method, just changes where the virtual lookup starts from. And "base" refers to the base class of the caller, which is to say that "base" means Bravo, not Alpha.

  • > Before I was sure that nobody can skip my methods and directly call base methods omitting call my overrides.

    Note that this was never true, and will not be true even if this issue will be treated as a defect by C# team, and fixed for C#. The reason is that you cannot assume that other code running on .NET is written in C#, and so any limits imposed by C# compiler which aren't also backed by CLR can be worked around by simply using a different language (or even CIL directly).

    With respect to calling virtual methods non-virtually, it should be understood that CLR allows such methods to be targeted by "call" instruction (which is for non-virtual calls). For non-verifiable code, _any_ accessible virtual method on _any_ type can be called non-virtually by _any_ other type!

    Verifiable code has additional restrictions, but those are actually in line with the present C# behavior:

    "When using the call opcode to call a non-final virtual method on an instance other than a boxed value type, verification checks that the instance reference to the method being called is the result of ldarg.s 0, ldarg 0 and the caller’s body does not contain starg.s 0, starg 0 or ldarga.s 0, ldarga 0.

    [Rationale: This means that non-virtually calling a non-final virtual method is only verifiable in the case where the subclass methods calls one of its superclasses using the same this object reference, where “same” is easy to verify. This means that an override implementation effectively "hides" the superclass' implementation, and can assume that the override implementation cannot be bypassed by code outside the class hierarchy...."

    So, in short, even in verifiable code, non-virtual calls to virtual methods are okay on _any_ method, so long as the receiver ("this") is the same as for the calling method. So skipping levels of hierarchy is perfectly okay. Note how the rationale even acknowledges that by saying "... cannot be bypassed by code _outside the class hierarchy_".

    So there is no guarantee that any class invariants will be preserved, if they are maintained by virtue of overriding methods - any descendant can always ignore the override.

    > and there is no such "half-virtual" call instruction in IL (really isn't?).

    There is. As noted above, if the IL specifies a non-virtual call to Bravo::Foo, even if Bravo does not itself declare the method Foo, the call will be correctly dispatched at runtime (to Alpha::Foo, or however high up the hierarchy is needed to find an implementation).

  • I'm going to have to go with Eric's side.  With the given example we're dealing with strings and it's clear that the method in question is incorrect for both B and C, since the string returned shows the inheritance heirarchy.  In a more complex example though, you might have a calculation that for the incorrect version of B is correct for C.  So when fixing B you would actually break C.

    Regardless of whether or not another company makes C, it isn't safe to assume that fixing B also fixes C, and it would be as likely that fixing B now breaks C.

    We're not talking about tweaking the code.  We're talking about adding a completely new function (on the level of B).

  • Sorry, one clarification on my last comment, where I said my mental model is that there's no such thing as a non-virtual call: I do know that the *implementation* in the CLR is that there *is* in fact such a thing as a non-virtual call, and that in the C# language non-virtual calls are used both for "base.Foo()" and for regular calls to non-virtual methods. But I consider that an implementation detail and not a reason to suppose that those two things should behave the same way.

    I consider it the job of the compiler to translate the C# language into CLR constructs in whatever way makes the most sense. The CLR lacks a "base call" primitive that would map exactly to my mental model of base.Foo(), so using a nonvirtual call to the base class method is obviously the way to implement that. But the desired behavior should be what drives the implementation, rather than the implementation as a nonvirtual call driving the choice of behavior.

  • As Ayende has said a while ago, when most customers have a different mental model than the Microsoft team, it's the Microsoft team that should change how things behave. Being Ayende, he actually got the MVC team to make that change :)

    Note: this is the second time I see Eric have a "weird" mental image of what should happen; the previous one was regarding static virtual methods (also known in Delphi as class virtual).

  • @Mark:

    > I'm going to have to go with Eric's side.  With the given example we're dealing with strings and it's clear that the method in question is incorrect for both B and C, since the string returned shows the inheritance heirarchy.  In a more complex example though, you might have a calculation that for the incorrect version of B is correct for C.  So when fixing B you would actually break C.

    This interpretation is inconsistent with proper use of virtual & override. The class that introduces a virtual method specifies the contract for that method, which all derived classes are expected to adhere to. If they do not, they violate LSP, since a client could call the override via a reference to a base class type without even knowing it; he has the right to expect the contract to be upheld regardless of the effective type of the referenced object.

    Consequently, whether B overrides A.M or not, the implementation of C can expect that base.M - regardless of what it ends up calling - will uphold the contract for A.M. If the newly introduced B.M does not do so, it would break far more than this particular edge case - it would break any client code that makes virtual calls to A.M in a situation where B may be the effective type of receiver.

  • Just to make the distinction even clearer: If the new version of Bravo looked like this:

    public class Bravo : Alpha {

     public new void M() { ... }

    }

    ie "new" instead of "override"...

    THEN I would expect that Charlie's base.M() should still end up calling Alpha.M() instead of Bravo.M(), until Charlie got recompiled.

  • If you call M() from a Charlie type reference (e.g., Charlie c = new Charlie()) then later drop in a recompiled Bravo.dll you won't get the new Bravo.M() call.

    To get that, you have to use a Bravo type reference.  If you call M() from a Bravo type reference (e.g., Bravo b = new Bravo()) then later drop in a recompiled Bravo then you will execute Bravo.M() followed by a call to Alpha.M().

    Lastly, dropping in an override in Bravo doesn't Bravo.M() virtual.  The only method that's virtual is Alpha.M().

  • Also I want to add that if you says base.Feed in Giraffe you might not think that you call Animal.Feed. Instead you should think: my base class is a black box (generally you have not access to it's code) and I call realization of this method that is provided by this black box, and because I really don't know how this box provides this method or even if I know, I can't rely on particular realization, so I ask this box to handle this call anyway. You might ask only this box, the class one level up in hierarchy, not any other class upper. And it can do with this call whatever it wants: provide it's own realization or forward this call to it's base class by omitting own realization, but you don't need to care about this.

    So from my point of view "base" is "one level up class name substitution (exactly that class that I explicitly derived from)", and I didn't know and don't care about what's going on upper this type.

    If I correctly understand, from your point of view "base" is substitution of name of class that really implements particular method, so actual substitution depends on method name after the dot. So why then we have no syntax to access any base class (or just any class) method and make non virtual call to it?

    I guess actual behavior of "base" is correlated with fact that actually I cannot have  any (direct or indirect) base type less accessible than my class, and all information about witch class overrides witch method is exposed and public. So the fact has the type override particular method or not is the part of it's public contract. And that's the thing that I cannot agree. I think it should be absolutely equivivalent for public contract of your class does you override method or stuck with base implementation by omitting your own implementation or stuck with base implementation by providing one line override that just calls base implementation.

  • I would like to share my two cents. Before I do, I will say that a strong case can be made for both positions of this issue. Furthermore, it's entirely unclear what the appropriate or most desirable behavior should actually be, and given that there are potentially significant consequences to the large body of deployed code already shipped, a lot of thought (by folks smarter than me) needs to take place before such a change should be made. That said...

    After reading the article and your (Eric's) explanation of what happens, I have to say I was surprised. My "mental model" leaned in the direction that the customer, and many other responders, have described - namely that base calls in a virtual method send the message up the inheritance chain, and don't skip directly to the base version that existed at compiled time.

    Second, similar to the topic of how the stack works in .NET/C#, I've always treated the mechanism of the base call as being an implementation detail - an abstraction that I didn't have to worry about. However, in this case, it turns out to be a leaky abstraction - you have to know a great deal about how the compiler wires a base call to be able to design and implement inheritance hierarchies that behave correctly - particularly in special cases like runtime substitution of compiled code. I have to say that I find that a bit unsettling because the vast majority of C# developers likely do not have this level of understanding about how their code is translated into IL.

    Third, I now realize that there are potentially a large number of cases where this type of problem can be introduced. Take, for example, the .NET framework itself. It's quite common to inherit from classes in the .NET BCL - these class can, in turn, inherit from yet others within the BCL. Now imagine a case where some user-defined class inherits from a version 2.0 .NET class - but at runtime, the compiled code is loaded into a different version of the CLR than what the code was originally compiled against (there are numerous cases where this actually happens in real world code) and dynamically bound against different versions of the BCL assemblies. It's entirely plausible that the BCL code itself has added some overrides that may have previously been omitted. Consequently, at runtime the behavior continues to call the wrong overload. Another example would be service releases of the .NET framework itself, which could introduce overrides of calls within the inheritance model that didn't previously exist. Most shipped applications are not going to get recompiled when a service pack is deployed ... nor in most cases is that even possible. It seems that the authors of the .NET BCL will have to be careful in the future to be aware of such impacts.

    Fourth, I don't view base calls and overload resolution as the same thing. In the update to the article, you mention that swapping in a M(int) override in Bravo would exhibit the same type of problem as the virtual method's base call does.

    Fifth, I am surprised that this should be such a rare case. It would seem to me that this behavior should occur more frequently than it does. The fact that this isue hasn't seen been more common and impactful is hard to reconcile with my intuition about the frequency that should a change could be introduced and the amount of existing, compiled code out in the real world.

  • The argument about "calling the code I tested against" is invalid. If you want that (and you probably do want that), you need to sign your assemblies(*). If you don't sign them, _all_ calls into an assembly are calls into the unknown, not just base calls.

    Secondly, many people see turning virtual calls into non-virtual ones as an optimization, that shouldn't change semantics. For example, if a method in a sealed class calls a virtual method in that same class, the call can be optimized into a non-virtual call, because both methods are in the same module. Calling a virtual base method can be optimized statically only if the base method is in the same module. If not, the optimization should be postponed until runtime, when all information is known. Note that the JIT can (and should) do the optimization, such that there is a cost (if any) only on the first call.

    (*) Even signing doesn't give you this protection. I can easily recompile an assembly, even when modified, to have the same strong name as the previous version. In fact, I do that regularly when my changes don't break the API (for example, they're bug fixes or optimizations). In fact, I use that as a feature. See also http://blogs.u2u.net/kris/post/2007/07/20/Versioning-NET-Assemblies.aspx.

  • @Pavel:

    I see your point, I hadn't thought that far into it, but after posting I was trying to come up with an real-world example to what I was saying and couldn't come up with anything that wasn't hacky at best.

    With that said, I think I'm swaying sides a bit on this, though I think Leo said it well that it's unclear what's really desirable.

    When it comes to why this hasn't been seen more often with releases of .NET, I would assume it has to do with the assembly manifest pointing to the specific version of .NET an assembly was built against.  I don't completely know what the default is for this though.

  • " I want something on some base class to be called."

    I wonder, if customer introduces new Bravo.dll where base class of Bravo is no longer Alpha, but still has same method with the same signature. Does he expect it to hot-swap into entirely different hierarchy?

Page 4 of 9 (134 items) «23456»