Putting a base in the middle

Putting a base in the middle

Rate This

Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently. There are three DLLs involved, Alpha.DLL, Bravo.DLL and Charlie.DLL. The classes in each are:

public class Alpha // In Alpha.DLL
{
  public virtual void M()
  {
    Console.WriteLine("Alpha");
  }
}

public class Bravo: Alpha // In Bravo.DLL
{
}

public class Charlie : Bravo // In Charlie.DLL
{
  public override void M()
  {
    Console.WriteLine("Charlie");
    base.M();
  }
}

Perfectly sensible. You call M on an instance of Charlie and it says “Charlie / Alpha”.

Now the vendor who supplies Bravo.DLL ships a new version which has this code:

public class Bravo: Alpha
{
  public override void M()
  {
    Console.WriteLine("Bravo");
    base.M();
  }
}

The question is: what happens if you call Charlie.M without recompiling Charlie.DLL, but you are loading the new version of Bravo.DLL?

The customer was quite surprised that the output is still “Charlie / Alpha”, not “Charlie / Bravo / Alpha”.

This is a new twist on the brittle base class failure; at least, it’s new to me.

Customer: What’s going on here?

When the compiler generates code for the base call, it looks at all the metadata and sees that the nearest valid method that the base call can be referring to is Alpha.Foo. So we generate code that says “make a non-virtual call to Alpha.Foo”. That code is baked into Charlie.DLL and it has the same semantics no matter what Bravo.DLL says. It calls Alpha.Foo.

Customer: You know, if you generated code that said “make a non-virtual call to Bravo.Foo”, the CLR will fall back to calling Alpha.Foo if there is no implementation of Bravo.Foo.

No, I didn’t know that actually. I’m slightly surprised that this doesn’t produce a verification error, but, whatever. Seems like a plausible behaviour, albeit perhaps somewhat risky. A quick look at the documented semantics of the call instruction indicates that this is the by-design behaviour, so it would be legal to do so.

Customer: Why doesn’t the compiler generate the call as a call to Bravo.Foo? Then you get the right semantics in my scenario!

Essentially what is happening here is the compiler is generating code on the basis of today's static analysis, not on the basis of what the world might look like at runtime in an unknown future. When we generate the code for the base call we assume that there are not going to be changes in the base class hierarchy after compilation. That seemed at the time to be a reasonable assumption, though I can see that in your scenario, arguably it is not.

As it turns out, there are two reasons to do it the current way. The first is philosophical and apparently unconvincing. The second is practical.

Customer: What’s the philosophical justification?

There are two competing "mental models" of what "base.Foo" means.

The mental model that matches what the compiler currently implements is “a base call is a non-virtual call to the nearest method on any base class, based entirely on information known at compile time.”

Note that this matches exactly what we mean by "non-virtual call". An early-bound call to a non-virtual method is always a call to a particular method identified at compile time. By contrast, a virtual method call is based at least in part on runtime analysis of the type hierarchy. More specifically, a virtual method identifies a "slot" at compile time but not the "contents" of that slot. The "contents" – the actually method to call – is identified at runtime based on what the runtime type of the receiver stuffed into the virtual method slot.

Your mental model is “a base call is a virtual call to the nearest method on any base class, based on both information known at runtime about the actual class hierarchy of the receiver, and information known at compile time about the compile-time type of the receiver.”

In your model the call is not actually virtual, because it is not based upon the contents of a virtual slot of the receiver. But neither is it entirely based on the compile-time knowledge of the type of the receiver! It's based on a combination of the two. Basically, it’s what would have been the non-virtual call in the counterfactual world where the compiler had been given correct information about what the types actually would look like at runtime.

A developer who has the former mental model (like, say, me) would be deeply surprised by your proposed behavior. If the developer has classes Giraffe, Mammal and Animal, Giraffe overrides virtual method Animal.Feed, and the developer says base.Feed in Giraffe, then the developer is thinking either like me:

I specifically wish Animal.Feed to be called here; if at runtime it turns out that evil hackers have inserted a method Mammal.Feed that I did not know about at compile time, I still want Animal.Feed to be called. I have compiled against Animal.Feed, I have tested against that scenario, and that call is precisely what I expect to happen. A base call gives me 100% of the safe, predictable, understandable, non-dynamic, testable behavior of any other non-virtual call. I rely upon those invariants to keep my customer's data secure.

Basically, this position is "I trust only what I can see when I wrote the code; any other code might not do what I want safely or correctly".

Or like you:

I need the base class to do some work for me. I want something on some base class to be called. Animal.Feed or Mammal.Feed, I don't care, just pick the best one - whichever one happens to be "most derived" in some future version of the world - by doing that analysis at runtime. In exchange for the flexibility of being able to hot-swap in new behavior by changing the implementation of my base classes without recompiling my derived classes, I am willing to give up safety, predictability, and the knowledge that what runs on my customer's machines is what I tested.

Basically, this position is "I trust that the current version of my class knows how to interpret my request and will do so safely and correctly, even if I've never once tested that."

Though I understand your point of view, I’m personally inclined to do things the safe, boring and sane way rather than the flexible, dangerous and interesting way. However, based on the several dozen comments on the first version of this article, and my brief poll of other members of the C# compiler team, I am in a small minority that believes that the first mental model is the more sensible one.

Customer: The philosophical reason is unconvincing; I see a base call as meaning “call the nearest thing in the virtual hierarchy”. What’s the practical concern?

In the autumn of 2000, during the development of C# 1.0, the behaviour of the compiler was as you expect: we would generate a call to Bravo.M and allow the runtime to resolve that as either a call to Bravo.M if there is one or to Alpha.M if there is not. My predecessor Peter Hallam then discovered the following case. Suppose the new hot-swapped Bravo.DLL is now:

public class Bravo: Alpha
{
  new private void M()
  {
    Console.WriteLine("Bravo");
  }
}

Now what happens? Bravo has added a private method, and one of our design principles is that private methods are invisible implementation details; they do not have any effect on the surrounding code that cannot see them. If you hot-swap in this code and the call in Charlie is realized as a call to Bravo.M then this crashes the runtime. The base call resolves as a call to a private method from outside the method, which is not legal. Non-virtual calls do matching by signature, not by virtual slot.

The CLR architects and the C# architects considered many possible solutions to this problem, including adding a new instruction that would match by slot, changing the semantics of the call instruction, changing the meaning of "private", implementing name mangling in the compiler, and so on. The decision they arrived at was that all of the above were insanely dangerous considering how late in the ship cycle it was, how unlikely the scenario is, and the fact that this would be enabling a scenario which is directly contrary to good sense; if you change a base class then you should recompile your derived classes. We don't want to be in the business of making it easier to do something dangerous and wrong.

So they punted on the issue. The C# 1.0 compiler apparently did it the way you like, and generated code that sometimes crashed the runtime if you introduced a new private method: the original compilation of Charlie calls Bravo.M, even if there is no such method. If later there turns out to be an inaccessible one, it crashes. If you recompile Charlie.DLL, then the compiler notices that there is an intervening private method which will crash the runtime, and generates a call to Alpha.M.

This is far from ideal. The compiler is designed so that for performance reasons it does not load the potentially hundreds of millions of bytes of metadata about private members from referenced assemblies; now we have to load at least some of that. Also, this makes it difficult to use tools such as ASMMETA which produce "fake" versions of assemblies which are then later replaced with real assemblies. And of course there is always still the crashing scenario to worry about.

The situation continued thusly until 2003, at which point again the C# team brought this up with the CLR team to see if we could get a new instruction defined, a "basecall" instruction which would provide an exact virtual slot reference, rather than doing a by-signature match as the non-virtual call instruction does now. After much debate it was again determined that this obscure and dangerous scenario did not meet the bar for making an extremely expensive and potentially breaking change to the CLR.

Concerned over all the ways that this behaviour was currently causing breaks and poor performance, in 2003 the C# design team decided to go with the present approach of binding directly to the slot as known at compile time. The team all agreed that the desirable behaviour was to always dynamically bind to the closest base class -- a point which I personally disagree with, but I see their point. But given the costs of doing so safely, and the fact that hot-swapping in new code in the middle of a class hierarchy is not exactly a desirable scenario to support, it's better to sometimes force a recompilation (that you should have done anyways) than to sometimes crash and die horribly.

Customer: Wow. So, this will never change, right?

Wow indeed. I learned an awful lot today. One of these days I need to sit down and just read all five hundred pages of the C# 1.0 and 2.0 design notes.

I wouldn’t expect this to ever change. If you change a base class, recompile your derived classes. That’s the safe thing to do. Do not rely on the runtime fixing stuff up for you when you hot-swap in a new class in the middle of a class hierarchy.

UPDATE: Based on the number of rather histrionic comments I've gotten over the last 24 hours, I think my advice above has been taken rather out of the surrounding context. I'm not saying that every time someone ships a service pack that has a few bug fixes that you are required to recompile all your applications and ship them again. I thought it was clear from the context that what I was saying was that if you depend upon base type which has been updated then:

(1) at the very least test your derived types with the new base type -- your derived types are relying on the mechanisms of the base types; when a mechanism changes, you have to re-test the code which relies upon that mechanism.

(2) if there was a breaking change, recompile, re-test and re-ship the derived type. And

(3) you might be surprised by what is a breaking change; adding a new override can potentially be a breaking change in some rare cases.

I agree that it is unfortunate that adding a new override is in some rare cases a semantically breaking change. I hope you agree that it is also unfortunate that adding a new private method was in some rare cases a crash-the-runtime change in C# 1.0. Which of those evils is the lesser is of course a matter of debate; we had that debate between 2000 and 2003 and I don't think its wise or productive to second-guess the outcome of that debate now.

The simple fact of the matter is that the brittle base class problem is an inherant problem with the object-oriented-programming pattern. We have worked very hard to design a language which minimizes the likelihood of the brittle base class problem biting people. And the base class library team works very hard to ensure that service pack upgrades introduce as few breaking changes as possible while meeting our other servicing goals, like fixing existing problems. But our hard work only goes so far, and there are more base classes in the world that those in the BCL.

If you find that you are getting bitten by the brittle base class problem a lot, then maybe object oriented programming is not actually the right solution for your problem space; there are other approaches to code reuse and organization which are perfectly valid that do not suffer from the brittle base class problem.

  • @Michael Starberg - Did you actual read Eric's original post? Eric actually gives a real world example:  "Here’s a crazy-seeming but honest-to-goodness real customer scenario that got reported to me recently."

    It really is not very hard to imagine real world examples.  Try using your imagination.

  • DRBlaise: Did you read all the comments? Also, I hardly think there is a bravo.dll. I _am_ trying to use my imagination. Even tried by closing my eyes. Still zero. I see nothing. Do you? Please help me understand why you would hotswap a .dll in the first place without at least do the ctrl-shift-b on your local laptop before you even commit to the repos?

    But let's say there was a true case. Eric says there is at least 1. Hence at least an array. How many are bothered by this? 7 people in the world? 130 people? Would 16-bit be enough to hold them? Should I go 128-bit just to care for future people who complains? Even so, I say it is their pain.

    I will hold my ground and stand by my position: If you put any file that can be run on disk and then open it without knowing what you are doing, without any documentation, then you might get surprises.

    Guess I just don't like the idea of hot swapping .DLL's all willy nilly. Been there, done that. Enough already!

    Back to the 'const' vs. 'static readonly' question. Miss that one, and you will flunk the entire test. That is how Microsoft sift out see-sharper from java-dudes..  Just kidding. Or, am I? I don't know as I passed. =)

    Fair to say is that Eric really tries to explain how stuff works.

    Can we at least agree on that that is a virtue and makes us all happy?

  • oh, I think the clue is the title: - Putting a base in the middle. =)

  • @Michael,

    Not sure why this is so difficult, but here goes:

    1. You distribute a .NET 3.5 application to your customers.

    2. Customers install .NET 3.5 SP1.

    3. Hosed

  • genious: So you are placing/voting yourself in category 1?

    What I found as of yet, is that when you go .NET4 with code contracts and then have to revert to .NET3.5 in panic, msbuild.exe gets very upset and refuses to build. Just a few lines in the .csproj-xml that makes the build confused.

    And so I have to manually compile and deploy .dll via ftp until when .NET4.0 sails. Hence I wrote I wanna be voting 2, but am factually in cat 1.

    Well not really, as this is just a few lines in xml that could be done with. I'll stick around doing it manual until 12 April.

    Would you still be 'Hosed' if you recompiled, sift through all warnings and set a 'supported to 3.5 sp1' that would make windows update kick in, assuming we are talking windows OS? I am not nagging, but asking.

  • genious: Not begging the question correctly, and this is getting off topic, but doing your 1. 2. 3. and getting 'Hosed', isn't that a good thing? Of course not. Especially if it is transactional data or Jack Bauer remoting a nuclear power plant. But surely a SP1 is better than vanilla and worse than SP2?

    Your users can't plan for a hypothetical upgrade of the OS, .NET run-time nor its frameworks and BCL. But you can plan for it. Or wait, your entire point is that you can't plan ahead. I stand corrected. Luckily, thanks to this post; now we know we shouldn't leave 'virgins' empty in C#. I kinda start to like that keyword. If C++ has friends, why can't C# have virgins?

    But no. I am trying to say that I would not like to get a performance hit just to solve the anti-pattern of 'base in the middle'. I don't know what that performance hit would imply, but for me it sounds like swatting flies with a bulldozer.

    Hehe, I don't think Liskov is following this thread, but if she would have, she'd probably be hiring assassins. Oops. Eric can always hide behind 'design notes', but I am taking a true stand for 'NO FIX'.

    It's not a bug. Working as designed.

    One good thing has come out of this thou. I bet you all a atto-dollar that the C# crew is etcha-sketching on private overrides for nested classes. If not virgins, nor friends, what do you call those? poked?  Whoops.

    Seriously

    - Michael Starberg

  • Eric, I agree with your latest update, that some of the comments have been a little dramatic and emotional, but I have to say that your latest update seems somewhat defensive and emotional to me.  You have not added any additional information and have basically re-stated your points and added some BOLD typing.  It reminds me of a SNL skit about Pres. Bush:  “… we are working HARD for the American people … working late … working weekends …”

    In my opinion, it is a disingenuous to compare the C# 1.0 “new private method” bug with the C# 2.0 “new override” bug.  They may both be rare, but when comparing them to each other, I would say the C# 2.0 “new override” bug has to be at least 100 times more likely to happen, have more serious and subtle consequences, and is harder to debug.  (I understand that you and a few others do not believe the "new override" behavior is a bug, but I strongly and emotionally disagree! :))

    I strongly disagree that I am disingenuous. I strongly disagree that any members of the design team are disingenuous. I also strongly disagree with any implication that we do not make every decision with the needs of the customer in mind. You might not always agree with those decisions (and neither do I). And indeed, not every decision we make turns out for the best; we are imperfect humans working with imperfect information, like everyone else.

    As for the likelihood of the change producing an error: the whole motivation for this article in the first place was that this issue was first reported to me a couple of months ago, and then again a couple of weeks ago. Neither were bug reports describing an actual user-affecting problem; both were "I happened to notice that this was the compiler behaviour, is it by design?" Of course, not every question from every user gets run by me, but the fact that I've seen this question a grand total of twice in the last five years tells me that this is not exactly cutting a swath of destruction through the industry. It's a small unexpected "gotcha" behaviour among many thousands of other such small unexpected behaviours, and I don't see what the big deal is. Compare it, say, to "lambdas close over the variable of a loop, not its values", which I get a couple of times a week; that one really is hundreds of times more likely, and we're probably not fixing it either.

    My point is, and continues to be, that when you make a change to a base class, you've got to test your derived classes, and possibly recompile them. I don't see this as controversial, surprising, or onerous.

    -- Eric

    From the developer notes you shared, after the failure to get the CLR team to make the necessary change to implement the “desired” functionality, the C# team’s main reasons to make the change was to simplify and speed up the compiler and other tools.  I can understand these reasons, and appreciate you sharing them with us.

  • Eric, thank you for your response and your honesty – I appreciate your passion to defend yourself and your team.  I am sorry that I implied that you or your team are disingenuous in anyway.   I should have wrote:  “It is my opinion you are fooling yourself if you think the two bugs can be compared equally.”  But I wanted to expand my vocabulary after your use of  “rather histrionic comments.”  (I had to look up the definition)

    Thanks again for this blog and the great insights you provide on a weekly basis.  Please continue to provide the unique looks into the C# design team’s decision process.  I do know that your decisions are made with us in mind and that a lot of decisions are difficult compromises.

  • Eric: Thanks for not being 'disingenuous' . I don't even see the gotcha.  It is working as designed. To boldly go where no language gone before.

    Maybe it would just be what Monty Python did with the 'smart penguin 2 meters tall' sketch with John Cleese narrating 'This was later known as the complete-waste-of-time-theory' but my curious mind would still like to get some idea of what a 'fix' would imply.

    I am sorry to say that I don't have the time in my life, nor the skillz,  to write my own compiler and runtime and test what a 'true virtual' call would cost. And I am not asking for you to mess with the compiler. Seems like java does it. But you could use your brains and insights and maybe do a new post on what a change would take.

    While I rather see the C#-team spend time/brains on private overrides for nested classes. Now THAT is not a complete waste of time theory.

  • DRBlaise: Well, I am one that do appreciate *your* passion. Why can't coding and compilers be an emotional topic? I've read every word you wrote and actually learned something. However, it is safer to do Star Trek and Python jokes than to compare what you think is a 'flaw' with George Bush. That is taking it way off-topic.

    I have also learned something that I have seen before. If you want to get Mr. Lipperts attention, all you have to do is to insult his brain. For me this is still a untested theorem, but the day I really want his attention I am so gonna play the disingenuous card. By the sample of one, it seems to work =)

    Happy Easter

  • "My point is, and continues to be, that when you make a change to a base class, you've got to test your derived classes, and possibly recompile them. I don't see this as controversial, surprising, or onerous." --Eric

    Having come in late to the discussion, I think the problem is that if you restate the situation - as a number of commentators have! - it's more obviously problematic;

    "When you - or somebody else like Microsoft or another upstream library vendor - makes changes to a base class, you've got to test your derived classes - although you may not know what you're even testing *for*! - and possibly recompile them. And do this before any of your users in-the-field install, say, a .NET service pack containing the new base classes."

    If a .NET SP comes out that adds some new overrides (as has been indicated, this does happen!) your classes will now not be calling those overrides. The fact that, say, your call to base.PreInit() "bypasses" the new override may - in the worst case - open a security hole. How are you meant to test for that? You don't have full details of every change MS has made and what the possible ramifications of "bypassing" a new override are! I guess to be safe, you have to assume that bypassing the override may be problematic, and *always* recompile and release a new version of your software.

    So it sounds like the only "correct" way to deal with the situation is, whenever a .NET SP comes out, email all the users of your software and tell them under no circumstances to install it until you've had a chance to recompile all your software and release updated versions compiled against the new libraries. God help them if another vendor orders them to install the SP straight away because it contains security fixes...

    (Luckily the fact that Eric's not encountered many problems like this implies that while there are pieces of software out there 'bypassing' the new overrides in some situations, none of them have caused any security or other serious issues. That we know of. Of course, maybe Microsoft is particularly careful to make sure no new overrides added in service packs will cause any problems if they're 'bypassed' but this seems unlikely - correct me if I'm wrong!)

    @Michael: Doesn't Eric touch on the required changes in the article itself - basically, the CLR already supports exactly the required behaviour so it would "just" involve changing the CLR opcodes output by the compiler for a base method call - no runtime changes required?

  • @Michael Starberg, So do you recompile separate versions of your unmanaged applications for each version of windows? (new version of kernel32.dll!)

    The simple fact is, there are breaking changes and non-breaking changes. Adding an override to a virtual method ought to be a non-breaking change (assuming the new code is specifically intended to work in the situations the old code is called in, which is rather the point of virtual methods now isn't it) just as modifying the body of an existing method is. In any other situation than a base.M() call from a derived class, it _is_ a non-breaking change. You seem to take the extremist attitude that _every_ change should be considered a breaking change: "If you hotswap a .dll without any tests or even a compile, you are back to dll-hell. There is no pity for you."

    As for the real issue... The "base.x" syntax conceals the fact that you are making a non-virtual call to a class other than the actual base class. If C# is truly meant to work this way, the syntax should be something like Alpha::M, with Bravo::M being a _compile-time error_ when there is no override. This would incidentally allow you to continue calling Alpha::M (and skipping Bravo::M) after the change is made, until you not only recompile but also edit your source.

  • I agree with what GRico said somewhere above.

    base can actually have two very different meanings in the same declaration space. It might be intended that way but it doesnt feel right at all.

    protected override Whatever()

    {

         if (base.whatever())

                base.whateverElse();

    }

    if whatever() is overriden in the "base" class understanding base class as the preceding class in the inheritance hierarchy and whateverElse() isnt, then the base keyword has unexpectedly two alltogether different meanings.

    I wasn't aware of this behavior at all. To me base.X intuitively was a virtual call, so all this is new. I still think that C# made a mistake here as my intuition seems to coincide with the majority of posters and probably with the majority of coders who aren't even aware of this issue at all.

    I'd like to point out that this is completely independant of the "dll swap" problem this post talks about and if you should recompile or not. That issue only makes the dual meaning of the base keyword visible which is my gripe with the language.

  • Unless I misunderstand (which would not surprise me, since I am reading fast), the current design only breaks if a class in the middle of an inheritance chain changes its _implemented_ interface. Unfortunately, this is disguised by the virtual mechanism so that the _visible_ interface is unchanged.

    Regardless, changing the interface (even invisibly) is by definition a potentially breaking change. The party responsible for Bravo should adjust version numbers, strong names, etc. as appropriate to indicate that the new release is not hot-swap backward compatible in the case of an added interface element, and not hot-swap compatible at all in the case of a removed interface element.

    (For the purposes of this comment, I am trying to interpret the existing design. At the moment, I have no opinion on what is "right".)

Page 9 of 9 (134 items) «56789