Why is covariance of value-typed arrays inconsistent?

Why is covariance of value-typed arrays inconsistent?

Rate This
  • Comments 22

Another interesting question from StackOverflow:

uint[] foo = new uint[10];
object bar = foo;
Console.WriteLine("{0} {1} {2} {3}",       
  foo is uint[], // True
  foo is int[],  // False
  bar is uint[], // True
  bar is int[]); // True

What the heck is going on here?

This program fragment illustrates an interesting and unfortunate inconsistency between the CLI type system and the C# type system.

The CLI has the concept of "assignment compatibility". If a value x of known data type S is "assignment compatible" with a particular storage location y of known data type T, then you can store x in y. If not, then doing so is not verifiable code and the verifier will disallow it.

The CLI type system says, for instance, that subtypes of reference type are assignment compatible with supertypes of reference type. If you have a string, you can store it in a variable of type object, because both are reference types and string is a subtype of object. But the opposite is not true; supertypes are not assignment compatible with subtypes. You can't stick something only known to be object into a variable of type string without first casting it.

Basically "assignment compatible" means "it makes sense to stick these exact bits into this variable". The assignment from source value to target variable has to be "representation preserving".

One of the rules of the CLI is "if X is assignment compatible with Y then X[] is assignment compatible with Y[]".

That is, arrays are covariant with respect to assignment compatibility. As I've discussed already, this is actually a broken kind of covariance.

That is not a rule of C#. C#'s array covariance rule is "if X is a reference type implicitly convertible to reference type Y (via a reference or identity conversion) then X[] is implicitly convertible to Y[]". That is a subtly different rule!

In the CLI, uint and int are assignment compatible; therefore uint[] and int[] are too. But in C#, the conversion between int and uint is explicit, not implicit, and these are value types, not reference types. So in C# it is not legal to convert an int[] to a uint[]. But it is legal in the CLI. So now we are faced with a choice.

1) Implement "is" so that when the compiler cannot determine the answer statically, it actually calls a method which checks all the C# rules for identity-preserving convertibility. This is slow, and 99.9% of the time matches what the CLR rules are. But we take the performance hit so as to be 100% compliant with the rules of C#.

2) Implement "is" so that when the compiler cannot determine the answer statically, it does the incredibly fast CLR assignment compatibility check, and live with the fact that this says that a uint[] is an int[], even though that would not actually be legal in C#.

We chose the latter. It is unfortunate that C# and the CLI specifications disagree on this minor point but we are willing to live with the inconsistency.

So what's going on here is that in the "foo" cases, the compiler can determine statically what the answer is going to be according to the rules of C#, and generates code to produce "True" and "False". But in the "bar" case, the compiler no longer knows what exact type is in bar, so it generates code to make the CLR answer the question, and the CLR gives a different opinion.

 

  • The really crazy part is it's not just "is" that does it

               var x = new int[] { -1 };

               uint[] y = (uint[])(object)x;

               Console.WriteLine(y[0]); // 4294967295

               x[0] = int.MinValue; // prove that x really is the same object as y

               Console.WriteLine(y[0]); // 2147483648

    At least it's guaranteed to always be twos-complement.

  • Excellent explanation as always Mr. Lippert, but then it brought up other scenarios to my mind.

    Why don't structure ValueTypes also behave the same?  DateTime has 64 bits and long as 64 bits, but behavior is different.  Why are they not CLI "assignment compatible"?

    DateTime[] foo = new DateTime[10];
    object bar = foo;
    Console.WriteLine("{0} {1} {2} {3}",
     foo is long[], // False
     foo is DateTime[],  // True
     bar is long[], // False
     bar is DateTime[]); // True

    Why don't non-Array ValueTypes also have the same behavior?

    uint foo = new uint();
    object bar = foo;
    Console.WriteLine("{0} {1} {2} {3}",
     foo is uint, // True
     foo is int,  // False
     bar is uint, // True
     bar is int); // False

  • This also applies to enums, which are also assignment-compatible with their underlying base type - in fact, for them it's even more relaxed, because it doesn't even have to involve arrays - you can box int and then unbox to enum. But if you use "is" to check for a type, it will tell you that a boxed enum value isn't int. On the other hand, if you box int, you cannot unbox it as uint.

    IntPtr is even more interesting. It is assignment-compatible with either int or long, depending on architecture; so this:

               object o = new[] { (IntPtr)1 };

               Console.WriteLine(o is int[]);

               Console.WriteLine((int[])o);

    will print "True" and otherwise work fine on 32-bit .NET, but throw InvalidCastException on 64-bit.

  • @DRBlaise, if arbitrary value types were assignment compatible like that, it would provide an easy way to defeat encapsulation. DateTime contains a private field; you're not supposed to mess with it and e.g. break its invariants, but if you could cast a long[] with arbitrary value in it to DateTime[], that's precisely what you could do. In contrast, treating int[] as uint[] is safe in that the types have exact, well-defined 1-to-1 value correspondence, so there's nothing you could possibly gain in terms of circumventing encapsulation or type safety by such a cast.

    And before you mention reflection, remember that it requires certain CAS permission checks for the calling code, while casting int[] to uint[] does not.

  • @Pavel - Thanks for the explanations.  The inconsistentancies with enums boxing, unboxing, and IS were particularly mind blowing.

  • Off-topic (?):

    I would like to see from Microsoft, within a decade from now, a complete new framework base without these types of incoherent behavior and supporting natively:

    Co+contra variance at the type system level (not just for delegates and interfaces), tuples, inmmutabiilty, STM, weak events, unloadable app-domains, and whatever needed thinking in tomorrow computing.

    Stop forcing compatibility with legacy technologies and bring something CONSISTENT.

    Then for compatibility make special types (like that dynamic of c# 4.0) or virtualize.

  • Nestor:

      I think you miss the point.  Microsoft was doing exactly what you asked: building a new framework that was consistent and easy to use.  They called it .NET and shipped it in 2002.  7 years later we find that a few (and a very few by the way) of the decisions made back in 1998 and 1999 are different than we would have chosen.

    The problem is that without the benefit of a time machine, there is no reason to believe they would do any better this time than last time.  I'm sure they would have never done anonymous delegates in C# 2,0 if they knew that lambdas were comming in C# 3.0.  The new system would sprout its own warts and  be just as inconsistent as ,NET within a month of release.

    Furthermore it seems most of what you ask for can (and in many instances is) be implemented without a top to bottom rewrite.  Read Joel's adminition on why you should never start over from scratch: http://www.joelonsoftware.com/articles/fog0000000069.html

  • Interesting post. Is this going to change, along with the generic co/contravariance changes in C#4.0?

  • @John

    The article by Spolsky is a bit drastic. And after the (good) initial assumption that writing a program from scratch will take a long time, it continues with a long list of "what ifs" that really don't demonstrate anything. I will reply to his article with my own questions:

    - what if the old program REALLY is a mess?

    - what if the old program was badly patched/maintained for 10 years?

    - what if the new team DOES have more experience?

    - what if the new team gets better requirements than the old team?

    - what if the old program (service) is still making an astonishing amount of revenue, while we develop the new one?

    - what if I want to switch platform?

    My company switched platform for it's core business from mainframe to .NET 4 years ago. With a total rewrite. It was long and hard, but it worked.

    But, obviously, it's not a rule. Recently, we developed a new version of another program, switching from .NET to Java, with a rewrite from zero. The results were... uhm... less brilliant :-)

  • @Filini:

    Or perhaps the results were more brillant:  http://thedailywtf.com/articles/the_brillant_paula_bean.aspx

  • > Stop forcing compatibility with legacy technologies and bring something CONSISTENT.

    It's funny how .NET is already "legacy", when just 3-4 years ago you could still hear a lot of people moaning about how COM (usually in VB6 context) was perfectly fine, and how evil MS is because it "killed" (eh?) it with .NET.

  • John:

    I was not talking about a fully rewrite of everything. I just wanted to put in perspective -within the next decade- that .NET will become of age and there is IMHO a need to sacrifice legacy compatibility toward reaching a consistent multiparadigm coexistence in the same framework.

    Considering the evolution of such paradigms (functional composability, meta-programming, etc.), plus multicore hardware, the Cloud and so on... I think it is becoming necessary to think ahead like in these 1995-1999 years when .NET was conceived.

    Let's think about the ".CLOUD" (or ".CORE") framework structural basis ;)

    I'm sure these kind of things are being discussed at some level on MS, but of course they cannot share these speculative thinkings for the same reasons the USAF doesn't talk so much about Area 51.

  • It seems like option 3 would have been to use the CLR assignment compatibility check, plus an extra check for the known differences between the C# rules and the CLR rules (which as you said are extremely limited). This would limit the performance impact but still maintain consistency.

    Broadly speaking, I much prefer consistency over performance shortcuts. Although you describe "a method which checks all the C# rules for identity-preserving convertibility" as "slow", in reality it would be blindingly fast; it would only be slow compared to the even-more-blindingly fast CLR check. We all know the cliche about premature optimization, but frankly it is cliche because it is so often true. I would rather suffer an incredibly tiny performance degradation and avoid these kinds of inconsistencies that lead to subtle bugs, which cost time, money, and user confidence.

  • I was amazed to know the "C# team" had done a trade-off such as this one. You chose speed over correctness (w.r.t. the C# specification)!

    Yep. If it helps you get through the grieving process, think of it not as an incorrectness but as a special extra feature. An extension of the language, as it were. -- Eric

    Had you chosen speed over prettyness, it could be understood, not this.

    You bended the rules, and thought no one's ever going to use this, so no one's ever going to notice this...but who knows how many hours have been lost debugging such a thing.

    We'll never know. But considering that I've personally seen a grand total of one user mention this issue in the last four years, my guess would be that the number is small. -- Eric

    I felt glad that I am so many times lazy, for not using uint or even arrays of ints (I prefer ints and generic ILists for almost anything I do, as long as it doesn't become a performance bottleneck)!

  • > You bended the rules, and thought no one's ever going to use this, so no one's ever going to notice this...but who knows how many hours have been lost debugging such a thing.

    Can you come up with a realistic scenario where allowing to treat int[] as uint[] would be harmful and require "hours of debugging"?

    Actually, can you come up with a realistic scenario where strict array variance semantics, as described by C# spec, is fundamental to the design of the code in question, and would be broken on VC#?

Page 1 of 2 (22 items) 12