Type inference woes, part one

Type inference woes, part one

Rate This
  • Comments 7

The C# compiler has a subtle violation of the specification which raises an interesting question for some of the new LINQ-related features. The specification for the ?: operator states the following:

The second and third operands of the ?: operator control the type of the conditional expression. Let X and Y be the types of the second and third operands. Then,

  • If X and Y are the same type, then this is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from X to Y, but not from Y to X, then Y is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from Y to X, but not from X to Y, then X is the type of the conditional expression.
  • Otherwise, no expression type can be determined, and a compile-time error occurs.

This makes a lot of sense. If you have

Giraffe g = (a > b) ? new Giraffe() : new Mammal();

then you ought to get a type error. The type of the ternary expression should be Mammal, not Giraffe, because Giraffe goes to Mammal but Mammal does not go to Giraffe.

If both types go to the other (which can happen if both types define an implicit conversion operator for the other) then we can't decide which one is better and give up.

If neither type goes to the other, then notice that we do not attempt to find the "nearest encompassing type". For example, if we had

Mammal m = (a > b) ? new Dog() : new Cat();

then we would again get a type error. We want to be able to have a unique type as the type of the ternary expression. We do not want to get into the business of saying "well, Dog and Cat are both subclasses of Mammal, but they are also both implementors of IHousepet, so which one is the "real" nearest encompassing type?" It's just too big a can of worms. We like the principle that the type of the expression must be the type of something in the expression.

So where's the spec violation? Well, according to the specification above, is this legal?

short s = 123;
int i = (a > b) ? 0 : s;

X is int, Y is short. Y goes to X, X does not go to Y, therefore X is the type of the expression, so this should work. But in fact, C# reports this as an error, because the rule C# actually implements is:

Let B and C be the second and third operands. Let X and Y be the types of the second and third operands. Then,

  • If X and Y are the same type, then this is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from B to Y, but not from C to X, then Y is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from C to X, but not from B to Y, then X is the type of the conditional expression.
  • Otherwise, no expression type can be determined, and a compile-time error occurs.

Since the types are not the same, and literal zero goes to short, and s goes to int, no expression type can be determined and we report an error.

One could make the argument that this is the better behaviour. It certainly does seem that in this case it's arguably ambiguous what the best type for the expression is since both halves can be converted to both int and short.

Now, since this is an error case, surely we can simply fix the bug. We'd be turning a case which is presently an error into something which works. Turning broken code into working code is by definition not a breaking change -- a breaking change is turning working code into broken code, or working code with different semantics.

Unfortunately, fixing this would be a breaking change:

public delegate int D();
...
D d = (a > b) ? (D)(delegate() { return 1; }) : delegate() { return 2; };

Notice that the latter expression has no type at all. Yes, anonymous methods expressions are expressions without a type. If we used the "expression" version of the algorithm then the second expression is convertible to the type of the first, but the first is not convertible to the non-existant type of the second, so the type of the expression is the type of the first. If we used the spec-compliant "type" version then the second expression doesn't even have a type, so there's no type we can infer!

Next time I'll talk about an algorithm that might save the day here, and also discuss how we'll need to generalize this to make some of the LINQ-related language features work.

  • <blockquote>Now, since this is an error case, surely we can simply fix the bug. We'd be turning a case which is presently an error into something which works. Turning broken code into working code is by definition not a breaking change -- a breaking change is turning working code into broken code, or working code with different semantics.</blockquote>

    Someone should tell who-ever closed this (still lurking) bug:

    http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=9810731d-021a-4de1-90aa-24fdc0d259b4
  • There's more than one bug closed as "Won't fix" despite not being a breaking change, based on that criteria.
  • I have no opinion whatsoever on the merits or drawbacks of "wont fix"ing the specific bug you mention. However I will note that making a change in the _runtime_ behaviour of a framework function is _fundamentally_ riskier than making a change in the _compile time_ behaviour of the compiler. It is reasonably safe to say that no person is _relying_ upon the compiler _failing_ to compile code which under the specification really ought to compile.  It's not like if we figure out a way to fix this that suddenly a fortune 500 company's web site will stop working.  With runtime behaviour changes it is extremely difficult to know who is relying on "incorrect" behaviour continuing to be incorrect.  
  • So why do anonymous method expressions have no type?  Why not give them a type like "() -> int", which could be implicitly convertible to D?
  • First off, we _don't_ know the return type. I'll be discussing problems with inferring the return type in future posts.  Second, how would you represent it in metadata?  Third, how would you name this type so that you could use it in an "is" expression?  Fourth, what Type object would typeof return?  

    Giving a thing a type is surprisingly tricky.  We want the C# type system to map reasonably well to the .NET CLR type system, so things like anonymous methods, lambdas, and method groups are considered to have no type.
  • Surely since the specification describes *only* the behavior for things where asking about "the type of this expression" makes sense, what you do with these untyped expressions is already outside the scope of the spec?

    It seems like it would be perfectly possible to implement the specced behavior in any case where both expressions have types, but do what you do now in the case where one or both don't. And then in the next release of the spec work to get it amended to cover this case too.

    Stuart.
  • This brings up a more general question actually that I'd be really curious to know your take on. The CLR's type system improved radically with the introduction of generics in 2.0. From what I've read the Orcas CLR is going to have a much more limited scope of changes. But I've encountered quite a lot of situations already, both in my own work and in relation to situations like this where the issues follow naturally from the C# team's own work on future language versions, where enhancements to the CLR's type system would really help.

    1) Anonymous types could really use some CLR-level existence so that equivalent anonymous types could stay equivalent across assemblies.
    2) I'd love to be able to say class Foo<T> { T? val = null; } and have that compile, with val being a Nullable<T> if T:struct or just a T otherwise. If this had existed when generic IDictionary was being specced, it could have been more compatible with the non-generic version in the case where the key isn't present... Correct me if I'm wrong, but I don't think that can be done at purely the language level.
    3) Variance in generics / wildcard types: http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=0819914e-5016-4f90-ba06-f54945576e70
    4) The ability to express the concept of a function (eg int->bool or whatever). This manifests in situations where Func<T, bool> can't be cast to Predicate<T> despite being entirely equivalent. Solving this might have benefits for solving the problem with the ternary mentioned here.

    My question is (a) whether there is a group actively working on improvements to the type system for Orcas+1, and (b) are any of them blogging? ;)
Page 1 of 1 (7 items)