Fabulous Adventures In Coding

Eric Lippert's Blog

Type inference woes, part one

The C# compiler has a subtle violation of the specification which raises an interesting question for some of the new LINQ-related features. The specification for the ?: operator states the following:

The second and third operands of the ?: operator control the type of the conditional expression. Let X and Y be the types of the second and third operands. Then,

  • If X and Y are the same type, then this is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from X to Y, but not from Y to X, then Y is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from Y to X, but not from X to Y, then X is the type of the conditional expression.
  • Otherwise, no expression type can be determined, and a compile-time error occurs.

This makes a lot of sense. If you have

Giraffe g = (a > b) ? new Giraffe() : new Mammal();

then you ought to get a type error. The type of the ternary expression should be Mammal, not Giraffe, because Giraffe goes to Mammal but Mammal does not go to Giraffe.

If both types go to the other (which can happen if both types define an implicit conversion operator for the other) then we can't decide which one is better and give up.

If neither type goes to the other, then notice that we do not attempt to find the "nearest encompassing type". For example, if we had

Mammal m = (a > b) ? new Dog() : new Cat();

then we would again get a type error. We want to be able to have a unique type as the type of the ternary expression. We do not want to get into the business of saying "well, Dog and Cat are both subclasses of Mammal, but they are also both implementors of IHousepet, so which one is the "real" nearest encompassing type?" It's just too big a can of worms. We like the principle that the type of the expression must be the type of something in the expression.

So where's the spec violation? Well, according to the specification above, is this legal?

short s = 123;
int i = (a > b) ? 0 : s;

X is int, Y is short. Y goes to X, X does not go to Y, therefore X is the type of the expression, so this should work. But in fact, C# reports this as an error, because the rule C# actually implements is:

Let B and C be the second and third operands. Let X and Y be the types of the second and third operands. Then,

  • If X and Y are the same type, then this is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from B to Y, but not from C to X, then Y is the type of the conditional expression.
  • Otherwise, if an implicit conversion exists from C to X, but not from B to Y, then X is the type of the conditional expression.
  • Otherwise, no expression type can be determined, and a compile-time error occurs.

Since the types are not the same, and literal zero goes to short, and s goes to int, no expression type can be determined and we report an error.

One could make the argument that this is the better behaviour. It certainly does seem that in this case it's arguably ambiguous what the best type for the expression is since both halves can be converted to both int and short.

Now, since this is an error case, surely we can simply fix the bug. We'd be turning a case which is presently an error into something which works. Turning broken code into working code is by definition not a breaking change -- a breaking change is turning working code into broken code, or working code with different semantics.

Unfortunately, fixing this would be a breaking change:

public delegate int D();
...
D d = (a > b) ? (D)(delegate() { return 1; }) : delegate() { return 2; };

Notice that the latter expression has no type at all. Yes, anonymous methods expressions are expressions without a type. If we used the "expression" version of the algorithm then the second expression is convertible to the type of the first, but the first is not convertible to the non-existant type of the second, so the type of the expression is the type of the first. If we used the spec-compliant "type" version then the second expression doesn't even have a type, so there's no type we can infer!

Next time I'll talk about an algorithm that might save the day here, and also discuss how we'll need to generalize this to make some of the LINQ-related language features work.

Published Wednesday, May 24, 2006 12:16 PM by Eric Lippert
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Marc Brooks said:

<blockquote>Now, since this is an error case, surely we can simply fix the bug. We'd be turning a case which is presently an error into something which works. Turning broken code into working code is by definition not a breaking change -- a breaking change is turning working code into broken code, or working code with different semantics.</blockquote>

Someone should tell who-ever closed this (still lurking) bug:

http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=9810731d-021a-4de1-90aa-24fdc0d259b4
May 24, 2006 6:55 PM
 

Peter Ritchie said:

There's more than one bug closed as "Won't fix" despite not being a breaking change, based on that criteria.
May 24, 2006 10:09 PM
 

Eric Lippert said:

I have no opinion whatsoever on the merits or drawbacks of "wont fix"ing the specific bug you mention. However I will note that making a change in the _runtime_ behaviour of a framework function is _fundamentally_ riskier than making a change in the _compile time_ behaviour of the compiler. It is reasonably safe to say that no person is _relying_ upon the compiler _failing_ to compile code which under the specification really ought to compile.  It's not like if we figure out a way to fix this that suddenly a fortune 500 company's web site will stop working.  With runtime behaviour changes it is extremely difficult to know who is relying on "incorrect" behaviour continuing to be incorrect.  
May 24, 2006 10:55 PM
 

Evin said:

So why do anonymous method expressions have no type?  Why not give them a type like "() -> int", which could be implicitly convertible to D?
May 25, 2006 12:32 PM
 

Eric Lippert said:

First off, we _don't_ know the return type. I'll be discussing problems with inferring the return type in future posts.  Second, how would you represent it in metadata?  Third, how would you name this type so that you could use it in an "is" expression?  Fourth, what Type object would typeof return?  

Giving a thing a type is surprisingly tricky.  We want the C# type system to map reasonably well to the .NET CLR type system, so things like anonymous methods, lambdas, and method groups are considered to have no type.
May 25, 2006 1:04 PM
 

Stuart Ballard said:

Surely since the specification describes *only* the behavior for things where asking about "the type of this expression" makes sense, what you do with these untyped expressions is already outside the scope of the spec?

It seems like it would be perfectly possible to implement the specced behavior in any case where both expressions have types, but do what you do now in the case where one or both don't. And then in the next release of the spec work to get it amended to cover this case too.

Stuart.
May 26, 2006 2:13 PM
 

Stuart Ballard said:

This brings up a more general question actually that I'd be really curious to know your take on. The CLR's type system improved radically with the introduction of generics in 2.0. From what I've read the Orcas CLR is going to have a much more limited scope of changes. But I've encountered quite a lot of situations already, both in my own work and in relation to situations like this where the issues follow naturally from the C# team's own work on future language versions, where enhancements to the CLR's type system would really help.

1) Anonymous types could really use some CLR-level existence so that equivalent anonymous types could stay equivalent across assemblies.
2) I'd love to be able to say class Foo<T> { T? val = null; } and have that compile, with val being a Nullable<T> if T:struct or just a T otherwise. If this had existed when generic IDictionary was being specced, it could have been more compatible with the non-generic version in the case where the key isn't present... Correct me if I'm wrong, but I don't think that can be done at purely the language level.
3) Variance in generics / wildcard types: http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=0819914e-5016-4f90-ba06-f54945576e70
4) The ability to express the concept of a function (eg int->bool or whatever). This manifests in situations where Func<T, bool> can't be cast to Predicate<T> despite being entirely equivalent. Solving this might have benefits for solving the problem with the ternary mentioned here.

My question is (a) whether there is a group actively working on improvements to the type system for Orcas+1, and (b) are any of them blogging? ;)
May 26, 2006 3:03 PM

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required
Submit

About Eric Lippert

Eric Lippert is a senior developer on the Microsoft C# compiler team. Before that he worked on the framework of Visual Studio Tools For Office. Before that, he worked on the compilers, runtimes and tools for VBScript, JScript, Windows Script Host and other Microsoft Scripting technologies. He lives in Seattle and spends his free time editing books about programming languages, playing the piano, and trying to keep his tiny sailboat upright in Puget Sound.

This Blog

Syndication


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker