C# 3.0 Return Type Inference Does Not Work On Method Groups

C# 3.0 Return Type Inference Does Not Work On Method Groups

Rate This
  • Comments 17

UPDATE:

A lot of people have linked to this article which is somewhat out of date now. This article describes why in C# 3.0, return type inference does not work when the argument is a method group and the formal parameter is a generic delegate. As you can see, there was a lot of feedback on this article; as a result, we revisited that design decision for C# 4.0. In C# 4.0, return type inference works on method group arguments when the method group can be associated unambiguously with a completely fixed set of argument types deduced from the delegate. Once the argument types associated with the method group are known, then overload resolution can determine unambiguously which method in the method group is the one associated with the delegate formal parameter; we can then make a return type inference from the specific method to the delegate return type.

I return you now to the original article.


Thanks all for your excellent feedback on my variance series. I am still reading and digesting the feedback and I shall take it to the language design committee later this month.

Returning to less hypothetical but equally complicated issues, today, what I thought would be a rather obscure issue with C# 3.0 type inference which multiple people have reported to me since the last beta. Perhaps it is not as obscure as I thought!

Consider this (contrived, but simplified from real code) example where we want to see if a collection of customers has every member from London or not. The LINQ-to-Objects library has an All extension method sequence operator which returns true if a predicate is true of all members of a collection, so we can use that:

static string GetCity(Customer c)
{
  return c.City;
}
static bool IsLondon(string s)
{
  return s == "London";
}

This works just fine:

var cities = customers.Select(c=>GetCity(c));
bool allLondon = cities.All(s=>IsLondon(s));

Now, assuming that this is LINQ-to-Objects and those lambdas are going to be turned into delegates, why do we need the lambdas at all? We can just turn the method groups into delegates directly. This works just fine too:

var cities = customers.Select(c=>GetCity(c));
bool allLondon = cities.All(IsLondon);

But this does not work:

var cities = customers.Select(GetCity); // Inference error

That fails with a type inference error stating that the type arguments for Select could not be determined. If you supply them explicitly, everything works:

var cities = customers.Select<Customer, string>(GetCity); // No problem.

Why does this fail on the Select but not on the All? And is that the correct behaviour according to the specification?

Unfortunately, the published 3.0 specification states that this should work for the Select. I did not realize that the implementation was out of line with the specification until the spec was already on the web, and upon review, we realized that the implementation was correct and the specification was wrong. (We will batch up the specification errors we’ve discovered so far and fix them all at once in a future revision; we do not want to fix specification errors piecemeal.)

Let’s go through the process of type inference and we’ll see where things go wrong. The definition of Select is:

static IEnumerable<R> Select<C, R>(this IEnumerable<C> collection, Func<C, R> projection) { …

When we see

customers.Select(c=>GetCity(c));

the type inference process reasons as follows. From the first argument we unambiguously deduce that C is Customer. That means that the second argument must be a Func<Customer, R>, but we do not know what R is. We then analyze the lambda as if it were written (Customer c)=>{ return GetCity(c); }. We do overload resolution on GetCity(c) and make an exact match to the string GetCity(Customer) method. We therefore deduce that the lambda returns a string, and therefore deduce that R is string.

At this point we have now deduced both C and R. We verify that the arguments are all compatible with the deduced types of the formal parameters and life is good.

Now consider what happens when we see:

customers.Select(GetCity);

This time, type inference goes like this: again, we unambiguously deduce that C is Customer. The second argument is a Func<Customer, R>, but we do not know what R is. We have a method group as the second argument. In order to determine what method the method group refers to, we must do overload resolution on it. But… we do not have any arguments! How can we do overload resolution on GetCity if we do not know what the arguments to the call are? Sure, in this case there is only one method in the method group and it is not generic, so we could just pick it. But we ought to solve the general problem here.

Now, you might point out that we manage to do overload resolution correctly for this case:

Func<Customer, string> f = GetCity;

Overload resolution works just fine there even though we have no arguments to GetCity. In that case we take the signature of the delegate and use the type of each parameter in the delegate signature to do overload resolution. That is, we do it as though we were doing overload resolution on GetCity(default(Customer)). That is then enough information to do overload resolution on GetCity.

I seem to be digging myself in deeper. Why don’t we do the same thing in the type inference case above?

Because I skipped a step in overload resolution. What if we need to do type inference on GetCity?

Suppose instead of the simple example above we had something like

S GetCity<X, S>(X x) { …

Now consider what happens when we have

Func<Customer, string> f = GetCity;

In this case before we do overload resolution we first do type inference FROM Func<Customer, string> TO GetCity<X, S> and deduce that X and S are Customer and string. Then overload resolution proceeds normally.

Am I still digging myself deeper? No. Perhaps now you see the problem. Suppose we had a generic GetCity and we said:

customers.Select(GetCity);

We get as far as deducing that we have Func<Customer, R>, and now we are in a hideous bind. In order to determine R we need to determine the return type of GetCity. To determine the return type of GetCity we need to do overload resolution on the method group. To do overload resolution on the method group, we need to do type inference on the method group. To do type inference on it we need to know what delegate type it is going to. It is going to Func<Customer, R>. But R is what we are trying to figure out! We do not yet know the delegate type that it is going to, so overload resolution is in general impossible. This is a circle that will stay unbroken (by and by Lord, by and by.)

It does not actually make any sense to try to do overload resolution in the general type inference scenario. We could have come up with a weaker form of overload resolution that worked in an environment where type inference might not succeed, but that would then introduce a brand new and subtly incompatible set of overload resolution rules which would only come into effect in these obscure inference scenarios where one type inference triggers another. We want the language semantics to remain understandable and testable, and we’re pushing that edge pretty hard already with the new type inference features.

Therefore this scenario is illegal; you cannot do return type inference on a method group, only on a lambda. The compiler can peer into the lambda and determine what type it would return were all its arguments known, but the compiler cannot do that with a method group. There is simply less information available with a method group.

Why then does the All(IsLondon) case work?

Because the signature of All is

static bool All<C>(this IEnumerable<C> collection, Func<C, bool> predicate)

By the type we need to do overload resolution on the method group the type of the second parameter is entirely known. The rule that return type inference cannot be done on method groups does not apply because there is no return type inference necessary.

[UPDATE: In the C# specification, an expression naming a method is called a "method group". Unfortunately, in the compiler source code they are called "member groups", and this gets me in the bad habit of using imprecise language. In the original version of this article I referred to "member groups" throughout; I've fixed those up. Sorry for any confusion.]

  • Eric, I think you just committed "Developer Overloading" <grin>, My brain HURTS.

    Actually a very well written explaination. I have often tried to explain the inference process to my clients, only to be met with blank stares. This post will definately help in that regard.

    Cheerrs!

  • I'm a bit confused:

    S GetCity<X, S>(X x) { …

    var fails = customers.Select(GetCity); // [1]

    var alsoFails = customers.Select(c=>GetCity(c)); // [2]

    If we had overload resolution that only fails when type information cannot be derived, is there a case where [2] would work and [1] would not?  (I don't see how this could be possible.)  Perhaps the devil is in the details, but I don't see why it's that hard to specify that overload resolution fails if all the types cannot be determined.  Could you shed some light on some of the "subtly incompatible set of overload resolution rules" that would be needed?

  • You are correct in that if overload resolution is going to fail for the method group, then it is going to fail in the lambda as well.

    The subtlety is this: suppose we decided to go ahead and do overload resolution anyway.  In this case we would then deduce that S is R, and therefore deduce that R is... R.  

    What should we do in that case?  Should we have overload resolution pick a different method?  That would then be changing the overload resolution rules in a subtle and confusing way.

    Suppose we didn't do that. Suppose we added R to R's boundary set. Now suppose we had a different inference path, say from a third argument, from which we would deduce that R is string.  Now what do we do to resolve the conflict?  There is no implicit conversion from string to R or R to string, so we'd have to have inference fail.

    Which seems bad.  We could then say well, never add R to R's bounds set.  Just ignore it if that happens.  Keep on trucking. Which is another subtle change to the type inference algorithm.

    And then you have to think about what happens if we are deducing not that R is R, but that, say, R is IConvertible<R>.  Or what if there are more complex relationships between the type variable, the inferred return type, and other types in the bounds set.

    It gets to be a big mess, and all because we have a circularity whereby we are trying to make an inference both from and to the same thing at the same time. It's logically weird and we just don't want to go there.

  • Sorry I'm lost in your inference. If we have already get Func<Customer, R>, didn't we already know the argument list of GetCity()? Then it should be straightforward to pick an overload of GetCity() and find the return type. Why does it have to determine return type for overload resolution?

  • After working on this problem for a while, I realized the same thing as qrli.  You should never have made the return and argument type parameters in your posts on variance different than Func<A, R>. :-p  I can't decide whether this is important though, due to the fact that we need to imply a delegate type.  Is there no way for the type inference engine to know that a method's arguments are completely specified by Func<Customer, R> and that only one overload could possibly be chosen?

    You describe the unbroken circle problem when dealing with the generic version of GetCity.  However, is it not possible to try overloads that would be a better match than your generic version, overloads for which there are no unknown type parameters?  If both the generic and non-generic versions exist and I write the lambda as above, the non-generic version will be chosen.  (I assume this is due to §14.4.2.2 of the C# spec.)

    If the generic version of GetCity is the only one that exists, [2] fails, so I don't know why we're worrying about making [1] work.

    Does overload resolution with lambda methods include any of the subtleties you are trying to avoid with non-lambda overload resolution?

  • Perhaps this is a stupid question (I don't know C# very well) but... are there any cases where automatically rewriting:

    var cities = customers.Select(GetCity);

    as:

    var cities = customers.Select(c=>GetCity(c));

    will do the wrong thing? It seems blatantly obvious that the former is unambiguous, in the given case, so it seems to me to be a bug if C# can't work that out for itself.

  • > Why does it have to determine return type for overload resolution?

    Overload resolution goes like this (simplified):

    * Add all nongeneric methods in the method group to the candidate set.

    * Do type inference on all generic methods and add all successful inferences to the candidate set.

    * Remove all the nonapplicable methods (ie, where arguments do not go to parameter types)  from the candidate set.

    * Choose the unique best candidate from the applicable candidates.

    You have to determine the return type in order to do step two -- make type inferences on generic methods.

  • > However, is it not possible to try overloads that would be a better match than your generic version, overloads for which there are no unknown type parameters?

    Yes. That is exactly what I mean by making a subtle change to the overload resolution algorithm just for this case.

    We could mess with the overload resolution algorithm in all the ways you suggest, and we'd then have _two_ overload resolution algorithms, one for regular code and one for nested type inference on method groups, which would produce sublty different results.  We don't want to go there.

  • > If both the generic and non-generic versions exist and I write the lambda as above, the non-generic version will be chosen.

    That rule only comes into effect when there is an exact tie, ie, you have Foo(int) and Foo<T>(T) in the candidate set and you call Foo(1).  Type inference makes Foo<int> a candidate and the nongeneric is then chosen by the betterness algorithm.

    One of the subtleties that you'd run into by messing with the overload resolution algorithm is that now generic methods that otherwise would be chosen as the better match disappear from the candidate set seemingly inexplicably.

  • > It seems blatantly obvious that the former is unambiguous

    Sure, it is. And we could write a new rule, say, that return type inference from method groups is legal only when it is "blatantly obvious" what method the method group refers to. But we'd then have to write a new "blatantly obvious overload resolution" algorithm which was different from the regular overload resolution algorithm. And now, again, we have two overload resolution algorithms.

    I think my point here has been somewhat lost.  Let me restate it:

    * the specification for return type inference from a lambda explicitly does NOT require the target delegate type to be known.

    * the specification for overload resolution on a method group explicitly DOES require the target delegate type to be known.  It's not necessary in absolutely every case, but the algorithm does call for knowing what the return type is for overload resolution on a method group containing generic methods.

    * in the type inference scenario, the target delegate type is exactly what we are trying to infer; it is unknown.

    * therefore it is rather nonsensical to specify that we ought to use overload resolution as an explicit part of return type inference

    Or, even more briefly:

    * We have designed the type inference algorithm so that we always infer from known to unknown at every step. Return type inference on method groups as specified requires sometimes inferring from unknown to unknown. That wasn't our intention, it wasn't what we implemented, and it got into the published specification by accident.

  • > One of the subtleties that you'd run into by messing with the overload resolution algorithm is that now generic methods that otherwise would be chosen as the better match disappear from the candidate set seemingly inexplicably.

    I keep returning to this and it's bothering me -- do you have an example of such a generic method?

  • For now, I can understand the case when there is a GetCity<S, R>() generic overload, and it is understandable that the generic parameters cannot be deduced. But I think it will surprise most programmers that it won't work if there's only non-generic GetCity().

    I tend to think that there's something wrong with method-group-to-delegate conversion, or accurately, deduction. The conversion from method group to delegate appears to be so significant althought the syntax intents to hide the complexity. I used to think it is trivial but I was surprised to find that Control.Invoke() doesn't accept a method group.

    So I got the same question as Richard. Why cannot you do the conversion in the lambda way instead of the method-group-to-delegate way? It seems more powerful to treat method group as a terser form of lambda expression.

  • qrli, Richard,

    do you mean resolving the method group as if it were a lambda call, or actually generating a method that does the call?

    Just resolving it like this would be fine with me, although I figure Eric's gonna tell us that this is too much of a special case. Still, it would remove this horrible case where noone will understand why referring does not work!

    Acutally generating this method would result in some overhead, and I'm not sure if it is acceptable to introduce this overhead just to enable deferring.

    Just creating an actual method if the method group cannot be resolved otherwise would create a speial case too, plus it would mean that you have to guess if any overhead is invisibly created, you won't see it in the code. This is hardly how languages like C# work. (Then again, while it is easier to figure out whether an anonymous delegate/lambda results in a closure might be easier than this, it could still be considered a precedent that we don't actually care about such minor performance penalties. After all, the costs of a closure are MUCH higher than creating a method for just passing stuff through).

    I understand the language designer's perspective, but in this case I'd vote for some kind of fix that resolves this from a user's perspective. I think it might even be easier to understand the rules of inference if they are the same for method groups as for lambdas. But then again, it might not, and it might lead to breaking changes with existing C# 2.0 code.

  • Eric, have you abandoned this post (or perhaps been ultra-busy)?  The question in my comment above was for you...

  • Ayende在使用.Net3.0的时候遇到了这样一个问题Csc.exeanddelegateinference,or:WhyC#hasawkwardsyntax Co...

Page 1 of 2 (17 items) 12