Covariance and Contravariance in C#, Part Eight: Syntax Options

Covariance and Contravariance in C#, Part Eight: Syntax Options

Rate This
  • Comments 66

As I discussed last time, were we to introduce interface and delegate variance in a hypothetical future version of C# we would need a syntax for it. Here are some possibilities that immediately come to mind.

Option 1:

interface IFoo<+T, -U> { T Foo(U u); }

The CLR uses the convention I have been using so far in this series of “+ means covariant, - means contravariant”. Though this does have some mnemonic value (because + means “is compatible with a bigger type”), most people (including members of the C# design committee!) have a hard time remembering exactly which is which.

This convention is also used by the Scala programming language.

Option 2:

interface IFoo<T:*, *:U> { …

This more graphically indicates “something which is extended by T” and “something which extends U”.  This is similar to Java’s “wildcard types”, where they say “? extends U” or “? super T”.

Though this isn’t terrible, I think it’s a bit of a conflation of the notions of extension and assignment compatibility. I do not want to imply that IEnumerable<Animal> is a base of IEnumerable<Giraffe>, even if Animal is a base of Giraffe. Rather, I want to say that IEnumerable<Giraffe> is convertible to IEnumerable<Animal>, or assignment compatible, or some such thing. I don’t want to conceptually overwork the inheritance mechanism. It's bad enough IMO that we conflate base classes with base interfaces.

Option 3:

interface IFoo<T, U> where T: covariant, U: contravariant { …

Again, not too bad. The danger here is similar to that of the plus and minus: that no one remembers what “contravariant” and “covariant” mean. This has the benefit at least that you can do a web search on the keywords and get a reasonable explanation.

Option 4:

interface IFoo<[Covariant] T, [Contravariant] U>  { …

Similar to option 3.

Option 5:

interface IFoo<out T, in U> { …

We are taking a different tack with this syntax. In all the options so far we have been describing how the user of the interface may treat the interface with respect to the type system rules for implicit conversions – that is, what are the legal variances on the type parameters. Here we are instead describing this in the language of how the implementer of the interface intends to use the type parameters.

I like this one a lot; the down side of this is of course that, as I described a few posts ago, you end up with situations like

delegate void Meta<out T>(Action<T> action);

where the "out" T is clearly used in an input position.

Option 6:

Do something else I haven’t thought of. Anyone who has bright ideas, please leave comments.

Next time: what problems are introduced by adding this kind of variance?

  • How about re-using the existing c# syntax and allow re-ordering the parameters

    interface IFoo<T, U>

      where T: Mammal

      where Mammal : U

    {

    }

  • >  I still don't understand how *:U expresses "something which U extends".

    You and Bradley are right, that is badly worded. I intended to say "something which extends U". I'll fix it.

  • Stuart: I want to be able to declare a variable of type List<?>

    Indeed, when we were designing anonymous types we considered doing that kind of inference. We're calling those "mumble types", as in "I have a list of... mumble mumble mumble".

    Obviously we didn't end up doing them for C# 3.0 but it does seem like a generally useful addition to the type system, so we'll consider it for hypothetical future versions.

  • I'm going to cast my vote for Luke's "is * / * is" idea.  In present-day C# where we don't have cov/cnv, I would have to test and cast the interface manually, and that is precisely the test I'd be using.  It seems natural to put it in the definition instead.

    I strongly dislike the overly verbose versions (#3 and #4).  I thought that the reason we used C# instead of VB is that we don't particularly like the verbosity.  Plus, even though I can remember what covariant and contravariant mean on a basic level, I'm never going to remember the exact effect each one has within the language.

    I agree that #2 is conflating two very different concepts, but they're already conflated in Generics in exactly this way.  If I declare IFoo<T> where T : IBar, and I have a class MyClass which implements IBar, I can legally use an IFoo<MyClass>.  Furthermore, this doesn't bother me and I don't find it the slightest bit confusing; it mirrors the real inherits/implements syntax.

    #1 is okay.  It conveys a bit more information about what's actually happening than the verbose versions, but I still know I'm going to screw it up eventually if I have to contend with it.  And if I were to hand the code off to an unfamiliar programmer, he might very well look at it and go, "huh"?

    #5 bugs me because in/out are already reserved words that mean something totally different.  It does make sense, but please, no more overloaded keywords.

    Also, I don't think any of these suggestions truly address the Meta issue.  It's a conceptual problem, a double-negative in a sense.  I think if people really want to program like this (I can't imagine why), then they probably already know that they're playing with fire.  It's the rest of us, who might have no idea how dangerous it is, that you should be worrying about. :-)

  • Eric, that's fantastic news that you're considering "mumble types" for a future version. I really hope it's a sooner rather than later future version - in particular I hope it ends up being at least the same version as the changes you're discussing now. I think that introducing both concepts together would help to come up with a coherent syntax between the two. And I'll shut up now about pushing for the feature because you know how I feel about it, and go back to discussing what we're talking about now.

    I'm trying to think of ways to express what Foo<* is T> actually means in english to see if there's a better way to express it in the language.

    What it really means is "Foo<T1> is Foo<T2> if T1 is T2". I find it really hard to read that meaning into *any* of the proposals (except the "in/out" one which everyone hates). Similarly contravariance means "Foo<T1> is Foo<T2> if T2 is T1"

    So how about this as an actual syntax proposal:

    public interface IEnumerable<T>

       where IEnumerable<this> is IEnumerable<base> {

     ...

    }

    public delegate Action<T>(T t)

       where Action<base> is Action<this>;

    When there's multiple type parameters it becomes a little less obvious. Some possible approaches:

    public delegate R Func<R, A>(A a)

       where Func<this, base> is Func<base, this>;

    (problem of how to express a parameter that isn't variant at all - "this" on both sides?)

    public delegate R Func<R, A>(A a)

       where Func<this, *> is Func<base, *>

       where Func<*, base> is Func<*, this>;

    (which I think I like better except that it still has the confusion of people expecting * to mean "pointer" rather than "wildcard". There are other symbols that could be used but I can't think of any actual words that carry the right meaning...)

  • Ooh, or:

    public delegate R Func<R, A>(A a)

       where Func<this, A> is Func<base, A>

       where Func<R, base> is Func<R, this>;

    Yes, it's verbose, and probably not terribly easy to write correctly (although I suspect the IDE could help a LOT with this kind of clause) but code is read much more often than it's written. And having the code written this way gives some kind of intuitive sense of what it means.

    Plus it doesn't have the meta problem:

    public delegate Action<T>(T t)

       where Action<base> is Action<this>;

    public delegate Meta<T>(Action<T> action)

       where Meta<this> is Meta<base>;

    It accurately expresses what's actually going on there without presuming the meaning as input or output.

  • just a thought experiment:

    instead of

    interface IFoo<T:*, *:U>

    we could write

    interface IFoo<T, U> : IFoo<T:*, *:U>

    or, using Luke's syntax,

    interface IFoo<T, U> : IFoo<T is *, * is U>

    this is not beautiful, but it gives a better hint as to what it really implies for the interface.

    now ":" goes to mean "assignable from" in addition to "implements" and "extends" (which is worse, because the latter two were easily identifiable due to the I-prefix for interfaces). so no actual syntax recommendation, but just a way that makes it more understandable for me, and might inspire other ideas.

    we could take Luke's way even further:

    interface IFoo<T, U> is IFoo<T is *, * is U>

    even more explicit:

    interface IFoo<T, U>

     is IFoo<A, B>

       where T is A

       where B is U

    ugly, I know.

    plus, I'm still unhappy with the positional thing. Makes me think more than it should. end of thought experiment.

    Thinking about in/out, I believe automatic variance would not be a bad thing in every case. I know, I voted against it even before you explained it, but think about this: when is any of the problems really present in case of delegates? I believe automatic co/contravariance on delegates is both easier to resolve and less likely to introduce breaking changes. also, we could use a single keyword/attribute to indicate that we want co/contravariance, and let the compiler detect which one.

    here's how I got there. The Meta<> Problem

    delegate void Action<in T> (T arg);

    delegate void Meta<out T>(Action<T> action);

    If we would be able that we want to use Action<T> as the input parameter, not T, the compiler could figure it out that T needs to be covariant, because Action<T> is already contravariant on T. So we would need some way indicate that "input" refers to Action<T>, not just T. The easiest thing would be to indicate that with the action parameter, but there it is already clear that it is an input parameter. So why bother?

    It's harder for interfaces though. They are more complex, and while it's quite clear that adding/modifying parameters in a delegate would change assignment semantics, it's more unexpected in interfaces (and essentially impossible to control, who thinks about variance whenever changing or adding interface members?). Plus, there's the exponential growth problem. Can we come up with something better?

    Let's take an example in +/- syntax:

    interface IDoer<-T> {

     void Do(T obj);

    }

    interface IDoable<+T> {

     void ApplyDoer (IDoer<T> doer);

    }

    now, we obviously would like to write IDoer<in T>, but absolutely not IDoable<out T>. Now imagine this in the syntax I recommended above:

    interface IDoer<T>

     contravariant on T

    interface IDoable<T>

     covariant on T

    if we replace contravariant with "in" and covariant with "out", we end up having the same problem. but what if, instead of "covariant on T", we could just say "contravariant on IDoer<T>"? Just like with implicit delegate co/contravariance, the compiler could easily figure out that in order for IDoer<T> to be contravariant, T needs to be covariant.

    interface IDoer<T>

     variance: in T

    interface IDoable

     variance: in IDoer<T>

    now the wording sucks. we need better keywords/syntax. but besides that, I really love this. we have in/out semantics, it is correct, and it is easily resolvable by the compiler, because it doesn't need to look at the entire interface.

    an additional problem is that documentation generation tools like ndoc could not tell that the user wrote "in IDoer<T>" instead of "out T". this is essential for documentation, so the compiler would somehow have to annotate this.

  • another thought: specifying variance for interfaces is really just a promise to not use T in any other way than specified (input/output). once this has been made clear, there's simply no reason why the compiler would not assume co/contravariance as needed, but the programmer would have to comply with that statement, which the compiler could check.

    you're not going to keep design by contract away from us forever. there's even fascinating research in MS research (spec#). how do those fit together? if variance can be derived from an explicit constraint (contract), you might want to use similar syntax, or at least align them in some way. then again, it's probably to early to know how C#'s hypothetical future DBC syntax is going to look like. still, this could be a hint when trying to cook up a syntax for what I've proposed above.

    (note that "variance: in IDoer<T>" above also implies that T is not used directly by this interface, only via IDoer<T>. this would probably have to be made explicit when using a constraint syntax.)

  • Well, since Stefan already opened the door, I love Spec#, and I want contracts in C# :).

  • that's right, mike. DBC would have a greater impact on the way we code than any co/contravariance, however fascinating this is. (even without the amazing static analysis spec# does)

    (while the door is open, can we please automatically generate unit tests from those contracts too?)

    but then I want memberof (ldtoken for members), generics and lambdas in attributes, I want Expression<Func<T>> parameterized (lambdas) so that the expression tree does not have to be parsed and processed for every call (I'll just say i4o, the indexed variant of LINQ 2 objects), and ... - no, wait, let's stay focused for a minute, OK? ;-)

    (ok, I can't hold it back. pattern matching. now I've said it. but before we get into macros and meta-programming, let's get back to that covariance syntax problem of that hypothetical... hey, can we give it an imaginary version number, like C# 4i? or does that sound too much like oracle?)

  • interface IFoo<T, U> where T is Animal, Mammal is U

    looks very intuitive - no need to remember anything.

  • In my opinion it is important to have a clear verbose format. This is something quite complicated that should not be hidden away in a small added + or -.

    I am just brainstorming here, but how about this (introducing a new keyword "assignable to"):

    interface IFoo<T,U>

     assignable to IFoo<R, S> where R is T where U is S;

    {

     ...

    }

    This has the advantage of making it extremely clear for the user of the interface what flexibility the co-/contravariance gives him, at the cost of making it unclear for the implementor which restrictions this imposes on him.

    Since interface-users should clearly outnumber implementors - especially for the types of interface where co- and contravariance is an issue - this should be an ok tradeoff.

  • I'm very much against a very verbose syntax.  In my experience though symbols are initially confusing, you get used to them and they're much easier to skim (since they're easier to ignore).  Frequently, you'll be perfectly happy to ignore covariance and contravariance, so that's a plus.

    Consider also the competition which C# faces from dynamic languages.  These are often preferable precisely because they contain less "superfluous" non-behavioural syntax.  Succintly; they just say what needs to be done, nothing more.

    I'm also hoping that a good number of other "restraints" might make it into the language.  Currently, it's not possible to demand a type have a  particular static member (except the parameterless constructor).  That capability would be immensely useful almost instantly for easy requirement of things like operators, to implementations of wrapping types which add functionality such as .Equals based on .CompareTo or whatnot.

    What core problem does covariance and contravariance solve?

    It allows more generic code.  It reduces the problem of "leaky abstractions".  Basically, it improves encapsulation.  But it's not a big win.

    For example, the ability to implement an interface as defined by a class (i.e. to _not_ inherit it's implementation optionally) might be a far greater win.  Or the ability to "add" features to a class. Or the ability to "uncurry" an instance method into a static method.

    A lot of code I encounter isn't properly structured because it's too much work to do so in C#.  If I have a custom list which implements, say, a priority queue, then implementing that as a class (with all the appropriate interfaces, including IEnumerable, IEnumerable<T>, ICollection, etc...) is a lot of work, so instead, everyone generally opts for the less encapsulated, more spagetti-code but easier option:  just implement the key feaures of a priority queue internally and don't abstract it out, don't allow code reuse, and leave the data structure implicit, not explicit.

    And not just "framework" creators need these kind of abilities (the ability to easily generate a class or instance with a particular set of "interfaces" or behaviours).  Many real-world frameworks require the "user" of the framework to provide objects which fit certain interfaces and conventions.  This pattern is sometimes called "Inversion of Control" or "Dependancy Injection" - but what it means is that:

    Everyone needs to be able to flexibly, easily, and comprehensibly be able to implement interfaces and other functionality contracts.

    Improving Covariance/Contravariance in C# should not imply that doing so becomes more complex, since that would defeat the purpose of language improvement.

  • Eamon, I don't disagree with a lot of your argument, but I don't think it applies to this particular scenario.

    The basic principle is "simple things should be simple, and complicated things should be possible". The fact that it doesn't say "complicated things should be simple" is *deliberate* - trying to achieve that always inherently hurts the simple things.

    I agree that it should be a lot simpler than it is to implement ICollection<T>, and for that matter IList<T> and IDictionary<K,V> (although I'm not sure how much easier it could be to implement IEnumerable than it is today with "yield" - that's already practically trivial). These are simple things and shouldn't be so damn hard.

    But covariance and contravariance are not simple things. They're advanced features for advanced framework designers. I can't think of a single use of this kind of variance outside of a framework. (The use-site "mumble type" variance could be used by everyone, which is one of the reasons I like it better overall). That means that (a) writing variant interfaces and delegates will be rare, and done by technically advanced users, (b) USING variant interfaces and delegates will be common by people who AREN'T so proficient, and (c) the less proficient people will only occasionally encounter variant types, so they won't have any opportunity to "get used to" the feature.

    All of that means that it's ok to write a little bit more (and we're only talking about one extra line of code per variant type parameter, here, even in the more verbose suggestions such as mine) when you're implementing the type, because the person doing that NEEDS to know what they're doing, but it's very important that someone READING the code be able to understand what it means without being an expert. One of the reasons C# is so much more friendly than C++ is that there aren't miscellaneous punctuation symbols all over the place that radically change the meaning of your code.

    Even the comments here show that the meaning of this stuff isn't at all intuitive. Eric has spent eight blog posts explaining variance in detail, along with various syntax suggestions, and yet Kerneltrap is making a syntax suggestion that demonstrates a complete lack of grasp of what variance actually means. (Not picking on you, Kerneltrap, it's NOT obvious and it IS hard to get your head around. Hence why it's so important to have a syntax that helps rather than being more obscure)

  • I agree to the notion that in this case, verbosity hurts less than perl-like obscurity. rasmus' notation is similar to my last proposed "is" notation (just replaces "is" with "assignable to"). i don't think "assignable to" is bad, but I do think that "is" is clear enough in this case, no need for another keyword (would avoid the untypical two-word keyword too). but that's just me.

    still, of the two syntaxes, i like the first one better:

    interface IFoo<T, U> is IFoo<T is *, * is U>

    (instead of crossing out explicit type parameter names, which is maybe easier to read if you do it the very first time and have to spend minutes reading it anyway, but harder if you do it on a regular basis. I posted the crossed-out syntax merely as a trigger for other people's ideas, but I don't like it as it is).

    disclaimer: i don't think that verbosity works for delegates, where variance will arguably be more common, and the overhead of verbose syntax would probably double a typical delegate declaration. i still favor automatic co/contravariance for delegates, or alternatively a single keyword that indicates that I want it, but leaves the details to the compiler.

    Eric: would variance be limited to interfaces, or could we have it for generic classes too? (after all, delegates are just classes anyway)

Page 3 of 5 (66 items) 12345