What Are The Semantics Of Multiple Implicitly Typed Declarations? Part Two

What Are The Semantics Of Multiple Implicitly Typed Declarations? Part Two

  • Comments 26

Many thanks for all your input in my informal poll yesterday. The results were similar to other "straw polls" we've done over the last couple of months. In this particular poll the results were:

var a=A, b=B; where the expressions are of different types should:

  • have the same semantics as var a=A; var b=B;: 12
  • replace the var with some type for both: 3
  • give an error: 6

There were 18 comments; a few people voted twice, which is fine with me.

The way the feature is specified is that the var is to be replaced with the best type compatible with all the expressions, to maintain the invariant that parallel declarations like this always give the same type to each variable. Many people that we've polled believe that this is the "intuitively obvious" choice, including much of the language design team. A larger group of language users believes that "infer each variable type separately" is the "intuitively obvious" choice.

So what to do? We have a relatively unimportant edge-case feature where customers strongly disagree as to what the code "obviously" means, and the difference can lead to subtle bugs. That's clearly badness. Given this feedback, amply confirmed by you all, we are probably going to simply remove multiple implicitly typed declarations from the C# 3.0 language.

Thanks for your feedback!

  • Does this mean that you're going with the "give an error" option or are you not allowing "var a=1,b=1"?
  • The plan right now is to disallow the whole thing.  That is, if var is being used as a contextual keyword, then you get one declaration per var, not a list of declarations per var.
  • so even though
      int a=1, b;
    is semantically equivalent to
      int a=1; int b;

    the exact same form with var instead of int will be illegal? Is there any other case where replacing a fixed type name with "var" would cause a declaration error?


  • Since "var" is _only_ legal in a local variable declaration with an assignment, the answer to your question is "yes -- all other cases are such cases".

    Also, any local variable context in which the type of the expression cannot be determined will also fail.  For example, Func<int, int> f = c=>c+1; succeeds, var f = c=>c+1; fails because we have no idea what the desired type of the lambda is.

     

  • Even though I voted for "equivalent to var a=A; var b=B;" I agree with your decision. I actually never use multiple declarations in a single line anyway so it won't affect me in the slightest, and considering that there are clearly a large number of (weird) people whose intuition is backwards ;) it's definitely better to disallow code that could be read ambiguously. This is similar I think to requiring break or goto at the end of each case in a switch statement - leaving it out would mean that the "obvious" meaning to a C/C++ programmer would be the exact opposite of the "obvious" meaning to everyone else :)
  • Why even alow the var type at all then?  Granted, I do not work with C# or any other C-derived, strongly-typed language, but it seems to me that the only purpose of it is to allow the declaration of variables without the programmer actually deciding what type of information they will hold...which is increadibly lazy and probably dangerous to some degree.
  • Two reasons. The not particularly good reason is that

    Dictionary<string, List<int>> mydict = new Dictionary<string, List<int>>();

    is somewhat redundant and gross looking.

    The really good reason is "because C# 3.0 will have _anonymous_ types".  Obviously if a type cannot be named then there is no way to declare a variable of that type without some kind of type inference.

  • So, then, what is the advantage of anonymous types?  I'm not trying to be a pain, by the way, I'm just curious.  As I said, I don't work with C-derived languages. I only do scripting, where the declaration of variables is almost always optional anyway; so I don't really understand the advantages/disadvantages of being required to define a type for a variable.
  • I will leave the enumeration of the advantages of static typing vs dynamic typing for another day.

    There are two main advantages of anonymous types.

    First, anonymous is generally goodness.  We already have "anonymous variables" in C#.  That is, you can write:

    a = b + c * d;

    See the anonymous variable in there?  Of course you don't.  We are so used to anonymous variables that we don't even see them anymore.  The C# compiler of course is actually generating the equivalent of

    temp = c * d;
    a = b + temp;

    C# 2.0, Jscript, etc, have anonymous methods, which is also handy.

    Anonymous types are just one more step in this direction.  You ought to be able to say "I want a name, age, phone number triplet" and have that be a statically typed entity without having to give that thing a name.

    Second, having anonymous types makes query comprehensions much easier to write:

    var results = from c in customers where c.City == "London" select new {c.Name, c.Age};

    Now suppose that we didn't have anonymous types or type inferencing.  You'd have to write:

    internal class NameAndAge { private string name; private int age; internal string Name { get ... blah blah blah, and then

    IEnumerable<NameAndAge> results = from c in customers where c.City == "London" select new NameAndAge(c.Name, c.Age);

    Now you decide that you want phone number in there as well and you have to define ANOTHER new type!  What a pain!  And then you have to update the type of results too.  

    The point of all of these new features is to make query comprehensions work _painlessly_ without giving up static typing.
  • What bothers me about anonymous types is that there's no way to pass them between methods. So you end up with entire chunks of code that are impossible to perform "extract method" refactoring on, because a variable that would need to be passed to or returned from the new method can't be.

    I like anonymous types, but I'd like them much better if they were first-class types that could be used in any context. So suppose I have code like:

    foreach (var info in from p in people select new {p.Name, p.Age}) {
     // do lots of very long and complex processing here using info
    }

    I could refactor this:

    foreach (var info in from p in people select new {p.Name, p.Age}) {
     doProcessing(info);
    }

    void doProcessing(@{string Name, int Age} info) {
     // long and complex processing here
    }

    The @{...} syntax is the new bit I'm proposing, of course. I think without something like this there's a danger that anonymous types could really hurt long-term maintainability of code due to the inability to refactor.

    Also is my foreach line actually legal?
  • I would also prefer anonymous methods to be first-class. However, doing it right requires changing the CLR type system, not just the C# type system.  (The versioning issues are considerable.)

    Given that we're not going to change the CLR type system, I am hoping that there are things we can do to make this work.  Suppose, for example, that refactoring to extract a method upon code that references an anonymous type caused a new nominal type to be emitted into your source code.  Would that make you feel better?

    I'm not _saying_ that we're going to do that, of course.  But it's definitely an idea that's been kicking around here for a while. :-)

    (And sure, that foreach looks good to me.)










  • I agree that doing anonymous types right requires changing the CLR type system (heh, which is why I keep asking which version that's going to happen in ;) ).

    The problem of course is cross-assembly calls.

    I'm wondering if a sufficiently close approximation could be done by naming conventions along with a little help from the compiler.

    Suppose that @{string Name, int Age} compiled under the hood into a class called something like __anon__mscorlib__System_Int32__Age__mscorlib__System_String__Name (or with some otherwise illegal characters in there to make sure it was impossible to clash with any real name, but I don't know what the CLR allows here. Also note I sorted the names into alphabetical order because @{int Age, string Name} should mean the same). Now make these types public as far as the CLR is concerned (they have no code that could be used as an attack surface so this is safe). And make them inherit from, I dunno, System.AnonType, and have the compiler disallow inheriting from that manually (much like ValueType).

    So suppose that assembly1 contains a method foo(@{string Name, int Age} val) and you want to call this from assembly2.

    The problem is that assembly1's version of __anon__mscorlib__System...blahblahblah is different from assembly2's, so you're passing the wrong thing. BUT the compiler knows this at compile time, and also knows that the types in question are anonymous because of the AnonType inheritance. So when it encounters an attempt to call a method with the wrong parameter types, but the only thing wrong is that they're both anonymous types with the exact same name from two different assemblies, instead of emitting a compile error, it emits a call to a method instead (perhaps inside AnonType), and that method is declared as "T2 Convert<T1, T2>(T1 val) where T1 : AnonType where T2 : AnonType". This doesn't of course only apply in the context of method calls, the same thing could be done as what amounts to an implicit cast operator in both directions, to cover all cases.

    That method could be implemented using reflection - and without even any privileged code, since the properties in question are public.

    And furthermore with a suitable enhancement to the CLR's type system in the future I'm *fairly* sure that all this under-the-hood crap could be done away with without any programmer-visible backward-incompatible changes.

    Am I missing anything fundamental?
  • There are length restrictions on how long a type name can be, so we'd have to use some kind of crypto strong hash to shorten that down.

    But modulo that, yes, that's the kind of crazy magic that we could do.

    An alternate approach would be to have some kind of standard "loosely typed property bag" type for import and export that could be converted to/from the appropriate strongly typed tuple.

    There are any number of ways to skin this particular problem.  However we want this release to have as many new features as necessary to make query comprehensions work, but no more.  We the C# team also do not want to take dependencies on changes in the CLR if we can possibly avoid it.
  • What sort of changes would be required for making anonymous types first class? What's the difference between a local variable and one that can be returned from or passed to a function?
  • Adding new stuff to the CLR type system has a major impact on all languages. For example, all languages are now required to be able to talk to generic types if they want to be CLI-compliant languages.  That's a major burden on language implementors and we do not take imposing it lightly.  

    The difference between a local and something returned, in this case, is that a local never escapes into any context in which its type can be part of a publically visible contract.  If a method could be 'var' then we'd either have to say "only private/internal methods can be var", which is gross, or come up with some standardized, versionable, secure, safe way to represent public methods that return anonymous types.

    By keeping anonymous types restricted to being used only inside contexts in which they cannot "leak out" we don't have to solve any of those hard problems.  They can be solved in future versions of the CLR.  We've got to ship this thing! If we wait for the type system to be perfect, we'll wait forever.
Page 1 of 2 (26 items) 12