C# 3.0 is still statically typed, honest!

C# 3.0 is still statically typed, honest!

  • Comments 24

Since LINQ was announced I've seen a lot of really positive feedback and a lot of questions and concerns. Keep 'em coming! We need this feedback so that we can both correct misconceptions early and get the design right now.

One of the misconceptions that I've seen a lot over the last few days in forums, blog posts and private emails is a confusion about what the new "type inferencing" feature implies for the type safety of the language. Apparently we have not been sufficiently clear on this point: C# 3.0 will be statically typed, just like C# 1.0 and 2.0. The var declaration style does not introduce dynamic typing or duck typing to C#.

I think the confusion may arise from familiarity with other languages such as JScript. In JScript this is perfectly legal:

var foo = new Blah();
foo = 123;
foo = "hello";

JScript is a dynamically typed language. You can assign any value of any type to a var.

In C# 3.0, the var statement means "look at the type of the thing assigned to the variable, and act as though the variable was declared with that type." In other words, in C# the code above is just a syntactic sugar for

Blah foo = new Blah();
foo = 123;
foo = "hello";

which of course would produce a type error on the second and third lines.

If you take a look at section 26.1 of the C# 3.0 specification you'll see that the var statement has a lot of restrictions on it to ensure that the compiler always has enough information to make the correct type inference. Namely:

  • the declarator must include an initializer, so that we can infer the type of the variable from the type of the initializer
  • the initializer has to be something that we can figure out the type of – not null or a collection initializer

Compare this to JScript .NET, which has a much stronger type inference mechanism. JScript .NET does not require initializers in var statements; the compiler tracks all assignments to the variable and infers the best type. If, say, only strings are assigned to a variable then it will infer the string type. JScript .NET also infers return types of functions by a similar mechanism. But the goal of the JScript .NET type inference mechanism was to increase the performance of legacy dynamically typed code. If we can infer a type and thereby generate faster, smaller code, we do so.  If not, we don't.

Then why introduce this syntactic sugar in C# 3.0? C# doesn't have a body of legacy dynamic code like JScript and already generates efficient code.

There are two reasons, one which exists today, one which will crop up in 3.0.

The first reason is that this code is incredibly ugly because of all the redundancy:

Dictionary<string, List<int>> mylists = new Dictionary<string, List<int>>();

And that's a simple example – I've written worse. Any time you're forced to type exactly the same thing twice, that's a redundancy that we can remove. Much nicer to write

var mylists = new Dictionary<string, List<int>>();

and let the compiler figure out what the type is based on the assignment.

Second, C# 3.0 introduces anonymous types. Since anonymous types by definition have no names, you need to be able to infer the type of the variable from the initializing expression if its type is anonymous.

We'll discuss the reasoning behind anonymous types in another post.

  • This is actually the really cool part about C# 3.0 anonymous types and type inference - because it's still statically typed, you get all the compiler and IDE support you'd expect from explicit typing, and don't you don't take the performance hit that you have to put up with for using a dynamic language.

    It's purely syntactic sugar, and will save me creating an awful lot of useless classes, just to get sets of related data from the backend.

    I'm guessing that the behind-the-scenes monkeying is all built on top of generics?

    Out of curiosity, I assume that you can't cast an anonymous type without un/boxing it?

    I'll stop with the unmitigated praise now, but I really am happy about the new features in the spec.
  • Any way we can get 'var' replaced with 'dim' and the '=' replaced with 'as'? :)
  • That's hilarious! I will share your suggestion with the language design committee, but I don't think they'll go for it.
  • I'm getting an internal compiler error when I do

    static void Main(string[] args)
    {
    var foo = "";
    foreach (var bar in foo) Console.Out.WriteLine (bar);
    }
  • The more minimal

    foreach (var x in "") ;

    shows this as well.
  • Here's a different way to get an ICE with inferencing

    static void Main(string[] args)
    {
    var x = new[]{};
    }
  • And the program

    static void Main(string[] args)
    {
    var x = (object[]) new[]{null};
    }

    compiles but gives the extraordinarily mysterious "bad image format exception".
  • Any good reason a function can't be 'var' typed and have its return type inferred from its return statement?

    I would like to do something like this:

    var divmod = Div(x, y);

    var Div(int x, int y)
    {
    return new { Quotient = x / y, Remainder = x % y };
    }

    In other words, this would make it really easy to return multiple values from a function without having to declare a type ahead of time or use cumbersome out parameters.
  • Gabe: Separate compilation is an obvious limit to the amount of type inferencing that can occur (unless you always plan to compile your entire program at one go, which doesn't scale well).
  • Why can I do

    int[] x = {1, 2, 3};

    but I have to do

    var x = new[] {1, 2, 3};

    when I want to do

    var x = {1, 2, 3};

    ?

    The new[] seems like a bit of syntactic cruft in this case since it just adds type information that is already inferrable.
  • Is there any reason you've chosen the 'var' keyword unlike the C++ standards people who are doing much the same thing with the 'auto' keyword?

    Is it because you were first, or you were doing it simultaneously or is it just to be different?

    I only ask because it can be a pain if you have to use both languages where there is a different keyword for the same thing.

    On the flipside I suppose javascript people would ask the same question if you'd used "auto", but from my point of view it represents a different concept so should maybe have a different name. Also I suppose "auto" isn't a very good name, but the C++ people went with it to avoid adding a new keyword since they already had one that no one uses.
  • ++ to Stewart's comment.

    While I'm not jumping for joy over the term 'auto' it's a better term than 'var' (IMO).

    Raimond Brookman mentioned the term 'infer' over at http://blogs.msdn.com/danielfe/archive/2005/09/22/472884.aspx . I'd like this over 'auto'.
  • 1. If your code initializes to a return value of a method you can no longer tell what type it is:

    var item = MyCollection[key];

    what type is item? Is it object (MyCollection defined as IDictionary) or a string (MyCollection is StringCollection) or any other type (including primitives) if MyCollection is a generic collection? Is it what it's supposed to be?

    Even worse:

    var item = MyService.LookupItem(param);

    Without knowing the MyService class the reader of the code has simply no way of knowing the type of item.

    2. Your code will strongly type to the actual type even though a supertype would be more appropriate.

    var collection = new SortedDictionary();

    Would type the collection to OrderedDictionary even though IEnumerable is what you wanted.
  • It's possible to write hard-to-read code in any language. In your first example, I would say "if it's hard to figure out what the type is by reading the initializer, and it is important that the reader know the type, then call out the type."

    In your second example, I would say that if you want the variable to be typed as a less-derived type, then nothing is stopping you from typing it however you want.

    Remember, inference is a _convenience_ feature. You don't have to use it if you don't want to or if you feel that it makes your code less clear, or if it doesn't have the semantics you want.

    The argument that some people will misuse it and therefore its bad doesn't hold much water with me. C# is an "enough rope" programming language -- there are many, many idioms in C# that can be abused, and we trust that our developers are professional enough not to do so.
  • Just because there are things that can be abused does not mean we have a green light to add more :) Personally I don't share the view that a programming language should be designed to require as little typing as possible. It should be designed to be unambiguous so there's as little guessing involved as possible.
Page 1 of 2 (24 items) 12