Why no var on fields?

Why no var on fields?

Rate This
  • Comments 35

In my recent request for things that make you go hmmm, a reader notes that you cannot use "var" on fields. Boy, would I ever like that. I write this code all the time:

private static readonly Dictionary<TokenKind, string> niceNames =
  new Dictionary<TokenKind, string>()
  {
    {TokenKind.Integer, "int"}, ...

Yuck. It would be much nicer to be able to write

private static readonly var niceNames =
  new Dictionary<TokenKind, string>()...

You'd think this would be straightforward; we could just take the code that we use to determine the type of a local variable declaration and use it on a field. Unfortunately, it is not nearly that easy. Doing so would actually require a deep re-architecture of the compiler.

Let me give you a quick oversimplification of how the C# compiler works. First we run through every source file and do a "top level only" parse. That is, we identify every namespace, class, struct, enum, interface, and delegate type declaration at all levels of nesting. We parse all field declarations, method declarations, and so on. In fact, we parse everything except method bodies; those, we skip and come back to them later.

Once we've done that first pass we have enough information to do a full static analysis to determine the type of everything that is not in a method body. We make sure that inheritance hierarchies are acyclic and whatnot. Only once everything is known to be in a consistent, valid state do we then attempt to parse and analyze method bodies. We can then do so with confidence because we know that the type of everything the method might access is well known.

There's a subtlety there. The field declarations have two parts: the type declaration and the initializer. The type declaration that associates a type with the name of the field is analyzed during the initial top-level analysis so that we know the type of every field before method bodies are analyzed. But the initialization is actually treated as part of the constructor; we pretend that the initializations are lines that come before the first line of the appropriate constructor.

So immediately we have one problem; if we have "var" fields then the type of the field cannot be determined until the expression is analyzed, and that happens after we already need to know the type of the field.

But it gets worse. What if the field initializer in a "var" field refers to another (static) "var" field? What if there are long chains, or even cycles in those references? There can be arbitrary expressions in those initializers, expressions which contain lambdas which contain expressions which require method type inference or overload resolution. All of these algorithms that are in the compiler were written with the assumption that when they run, the types of every top-level program entity is already known. All of those algorithms would have to be rewritten and tested in a world where top-level type information is being determined from them rather than being consumed by them.

It gets worse still. If you have "var" fields then the initializer could be of anonymous type. Suppose the field is public. There is not yet any standard in the CLR or the CLS about what the right way to expose a field of anonymous type is. We don't have good policies for documenting them, versioning them, or interoperating with them across languages. Doing this feature would potentially cause huge costs across the division.

Inferred locals have none of these problems; inferred locals never have cycles or refer to things that haven't been analyzed yet. Inferred locals never escape into public visibility.

So apparently this simple-seeming feature has the potential to cause really, really bad implementation issues in multiple ways, and all in order to avoid a small redundancy. This seems like it is possibly not worth the cost. If our goal is to remove the redundancy, I would therefore prefer to remove it the other way. Make this legal:

private static readonly Dictionary<TokenKind, string> niceNames =
  new()...

That is, state the type unambiguously in the declaration and then have the "new" operator be smart about figuring out what type it is constructing based on what type it is being assigned to. This would be much the same as how the lambda operator is smart about figuring out what its body means based on what it is being assigned to.

Thoughts?

  • I would say the ‘new’ operator is obviously a feature of the kind ‘nice to have but not really important’. In contrast to ‘var’ or the type inference for lambdas which can greatly improve the readability of a method, the ‘new’ operator would save you at most one second of parsing – I mean, a declaration of a field is generally so obvious that you don’t have to read it twice or spend several minutes to understand it. Therefore the gaigned profit is probably not worth the effort to implement/test/... it.

    But there is something in your post which confused me a little bit: “What if there are long chains, or even cycles in those references?” First I wanted to respond: Hey it’s not possible to have cycles in a definition of a field because you can’t refer to other fields inside the definition, but I just found out that is not true for static fields. An example:

    public static class Foo1

    {

    public static List<int> Bar = new List<int>()

    {

    Foo2.Bar.Count,

    };

    }

    public static class Foo2

    {

    public static List<int> Bar = new List<int>()

    {

    Foo1.Bar.Count,

    };

    }

    The code compiles and throws a null-reference as expected because either Foo1.Bar is not constructed when Foo2.Bar is accessed or vice versa. Which brings me to my question: Why is it possible to reference to other static fields inside the definition of a static field? Since no order of compilation is guaranteed at all, I can hardly think of any possible use of this.

  • Order of compilation is irrelevant. What is relevant is the order in which the static field initializers run, and that is well-defined. See section 10.12 of the specification for details. (It is arguably a bad programming practice to rely upon these details, but it is legal.)

  • Why not limit "var" fields to some well-defined number of constructs, like constants and object creation expressions. This would probably cover 80% of cases, allow for future expansion of the feature and doesn't look too binding for future. As for performance of compiler, parsing should already be done at this point and you can easily detect if var is valid from AST. As for resolving the type for top-level structure: for constants you know it, and for object creation expression it is the same as resolving type specification to the left of the field's name.

  • I know I am nitpicking smartass (I'm really sorry about that :( ) but in 10.12 the specification only makes a statement about classes with static constructors. In combination with 10.5.5.1: "If a static constructor (§10.12) exists in the class, execution of the static field initializers occurs immediately prior to executing that static constructor. Otherwise, the static field initializers are executed at an implementation-dependent time prior to the first use of a static field of that class." Therefore I would say it's not really defined which of the two Foo's throw an exception regardless of the access pattern of Foo1.Bar and Foo2.Bar.

    But I got your point: the behaviour is more or less defined and could be of use in some cases.

    Thanks for the clarification.

  • The "<Type> Id = new()" doesn't feel very good -- not that that's complete reasoning by itself. It's a wierd "value" that could never stand alone, so it doesn't seem proper to have on the rhs. Of course, it also only solves the particular issue when constructing a type directly. It still wouldn't work in the cases where you want to use a method to construct the value.

    Overall, this is only a small part of C#'s verbosity (although, having 300+ character fields is still quite insane). In the examples above, why do we even need to specify the types at all? Or when defining methods, why must we manually calculate each generic type parameter and all the constraints necesary? Or for that matter, even specify the types of the parameters? Even being able to partially specify type parameters at a callsite would be a good start. (Like being able to say Foo<Bar,?> and let the compiler figure out ?.) (And in general, yes, I know, overloading is a PITA, for starters.)

    I'm not sure there is a solution that keeps C# style and backcompat, and doesn't mean completely re-implementing things. So, as to the " = new()" idea, I don't see nearly enough benefit for that feature alone.

  • Not a big fan of the new() idea - as MichaelGG said, it doesn't "feel" right.

    C# started off as an extremely clean language, but since C# 3.0 it's feels as though a large number of kludges were added solely for LINQ.

    I recently did a demonstration of C# 3.0 for our development team and most of them said "Ughhh" to the language extensions before I showed them LINQ.

    The real draw of C# was that it was straightforward and clean. The C# 3.0 extensions feel forced and as though the language is heading down the wrong path - loading on unnecessary solutions for fringe cases. Overall the language will suffer.

    Languages don't need to evolve with every product release, it really feels like at this point the C# language team is trying to justify it's existance and not really improving the language (no offense!). C# 2.0 was as close to a "perfect" strongly typed language as you could get, and 3.0 really destroyed that. Lets not go further down that path - I'd rather see C# stay the way it is (now a somewhat mature language) and focus put into the compiler, BCL, and CLR.

    Sorry!

  • @Eric,

    The only reason C# 3.0 feels "dirty" is precisely because all these things were added only for specific cases, namely LINQ's cases. Nothing feels like it was designed for the language as a whole. OTOH, it tries to be a C-ish syntax language, and by the time you finish cleaning it up and simplifying the syntax... not sure you'd end up with anything C-like.

    C# 3 was a major step forward, but it was only a start, and I was so hoping that C# 4 would follow through with the apparent path set out. But as Eric Lippert said before, too many users thought this was too hard, too much, too complex. With the recent announcement of C# and VB going to be "equal, just different looking", it's clear what the future path for MS .NET languages is.

  • I don't like the idea of the "{type} {name} = new(...)" syntax much, mainly because of the inconsistency with the way that var works. It would feel very weird to be able to specify the type only on the RHS within method body as we do now, and then be able to specify it only on the LHS when it's a field.

    And then what happens in the future if you do re-architect the compiler so that declaring fields using var would be possible, and if the CLR/CLS does come up with a specification for anonymous types to be exposed and shared between languages? Then you're left with an inconsistent syntactic wart which arose from technical issues rather than being designed into the language as the best way to do things.

    Sure, it's redundant type information, but (a) there's a fair bit of that in C# anyway, and (b) it's not really that painful, especially if you have something like ReSharper which will fill in the RHS for you anyway by simply hitting TAB. I'd say either do it the way that would be ideal (var) or don't do it at all (until technically possible).

  • Visual basic has a special syntax to avoid repetition of the type in the most common case:

    dim x as Object = new Object()

    becomes:

    dim x as new Object()

    Of course that syntax only works in VB because it fit naturally with how things were already declared. C# goes 'type name', which would naively mean something like "new List<Widget>(128) widgets". I would suggest the following syntax:

    var x: new Object()

  • I don't think that an alternate new syntax should be introduced.  Firstly, new syntax is extra mental weight, so it better be worth it.  Secondly, I don't think C# should encourage the use of constructors at all - constructors already have weird semantics as is.  Many constructors have various implicit initializations - i.e. call one overload and you get a class loaded from DB, another and you get an "unintialized" object, another and you get an inline-initialized object, etc.  These initializations are bad since they're unnamed; that is; a reader of code (and by extension the intellisense-using writer) cannot easily determine which overload to use.

    Constructors are one of the few methods where people find it acceptable for the "same" method to have vastly different semantics amongst overloads.  That's not a good habit.

    Adding such a syntax would seduce programmers into adding "handy" constructors and make a bad situation worse; even more functionality would be put into constructors.

    Constructors are already one of the most amorphous aspects of the language; they come across as a bag of various features only loosely coupled to a vague intent.  Why are only constructors the only static methods able to be required for a generic type paramenter?  Why are only constructors able to guarantee non-null return value?  Why are constructors not able to return null?  Why are only constructors unable to return a subclass of their normal return value (leading to overcomplicated factory methods)?  Why are only constructors able to require that a subclass call them?  Why is the collection and object initializer syntax only available for constructors?

    I'd much prefer the language evolve toward dissociating these many features and making them generally useful than to convolute the constructor even further.

    So, in the name of avoiding unnecessary syntax baggage, and in the name of making the language "general", I'd vote for not implementing such syntax.

  • Array initializers state the type only once:

    int[] values = { 0, 1, 2 };

    Perhaps the syntax could be extended to collection initializers:

    List<int> values = { 0, 1, 2 };

    Dictionary<TokenKind, string> niceNames = { };

    And object initializers:

    Point point = { X = 0, Y = 1 };

  • Eric,

    type name = new() does not only look very un-C#y, it also does not cover the case where type inference would actually save a lot more of typing (and reading!):

    var name = Foo(...);

    where Foo has a fixed return type or does some nice type inferencing itself. I did run into some situations where this would have helped a lot. (Also, calling constructors for anything but primitive objects is a bit old school anyway. Inferencing from arbitrary expressions would obviously not help IoC users, but it could be used by fields that get initialized by calls to factories or service repositories. a special 'new' syntax gives us nothing of this, but encourages users to keep using constructors directly, with all the downstream problems of testability and extensibility.)

    Do we care when the compiler tells us to be explicit whenever it runs into complex or even cyclic dependencies? Obviously, C# programmers are used to that already:

    error CS0411: The type arguments for XXXX cannot be inferred from the usage. Try specifying the type arguments explicitly.

    What's so different about a field that needs its type to be specified explicitly?

    Referencing fields in field initializers is not good practice anyway, one could argue that it _should_ be punished. Personally, I would disallow it for public fields too, because I believe a change in an expression should not change the classes public interface. This could have too many trickle-down effects. Same for public fields of anonymous types. (What would be next? Guessing a method's return type? I'm not saying that this would be a bad thing, other languages do that. But that wouldn't be C# anymore. I'd rather have return type inferencing the other way - guessing a return statement's type from the declared return type.)

    Would it be hard to change the compiler? Probably. But then again, you never accepted the argument that a proposed language feature would be relatively to implement easy either! ;-)

    PS: This blog engine ate my post - two times now. (and only the second time was I smart enough to keep a copy.) This is ridiculous. I've had that before. I really believe you should take this to the admins. (We later engaged in a short email conversation, in case you want to give them the text to look for a reason, like some special character squences. However, this time around the posting got confirmed, while last time I was just taken to the home page.) Few things are more frustrating than typing lots of stuff, seeing it confirmed (that's when I stopped caring about the copy I had in the clipboard), but never see it appear.

  • PPS: I posted using Firefox 3 this time (previous attempts using IE7 did not work). Don't know if that made a difference but the posting appeared. Well, maybe that's just Microsoft trying to make the EU happy ;-)

  • I agree with Ilya, except that I would expand the proposal futher: you can use var fields, but your initialization expression cannot access other var fields. This would probably cover 95% of the useful cases.

    But I do appreciate that a change like this messes up the architecture of the compiler.

    Igor Ostrovsky

  • First off, I also had trouble posting with IE7 - this post was created with Firefox.

    As for the idea to introduce a new syntax for new, I think that it is a bad idea for several reasons. First, unlike var, it could only be used to initialize members that are concrete types - since there would be no way to infer the type to construct when the left hand side is a interface or abstract class. The compiler can certainly warn you .. but it's an awkward inconsistency that doesn't buy you much.

    Second, and more important, this syntax may actually allow the semantics of a program to change subtly without the developer being aware. Take the following example:

    class Animal { override string ToString() { return "Animal"; } }

    class Dog { override string ToString() { return "Dog"; } }

    class Vet { public readonly Dog ThePatient = new(); }

    No some brilliant developer comes along, and without thinking too deeply about it says: hey, we should expose ThePatient as a reference to the base type Animal. Well, as a result, the compiler infers that the type to create should now be an instance of Animal, rather than Dog. The developer may not have intended this ... they just didn't realize that this inference is taking place. (Yes, developers should pay attention to what they're doing and understand the language, but it's an easy thing to overlook). The compiler won't complain ... it will happily change the runtime type instantiated - potentially leading to subtle and difficult to track down bugs.

    The var keyword doesn't have this issue because the compiler isn't deciding what type to instantiate - just what type of reference to assign to. In other words, var never results in a different method than you expect getting invoked.

    I think that allowing the compiler to make inferences about what runtime types to instantiate is a bad idea - this is a case where C# should favor correctness rather than convenience. IMHO.

Page 1 of 3 (35 items) 123