Uses and misuses of implicit typing

Uses and misuses of implicit typing

Rate This
  • Comments 54

One of the most controversial features we've ever added was implicitly typed local variables, aka "var". Even now, years later, I still see articles debating the pros and cons of the feature. I'm often asked what my opinion is, so here you go.

Let's first establish what the purpose of code is in the first place. For this article, the purpose of code is to create value by solving a business problem.

Now, sure, that's not the purpose of all code. The purpose of the assembler I wrote for my CS 242 assignment all those years ago was not to solve any business problem; rather, its purpose was to teach me how assemblers work; it was pedagogic code. The purpose of the code I write to solve Project Euler problems is not to solve any business problem; it's for my own enjoyment. I'm sure there are people who write code just for the aesthetic experience, as an art form. There are lots of reasons to write code, but for the sake of this article I'm going to make the reasonable assumption that people who want to know whether they should use "var" or not are asking in their capacity as professional programmers working on complex business problems on large teams.

Note that by "business" problems I don't necessarily mean accounting problems; if analyzing the human genome in exchange for National Science Foundation grant money is your business, then writing software to recognize strings in a sequence is solving a business problem. If making a fun video game, giving it away for free and selling ads around it is your business, then making the aliens blow up convincingly is solving a business problem. And so on. I'm not putting any limits here on what sort of software solves business problems, or what the business model is.

Second, let's establish what decision we're talking about here. The decision we are talking about is whether it is better to write a local variable declaration as:

TheType theVariable = theInitializer;

or

var theVariable = theInitializer;

where "TheType" is the compile-time type of theInitializer. That is, I am interested in the question of whether to use "var" in scenarios where doing so does not introduce a semantic change. I am explicitly not interested in the question of whether

IFoo myFoo = new FooStruct();

is better or worse than

var myFoo = new FooStruct();

because those two statements do different things, so it is not a fair comparison. Similarly I am not interested in discussing bizarre and unlikely corner cases like "what if there is a struct named var in scope?" and so on.

In this same vein, I'm interested in discussing the pros and cons when there is a choice. If you have already decided to use anonymous types then the choice of whether to use implicit typing has already been made:

var query = from c in customers select new {c.Name, c.Age};

The question of whether it is better to use nominal or anonymous types is a separate discussion; if you've decided that anonymous types are worthwhile then you are almost certainly going to be using "var" because there is no good alternative.

Given that the overarching purpose is assumed to be solving business problems, what makes good code? Obviously that's a huge topic but three relevant factors come to mind. Good code:

  • works correctly according to its specification to actually solve the stated problem
  • communicates its meaning to the reader who needs to understand its operation
  • allows for relatively low-cost modification to solve new problems as the business environment changes.

In evaluating whether or not to use "var" we can dismiss the first concern; I'm only interested in pros and cons of cases where using var does not change the meaning of a program, only its textual representation. If we change the text without changing its semantics then by definition we have not changed its correctness. Similarly, using or not using "var" does not change other observable characteristics of the program, such as its performance. The question of whether or not to use "var" hinges upon its effect on the human readers and maintainers of the code, not on its effect upon the compiled artefact.

What then is the effect of this abstraction on the reader of the code?

All code is of course an abstraction; that's the whole reason why we have high-level languages rather than getting out our voltmeters and programming at the circuit level. Code abstractions necessarily emphasize some aspects of the solution while "abstracting away" other aspects. A good abstraction hides what is irrelevant and makes salient what is important. You might know that on x86 chips C# code will typically put the value returned by a method in EAX and typically put the "this" reference in ECX, but you don't need to know any of that to write C# programs; that fact has been abstracted away completely.

It is clearly not the case that more information in the code is always better. Consider that query I referred to earlier. What is easier to understand, that query, or to choose to use a nominal type and no query comprehension:

IEnumerable<NameAndAge> query = Enumerable.Select<Customers, NameAndAge>(customers, NameAndAgeExtractor);

along with the implementations of the NameAndAge class, and the NameAndAgeExtractor method? Clearly the query syntax is much more abstract and hides a lot of irrelevant or redundant information, while emphasizing what we wish to be the salient details: that we are creating a query which selects the name and age of a table of customers. The query emphasizes the business purpose of the code; the expansion of the query emphasizes the mechanisms used to implement that purpose.

The question then of whether "var" makes code better or worse for the reader comes down to two linked questions:

1) is ensuring salience of the variable's type important to the understanding of the code? and,
2) if yes, is stating the type in the declaration necessary to ensure salience?

Let's consider the first question first. Under what circumstances is it necessary for a variable's type to be clearly understood when reading the code? Only when the mechanism of the code -- the "how it works" -- is more important to the reader than the semantics -- the "what its for".

In a high-level language used to solve business problems, I like the mechanisms to be abstracted away and the salient features of the code to be the business domain logic. That's not always the case of course; sometimes you really do care that this thing is a uint, it has got to be a uint, we are taking advantage of the fact that it is a uint, and if we turned it into a ulong or a short, or whatever, then the mechanism would break.

For example, suppose you did something like this:

var distributionLists = MyEmailStore.Contacts(ContactKind.DistributionList);

Suppose the elided type is DataTable. Is it important to the reader to know that this is a DataTable? That's the key question. Maybe it is. Maybe the correctness and understandability of the rest of the method depends completely on the reader understanding that distributionLists is a DataTable, and not a List<Contact> or an IQueryable<Contact> or something else.

But hopefully it is not. Hopefully the rest of the method is perfectly understandable with only the semantic understanding, that distributionLists represents a collection of distribution lists fetched from a storage containing email contacts.

Now, to the crowd who says that of course it is better to always know the type, because knowing the type is important for the reader, I would ask a pointed question. Consider this code:

decimal rate = 0.0525m;
decimal principal = 200000.00m;
decimal annualFees = 100.00m;
decimal closingCosts = 1000.00m;
decimal firstPayment = principal * (rate / 12) + annualFees / 12 + closingCosts;

Let's suppose that you believe that it is important for all the types to be stated so that the code is more understandable. Why then is it not important for the types of all those subexpressions to be stated? There are at least four subexpressions in that last statement where the types are not stated. If it is important for the reader to know that 'rate' is of type decimal, then why is it not also important for them to know that (rate / 12) is of type decimal, and not, say, int or double?

The simple fact is that the compiler does huge amounts of type analysis on your behalf already, types which never appear in the source code, because for the most part those types would be distracting noise rather than helpful information. Sometimes the declared type of a variable is distracting noise too.

Now consider the second question. Suppose for the sake of argument it is necessary for the reader to understand the storage type. Is it necessary to state it? Often it is not:

var prices = new Dictionary<string, List<decimal>>();

It might be necessary for the reader to understand that prices is a dictionary mapping strings to lists of decimals, but that does not mean that you have to say

Dictionary<string, List<decimal>> prices = new Dictionary<string, List<decimal>>();

Clearly use of "var" does not preclude that understanding.

So far I've been talking about reading code. What about maintaining code? Again, var can sometimes hurt maintainability and sometimes help it. I have many times written code something like:

var attributes = ParseAttributeList();
foreach(var attribute in attributes)
{
    if (attribute.ShortName == "Obsolete") ...

Now suppose I, maintaining this code, change ParseAttributeList to return a ReadOnlyCollection<AttributeSyntax> instead of List<AttributeSyntax>. With "var" I don't have to change anything else; all the code that used to work still works. Using implicitly typed variables helps make refactorings that do not change semantics succeed with minimal edits. (And if refactoring changes semantics, then you'll have to edit the code's consumers regardless of whether you used var or not.)

Sometimes critics of implicitly typed locals come up with elaborate scenarios in which it allegedly becomes difficult to understand and maintain code:

var square = new Shape();
var round = new Hole();
... hundreds of lines later ...
bool b = CanIPutThisPegInThisHole(square, round); // Works!

Which then later gets "refactored" to:

var square = new BandLeader("Lawrence Welk");
var round = new Ammunition();
... hundreds of lines later ...
bool b = CanIPutThisPegInThisHole(square, round); // Fails!

In practice these sorts of contrived situations do not arise often, and the problem is actually more due to the bad naming conventions and unlikely mixtures of business domains than the lack of explicit typing.

Summing up, my advice is:

  • Use var when you have to; when you are using anonymous types.
  • Use var when the type of the declaration is obvious from the initializer, especially if it is an object creation. This eliminates redundancy.
  • Consider using var if the code emphasizes the semantic "business purpose" of the variable and downplays the "mechanical" details of its storage.
  • Use explicit types if doing so is necessary for the code to be correctly understood and maintained.
  • Use descriptive variable names regardless of whether you use "var". Variable names should represent the semantics of the variable, not details of its storage; "decimalRate" is bad; "interestRate" is good.
  • @Ben, I feel like some context is being lost in this thread. To summarize: Olivier said "consider that it is possible to use a cast as a type annotation" and provided code comparing type annotation as a cast and type annotation as an explicit type on a local. In response, I said "I would caution against using a cast in that way". Then you said "the compiler will error out on a cast that is determined at compile-time to be impossible". My response (summarized) was that your statement, while true, is not relevant to the reasons that it is not advisable to use a cast solely as a way to make a type explicit in code. Based on your latest comment, you apparently agree that it is not advisable to use a cast as a way of making a type explicit in code, you also seem to agree that making the type explicit in code is of limited value and so, like me, you use var most of the time.

    Since we're in agreement, I don't have much to add except that perhaps you interpreted my comment "better to be explicit with the type of the local variable" separately from the context. I can see that it would have been much clearer if I had said "if you think stating the type explicitly in code is important and if you are considering using a cast solely for that purpose, don't do that because it would be better to be explicit with the type of the local variable".

  • @Alex, I admit I didn't read Olivier's comment closely. Now that I look again, I see why you took the tack that you did, and I absolutely agree that using var just to make declarations line up is a bad practise. Apologies for the misunderstanding. :-)

  • I was surprised by Stuart's comment. It even seemed to me that he must have done something wrong (sorry mate), but the quick test showed that yes, indeed my compiler behaves in the same way:

       interface IThingy

       {

           bool Orly();

       }

       class Thingy : IThingy

       {

           public bool Orly()

           {

               return true;

           }

       }

       class TestClass

       {

           public IThingy CreateThingyFromSomething(dynamic whatever)

           {

               return new Thingy();

           }

       }

       class Program

       {

           static void Main(string[] args)

           {

               dynamic val = 42;

               var myThingy = new TestClass().CreateThingyFromSomething(val);

               myThingy.Orly();

               myThingy.WhyAmIDynamic(); //why indeed? this gives an error at runtime, not at compile time

           }

       }

    Now I would like to apologize to Eric for my laziness - maybe I didn't read the C# specification thoroughly enough. What is the reason of such behaviour? Or is it really just a bug (with a simple workaround, I admit)?

  • I think the behavior of dynamic is completely self-consistent, even if it's a little unexpected. "dynamic" means "all operations with this value will be resolved at runtime". Not "We'll make some attempt sometimes to resolve some operations at compiletime but if we can't do that it'll be done at runtime".

    So I think the current behavior is technically correct, but it'd be nice if there were some way to override it.

  • @Stuart-"I sort of want a way to declare that I'm using a 'dynamic' undynamically"

    C# already has a way to treat a 'dynamic' undynamically or to "override" the behavior: cast to object.

    var myThingy = new TestClass().CreateThingyFromSomething((object)val);

    I know it feels unnatural to cast a dynamic variable to object on a function that takes a dynamic parameter so that the method returns its actual type, but I don't think I would want C# to introduce another syntax just to handle this situation.

  • @Eggbound: "I've worked on very large codebases where var is the norm, and can't remember ever having to wonder what the type of a particular variable is, because variables have descriptive names."

    That is what Hungarian notation is, you move your type information into your variable names and we all know where that went. Unless somehow you are able to deduct that "customers", which is a pretty descriptive name, has to actually be ICollection, or even IList instead of just IEnumerable. Or that eventCount must be long and not just an int because it's counting events that have the potential to overflow int sized counter.

  • @Stuart - oh, yes, now I see where I was wrong.

    Dynamic allows double dispatch, so if the reference to Thingy would be obtained using anything more complex than a "new" operator, the resulting type would be unknown.

    Unknown means either object or dynamic, and since we are already dealing with dynamics, the latter option is more reasonable.

    So everything is correct except my intuition :)

  • iMil42: What? The static type of myThingy can be infered from CreateThingyFromSomething method. Your intuition is correct. The implicit type of myThingy sholud be IThingy.

  • A real life case where 'var' went wrong for me:

    void Foo()

    {

     var data = GetState();

     ThreadPool.QueueUserWorkItem(_Runner, state)

    }

    void _Runner(object state)

    {

      var data = (DataA)state;

      ... use data ...

    }

    Then GetState changed to returning a type DataB.  All fine at compile-time...  Big bang when this (infrequently called) code path was used -- yes I caught in developer-test but might not have...

    So my one rule: Do not use var when passing the variable to a 'untyped' (i.e. type object) parameter...

  • Any chance for "var everywhere" in C# 5.0?  Fields would be nice...

    class Stuff

    {

    private Dictionary<string, List<decimal>> _prices = new Dictionary<string, List<decimal>>();

    }

    with less code noise, less distracting, to simply:

    class Stuff

    {

    private var _prices = new Dictionary<string, List<decimal>>();

    }

    Thanks.

  • @Mark:

    Eric has previous covered that, see: blogs.msdn.com/.../why-no-var-on-fields.aspx

  • @Alan:

    Appreciate these issues, we all tell others that "it's not so easy and here's why"...but...

    "Doing so would actually require a deep re-architecture of the compiler."

    There was a recent video on Channel 9 with Eric and Erik and "compiler as a service"...low level rewrite...any hope?

    -M

  • Alan,

    How is 'var' the cause for the bug in your case? Let's say you wrote it as

    DataB data = GetState();

    ThreadPool.QueueUserWorkItem(_Runner, state)

    Your _Runner method will still fail as there is no type check performed by the compiler. Using or not using 'var' in this scenario won't make any difference.

    Yogi

  • I don't know if there's any relation, but var makes me think of duck typing.

    I'm more on the pro-var side, personally - I don't care what type GetThingy() returns as long as the Thing it returns will Frob() as requested.

  • Another disadvantage with using implicit typing: "Find usages"/"Find references" on a type will not find occurrences where implicit typing is used.

Page 3 of 4 (54 items) 1234