Uses and misuses of implicit typing

Uses and misuses of implicit typing

Rate This
  • Comments 54

One of the most controversial features we've ever added was implicitly typed local variables, aka "var". Even now, years later, I still see articles debating the pros and cons of the feature. I'm often asked what my opinion is, so here you go.

Let's first establish what the purpose of code is in the first place. For this article, the purpose of code is to create value by solving a business problem.

Now, sure, that's not the purpose of all code. The purpose of the assembler I wrote for my CS 242 assignment all those years ago was not to solve any business problem; rather, its purpose was to teach me how assemblers work; it was pedagogic code. The purpose of the code I write to solve Project Euler problems is not to solve any business problem; it's for my own enjoyment. I'm sure there are people who write code just for the aesthetic experience, as an art form. There are lots of reasons to write code, but for the sake of this article I'm going to make the reasonable assumption that people who want to know whether they should use "var" or not are asking in their capacity as professional programmers working on complex business problems on large teams.

Note that by "business" problems I don't necessarily mean accounting problems; if analyzing the human genome in exchange for National Science Foundation grant money is your business, then writing software to recognize strings in a sequence is solving a business problem. If making a fun video game, giving it away for free and selling ads around it is your business, then making the aliens blow up convincingly is solving a business problem. And so on. I'm not putting any limits here on what sort of software solves business problems, or what the business model is.

Second, let's establish what decision we're talking about here. The decision we are talking about is whether it is better to write a local variable declaration as:

TheType theVariable = theInitializer;

or

var theVariable = theInitializer;

where "TheType" is the compile-time type of theInitializer. That is, I am interested in the question of whether to use "var" in scenarios where doing so does not introduce a semantic change. I am explicitly not interested in the question of whether

IFoo myFoo = new FooStruct();

is better or worse than

var myFoo = new FooStruct();

because those two statements do different things, so it is not a fair comparison. Similarly I am not interested in discussing bizarre and unlikely corner cases like "what if there is a struct named var in scope?" and so on.

In this same vein, I'm interested in discussing the pros and cons when there is a choice. If you have already decided to use anonymous types then the choice of whether to use implicit typing has already been made:

var query = from c in customers select new {c.Name, c.Age};

The question of whether it is better to use nominal or anonymous types is a separate discussion; if you've decided that anonymous types are worthwhile then you are almost certainly going to be using "var" because there is no good alternative.

Given that the overarching purpose is assumed to be solving business problems, what makes good code? Obviously that's a huge topic but three relevant factors come to mind. Good code:

  • works correctly according to its specification to actually solve the stated problem
  • communicates its meaning to the reader who needs to understand its operation
  • allows for relatively low-cost modification to solve new problems as the business environment changes.

In evaluating whether or not to use "var" we can dismiss the first concern; I'm only interested in pros and cons of cases where using var does not change the meaning of a program, only its textual representation. If we change the text without changing its semantics then by definition we have not changed its correctness. Similarly, using or not using "var" does not change other observable characteristics of the program, such as its performance. The question of whether or not to use "var" hinges upon its effect on the human readers and maintainers of the code, not on its effect upon the compiled artefact.

What then is the effect of this abstraction on the reader of the code?

All code is of course an abstraction; that's the whole reason why we have high-level languages rather than getting out our voltmeters and programming at the circuit level. Code abstractions necessarily emphasize some aspects of the solution while "abstracting away" other aspects. A good abstraction hides what is irrelevant and makes salient what is important. You might know that on x86 chips C# code will typically put the value returned by a method in EAX and typically put the "this" reference in ECX, but you don't need to know any of that to write C# programs; that fact has been abstracted away completely.

It is clearly not the case that more information in the code is always better. Consider that query I referred to earlier. What is easier to understand, that query, or to choose to use a nominal type and no query comprehension:

IEnumerable<NameAndAge> query = Enumerable.Select<Customers, NameAndAge>(customers, NameAndAgeExtractor);

along with the implementations of the NameAndAge class, and the NameAndAgeExtractor method? Clearly the query syntax is much more abstract and hides a lot of irrelevant or redundant information, while emphasizing what we wish to be the salient details: that we are creating a query which selects the name and age of a table of customers. The query emphasizes the business purpose of the code; the expansion of the query emphasizes the mechanisms used to implement that purpose.

The question then of whether "var" makes code better or worse for the reader comes down to two linked questions:

1) is ensuring salience of the variable's type important to the understanding of the code? and,
2) if yes, is stating the type in the declaration necessary to ensure salience?

Let's consider the first question first. Under what circumstances is it necessary for a variable's type to be clearly understood when reading the code? Only when the mechanism of the code -- the "how it works" -- is more important to the reader than the semantics -- the "what its for".

In a high-level language used to solve business problems, I like the mechanisms to be abstracted away and the salient features of the code to be the business domain logic. That's not always the case of course; sometimes you really do care that this thing is a uint, it has got to be a uint, we are taking advantage of the fact that it is a uint, and if we turned it into a ulong or a short, or whatever, then the mechanism would break.

For example, suppose you did something like this:

var distributionLists = MyEmailStore.Contacts(ContactKind.DistributionList);

Suppose the elided type is DataTable. Is it important to the reader to know that this is a DataTable? That's the key question. Maybe it is. Maybe the correctness and understandability of the rest of the method depends completely on the reader understanding that distributionLists is a DataTable, and not a List<Contact> or an IQueryable<Contact> or something else.

But hopefully it is not. Hopefully the rest of the method is perfectly understandable with only the semantic understanding, that distributionLists represents a collection of distribution lists fetched from a storage containing email contacts.

Now, to the crowd who says that of course it is better to always know the type, because knowing the type is important for the reader, I would ask a pointed question. Consider this code:

decimal rate = 0.0525m;
decimal principal = 200000.00m;
decimal annualFees = 100.00m;
decimal closingCosts = 1000.00m;
decimal firstPayment = principal * (rate / 12) + annualFees / 12 + closingCosts;

Let's suppose that you believe that it is important for all the types to be stated so that the code is more understandable. Why then is it not important for the types of all those subexpressions to be stated? There are at least four subexpressions in that last statement where the types are not stated. If it is important for the reader to know that 'rate' is of type decimal, then why is it not also important for them to know that (rate / 12) is of type decimal, and not, say, int or double?

The simple fact is that the compiler does huge amounts of type analysis on your behalf already, types which never appear in the source code, because for the most part those types would be distracting noise rather than helpful information. Sometimes the declared type of a variable is distracting noise too.

Now consider the second question. Suppose for the sake of argument it is necessary for the reader to understand the storage type. Is it necessary to state it? Often it is not:

var prices = new Dictionary<string, List<decimal>>();

It might be necessary for the reader to understand that prices is a dictionary mapping strings to lists of decimals, but that does not mean that you have to say

Dictionary<string, List<decimal>> prices = new Dictionary<string, List<decimal>>();

Clearly use of "var" does not preclude that understanding.

So far I've been talking about reading code. What about maintaining code? Again, var can sometimes hurt maintainability and sometimes help it. I have many times written code something like:

var attributes = ParseAttributeList();
foreach(var attribute in attributes)
{
    if (attribute.ShortName == "Obsolete") ...

Now suppose I, maintaining this code, change ParseAttributeList to return a ReadOnlyCollection<AttributeSyntax> instead of List<AttributeSyntax>. With "var" I don't have to change anything else; all the code that used to work still works. Using implicitly typed variables helps make refactorings that do not change semantics succeed with minimal edits. (And if refactoring changes semantics, then you'll have to edit the code's consumers regardless of whether you used var or not.)

Sometimes critics of implicitly typed locals come up with elaborate scenarios in which it allegedly becomes difficult to understand and maintain code:

var square = new Shape();
var round = new Hole();
... hundreds of lines later ...
bool b = CanIPutThisPegInThisHole(square, round); // Works!

Which then later gets "refactored" to:

var square = new BandLeader("Lawrence Welk");
var round = new Ammunition();
... hundreds of lines later ...
bool b = CanIPutThisPegInThisHole(square, round); // Fails!

In practice these sorts of contrived situations do not arise often, and the problem is actually more due to the bad naming conventions and unlikely mixtures of business domains than the lack of explicit typing.

Summing up, my advice is:

  • Use var when you have to; when you are using anonymous types.
  • Use var when the type of the declaration is obvious from the initializer, especially if it is an object creation. This eliminates redundancy.
  • Consider using var if the code emphasizes the semantic "business purpose" of the variable and downplays the "mechanical" details of its storage.
  • Use explicit types if doing so is necessary for the code to be correctly understood and maintained.
  • Use descriptive variable names regardless of whether you use "var". Variable names should represent the semantics of the variable, not details of its storage; "decimalRate" is bad; "interestRate" is good.
  • Eric, I like your analysis, but believe you left out one case (at least explicitly) that has been important in actual projects I work on. That is when you want the code to BREAK in the event of a type change.

    Consider a control system. Many elements have On() and Off() methods. there are many cases where there is no relationship between the types (i.e. no common base classes or interfaces), there is only the similarity that both have methods with those signatures.

    Now I write code:

    var thing = SomeFactory.GetThing()  // Returns something that is safe to turn off...

    thing.Off().

    Then later a change is made to the Factory and that method now returns something completely different, which happens to have severe consequences if it is arbitrarily turned off [having such a design is debatable for many reasons - but they are outside the defined scope of your post].

    By using var, the previous code will compile without compliant. Even though the return type may have changed from "ReadingLamp" to "LifeSupportSystem".

    I believe (based on my experiences as a "traveling consultant") that there are more time when there is the possibility of an "unintended side-effect" caused by a change in the type than there are times where the change in type has no bearing on the code that consumes it. As a result, I very rarely use var. Even when the return type is obvious (such as the LHS of a "new"), I find it easier to be consistent.

  • I'll deal with it if the team I'm on dictates it, but as for my personal preference, you'll have to pry explicitness from my cold dead hands. Within reason, of course, as I concede the point on anonymous typing, even enumerable query results in general, and (of course) sub-expressions.

    While I am rationally aware that the static typing is the same as always, it just reminds my irrational brain too much of weak typing in scripting languages and VB and makes me cringe.

  • @David - in your example I would have to imagine your factory is responsible for creating similar objects given the business domain. If your factory could return a "ReadingLamp" or "LifeSupportSystem", then I would assume your factory is called "ThingsThatTurnOffFactory" and might involve revisiting a design decision.

  • The fact that the type should be spelt out only if needed, seems more obvious to me when I consider that it is possible to use a cast as a type annotation. Compare:

    TheType theVariable = theInitializer;

    var theVariable = (TheType)theInitializer;

    With var, all declarations become nicely lined up, *and* optional explicit typing is still possible.

    I use var everywhere (btw C# also needs "val" from Scala).

  • Easily the most clear, succinct, and relevant analysis of the pros and cons of var I have ever read.

    @David V. Corbin,

    That is exactly the kind of contrived scenario that Eric was referring to as being due to bad design rather than the use of implicit typing. If you really have a method that may realistically change from returning a lamp to returning a life support system, you have MUCH bigger problems than whether or not to use var.

  • Despite being very used to explicitly typed locals prior to the introduction of var, implicitly typed locals are now the default for me.

    Type mechanisms are rarely so important to the business problem being solved that the type needs to be immediately visible in the code.

    Even when a type is important to understanding, most code is viewed in an editor and it is trivial to hover over var to see the tooltip containing the actual type of the variable.

    Even when viewing code on dead trees or in simple text editors, sufficient type information can usually be obtained from surrounding code. Parameters have explicit types in C# and most code depends on common libraries for which the reader has some prior knowledge of the types.

    For these reasons explicitly stating the type of a local in code is usually just noise as far as I'm concerned.

    Explicitly typed locals are still useful in a couple of situations though. When I use explicitly typed variables I think of them chiefly in one of two ways:

    1. As a compile time assertion that the result of an expression is of a particular type. The key consideration here is that I know I want to trigger a compile error when certain refactorings occur. This is typically useful when strongly typed code needs to interface with weakly typed code. Rather than pass an implicitly typed local (or an expression) to a function that takes an Object, I'll declare an explicitly typed local and pass that.

    (Another way of thinking about this is that I've conceptually created a strongly typed function that wraps the weakly typed function and then optimized away all the text that makes up the wrapper function except for the _parameters_ which become local variables in the calling function).

    2. As a way of constraining the dependencies on external code to a minimal set either for correctness of my current code or to preserve options for changing an external dependency without changing my code. Commonly in this situation, the explicit type is an interface.

    (Another way of thinking about this is that I've conceptually created a wrapper function around an external dependency where the _return type_ is a facade then I've optimized away both the wrapper function and the separate facade because the underlying object already implements the facade as a separate interface)

    In both cases, use of explicitly typed locals is related to managing dependencies that are external to the local function, refactoring and code maintenance are key to the decision process, and the use of explicitly typed locals can be viewed as optimizations of techniques that would otherwise require me to create separate functions.

  • Olivier, I would caution against using a cast in that way. If your initialization expression is not actually of the type you're casting to, you could end up with an exception at runtime instead of being notified by the compiler. Better to be explicit with the variable type (or if lining up the text is super important, emulate a C++-style static_cast with a generic function that just returns it's argument). 

  • I've encountered a situation where explicitly typing a variable caused problems. This was several years ago when Linq and Linq to Sql where new and not well understood by most develpers. There was some code consuming a LinqToSql data source and the developer explicitly stated a type for one of his intermediate queries. The type he stated was IEnumerable<T>, when what would have been implied was something that implemented IQueryable<T> (and therefor IEnumerable<T> as well). Of course the problem wasn't detected until the project rolled into production and had a massive database sitting behind it. The result being that the entire table of data was shipped to the application and the query handled in application code. Had he only typed "var" instead of trying to be explicit things would have been fine.

    This example clearly falls into the category of changing the semantic meaning of the code, but in a subtle enough way that the developer (and presumably some QA engineer) failed to detect it.

  • I know that not all development happens in Visual Studio, but most does. If you REALLY need to know the type of a variable, you can hover your mouse over it and VS will tell you what type it is. AutoComplete also readily shows what methods are available. I get the warm fuzzies knowing that C# is strongly typed and that prevents a whole class of errors but it is rare that needing to know the type of something is ever that important.

  • @David: I believe in those cases you should abstract the methods to an interface definition, so that you can explicitly tell the type, also change the concrete implementation if the semantics are preserved.

    But overall I prefer using var to explicitly declaring types. If there is one more thing compiler can do for me, and one less I have to care about, it's a nice feature.

  • Eclipse has a nice feature that highlights (and lets you refactor) code that uses types more derived than necessary. If all you do is enumerate over the return value of ParseAttributeList() then you should type it to IEnumerable (or its generic version). That way your code explicitly states that you are free to change the actual type to anything you want as long as it implements IEnumerable. Using var does not show that. In my opinion it makes the code more difficult to read because it is not clear about what the return type is expected to be. There's nothing to say that if you are going to change it you must implement IEnumerable, unless of course you scan *all* the code that uses the return value. In your example it's right there, in a real world code it may not be. It may be enumerated in a different method we're passing the return value to.

    That said, anonymous types, and maybe even things such as collection iterators, are a reasonable place to use var. But other than that I believe it leads to code that is easier to write but difficult to read.

  • I've more or less switched over to using "var" for almost everything, but I've found one unexpected situation where it's caused problems - and that's with the "dynamic" type. I've got a little helper library that implements json as "dynamic" so you can do "var json = Json.NewObject(); json.x = "y";" and have json.ToString() return {x: "y"}. The 'var' in this case evaluates to 'dynamic' - so far, so good.

    Then you have a method that takes one of these as a parameter - say, IThingy CreateThingyFromJson(dynamic json).

    Then you write code like this:

    var myThingy = CreateThingyFromJson(json);

    ... and 'var' evaluates to 'dynamic' rather than to 'IThingy', even when there's only one CreateThingyFromJson method in scope and it literally takes a 'dynamic' as its parameter. Oops. That was unexpected.

    I sort of want a way to declare that I'm using a 'dynamic' undynamically...

  • I've been using var almost exclusively since the betas.

    One thing that has not been mentioned, and which figures strongly into my motivations for using var, is its effect on design of code.  Var, by its nature, can act as a forcing function toward simplicity of the code.  That is:

    If your source cannot be reasonably understood using implicit typing, then you should consider rewriting it.

    Doing so helps keep me honest about maintainability.

    @Stuart:  That feels like a compiler bug to me, not an issue with implicit typing in itself.

  • Unfortunately a simple comment doesn't lend itself well to a proper response. You really need a full blog post to do so.

    That being said... I use var almost everywhere, (even in simple for loops!). About the only place I don't is where Jerry Pisk mentioned already. I know others who use var only in the case of linq. One interesting reason mentioned is refactoring code could potentially cause a situation where one doesn't know what the return type of a method call was when it breaks. My preference is what Eric and sukru mention. It is one more thing the compiler can do for me and when refactoring etc. as long as I don't change the semantics it is all good and should work. This is true progress!

    In my opinion, anyone who is dependent completely in explicit typing is either old school and resistant to change or creating methods with much too much complexity. Your methods should do as little as possible and do it well. I think this is where the real "argument" occurs. If you see allot of Explicit typing in your code then your code is quite possibly in need of refactoring.

  • "Contrived" apparently means "any situation which I doesn't support me".  

    I do think that "var" hurts maintaining and later changing of the code.  When I change the return type of a method, I want it to be obvious that I messed up an assignment:

    1:  var foo = someMethod();

    ...

    45: foo.methodNotThereOnNewType();

    I want it to fail on line 1, not line 45.  Especially if I didn't write the code the first time around.  I hate wasting time on things like that.  Not contrived at all.

    The useful and non-confusing scenario would be "var foo = new Something();" were it is obvious what the type is.  I don't need to look through some method in some other file to know it is a "Something", and it can't fail in a third location by someone changing a method return type in another module.

Page 1 of 4 (54 items) 1234