Uses and misuses of implicit typing

Uses and misuses of implicit typing

Rate This
  • Comments 54

One of the most controversial features we've ever added was implicitly typed local variables, aka "var". Even now, years later, I still see articles debating the pros and cons of the feature. I'm often asked what my opinion is, so here you go.

Let's first establish what the purpose of code is in the first place. For this article, the purpose of code is to create value by solving a business problem.

Now, sure, that's not the purpose of all code. The purpose of the assembler I wrote for my CS 242 assignment all those years ago was not to solve any business problem; rather, its purpose was to teach me how assemblers work; it was pedagogic code. The purpose of the code I write to solve Project Euler problems is not to solve any business problem; it's for my own enjoyment. I'm sure there are people who write code just for the aesthetic experience, as an art form. There are lots of reasons to write code, but for the sake of this article I'm going to make the reasonable assumption that people who want to know whether they should use "var" or not are asking in their capacity as professional programmers working on complex business problems on large teams.

Note that by "business" problems I don't necessarily mean accounting problems; if analyzing the human genome in exchange for National Science Foundation grant money is your business, then writing software to recognize strings in a sequence is solving a business problem. If making a fun video game, giving it away for free and selling ads around it is your business, then making the aliens blow up convincingly is solving a business problem. And so on. I'm not putting any limits here on what sort of software solves business problems, or what the business model is.

Second, let's establish what decision we're talking about here. The decision we are talking about is whether it is better to write a local variable declaration as:

TheType theVariable = theInitializer;

or

var theVariable = theInitializer;

where "TheType" is the compile-time type of theInitializer. That is, I am interested in the question of whether to use "var" in scenarios where doing so does not introduce a semantic change. I am explicitly not interested in the question of whether

IFoo myFoo = new FooStruct();

is better or worse than

var myFoo = new FooStruct();

because those two statements do different things, so it is not a fair comparison. Similarly I am not interested in discussing bizarre and unlikely corner cases like "what if there is a struct named var in scope?" and so on.

In this same vein, I'm interested in discussing the pros and cons when there is a choice. If you have already decided to use anonymous types then the choice of whether to use implicit typing has already been made:

var query = from c in customers select new {c.Name, c.Age};

The question of whether it is better to use nominal or anonymous types is a separate discussion; if you've decided that anonymous types are worthwhile then you are almost certainly going to be using "var" because there is no good alternative.

Given that the overarching purpose is assumed to be solving business problems, what makes good code? Obviously that's a huge topic but three relevant factors come to mind. Good code:

  • works correctly according to its specification to actually solve the stated problem
  • communicates its meaning to the reader who needs to understand its operation
  • allows for relatively low-cost modification to solve new problems as the business environment changes.

In evaluating whether or not to use "var" we can dismiss the first concern; I'm only interested in pros and cons of cases where using var does not change the meaning of a program, only its textual representation. If we change the text without changing its semantics then by definition we have not changed its correctness. Similarly, using or not using "var" does not change other observable characteristics of the program, such as its performance. The question of whether or not to use "var" hinges upon its effect on the human readers and maintainers of the code, not on its effect upon the compiled artefact.

What then is the effect of this abstraction on the reader of the code?

All code is of course an abstraction; that's the whole reason why we have high-level languages rather than getting out our voltmeters and programming at the circuit level. Code abstractions necessarily emphasize some aspects of the solution while "abstracting away" other aspects. A good abstraction hides what is irrelevant and makes salient what is important. You might know that on x86 chips C# code will typically put the value returned by a method in EAX and typically put the "this" reference in ECX, but you don't need to know any of that to write C# programs; that fact has been abstracted away completely.

It is clearly not the case that more information in the code is always better. Consider that query I referred to earlier. What is easier to understand, that query, or to choose to use a nominal type and no query comprehension:

IEnumerable<NameAndAge> query = Enumerable.Select<Customers, NameAndAge>(customers, NameAndAgeExtractor);

along with the implementations of the NameAndAge class, and the NameAndAgeExtractor method? Clearly the query syntax is much more abstract and hides a lot of irrelevant or redundant information, while emphasizing what we wish to be the salient details: that we are creating a query which selects the name and age of a table of customers. The query emphasizes the business purpose of the code; the expansion of the query emphasizes the mechanisms used to implement that purpose.

The question then of whether "var" makes code better or worse for the reader comes down to two linked questions:

1) is ensuring salience of the variable's type important to the understanding of the code? and,
2) if yes, is stating the type in the declaration necessary to ensure salience?

Let's consider the first question first. Under what circumstances is it necessary for a variable's type to be clearly understood when reading the code? Only when the mechanism of the code -- the "how it works" -- is more important to the reader than the semantics -- the "what its for".

In a high-level language used to solve business problems, I like the mechanisms to be abstracted away and the salient features of the code to be the business domain logic. That's not always the case of course; sometimes you really do care that this thing is a uint, it has got to be a uint, we are taking advantage of the fact that it is a uint, and if we turned it into a ulong or a short, or whatever, then the mechanism would break.

For example, suppose you did something like this:

var distributionLists = MyEmailStore.Contacts(ContactKind.DistributionList);

Suppose the elided type is DataTable. Is it important to the reader to know that this is a DataTable? That's the key question. Maybe it is. Maybe the correctness and understandability of the rest of the method depends completely on the reader understanding that distributionLists is a DataTable, and not a List<Contact> or an IQueryable<Contact> or something else.

But hopefully it is not. Hopefully the rest of the method is perfectly understandable with only the semantic understanding, that distributionLists represents a collection of distribution lists fetched from a storage containing email contacts.

Now, to the crowd who says that of course it is better to always know the type, because knowing the type is important for the reader, I would ask a pointed question. Consider this code:

decimal rate = 0.0525m;
decimal principal = 200000.00m;
decimal annualFees = 100.00m;
decimal closingCosts = 1000.00m;
decimal firstPayment = principal * (rate / 12) + annualFees / 12 + closingCosts;

Let's suppose that you believe that it is important for all the types to be stated so that the code is more understandable. Why then is it not important for the types of all those subexpressions to be stated? There are at least four subexpressions in that last statement where the types are not stated. If it is important for the reader to know that 'rate' is of type decimal, then why is it not also important for them to know that (rate / 12) is of type decimal, and not, say, int or double?

The simple fact is that the compiler does huge amounts of type analysis on your behalf already, types which never appear in the source code, because for the most part those types would be distracting noise rather than helpful information. Sometimes the declared type of a variable is distracting noise too.

Now consider the second question. Suppose for the sake of argument it is necessary for the reader to understand the storage type. Is it necessary to state it? Often it is not:

var prices = new Dictionary<string, List<decimal>>();

It might be necessary for the reader to understand that prices is a dictionary mapping strings to lists of decimals, but that does not mean that you have to say

Dictionary<string, List<decimal>> prices = new Dictionary<string, List<decimal>>();

Clearly use of "var" does not preclude that understanding.

So far I've been talking about reading code. What about maintaining code? Again, var can sometimes hurt maintainability and sometimes help it. I have many times written code something like:

var attributes = ParseAttributeList();
foreach(var attribute in attributes)
{
    if (attribute.ShortName == "Obsolete") ...

Now suppose I, maintaining this code, change ParseAttributeList to return a ReadOnlyCollection<AttributeSyntax> instead of List<AttributeSyntax>. With "var" I don't have to change anything else; all the code that used to work still works. Using implicitly typed variables helps make refactorings that do not change semantics succeed with minimal edits. (And if refactoring changes semantics, then you'll have to edit the code's consumers regardless of whether you used var or not.)

Sometimes critics of implicitly typed locals come up with elaborate scenarios in which it allegedly becomes difficult to understand and maintain code:

var square = new Shape();
var round = new Hole();
... hundreds of lines later ...
bool b = CanIPutThisPegInThisHole(square, round); // Works!

Which then later gets "refactored" to:

var square = new BandLeader("Lawrence Welk");
var round = new Ammunition();
... hundreds of lines later ...
bool b = CanIPutThisPegInThisHole(square, round); // Fails!

In practice these sorts of contrived situations do not arise often, and the problem is actually more due to the bad naming conventions and unlikely mixtures of business domains than the lack of explicit typing.

Summing up, my advice is:

  • Use var when you have to; when you are using anonymous types.
  • Use var when the type of the declaration is obvious from the initializer, especially if it is an object creation. This eliminates redundancy.
  • Consider using var if the code emphasizes the semantic "business purpose" of the variable and downplays the "mechanical" details of its storage.
  • Use explicit types if doing so is necessary for the code to be correctly understood and maintained.
  • Use descriptive variable names regardless of whether you use "var". Variable names should represent the semantics of the variable, not details of its storage; "decimalRate" is bad; "interestRate" is good.
  • @Alex Stockton, the compiler will error out on a cast that is determined at compile-time to be impossible. Just as you cannot "Car a = new Elephant()", you cannot "var a = (Car)new Elephant()".

  • @Ben. Right, but your examples don't negate the point that a cast can easily change something that you want to be a compile time failure in to a run time failure. C# casts should not be used for documentation purposes nor to assert things that you want to ensure at compile time.

    The following are not equivalent:

    1. var x = (Car)CreateTaxi();

    2. Car x = CreateTaxi();

    3. var x = CreateTaxi();

    Suppose the return type of CreateTaxi was originally Car then CreateTaxi gets updated so that the return type becomes Vehicle. A cast from Vehicle to Car is possible, so 1 will still compile, and will result in a runtime cast that can fail.

    Suppose the object returned from the new version of CreateTaxi is actually of type Boat. In this situation 1 will fail at run time, 2 will fail at compile time, and 3 will not fail at all.

    Obviously you have to decide for yourself what behavior you want now and how you want your code to behave when the code around it changes, but compile time failures are preferable to run time failures, so don't cast unless you have to (and you don't have to if what you're trying to do is either fix or document the compile time type of an expression).

  • My God, it's like the VB6 "Variant" datatype never happened.

    The problem us VB programmers had back in the day was Evil Type Coercion(tm). Even if there is no stated type, the programmer will assume a particular type when reading the code. When the compiler picks a different type than the one you assume, hilarity ensures.

    And by hilarity, I mean "intermittent, untraceable, data corrupting bugs".

  • The var keyword is nothing like the Variant datatype.  There is no type coercion.  The compiler correctly determines the static type of the variable.

  • Less dependencies means code that is easier to maintain, and without var you're dependent on a typename.

    I try to eliminate dependencies from my code as much as possible. Yes there is a disadvantage sometimes, but it greatly outweighs the advantage.

    Yes birds can hit wind turbines, but regular power plants produce too much pollution.

    Your choice :)

  • Having the name of the type isn't really a dependency, there are refactoring tools if the name changes and if the return type of the function could change you should probably be using an interface anyways.

    I would also have to argue against var making code more maintainable, at least on a project the size I am on.  It isn't so much that it makes is "harder" to maintain per say, more just frustrating. Sure you can read tooltips to figure out what they are, but trying to keep them straight is annoying and sort of a momentum killer (at least to me)...plus nothing like getting really focused on debugging a difficult bug and then having to stop and switch your brain into "what is this variable?  All i can tell from the code is it has a property named Value that is an object or numeric data type."

    Actually after I joined, I had to debug some year+ old code by someone no longer on the team riddled with vars nested inside foreach loops of vars, and everything was return values of something else.  That led me to push for (and we eventually did) ban the use of var on our project, a lot of times it doesn't hurt, but it is just abused too much in my experience.

  • Sean, it sounds more like your code was littered with poorly named variables, ie 'value'. I've worked on very large codebases where var is the norm, and can't remember ever having to wonder what the type of a particular variable is, because variables have descriptive names. Its also aided by the fact that 99.999% of the time I dont *care* what the type is, debugging or not. I mean...why would you?

  • Lovely . . . more analysis, discussion, and debate about using the "var" keyword.  My solution is simple:  K.I.S.S.  

    I don't see how the KISS principle applies. Is it simpler to state the type redundantly, or simpler to let the compiler infer the type? It seems to me to be a judgment call which is simpler. -- Eric

    It's preferable to state something explicitly that shouldn't be than to not state something explicitly that should be.  So . . . .  . state types explicitly unless a syntactic necessity (such as when using anonymous types) and get back to solving the *real* business problem.

    I suspect that you apply your principle inconsistently. In my mortgage example, would you insert casts indicating that the subexpressions were of type decimal? You say that it is "preferable" to state something explicitly, but the types of all those subexpressions are inferred silently, not stated. When, in your opinion, does your principle apply? For example, do you always state the type when calling a generic method, or do you sometimes let the compiler infer the type? How do you decide? Do you always insert redundant casts on method arguments indicating their types to the reader, or do you let the compiler work those out for you? How do you decide? I think your decision-making process is considerably more complex than simply choosing more explicitness every time. I think you probably have some "metric" for determining when explicitness adds value and when it impedes clarity. What is that metric? -- Eric

  • Hi Eric,

    With regard to the use of the "var" keyword, I was suggesting that the KISS principle be appied in the following manner:

    1.  There are times when explicit typing is preferable to implicit typing.

    2.  Determining situations in which implicit typing (with var) has proved to be a very contentious issue in the developer community.

    3.  The risks associated with always using explicit typing (when possible) are minimal, whereas the risks associated with always using implicit typing are more significant (my opinion).

    4.  So . . . for the sake of simplicity, always use explicit typing.  Some may call this a cop out, but I would call it being practical.  Given the current rate of technology change and the amount of information developers must absorb, I welcome simplicity whereever possible . . . sometimes at a cost.  I'd prefer to not have to work my way through a checklist everytime I declare a variable unless absolutely necesary.

    You are correct.  I suppose I apply this rule inconsistently.  I was thinking primarily of the use of "var" in scenarios such as  "var customer = new Customer();" (these scenarios seem to have received the most attention in many forums).  Looking at things from a broader perspective, I would certainly not want to declare types explicitly in all scenarios.  : )

    In any case, thanks for taking the time to reply.

    Chris

  • I'll second Alex Stockton's notion of using explicitly typed locals as assertions. I like being able to use var where the type of the expression is easily inferred (by humans and the compiler) from the RHS of the expression, as is the case with

    var theList = new List<int>();

    and I love var when I have do something like

    var map = new Dictionary<int, List<string>>();

    However, if the type of the local depends on the return value of a method, I prefer to explicitly type it so that if the return type of the method changes in an incompatible way, the calling code breaks as well. This prevents the class of errors which David Corbin mentioned earlier. Basically, the farther apart the type information for the LHS is from the RHS for the human reader, the more likely I am to explicitly type the variable.

  • There is also a scenario I've run into that would not have happened if the var keyword was used in foreach loops.  Regardless of naming issues and the wisdom of such a design, I've actually seen this type of code in the real world.  

    Say you have the following:  ITest2 inherits from ITest, but Test2 does not inherit from Test.

    IEnumerable<ITest2> items = new ITest2[] { new Test2(), new Test2(), new Test2() };

    foreach (ITest2 item in items)

    {

    Console.WriteLine(item.GetValue());

    }

    Then you change the enumerable to: IEnumerable<ITest> items = new ITest[] { new Test(), new Test(), new Test() };

    The code will compile, but it will fail at runtime with a cast exception.

  • @Alex Stockton, your point is clearer now, but the example is not realistic. Why would I write "var car = (Car)CreateTaxi()" unless CreateTaxi() already returned a type that required the cast? In this case, "var car" is no worse than "Car car", since the compile-time error check (on the cast) is the same.

    I agree that casts "should not be used for documentation purposes", but I think another example is required to illustrate this ideal where var is concerned. (I leave it to you to provide one.)

    FWIW, I use var nearly everywhere. As for K.I.S.S., I find the principle applies: var visually and lexically simplifies my code, which translates into faster, easier reading.

  • Considering the way Haskell programmers complain about how often they have to write types in F# programs, I'd have to say that naming types almost seems like a micro-optimization. Why worry about types except in the few cases where it matters?

  • Great, now C# has the equivalent of BASIC's Dim.

    Next up is GOTO: continuations!

  • @Dim=var

    No it doesn't

    Dim doesn't have any equivelent in C#.  What C# has (with var) is the equivelent of Option Infer (actually, I think C# got var first)

    While talking about VB, most people seem to say that var is good in situations like

    var o = new SomeObject();

    instead of having to repeat the type.  This is something that VB already has by being able to say

    Dim o as new SomeObject

    I think removing the extra typing in both cases is good.  I personally would probably prefer a specific syntax for this circumstance (as in VB) but that comes from my general dislike of var.

    My feelings towards var (and Option Infer) are that they are removing some of the benifit I get from a strongly typed language, hence I prefer to be explicit about types.

Page 2 of 4 (54 items) 1234