Closing over the loop variable considered harmful

Closing over the loop variable considered harmful

Rate This

(This is part one of a two-part series on the loop-variable-closure problem. Part two is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The "for" loop will not be changed. We return you now to our original article.


I don't know why I haven't blogged about this one before; this is the single most common incorrect bug report we get. That is, someone thinks they have found a bug in the compiler, but in fact the compiler is correct and their code is wrong. That's a terrible situation for everyone; we very much wish to design a language which does not have "gotcha" features like this.

But I'm getting ahead of myself. What's the output of this fragment?

var values = new List<int>() { 100, 110, 120 };
var funcs = new List<Func<int>>();
foreach(var v in values)
  funcs.Add( ()=>v );
foreach(var f in funcs)
  Console.WriteLine(f());

Most people expect it to be 100 / 110 / 120.  It is in fact 120 / 120 / 120. Why?

Because ()=>v means "return the current value of variable v", not "return the value v was back when the delegate was created". Closures close over variables, not over values. And when the methods run, clearly the last value that was assigned to v was 120, so it still has that value.

This is very confusing. The correct way to write the code is:

foreach(var v in values)
{
  var v2 = v;
  funcs.Add( ()=>v2 );
}

Now what happens? Every time we re-start the loop body, we logically create a fresh new variable v2. Each closure is closed over a different v2, which is only assigned to once, so it always keeps the correct value.

Basically, the problem arises because we specify that the foreach loop is a syntactic sugar for

  {
    IEnumerator<int> e = ((IEnumerable<int>)values).GetEnumerator();
    try
    {
      int m; // OUTSIDE THE ACTUAL LOOP
      while(e.MoveNext())
      {
        m = (int)(int)e.Current;
        funcs.Add(()=>m);
      }
    }
    finally
    {
      if (e != null) ((IDisposable)e).Dispose();
    }
  }

If we specified that the expansion was

    try
    {
      while(e.MoveNext())
      {
        int m; // INSIDE
        m = (int)(int)e.Current;
        funcs.Add(()=>m);
      }

then the code would behave as expected.

It's compelling to consider fixing this for a hypothetical future version of C#, and I'd like to hear your feedback on whether we should do so or not. The reasons FOR making the change are clear; this is a big confusing "gotcha" that real people constantly run into, and LINQ, unfortunately, only makes it worse, because it is likely to increase the number of times a customer is going to use a closure in a loop. Also, it seems reasonable that the user of the foreach loop might think of there being a "fresh" loop variable every time, not just a fresh value in the same old variable. Since the foreach loop variable is not mutable by user code, this reinforces the idea that it is a succession of values, one per loop iteration, and not "really" the same variable over and over again. And finally, the change has no effect whatsoever on non-closure semantics. (In fact, in C# 1 the spec was not clear about whether the loop variable went inside or outside, since in a world without closures, it makes no difference.)

But that said, there are some very good reasons for not making this change.

The first reason is that obviously this would be a breaking change, and we hates them, my precious. Any developers who depend on this feature, who require the closed-over variable to contain the last value of the loop variable, would be broken. I can only hope that the number of such people is vanishingly small; this is a strange thing to depend on. Most of the time, people do not expect or depend on this behaviour.

Second, it makes the foreach syntax lexically inconsistent. Consider foreach(int x in M()) The header of the loop has two parts, a declaration int x and a collection expression, M(). The int x is to the left of the M(). Clearly the M() is not inside the body of the loop; that thing only executes once, before the loop starts. So why should something to the collection expression's left be inside the loop? This seems inconsistent with our general rule that stuff to the left logically "happens before" stuff to the right. The declaration is lexically NOT in the body of the loop, so why should we treat it as though it were?

Third, it would make the "foreach" semantics inconsistent with "for" semantics. We have this same problem in "for" blocks, but "for" blocks are much looser about what "the loop variable" is; there can be more than one variable declared in the for loop header, it can be incremented in odd ways, and it seems implausible that people would consider each iteration of the "for" loop to contain a fresh crop of variables. When you say for(int i; i < 10; i += 1) it seems dead obvious that the "i += 1" means "increment the loop variable" and that there is one loop variable for the whole loop, not a new fresh variable "i" every time through! We certainly would not make this proposed change apply to "for" loops.

And fourth, though this is a nasty gotcha, there is an easy workaround, and tools like ReSharper detect this pattern and suggest how to fix it. We could take a page from that playbook and simply issue a compiler warning on this pattern. (Though adding new warnings brings up a whole raft of issues of its own, which I might get into in another post.) Though this is vexing, it really doesn't bite that many people that hard, and it's not a big deal to fix, so why go to the trouble and expense of taking a breaking change for something with an easy fix?

Design is, of course, the art of compromise in the face of many competing principles. "Eliminate gotchas" in this case directly opposes other principles like "no breaking changes", and "be consistent with other language features". Any thoughts you have on pros or cons of us taking this breaking change in a hypothetical future version of C# would be greatly appreciated.

(This is part one of a two-part series on the loop-variable-closure problem. Part two is here.)

 

  • I agree with Mike Van Til and the others stating the same thing.  Once you read it and think about what is going on, it makes perfect sense.  I would really prefer not to change it as that would break previous code, and make inconsistencies in how things function.  

  • I love the fact that you're looking at pain points and attempting to eliminate them.  I have found this gotcha in many other languages and usually people give the standard answer, "That's the way it is."  I think new language constructs can deal with this rather well.  For instance, I think the ForEach method is much clearer and does what the user expects.  For instance:

    values.ForEach(v => funcs.add( () => v));

    funcs.ForEach(f => Console.WriteLine(f()));

  • "...issue a warning in C# 4, and then change the behaviour for C# 5. That way you'd have to be in a situation where two people are compiling the same code, one in VS2012 and one in VS2008, to see a silent change in behaviour."

    Outside the corporate world you might upgrade your toolchain as every new release comes out, but inside it things are far more glacial. In the C++ world there are plenty of systems still based around Visual C++ 6.0 (1998!), and it's not uncommon for teams to jump two or 3 versions when push finally comes to shove (just as many businesses leapt from Windows 95 to Windows XP will leap to Windows 7 bypassing Vista).

    Hopefully C# based development hasn't plummeted to these depths just yet, but as it matures this will start to become an issue. C#3 is already providing plenty of bang-for-your-buck though, so C#4 and C#5 have a considerable amount work to do to keep the upgrade momentum going. Fortunately C#4 is looking enticing enough to do just that :-)

  • This is a weak point, but I distinctly remember my thinking before I web searched to understand why this was acting the way it does.

    IMO, the debugger helps people believe (like me when I ran into this the first time) that there variable inside the foreach loop is new each time because it highlights the type on each iteration.  In

    foreach(var value in values)  { doSomething(value) }

    it highlights the "in" first, and the "var value" next helping me think that there was a new value variable declared on each iteration.  Had it just highlighted "value in values" instead I might have realized there was only one variable being assigned..

  • How about looking at a more general issue - closing over any mutable value.

    Although the foreach mistake is easier to make - even for developers with good knowledge of the workings of closures, I and others still easily make the mistake when closing over, say, the mutable member variable of a class instance.

    Detecting whether or not a variable referenced in a closure is mutable would mean providing a way of declaring any variable as immutable, but I think that would be a nice language feature anyway.

    We are considering language features that let you enforce constraints on regions where mutations can be legal/visible; being able to make specific variables more immutable would be a part of that. But these are very long-term research-y ideas. -- Eric

    However, if I do close over a mutable variable, I'm not sure what should happen. Should it be illegal to close over a mutable variable? (you could always close over an immutable container for a mutable value, much like F#'s ref type - that would probably prove you knew what you were doing.)

    However, that might be way too harsh, and would break a hell of a lot of code. Maybe force a syntax that acknowledges you are closing over a mutable variable? Instead of => use ==> or something? That still doesn't feel correct to me.

    Any opinions?

  • My vote: make the change.

    Which change: the concept of foreach from <changing a read-only variable> to <multiple variables with the same name>

    I seriously doubt there is any (working) code that depends on lambdas binding to a foreach variable. The only possible usage I can think of is to (repeatedly) bind to the variable and then have some break condition.

    Initially, I was opposed to the change ("closures close over variables, not values" is just one of the learning points of C#, and every language has them no matter how well-designed). However, I have to agree with the "each" wording argument. It just seems more natural to work the other way.

  • Make the compiler spit out a warning.  I see this way too often during code reviews.  It's too easy to miss.  Then anyone can compile with warnings-as-errors and easily catch them... though I like the idea of MS buying Jetbrains too. :)  ReSharper is a great product!

  • This more like a programmer issue than a compiler issue. I would rather vote for the warning from a compiler than changing it. Good experienced programmer would catch it, but again the warnings should be enough for someone to pay attention to and fix it. The only danger is that, most ignore the warnings and expect the result as it seems from the code.

    Still i think compiler warnings is the way to go for now. Please don't change the way it works.

    Thanks,

  • Don't change this. Javascript handles its closures the exact same way, and that is the proper way.

    Again, just to be clear, I am not proposing that we change how closures are handled. Closures close over variables, great. What I'm proposing is that we change the loop variable of the foreach loop to go logically insidethe body of the loop. -- Eric

  • After reading even more great comments, my opinion has changed yet again.  I definitely don't like the idea of a warning message.  After Frank Bakker's comment, I realized there would be too many valid examples of "closure over a foreach loop variable" where the closure never leaves the foreach loop.  As some others have suggested I think new C# context keywords are the best answer. I suggest 3 versions of foreach:

    foreach (var item in collection) -- error on closure of item (this is now deprecated old-style)

    foreach (var item inside collection) -- item is inside body and is instantiated once per loop (closure allowed)

    foreach (var item outside collection) -- item is outside body and is instantiated once (closure allowed)

    There would need to be an easy way to convert older "in" versions of foreach to a new (inside/outside) version with the default being inside.  However the older "in" version of foreach would still be accepted except when using closure.

  • I find it interesting that some people claim that "good understanding of lambdas and closures, this code does exactly what you would expect it to do." The folks who wrote the Common Lisp standard, who presumably understood both those things, still saw fit to include in the documentation of DOTIMES and related looping constructs: "It is implementation-dependent whether dotimes establishes a new binding of var on each iteration or whether it establishes a binding for var once at the beginning and then assigns it on any subsequent iterations." Which is of course the question you face. They chose not to choose because they wanted to leave implementations some latitude in how they compiled these constructs. But if you expect people to frequently close over the loop variable you might choose to choose: say whether it's one binding or one per iteration. But it doesn't seem to me that the choice is a logical consequence of the nature of closures.

  • Make it an error in C# 4.0 (and provide an explanation how to achieve _both_ behaviors)

    Make it behave as expected in C# 5.0

    Silently going from one behavior to another is bad. Doing it this way gives a hope that there wouldn't be any single person in existence who not only depends on weird behavior, but also  would manage to jump from exactly C# 3.0 over 4.0 to 5.0 or further.

    By the way, "int item; foreach (item in data) ..." is an error already. So much for `for` compatibility.

  • +1 for warning message (which could always be turned off with #pragma warning of course). it is clearly a problem but a breaking change is a more of a problem.

  • >No, you're right. Usually its a failure to understand that lambdas close over variables, not values. But who cares

    I would say that I care because you're proposing to fix a symptom instead of a problem which leaves you with a sort of patchwork of fixes that aren't worth the 100 points to get you back into positive territory and certainly not enough points to cancel out whatever the penalty is for introducing a breaking change.  There's also a "give a mouse a cookie" argument in here somewhere too.  Once you fix this, then something else will be the thing that causes confusion with lambdas and you'll need to fix THAT too because now things are even more inconsistant than they were when you started.

    That said, I cannot think of a scenario where you would rely on the "old" behavior so my backwards compatibly demagoguery may not apply.

  • Some food for thought:

    Imports System.Collections.Generic

    Module Program

       Sub Main

           Dim fs As New List(Of Func(Of Integer))

           For Each x In Enumerable.Range(0, 10)

               fs.Add(Function() x)

           Next

           For Each f In fs

               Console.WriteLine(f())

           Next

       End Sub

    End Module

    C:\Temp\test.vb(8) : warning BC42324: Using the iteration variable in a lambda expression may have unexpected results.  Instead, create a local variable within the loop and assign it the value of the iteration variable.

               fs.Add(Function() x)

Page 5 of 9 (134 items) «34567»