Closing over the loop variable, part two

Closing over the loop variable, part two

Rate This
  • Comments 51

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The "for" loop will not be changed. We return you now to our original article.


Thanks to everyone who left thoughtful and insightful comments on last week's post.

More countries really ought to implement Instant Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form "I'd prefer to make the change, but if you can't do that then make it a warning". Or "don't make the change, do make it a warning", and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

Commenters who expressed support for a warning: 26
Commenters who expressed the sentiment "it's better to not make the change": 24
Commenters who expressed the sentiment "it's better to make the change": 25

Wow. I guess we'll flip a coin. :-)    (*)

Four people suggested to actually make it an error to do this. That's a pretty big breaking change, particularly since we would be breaking not just "already broken" code, but plenty of code that works perfectly well today -- see below. That's not likely to happen.

People also left a number of interesting suggestions. I thought I'd discuss some of those a little bit.

First off, I want to emphasize that what we're attempting to address here is the problem that the language encourages people to write code that has different semantics than they think it has. The problem is NOT that the language has no way to express the desired semantics; clearly it does. Just introduce a new variable explicitly into the loop.

A number of suggestions were for ways that the language could more elegantly express that notion. Some of the suggestions:

foreach(var x in c) inner
foreachnew(var x in c)
foreach(new var x in c)
foreach(var x from c)
foreach(var x inside c)

Though we could do any of those, none of them by themselves solve the problem at hand. Today, you have to know to use a particular pattern with foreach to get the semantics you want: declare a variable inside the loop. With one of these changes, you still have to know to use a particular keyword to get the semantics you want, and it is still easy to accidentally do the wrong thing.

Furthermore, a change so small and so targetted at such a narrow scenario probably does not provide enough benefit to justify the large cost of creating a new syntax, particularly one which is still easily confused with an existing syntax.

C++ luminary Herb Sutter happened to be in town and was kind enough to stop by my office to describe to me how they are solving a related problem in C++. Apparently the next version of the C++ standard will include lambdas, and they're doing this:

[q, &r] (int x) -> int { return M(x, q, r); }

This means that the lambda captures outer variable q by value, captures r by reference, takes an int and returns an int. Whether the lambda captures values or references is controllable! An interesting approach but one that doesn't immediately solve our problem here; we cannot make lambdas capture by value by default without a huge breaking change. Capturing by value would have to require new syntax, and then we're in the same boat again: the user has to know to use the new syntax when in a foreach loop.

A number of people also asked what the down sides of adding a warning are. The down side is that a warning which warns about correct behaviour is a very bad warning; it makes people change working code, and frequently they break working code in order to eliminate a warning that shouldn't have been present in the first place. Consider:

foreach(var insect in insects)
{
  var query = frogs.Where(frog=>frog.Eats(insect));
  Console.WriteLine("{0} is eaten by {1} frogs.", insect, query.Count());
}

This makes a lambda closed over insect; the lambda never escapes the loop, so there's no problem here. But the compiler doesn't know that. The compiler sees that the lambda is being passed to a method called Where, and Where is allowed to do anything with that delegate, including storing it away to be called later. Which is exactly what Where does! Where stores away the lambda into a monad that represents the execution of the query. The fact that the query object doesn't survive the loop is what keeps this safe. But how is the compiler supposed to suss out that tortuous chain of reasoning? We'd have to give a warning for this case, even though it is perfectly safe.

It gets worse. A lot of people are required by their organizations to compile with "warnings are errors" turned on. Therefore, any time we introduce a new warning for a pattern that is often actually safe and frequently used, we are effectively causing an enormous breaking change. A vaccine which kills more healthy people than the disease would have is probably not a good bet. (**)

This is not to say that a warning is a bad idea, but that it is not the obvious slam dunk good idea that it initially appears to be.

A number of people suggested that the problem was in the training of the developers, not in the design of the language. I disagree. Obviously modern languages are complex tools that require training to use, but we are working hard to make a language where people's natural intuitions about how things work lead them to write correct code. I have myself made this error a number of times, usually in the form of writing code like the code above, and then refactoring it in such a manner that suddenly some part of it escapes the loop and the bug is introduced. It is very easy to make this mistake, even for experienced developers who thoroughly understand closure semantics. That's a flaw in the design of the language.

And finally, a number of people made suggestions of the form "make it a warning in C# 4, and an error in C# 5", or some such thing. FYI, C# 4 is DONE. We are only making a few last-minute "user is electrocuted"-grade bug fixes, mostly based on your excellent feedback from the betas. (If you have bug reports from the beta, please keep sending them, but odds are good they won't get fixed for the initial release.) We are certainly not capable of introducing any sort of major design change or new feature at this point. And we try to not introduce semantic changes or new features in service packs. We're going to have to live with this problem for at least another cycle, unfortunately.

********

(*) Mr. Smiley Face indicates that Eric is indulging in humourous japery.

(**) I wish to emphasize that I am 100% in favour of vaccinations for deadly infectious diseases, even vaccines that are potentially dangerous. The number of people made ill or killed by the smallpox vaccine was tiny compared to the number of people who did not contract this deadly, contagious (and now effectively extinct) disease as a result of mass vaccination. I am a strong supporter of vaccine research. I'm just making an analogy here people.

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)

 

  • "The fact that the query object doesn't survive the loop is what keeps this safe. But how is the compiler supposed to suss out that tortuous chain of reasoning?"

    Damn... I've been thinking about suggesting that very enhancement to the Resharper team!

    Using a mixture of Linq and foreach means that I constantly get that warning in situations where it is pure noise. Sometimes I needlessly copy the value to another variable. Sometimes I put in the special comments to make Resharper shut the hell up. But sometimes Resharper is usefully telling me that I messed up, so I can't disable it entirely.

    And sometimes you can rewrite the loop as a query. For example, in my trivial frogs example, that could have been something like "var q = from insect in insects let count = (from frog in frogs where frog.Eats(insect)).Count() select new {insect, count}", and then you iterate over q to get the results. -- Eric

    Is it theoretically possible for the compiler (or a tool like Resharper) to figure this out? Obviously they would have to dig into the implementation of Where to find out if it stores the predicate-delegate for later use - but they've got the IL to examine, so I (naively) imagine it would be possible.

    It depends on what exactly you mean by "figure this out". If all you want to know is if the closure survives the loop, then clearly it is possible for a computer program to figure that out because the garbage collector figures it out at runtime, and the GC is a computer program. Imagine you did an aggressive garbage collection at the end of each loop; if the closure gets collected then it must have been dead. If not, then it must be alive. The GC just does graph analysis to see if a particular node in the graph is reachable or not; the same analysis could be done at compile time. Now, whether it is feasible for a compile-time analysis tool to do that level of flow analysis is another question; though this is in theory possible, it is a lot of work.

    However, just because the closure survives garbage collection does not guarantee that it is ever USED in a dangerous manner. Maybe it survives by some accident, and is never called again. If figuring out that is what you mean, then no, that's not solvable in general by computers; that's equivalent to The Halting Problem. -- Eric

    The other thing I've done is the usual thing of writing an extension method called ForEach that takes an Action to run for each item. That eliminates this problem. But it also doesn't play well with yield return, and also you (Eric) advised against it because it mixes side-effect code with functional style in a misleading way. But it is the closest thing I think there is to an "easy" solution to this problem.

  • As a junior-ish ASP.NET developer working individually within a small company, I'm woefully behind on things that frankly do not come up so often in my world; either because they're not necessary, or because I find another way to do whatever I need to do since I don't know any better.

    I found this blog a few weeks ago, and find it equal parts fascinating, educational (I had no idea that in foreach(var x in y), x wasn't a different variable for each iteration), and frustrating. The last part is because honestly quite a few of the topics and coding samples casually floated during the discussion often fly two or three feet over my head. I think this general topic is one of those examples. I would appreciate a recommendation or two for resources to help me wrap my brain around lambdas, Func(TResult) delegates (and probably delegates in general, as I've not often resorted to using them), and the scoping considerations involved.

               var ints = new List<int>() { 1, 2, 3 };
               var funcs = new List<Func<int>>();
               foreach (int i in ints)
                   funcs.Add(() => i);
               foreach (var func in funcs)
                   Console.WriteLine(func());

    func() always evaluating to 3 is bringing me to my knees.

    Welcome aboard.

    Two books I recommend are "C# In Depth" and "Essential C#", by Jon Skeet and Mark Michaelis, respectively. I was technical editor of both; they're both great. They are well-titled; Jon's book goes a lot deeper than most mainstream books about programming languages, and Mark's book does a good job of covering the essential material you need to know to understand the language. -- Eric

  • not really related to this post, but something I'd like to see would be a "what compiler warnings do you get for this *and why?*" WRT:

       class Program
       {
          static void Main(string[] args)
           {
               var x = 5; // not used
               var y = (int)5; // not used
               var z = (int?)5;  // no warning ?!
           }
       }

     

  • SuggestionBoxBob: I'd expect all you'd get would be three "variable is assigned but never used"s. Why, what other warnings might you get? Everything else about it is completely fine, with the possible exception of the redundant cast of y to the type that it already is - but I'm fairly sure that an unnecessary cast isn't a warning in C#.

    Your expectations would be wrong. You only get two warnings, on x and y. (I'll add some comments to the code.) The reason why we suppress the warning on z is kinda interesting; I blogged about this a couple years ago.

    http://blogs.msdn.com/ericlippert/archive/2007/04/23/write-only-variables-considered-harmful-or-beneficial.aspx 

    We suppress a "not used" warning on a local if a non-constant value is written to it. The compiler interprets (int?)5 as "new Nullable<int>(5)", which it sees as a non-constant computation which you might want to examine in the debugger. 

     -- Eric

  • > Though we could do any of those, none of them by themselves solve the problem at hand. Today, you have to know to use a particular pattern with foreach to get the semantics you want: declare a variable inside the loop. With one of these changes, you still have to know to use a particular keyword to get the semantics you want, and it is still easy to accidentally do the wrong thing.

    Eric, what about two suggestions combined (I believe I mentioned it in one of the comments)?

    1) Add a new foreach syntax for new semantics - doesn't really matter which one.

    2) Give a warning (or maybe even error) for old semantics.

    That way, one cannot accidentally do the wrong thing, because the compiler will complain. And yet it is also not a _silent_ breaking change - at least if someone's code is actually intentionally relying on existing behavior (which I find extremely unlikely), they'll get a clear error message telling them what's wrong, and, ideally, how to make it right.

  • > Is it theoretically possible for the compiler (or a tool like Resharper) to figure this out? Obviously they would have to dig into the implementation of Where to find out if it stores the predicate-delegate for later use - but they've got the IL to examine, so I (naively) imagine it would be possible.

    Thing is, Where() actually does store the predicate for later use, inside the (deferred) enumerable object that it returns - remember that recurring statement in MSDN member documentation? "This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic."

    It can, of course, further figure out that object returned by Where() does not escape the loop body, and therefore neither does the lambda... but hopefully it's clear now why this is actually that much more tricky.

  • Great follow-up.

    I also like your stance on the issue that it's not just a training issue. There's always so much to do when properly writing code. It's often difficult to always be able to look at other people's code and pick up a nuanced edge case such as this.

    Since the race for 'what to do' was so close, I'd vote for the breaking change.

    I must admit I always perceive the current iteration value as being a new variable, and yes for the record to others, I do understand delegates and variable capture.

    Can you (or someone) elucidate me as to where you'd want the results to be 3,3,3? I'm no idiot and have been programming C# for a long time, but I also would have failed the interview test here. OK that's not totally correct - I can figure out the how, but it's the why that eludes me.

    Thanks

    AC

  • By the way, how would you handle a case with 3 sides tied, by flipping a coin, in a fair way (such that every side gets equal chance)? It looks to me like this isn't an option, either.

    (and yes, I've seen the *)

    We'll flip two coins. Your side can have two heads, the other side can have two tails, and I'll take one head, one tail. -- Eric

  • OK, to allow fixing the issue in C#5, I change my vote and support Pavel:

    1) Add new syntax "foreach (new var x in c)" for new semantics (inside expansion).  The keyword new gives the best hint about the difference between the two fkavours.

    2) Give a warning for old syntax and semantics (outside expansion). This avoids the breaking change. Wrong code can be found. Correct code will have to use "#pragma warning" to get rid of the warning.

  • OT: Geeks should really stop promoting Instant Runoff Voting when there are other voting systems with the same look & feel that are much more mathematically sound. The Schultze method, for instance, which is used by such groups as Wikimedia, Gentoo, Debian, Ubuntu, FSF in Europe and Latin America, BoardGameGeek, and the Pirate Party. IRV is easy to implement and its flaws aren't obvious, but they are serious.

  • It sounds like the ship has sailed on this one. Hopefully the lesson is learned and the next language in the Java/C++/C# tradition will not have this problem.

    Personally, I am far more annoyed by the fact that I can write

    var list = new List<int>();

    ...

    foreach(int i in list) { ... }

    then change list to be, say, a List<double>() and not get a compiler error, but have it crash at runtime. I've had this bite me on numerous occasions.

  • "...a warning which warns about correct behaviour is..."

    An FxCop rule. :)

    Seriously, this is the sort of case FxCop handles really well.

    So one option would be to catch it with FxCop instead of the compiler.

    Advantage is that you can do it today.

    Disadvantage is that most people who would be helped by this probably don't run FxCop.

  • @Craig:

    FXCop can't catch that, because it analyzes compiled code, not source code. It is possible to detect such a pattern in compiled code, but given what lambdas extract to in IL, I would be surprised if FXCop is smart enough in its present state to handle that kind of thing. StyleCop could catch it, though.

  • > We'll flip two coins. Your side can have two heads, the other side can have two tails, and I'll take one head, one tail.

    The probabilities would then be 1/4, 1/4 and 1/2 - hardly fair.

    OK, then we'll Rock-Paper-Scissors for it. First you and the other side have a round, and then I'll play the winner. -- Eric

  • I know Eric is joking, but the real solution is pretty easy.

    Flip 2 coins:

    Head Head side 1 wins

    Head Tail side 2 wins

    Tail Tail side 3 wins

    Tail Head goto Flip 2 coins;

Page 1 of 4 (51 items) 1234