Closing over the loop variable, part two

Closing over the loop variable, part two

Rate This
  • Comments 51

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)


UPDATE: We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The "for" loop will not be changed. We return you now to our original article.


Thanks to everyone who left thoughtful and insightful comments on last week's post.

More countries really ought to implement Instant Runoff Voting; it would certainly appeal to the geek crowd. Many people left complex opinions of the form "I'd prefer to make the change, but if you can't do that then make it a warning". Or "don't make the change, do make it a warning", and so on. But what I can deduce from reading the comments is that there is a general lack of consensus on what the right thing to do here is. In fact, I just did a quick tally:

Commenters who expressed support for a warning: 26
Commenters who expressed the sentiment "it's better to not make the change": 24
Commenters who expressed the sentiment "it's better to make the change": 25

Wow. I guess we'll flip a coin. :-)    (*)

Four people suggested to actually make it an error to do this. That's a pretty big breaking change, particularly since we would be breaking not just "already broken" code, but plenty of code that works perfectly well today -- see below. That's not likely to happen.

People also left a number of interesting suggestions. I thought I'd discuss some of those a little bit.

First off, I want to emphasize that what we're attempting to address here is the problem that the language encourages people to write code that has different semantics than they think it has. The problem is NOT that the language has no way to express the desired semantics; clearly it does. Just introduce a new variable explicitly into the loop.

A number of suggestions were for ways that the language could more elegantly express that notion. Some of the suggestions:

foreach(var x in c) inner
foreachnew(var x in c)
foreach(new var x in c)
foreach(var x from c)
foreach(var x inside c)

Though we could do any of those, none of them by themselves solve the problem at hand. Today, you have to know to use a particular pattern with foreach to get the semantics you want: declare a variable inside the loop. With one of these changes, you still have to know to use a particular keyword to get the semantics you want, and it is still easy to accidentally do the wrong thing.

Furthermore, a change so small and so targetted at such a narrow scenario probably does not provide enough benefit to justify the large cost of creating a new syntax, particularly one which is still easily confused with an existing syntax.

C++ luminary Herb Sutter happened to be in town and was kind enough to stop by my office to describe to me how they are solving a related problem in C++. Apparently the next version of the C++ standard will include lambdas, and they're doing this:

[q, &r] (int x) -> int { return M(x, q, r); }

This means that the lambda captures outer variable q by value, captures r by reference, takes an int and returns an int. Whether the lambda captures values or references is controllable! An interesting approach but one that doesn't immediately solve our problem here; we cannot make lambdas capture by value by default without a huge breaking change. Capturing by value would have to require new syntax, and then we're in the same boat again: the user has to know to use the new syntax when in a foreach loop.

A number of people also asked what the down sides of adding a warning are. The down side is that a warning which warns about correct behaviour is a very bad warning; it makes people change working code, and frequently they break working code in order to eliminate a warning that shouldn't have been present in the first place. Consider:

foreach(var insect in insects)
{
  var query = frogs.Where(frog=>frog.Eats(insect));
  Console.WriteLine("{0} is eaten by {1} frogs.", insect, query.Count());
}

This makes a lambda closed over insect; the lambda never escapes the loop, so there's no problem here. But the compiler doesn't know that. The compiler sees that the lambda is being passed to a method called Where, and Where is allowed to do anything with that delegate, including storing it away to be called later. Which is exactly what Where does! Where stores away the lambda into a monad that represents the execution of the query. The fact that the query object doesn't survive the loop is what keeps this safe. But how is the compiler supposed to suss out that tortuous chain of reasoning? We'd have to give a warning for this case, even though it is perfectly safe.

It gets worse. A lot of people are required by their organizations to compile with "warnings are errors" turned on. Therefore, any time we introduce a new warning for a pattern that is often actually safe and frequently used, we are effectively causing an enormous breaking change. A vaccine which kills more healthy people than the disease would have is probably not a good bet. (**)

This is not to say that a warning is a bad idea, but that it is not the obvious slam dunk good idea that it initially appears to be.

A number of people suggested that the problem was in the training of the developers, not in the design of the language. I disagree. Obviously modern languages are complex tools that require training to use, but we are working hard to make a language where people's natural intuitions about how things work lead them to write correct code. I have myself made this error a number of times, usually in the form of writing code like the code above, and then refactoring it in such a manner that suddenly some part of it escapes the loop and the bug is introduced. It is very easy to make this mistake, even for experienced developers who thoroughly understand closure semantics. That's a flaw in the design of the language.

And finally, a number of people made suggestions of the form "make it a warning in C# 4, and an error in C# 5", or some such thing. FYI, C# 4 is DONE. We are only making a few last-minute "user is electrocuted"-grade bug fixes, mostly based on your excellent feedback from the betas. (If you have bug reports from the beta, please keep sending them, but odds are good they won't get fixed for the initial release.) We are certainly not capable of introducing any sort of major design change or new feature at this point. And we try to not introduce semantic changes or new features in service packs. We're going to have to live with this problem for at least another cycle, unfortunately.

********

(*) Mr. Smiley Face indicates that Eric is indulging in humourous japery.

(**) I wish to emphasize that I am 100% in favour of vaccinations for deadly infectious diseases, even vaccines that are potentially dangerous. The number of people made ill or killed by the smallpox vaccine was tiny compared to the number of people who did not contract this deadly, contagious (and now effectively extinct) disease as a result of mass vaccination. I am a strong supporter of vaccine research. I'm just making an analogy here people.

(This is part two of a two-part series on the loop-variable-closure problem. Part one is here.)

 

  • A new foreach syntax  for this very particular case would be overkill, as would be a warning.

    The interaction of foreach and closures probably should have been defined differently, but now it is too late to bother fixing it. After all, no language can evolve as much as C# did since 1.0 without having some awkward legacy issues. Just keep a note for the future designers of whatever language will come after C#.

    And it's not only the languages that have this kind of problems, e.g. still occasionally having to use pre-generics System collections brings me on the verge of tears.

  • (Started to think about it, wild guesses)

    If I were to change language for two enumeration styles (I'm not yet sure if it should be done), I'd go with the syntax:

    for(var i in list) // i is same variable for all loop iterations

    foreach(var i in list) // i is new variable for each iteration

    If foreach body is not "pure" enough (subject to define), warning could be issued. This warning could easily be fixed by converting it to "for" loop, if needed.  Seems like there are some opportunities for "foreach" parallelization, also.  

  • Andre: I have a simpler solution. A "three sided coin". Since that is geometrically improbable, I'll do you one better - a cube, with two sides for each possible outcome.

    Why do programmers always try to reinvent the wheel?

  • Thinking about purity of foreach block, that could be defined as "block is pure, if extracting it as separate method of the void ExtractedBlock(iterationVariable) form is pure". Then we get that problem of checking if call to Where, Select and any other method getting delegate is pure. It can be defined with some form of PureAttribute, when it means "I'm pure if this and that parameter's body is pure". Delegate body is again subject to the transformation above, like if it can be replaced with extracted pure method, than it is pure. If by chance it is using method group, than method it resolves to should be [Pure] already.

    Again, those are just some ideas. Discussion is welcome!

  • I love that in todays world you have to qualify that vaccination statement.

  • @Eric "we'll Rock-Paper-Scissors for it. First you and the other side have a round, and then I'll play the winner."

    Japery, indeed.  Guess it was a good vacation.

    I say we settle the choice by havting you think of a double between 0+ and 0-, and we'll try to approximate it.

  • "A number of people suggested that the problem was in the training of the developers, not in the design of the language."

    ...And a number of people say that that the music CDs are "dead", "electronic", "machine-like", whereas the vinyl records are "warm" and "live". And some time before, a number of people said the same thing about those very vinyl records, lamenting the muderous effects of technology on the spirit, or art, or whatever.

    Will we ever stop envying and resenting the next generation for having it easier than we had???

    God forgive me, but I think there is a simple psychological issue behind the "hard-line" statements like that: "if I had to have it the hard way, then so must you, little punk, and don't you dare to have more fun than I did when I was your age!" -- something like that.

    Of course the next generation of programming languages should make it easier to implement the ideas that pop up in your head -- fast, not after so much nose-to-the-grindstone time that you forget what you had in mind in the beginning.

    But then again: it's tempting for the people who have spent a lot of time mastering the "deep dark secrets" of a certain technology or language to protect the status of High Initiates, the Keepers of the Misteries of Programming, they may have thus obtained. I myself am a little bit guilty of that... :-)

  • Eric,

    Correct me if I'm wrong, but if I understand you correctly, you give some very good arguments against making it a warning, and you basically reject the make-it-an-error option. You also repeat some argumentss against leaving the language as it is.

    Does that mean that you're on the let's-make-the-change side?

    After reading this post, I certainly repeat my "make the change" vote.

  • I said in the last post that you should at least make it a warning. I changed my mind. On many occasions I use the results of a LINQ query with the iteration variable right away, just like in the example you provided, and in those cases the warning message makes no sense.

    There's only one option left: make the change you suggested.

  • Make using lambdas in foreach loops illegal.

    That would solve the problems for sure.

    No closures - no problem.

  • "...a warning which warns about correct behaviour is a very bad warning;"

    Static code analysis tools, like Lint for C++, are hugely popular for the very reason that they help point out *legal* code that may not be what you intended. I'm jolly grateful that my C++ compiler always points out when I do this:-

       if (x = y)

    Very rarely I may want to do just this, but I'll write the slightly longer form to placate the compiler:-

       if ((file = fopen(...) != NULL)

    C# may not have nearly as many dark corners as C++, but it has some. The Visual C++ team added a separate switch "/analyze" as less intrusive method which may be a preferable approach. Those who know what they're doing and dislike being nannied can do so. The rest of us who know that we are fallible have the opportunity to get all the help we can lay our hands on.

  • Isn't this known as lexical closures? Switching from this to closures that behave like they do in scheme or ruby seems more like adding a new language feature than fixing a bug.

    First off, scheme closures and ruby closures both use lexical closures. You seem to be implying that they use dynamic scoping.

    Second, I don't see what the difference between lexical and dynamic scoping has to do with it. The question of whether a closure ought to close over a variable or a value seems germane, but I don't see why whether the binding is lexical or dynamic is relevant. Can you clarify why you think this is relevant?

    Third, the proposal at hand is NOT to change how closures work; closures in C# are lexical closures that close over variables and that's not changing. The proposals at hand are (1) to move the declaration of the foreach loop variable to be logically inside the loop, or (2) to produce a warning that the closure closes over a single variable, not one variable per loop iteration.

    I'm therefore very confused by your comment. Can you elaborate on what you're getting at here, because I'm not following you. -- Eric

  • > Scheme closures use lexical scoping; ruby closures use dynamic scoping.

    Eric, can you elaborate on that?

    I'm not a Ruby expert, but from what I've seen all closures in Ruby are lexical. They do have three different kinds:

    - blocks, which aren't first-class (you can pass a block as an argument, but you cannot store it in a variable);

    - lambdas, which are first-class, and are really most like C# lambdas in pretty much everything;

    - procs, which are like lambdas, but also "capture" flow control (so a proc can do an early return from the function in which it was created, or break from a loop if it was created in the loop body).

    All three kinds of closures are lexical, however, in that when you reference a variable "a" inside a closure, it will always be the variable visible in the scope in which the closure was created - not the nearest declared in the scope from which it is called.

    Or did you mean something else by "dynamic scoping"?

    I was misremembering. I was probably thinking of some other language that has dynamic scoping for closures, like Perl (optionally) or PostScript (by default). However, the point remains that I don't understand the comment. I'll fix up the text. Thanks. -- Eric

  • I'm for #1, 'to move the declaration of the foreach loop variable to be logically inside the loop'.

    While it may be a breaking change, I find it unlikely anyone would **deliberately** depend on that behavior.

    Besides the whole idea of the foreach variable being readonly and yet changing on each iteration seems a bit like cheating to me.  :)

  • Still no answers to my question of "What is a real-world scenario where you'd want the current behavior?" It's all well and good to demonstrate the capture problem, but does someone have a real example?

    IMHO it's only a breaking change if there exists code out there that uses it in this way and meant to.

    I have a feeling there's a lot of code using the workaround to get the value they really want

    foreach( i in list ) {

      // copy i to v, foreach capture varialble issue see http://blogs.msdn.com/ericlippert/

      // will be fixed in c#5 we hope

      var v = i;

      funcs.Add( () => v );

    }

    In this case, fixing the #1 foreach fix (declaration inside loop) makes no semantic difference to the resulting code.

    Otherwise, the code is wrong and nobody tested it.

    I wouldn't put it as a warning either (for the warnings-as-errors crowd), but variable capture of the foreach variable could be an 'information' message in C# 4.

    Also, you can still explicitly allow capturing by just declaring a variable y which is outside the foreach, which would make it clear what the value of the captured variable will be. So if there is real-world code that needs this bizarre behavior, it can still work with only minor modification.

Page 2 of 4 (51 items) 1234