Iterator Blocks Part Seven: Why no anonymous iterators?

Iterator Blocks Part Seven: Why no anonymous iterators?

Rate This
  • Comments 28

This annotation to a comment in part five I think deserves to be promoted to a post of its own.

Why do we disallow anonymous iterators? I would love to have anonymous iterator blocks.  I want to say something like:

IEnumerable<int> twoints = ()=>{ yield return x; yield return x*10; };
foreach(int i in twoints) ...

It would be totally awesome to be able to build yourself a little sequence generator in-place that closed over local variables. The reason why not is straightforward: the benefits don't outweigh the costs. The awesomeness of making sequence generators in-place is actually pretty small in the grand scheme of things and nominal methods do the job well enough in most scenarios. So the benefits are not that compelling.

The costs are large. Iterator rewriting is the most complicated transformation in the compiler, and anonymous method rewriting is the second most complicated. Anonymous methods can be inside other anonymous methods, and anonymous methods can be inside iterator blocks. Therefore, what we do is first we rewrite all anonymous methods so that they become methods of a closure class. This is the second-last thing the compiler does before emitting IL for a method. Once that step is done, the iterator rewriter can assume that there are no anonymous methods in the iterator block; they've all be rewritten already. Therefore the iterator rewriter can just concentrate on rewriting the iterator, without worrying that there might be an unrealized anonymous method in there.

Also, iterator blocks never "nest", unlike anonymous methods. The iterator rewriter can assume that all iterator blocks are "top level".

If anonymous methods are allowed to contain iterator blocks, then both those assumptions go out the window. You can have an iterator block that contains an anonymous method that contains an anonymous method that contains an iterator block that contains an anonymous method, and... yuck. Now we have to write a rewriting pass that can handle nested iterator blocks and nested anonymous methods at the same time, merging our two most complicated algorithms into one far more complicated algorithm. It would be really hard to design, implement, and test. We are smart enough to do so, I'm sure. We've got a smart team here. But we don't want to take on that large burden for a "nice to have but not necessary" feature.

  • @Jay - of course this is a subjective discussion, but I have to disagree. Too many times I have found even anonymous methods to be a MAJOR source of testability issues. With very few exceptions, I have chosen to go with well named, well factored methods.

    Inlining, does not [IMHO] prove readability or understandability. Well factored code where a given item performs one well defined function does improve understanding of the code, and allows a person to focus on just the specific area instead of being bound into a given context.

  • Hi Eric. You say that the two algorithms for rewriting anonymous methods and iterator blocks would interact too much if anonymous iterators were allowed. I would be very interested in seeing a more detailed explanation of this. Why can't the algorithm for rewriting anonymous methods simply rewrite anonymous methods irrespective of whether they contain 'yield' or not? In other words, does the anonymous methods rewriter really need to care whether any particular anonymous method is an iterator block? The way I see it, it doesn't; you already have the algorithm in place to handle nested anonymous methods, so an anonymous iterator block nested inside an anonymous method doesn't seem like a barrier.

    Once anonymous methods have been rewritten into actual (top-level) methods, the iterator-block rewriter can work its magic, and at this point does not need to care whether any particular iterator block used to be an anonymous method and whether it used to be nested.

    Thus, to me the two features seem completely orthogonal. So orthogonal, in fact, that the relevant rewriting algorithms should be orthogonal too.

    What is the detail I am missing that makes them interact in complex ways?

  • Another limitation of iterator blocks and anonymous methods that I'd love to see a post on - or better yet, just a bug fix in the next version, because it clearly is a bug, and by the standards of what we're talking about, a very simple one to fix.

    Why can't you use "base" in an anonymous delegate or iterator block without getting an 'unverifiable code' warning?

    That's fixed in C# 4. I regret not getting the fix into the final release of C# 3, but it was simply too dangerous given all the radical changes we'd made to the anonymous function binding code to make lambdas work. It should have made it into the service release, but there was some scheduling mixup that I don't recall the details of, so it didn't make it in their either. In C# 4, we do the right thing; generate a helper method for you and call it. -- Eric

    I know exactly why you can't from a *technical* perspective: because no outside class is allowed to make a non-virtual call to a base class virtual method, and the delegate or iterator is being implemented as a separate class by the compiler under the hood. But compared to the complexity of the magic that the compiler is *already* doing to generate that separate class, especially in the case of an iterator, the fix (generate a private helper method on the containing class to do the necessary base class operation) is trivial.

    I note that "relatively trivial" does not logically imply "trivial". -- Eric

    I raised this question shortly after C#2 was released and got the answer that yes, the compiler team knew that and intended to fix it but there was no time to do so prior to the C#2 release. Well, I'm now using C#3.5 which has seen a plethora of radical new features in the compiler since C#2 - but still no fix for this simple bug? Is it fixed in C#4?

    Your idea of what's a simple bug and my idea of what's a simple bug are perhaps rather different. A simple bug, for example, does not require me to consider whether fixing the bug will subtly change the order in which metadata is emitted, and thereby hit unperformant code paths in the unmanaged metadata emitter provided by the CLR, code paths which we work hard to avoid. A simple bug doesn't require cross-team communication with the jitter and verifier teams to determine whether the new codegen is likely to result in any additional verification problems or hit any unexpectedly bad performing jit scenarios. Simple bugs do not require the construction of new visitor passes that have to happen in the precisely correct order related to other passes that do rewriting on "base" calls and anonymous function and iterator rewritings. Simple bugs take a few minutes of code review, not several hours pouring over the semantic analyzer with multiple senior team members. In my opinion, this was not a simple bug, which is why it went unfixed for an entire version. You are probably smarter than me and find these sorts of things simple, but I do not. -- Eric

    Also, will we ever get a 'yield foreach'?

    It's on the list, but it's not real high on the list. I wouldn't hold my breath waiting if I were you. Particularly since I have heard rumours that Erik's solution for C-Omega has been shown to have certain shortcomings; apparently his transformations do not correctly handle some cases in which the nested iterator throws exceptions. Since the existence of incorrect codegen scenarios implies that fixing the algorithm is an open research problem, and since the compiler team is not really in the business of solving open research problems -- that's MSR's department -- that's more points against the feature. I'd love to have it, but I don't think it is likely any time soon. -- Eric

  • "Your idea of what's a simple bug and my idea of what's a simple bug are perhaps rather different."

    Touché. I think what I meant was that it is 'simply' a bug, as opposed to a language feature with benefits and downsides and tradeoffs like most of the other things that have been argued about here. In retrospect I'm not at all surprised that the implementation was bloody hard and I'm sorry for implying that it wouldn't be.

    "Particularly since I have heard rumours that Erik's solution for C-Omega has been shown to have certain shortcomings; apparently his transformations do not correctly handle some cases in which the nested iterator throws exceptions."

    I'd think even a really naïve approach would be worth implementing from a code readability perspective. If, in the first pass of implementation, you simply translated "yield foreach foo;" into "foreach (var x in foo) yield return x;" it'd make code nicer to read without precluding doing smarter, optimized codegen when the research problem is solved, wouldn't it?

  • (things I forgot in the prior post)

    "You are probably smarter than me and find these sorts of things simple, but I do not."

    More like, I have a tendency to frequently forget that the C# language team is, in fact, made up of mortals. Naturally, from the results of your work, I tend to assume you are all extreme programming gods ;)

    (quoting myself) "make code nicer to read without precluding doing smarter, optimized codegen when the research problem is solved, wouldn't it?" - Not to mention that, presuming the research problem will eventually be solved, having a naive implementation now would mean that when that solution is found, C# can take advantage of it to produce drastically better performing binaries from *existing* code. If you wait for the solution to the research problem before adding the language feature that could take advantage of it, then everyone will need to scan their code for "foreach" loops that could benefit. Or the compiler's optimizer will need to be a lot cleverer to identify yields in foreach blocks that could also be transformed in the same way.

  • @Steve Bjorg:

    Eric requested in a comment a year ago (http://blogs.msdn.com/oldnewthing/archive/2008/08/15/8868267.aspx) that programmers not make "clever use" of iterators.

  • This feature isn't a big loss. 99% of the cases where I'd wanted to use it, new[] { a } is more appropriate... especially considering that any iterator will be a new object *anyway*.

    On the C# side, the language has essentially everything one could desire at this point (except A<T> : T, which opens up a whole world of great possibilities for generics, but...).

    *.Net* has bigger problems -- for example, if a struct or array of structs has no object pointers, *I should be able to modify it at a pointer level without going to unsafe code*. This is also the appropriate way to allow access to SIMD features...

  • "I should be able to modify it at a pointer level without going to unsafe code"

    If you modify/access it at pointer (as in c style) then you are being unsafe as you have no bounds checking and can read/write anywhere. How is that not unsafe in managed world.

    Or did you mean pointers with bounds checking (which sound rather tricky to integrate as well as raw pointers and keep the syntax sane)

    I would like SIMD support, perhaps Mono's testing of the waters would be an option (I haven't personally tried it) but actaully what I really want is intrinsics. If I'm running on a machine with crc32 I really want to be able to get to it without having to do a managed/unmanaged transition. Having the JIT transparently give me a software based implementation when it's not present is nice, but I'd even accept it throwing NotSupportedException.

  • Hi Eric,

    You have gone to the trouble of supporting anonymous async/await methods.

    Will you take the trouble to bring iterators into line with this feature? ;)

    Keep up the good work!

    James Miles

    http://enumeratethis.com

  • Thank you for writing this article.

    It is easy to forget the huge complexity underneath. From the face of it, it seemed trivial to just have the compiler wrap it into a private helper method just because I wasn't using any other lambda features (didn't access any variable from external scope). Your two initial points about iterator blocks and anonymous methods being the two most complicated language features, compiler wise, quickly set me straight. From proudly professing my newfound functional highground to being humble.

    Let's just take it as a positive signal that more and more laypeople are endeavouring into functional paradigms. And as a reminder that even C# will at some time be replaced.

    I hope to see you at the NDC 2011.

  • It's disappointing that VB 5 supports anonymous iterators and C# 5 doesn't, while C# has supported named iterators since C# 2.  I understand that it's not a generally useful feature that outweighs all of its associated development costs, but why promise language parity and then decide that this feature outweighs its costs for VB only?  I'd really like to use anonymous iterators in my reactive parsers library, and I'll bet I'd find lots more unrelated uses if it were available in C#.

  • Just thought of another great use for anonymous iterators: Rx Experimental contains an overload of Observable.Create that accepts an iterator block.  It provides a way to write Async-like code that generates an observable sequence (as opposed to a  scalar-valued Task.)  The fact that I have to use a named method for the iterator argument takes away from the "flow" of a reactive LINQ query.

  • Thanks for the post, that's a nice explanation. It would definitely be "nice to have" and I hope it's on the list *somewhere* (probably near the bottom), but there are many other features I'd consider more important :)

Page 2 of 2 (28 items) 12