“foreach” vs “ForEach”

“foreach” vs “ForEach”

Rate This
  • Comments 45

A number of people have asked me why there is no Microsoft-provided “ForEach” sequence operator extension method. The List<T> class has such a method already of course, but there’s no reason why such a method could not be created as an extension method for all sequences. It’s practically a one-liner:

public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
  // argument null checking omitted
  foreach(T item in sequence) action(item);
}

My usual response to “why is feature X not implemented?” is that of course all features are unimplemented until someone designs, implements, tests, documents and ships the feature, and no one has yet spent the money to do so. And yes, though I have famously pointed out that even small features can have large costs, this one really is dead easy, obviously correct, easy to test, and easy to document. Cost is always a factor of course, but the costs for this one really are quite small.

Of course, that cuts the other way too. If it is so cheap and easy, then you can do it yourself if you need it. And really what matters is not the low cost, but rather the net benefit. As we’ll see, I think the benefits are also very small, and therefore the net benefit might in fact be negative. But we can go a bit deeper here. I am philosophically opposed to providing such a method, for two reasons.

The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects. The purpose of an expression is to compute a value, not to cause a side effect. The purpose of a statement is to cause a side effect. The call site of this thing would look an awful lot like an expression (though, admittedly, since the method is void-returning, the expression could only be used in a “statement expression” context.) It does not sit well with me to make the one and only sequence operator that is only useful for its side effects.

The second reason is that doing so adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:

foreach(Foo foo in foos){ statement involving foo; }

into this code:

foos.ForEach((Foo foo)=>{ statement involving foo; });

which uses almost exactly the same characters in slightly different order. And yet the second version is harder to understand, harder to debug, and introduces closure semantics, thereby potentially changing object lifetimes in subtle ways.

When we provide two subtly different ways to do exactly the same thing, we produce confusion in the industry, we make it harder for people to read each other’s code, and so on. Sometimes the benefit added by having two different textual representations for one operation (like query expressions versus their underlying method call form, or + versus String.Concat) is so huge that it’s worth the potential confusion. But the compelling benefit of query expressions is their readability; this new form of “foreach” is certainly no more readable than the “normal” form and is arguably worse.

If you don’t agree with these philosophical objections and find practical value in this pattern, by all means, go ahead and write this trivial one-liner yourself.

UPDATE: The method in question has been removed from the "Metro" version of the base class library.

  • Your argument is "you cannot prevent some bad behaviours, therefore you should encourage them" ? -- Eric

    First , I think that describing the use of ForEach as "bad behavior" to perform actions on a sequence is not entirely legit and can not be put in the same box as Select() And Where(). That is because the intent of the latter constructs is different and expressed so by their nomenclature. You yourself contended that ForEach by its very nature would be used to have side effects. I am merely stating that since the semantics and the intention of ForEach w.r.t being side-effect-full is explicit, the convenience and nicety (both subjective) of writing:

    seq.ForEach( (s) => s.Prop = val ) ;

    trumps writing:

    foreach(var s in seq )

    {

    s.Prop = val;

    }

  • On the other hand, the goal of object orient programming is to keep data and behavior together.  Which class has the data?  Which data?  The collection elements, so the collection object has the data.  Then the collection object should be responsible for iterating over the collection of elements.  Why should the consumer of the collection know how to iterate over each element?  

    Instead, providing iteration method on the collections leads to a cleaner distribution of responsibilities:  The consumer knows what they want to do with each element, so they provide the action; the list knows about the elements so it does the work.

    Having this kind of mindset elements unnecessary language constructs like "for" and "foreach" and helps to reinforce good encapsulation.

  • @Daniel - You cannot easily query the exception object to provide a meaningful message to the end user.  Exception and error handling should a) provide meaningful message to the one reading the message be they an end user or internal support person.  An exception giving the method, line number and generic error message ('null reference exception') does not convey that file d:\data\ABC.TXT is not acessible by the application due to permission issues.

    A generic exception.tostring() is not useful for supporting a production quality commercial product.  I've seen this in multiple different offshore and onshore written applications (asp.net, c#, vb.net).

    @Puneet - Your state that the foreach is self documentating ("the semantics and the intention of ForEach w.r.t being side-effect-full is explicit").  Self documenting code has been an ongoing and unresolved argument within computer software development since the early 1970s.  Self documenting code advocates often ignore the aspect that code with significant business logic documented in code comments is much easier and much lest costly to develop and maintain.

    General: Linq like constructs work well when you have a high enough level development tool to develop the system using process workflows.  Tools that let business analysts design the workflow,  develop business rules for it and generate code or state tables for developers work best.   They promote maintainablity at the business level first and at the lower level code secondly.

  • @Greg

    I don't think my comments are geared towards making the case for ForEach() because it is "self-documenting". I usually hear that term in the context of developers NOT wanting to write documentation for their methods:) and so it has a negative connotation for me.

    My argument is based purely on the expressed intent of ForEach, communicated via its naming AND its documented behavior. This behavior is exactly what one would expect if they did sequence.ForEach(..) and so there are no "unintended side effects" vis-a-vis doing foreach(...) { }.... just ovelooked ones.

  • I don't mind there's no foreach extension but in a lot of the situations where I end up using foreach is for applying a function on all the elements to create a result, (just like sum or average in effect does). An extension method for applying any function to all elements would be nice.

  • @Greg:

    @Aaron:

    Thanks for the interesting discussion, guys.

    "Steve, code to print all code lines neglects any error handling ... Try to effectivly determine what error happened and where it happened with your code example is difficult"

    Error handling can happily exist wherever you like. Here's the code again, for reference;

       01 string[] classNames = rootDir

       02    .GetDirectories()         // all directories, recursively

       03    .GetFiles("*.cs")         // all c# files

       04    .Select(GetFileContent)   // the file content as a string

       05    .SelectMany(content => content.Split('\n')) // all the lines

       06    .Where(IsClassDefinition) // class definition lines

       07    .Select(GetClassName)     // class names

       08    .ToArray();

    Let's take the GetFileContent() method called on line 04. I can quite happliy wrap that code in any error handling I like;

       string GetFileContent(string path)

       {

           try

           {

               ....

           }

           catch

           {

               // your error handling here.

           }

       }

    So that we don't lose any expressive power in terms of adding error handling code.

    Secondly, if there is an unhandled exception in that function, your stack trace reflects it, with something like;

       FileNotFound at

         GetFileContent in

         PrintClassNames ...

    Debugging is a little harder, but only really because of a generic debugging problem in visual studio. There's no way I can debug the return value of a function. So if I write;

       string GetFileContent(string path)

       {

           return System.IO.File.ReadAllText(path);

       }

    then there's no way to debug that result. That's annoying, and it's more pronounced when you write a lot of small functions like this, but it's more a problem with visual studio's debugging facilities. You can still put a breakpoint in the code happily; so if you want to debug the call to IsClassDefinition() on line 06, you can put a breakpoint right there and debug away.

    "I consider your new example a false dilemma, because the "sequential" version should be a 4-line method broken out into subroutines/methods."

    My version is *already* broken out into subroutines; the methods GetDirectories(), GetFiles(), GetFileContent(), IsClassDefinition(), GetClassName() are all standard C# functions. But the actual gluing together of these functions appears as a linear sequence of steps which filter, convert, and correlate data together. All I've done is glue old-skool functions together using standard forms of glue.

    I suppose this is the important thing, from my POV; we have real custom work we need doing (getting file content, parsing files) and then we have glue. foreach(in) is a type of glue. So is Select(). What I've found is that I often end up making glue and reusing it in different parts of my app. Almost all this glue code deals with collections of things; I make custom filters, iterators, and conversions.

    So the question arises, really -- let's say you discover some great new kind of glue. For instance, you realise that this code is exactly the same thing as a SQL CROSS JOIN;

       foreach(var a in listOfA)

       {

           foreach(var b in listOfB)

           {

               if (a.x == b.y)

               {

                   // do something involving A,B

               }

           }

       }

    Let's say this pattern appears fifteen times in different parts of your app. How do you avoid repeating yourself? How do you avoid the copying and pasting of this kind of structural or glue code?

    Myself, I'd write a CrossJoin() method with this signature;

       public static IEnumerable<Pair<T1, T2>> CrossJoin<T1, T2>(this IEnumerable<T1> seq1, IEnumerable<T2> seq2)

    And then call it like this;

       a.CrossJoin(b).Where( (a,b) => a.x == b.y );

    To my mind, I've introduced a useful abstraction (cross join) and been able to name it, call it throughout the application, and remove duplication.

    So my question to you guys, and anyone else; let's say you'd discovered this pattern that repeats over and over. How do you pull it out and reuse it without using a linq-like syntax?

  • Многие люди спрашивают меня – почему Microsoft не предоставила для последовательностей метод расширения

  • Alg&uacute;n tiempo despues de que diera mi opini&oacute;n sobre un potencial m&eacute;todo extensor

  • If you don't read Eric Lippert's Fabulous Adventures In Coding , you're really missing out on some great

  • If you don&#39;t read Eric Lippert&#39;s Fabulous Adventures In Coding , you&#39;re really missing out

  • Well, the Reactive Extensions team at Microsoft Research chose differently.

    They included 2 imperative operators in their extensions: Run (same as your ForEach), and Do.

    Check it out here: http://community.bartdesmet.net/blogs/bart/archive/2009/12/26/more-linq-with-system-interactive-the-ultimate-imperative.aspx

     Omer.

  • Great blog - I generally agree with the post, ForEach is very imperative. One of the only redeeming qualities of it though, is that it means that when I'm writing mostly functional code, the order is *consistent*: every time I have to use foreach it feels "out of place" compared to the order I've been writing all my other statements in like Select and Zip (as the above poster mentions, I use Rx's "Run" instead). Of course, you could argue that it *should* feel weird, since foreach is bad :)

    The other advantage is that you can use Method Groups, so for example:

    new[] {1,2,3,4,5}.Select(x => x*5).Run(Console.WriteLine)

  • The Reactive Extensions library provides the Run extension method, which I think is a better name.

  • This probably isnt directly related to answer but its pretty funny of what I have just found out.

    ForEach, delete (works)

    List<int> list = new List<int>(){ 1, 2, 3, 4, 5, 6};

    list.ForEach(x => {

       Console.WriteLine(x);

       list.Remove(x);

    });

    foreach, delete (crashes)

    // throws exception

    foreach (var x in list)

    {

       Console.WriteLine(x);

       list.Remove(x);

    }

    ForEach, insert (...)

    // goes in infinite loop...

    list.ForEach(x => {

       list.Add(1);

    });

    foreach, insert (crashes)

    // throws exception

    foreach (var x in list)

    {

       Console.WriteLine(x);

       list.Add(x);

    }

    So anyone who is talking here about mutability or different confusing layers etc, I think it is completely half implemented feature by Visual Team because enumeration will always cause problems if the collection is modified.

    Despite of arguments, I still see a no reason that ForEach should allow modifications, it is purely used for enumeration and it makes no difference whether its foreach(var item in x) syntax or x.ForEach(x=>{}) syntax.

    I disagree with Eric, I just see it simply that BCL team has implemented this feature on IEnumerable as well as list.ForEach is faulty.

    "Why" is completely subjective,for example, Silverlight has added complex hashing algorithms and left MD5 behind where else everywhere else we use MD5 so widely. Its more of how much anything is in demand and who is the one who chooses whether to include it or not in framework.

    There is no logical or philosophical reason at all for not having ForEach in IEnumerable. There are many such missing points which I think .NET will improve over time.

  • Also the List<T>.ForEach method is *uselessly* slower than the foreach(...) equivalent (since under the hood it calls a delegate as pure overhead).

    The key is the "uselessly" word. If there was *any* gain (like in readability, conveying semantics, whatever), worrying about it would be premature optimization.

    But we have code which is already harder to read, semantically misleading, causing side-effects... and it also is slower!

    What worries me is that people are calling this a functional approach and praising it for being easier parallelize (when it's not and should not be parallelized) and for being easier to optimize (when in reality it's just slower).

Page 3 of 3 (45 items) 123