“foreach” vs “ForEach”

“foreach” vs “ForEach”

Rate This
  • Comments 45

A number of people have asked me why there is no Microsoft-provided “ForEach” sequence operator extension method. The List<T> class has such a method already of course, but there’s no reason why such a method could not be created as an extension method for all sequences. It’s practically a one-liner:

public static void ForEach<T>(this IEnumerable<T> sequence, Action<T> action)
{
  // argument null checking omitted
  foreach(T item in sequence) action(item);
}

My usual response to “why is feature X not implemented?” is that of course all features are unimplemented until someone designs, implements, tests, documents and ships the feature, and no one has yet spent the money to do so. And yes, though I have famously pointed out that even small features can have large costs, this one really is dead easy, obviously correct, easy to test, and easy to document. Cost is always a factor of course, but the costs for this one really are quite small.

Of course, that cuts the other way too. If it is so cheap and easy, then you can do it yourself if you need it. And really what matters is not the low cost, but rather the net benefit. As we’ll see, I think the benefits are also very small, and therefore the net benefit might in fact be negative. But we can go a bit deeper here. I am philosophically opposed to providing such a method, for two reasons.

The first reason is that doing so violates the functional programming principles that all the other sequence operators are based upon. Clearly the sole purpose of a call to this method is to cause side effects. The purpose of an expression is to compute a value, not to cause a side effect. The purpose of a statement is to cause a side effect. The call site of this thing would look an awful lot like an expression (though, admittedly, since the method is void-returning, the expression could only be used in a “statement expression” context.) It does not sit well with me to make the one and only sequence operator that is only useful for its side effects.

The second reason is that doing so adds zero new representational power to the language. Doing this lets you rewrite this perfectly clear code:

foreach(Foo foo in foos){ statement involving foo; }

into this code:

foos.ForEach((Foo foo)=>{ statement involving foo; });

which uses almost exactly the same characters in slightly different order. And yet the second version is harder to understand, harder to debug, and introduces closure semantics, thereby potentially changing object lifetimes in subtle ways.

When we provide two subtly different ways to do exactly the same thing, we produce confusion in the industry, we make it harder for people to read each other’s code, and so on. Sometimes the benefit added by having two different textual representations for one operation (like query expressions versus their underlying method call form, or + versus String.Concat) is so huge that it’s worth the potential confusion. But the compelling benefit of query expressions is their readability; this new form of “foreach” is certainly no more readable than the “normal” form and is arguably worse.

If you don’t agree with these philosophical objections and find practical value in this pattern, by all means, go ahead and write this trivial one-liner yourself.

UPDATE: The method in question has been removed from the "Metro" version of the base class library.

  • @Aaron G: No, chained where's will NOT cause iteration to occur multiple times. Remember that all LINQ operations use delayed evaluation via "yield return". This means multiple predicate delegates will be called for each item, but you won't be looping multiple times.

    @Ben: Depending on your definition of "finalizes", there are other LINQ operations that finalize as well. Average, Aggregate, etc.  ForEach just finalizes with a void result, which is not possible in functional languages.

    Personally, I find the reasons given here for not having Enumerable.ForEach to be lacking.

    And that's fine with me; you might want to re-read the last line of the post. -- Eric

    1. Violation of functional principles: C# isn't a functional language. Further, even functional languages include looping constructs to deal with impure functional operations, such as input/output (for example, "loop" in Lisp/Scheme). If they can get their hands dirty, it's crazy not to do so in a non-functional language such as C#.

    2. No representational advantage/duplicate functionality: If you don't remove the other ForEach methods, then this is a cop out for a personal preference. The cat's already out of the bag, so at least be consistent.

    Everything stated in this blog post comes down to someone's preferences, and not a technical argument, IMHO.

    You are correct. That's why I twice called out that my objections were philosophical and not technical. Apologies if that was in some way unclear. -- Eric

  • The problem with implementing myself (which I have done), is that now EVERYONE implements this. Not very DRY.

  • Cost vs coolness

    Aaron is correct in that Linq (i.e., lisp like) expressions are much less readable and maintainable.    Lisp-like constructs (LINQ) are extremely powerful but do not scale well for extremly complex systems.  They work in environments where one focuses on business/logic rules for the system and not on writing code such as generating logic tables and autogenerating code from the tables.

    Code should be written with a total cost over the expected product lifetime in mind and not just the speed of which the orginal developer progresses.    This is why simpler constructs, algorithms last longer than higher level, trendy, pattern of the week code.

    The next developer to take over the code should be in your concern when writing or maintaing a system.

  • I have had this debate personally many times. I tend to agree with Eric, that having multiple ways of accomplishing the same thing is inherently counter-productive; even if there are some small gains in being able to call ForEach functionally, they do not overcome that inherent downside. However, I am not sure that having thousands of developers implementing their own nearly-identical-but-subtly-different ForEach method is better in the long run than biting the bullet and creating a standard method in the platform.

    What bothers me most though is that there wasn't more of an effort made to come to a philosophical agreement when considering the ForEach method on List<T>. IMHO, putting that method on List<T> was the worst possible decision; either it should have been an extension method on IEnumerable<T>, or it should have been left out together. The "different teams" argument doesn't fly with me; .NET is a platform, and like it or not, its components ARE tightly coupled with each other. Someone needs to have the responsibility of maintaining consistency across the platform, so that things like this don't happen.

  • Ike: That is not correct; extension methods have nothing to do with closures.

  • Eric,

    I understand that the extension methods such as Where(...), Select(..) etc do not prevent me from changing state of free variables (or having other side effects). Since you guys do not (or can not) enforce that philosophical underpinning of functional programming, why the objection to inclusion of ForEach due to its "Explicit desire" to have such side-effects.

    Your argument is "you cannot prevent some bad behaviours, therefore you should encourage them" ? -- Eric 

    I would contend that the fact that ForEach expressly admits that it wants to change state unlike Where and Select is a good thing and makes its semantics obvious.

     

  • There is one thing you overlook in your comparison.

    I also overlook that the method version can take a method group as an argument. -- Eric

    When you compare ForEach() with foreach, your ForEach() example for some reason includes the type. Which can be implied of course.

    If you let the compiler infer the type you get nice intellisense advantages too. Sure you can get those with a foreach block too. But I think

    ".F Tab ( c => c."

    is a lot easier to write than

    "foreach (var c in sequence) c."

    IMHO Intellisense is super important, and shouldn't be left out of the debate.

    That said it is trivial to add the method. So I just add it whenever I need it. Problem is that I do this in every project because I don't have a library I share across every project... and after a while all that trivial coding starts to add up.

  • I don't like mixing in IntelliSense in the debate. Sure it's nice but the language shouldn't be designed around some external tool. The tool should be designed around the language.

  • @AaronG: "First of all, I think it's more difficult to read than the vanilla non-Linq version"

    @Greg: "Aaron is correct in that Linq (i.e., lisp like) expressions are much less readable and maintainable."

    Both of these are really style questions. It's not the style you'd see in older C# programs, but the style is fantastically productive and perfectly efficient. The alternative is complex nesting, which I find much harder to follow. Consider these two pieces of pseudocode;

       take all the lines

       remove comments

       remove blank lines

       print what's left

    vs

       for every line:

         if it is not a comment:

           if it is not blank:

             print it.

    For me, the simple sequence is easier to get my head around, and that's what you see with

        items

          .where

          .where

          .foreach

    Nesting makes it harder for me to follow the intent of the code.

    For larger processes this is especially pronounced. Let's say you have a dozen steps to transform your source items into some target. You have a choice between twelve linear steps using Linq, and twelve nested 'for' and 'if' blocks without.

    For instance; let's say you're writing a code scanner and you start with a directory containing C# source code. What you want out is a list of class names defined in C# files in that directory structure. So you go;

       string[] classNames = rootDir

         .GetDirectories()         // all directories, recursively

         .GetFiles("*.cs")         // all c# files

         .Select(GetFileContent)   // the file content as a string

         .SelectMany(content => content.Split('\n')) // all the lines

         .Where(IsClassDefinition) // class definition lines

         .Select(GetClassName)     // class names

         .ToArray();

    The alternative is something like this;

       List<string> classNameList = new List<string>();

       // find all the directories, recursively

       foreach(var dir in GetDirectories(rootDir))

       {

           // find all the C# files in the directory

           foreach(var file in GetFiles(dir, "*.cs"))

           {

               // the file content as a string

               var content = GetFileContent(file);

               // all the lines in the file

               var lines = content.Split('\n');

               foreach(var line in lines)

               {

                   // is it a class definition line

                   if (IsClassDefinition(line))

                   {

                       // get the class name

                       var className = GetClassName(line);

                       classNameList.Add(className);

                   }

               }

           }

       }

       string[] classNames = classNameList.ToArray();

    Now, I've been programming using the former style for a while now, and I find it significantly easier to understand. Given the choice of a linear, stepwise process, or a large number of nested 'if' and 'for' blocks, I'll take the linear steps.

    ---------

    @AaronG: "secondly the chained "wheres" will cause the entire array to be iterated twice, unless the compiler performs some kind of optimization on those particular extension methods"

    As wekempf has commented, this isn't true -- the second Where only receives items that passed through the first Where. You get two interations, but the second is on a subset. Iteration is so fast that programming differently for it would be a premature optimisation.

  • Eric, seems your argument is more pedantic than semantic.

    Why have the foreach method at all then? For would suffice, the following code would work perfectly well:

    Perhaps you missed the bit where I said that sometimes the benefit of the improvement in textual representation pays for the cost of having two ways to do something. -- Eric

               int[] loop = new[] { 4354, 234, 23, 34, 3, 4 };

               var e = loop.GetEnumerator();

               for (e.Reset(); e.MoveNext(); )

               {

                   Console.WriteLine(e.Current);

               }

    Or we could just use while and get rid of For as well.

    Of cource the reason we use foreach is that it is easier.

    The same goes for the Linq version of ForEach. It makes code simpler and easier to read.

    loop.ForEach(i => Console.WriteLine(i));

    I use this all the time, as I find it very useful, it seem more natural when working on a collection, as I don't need to output a collection and then run a foreach on it.

    Great. Perhaps you missed the bit at the end where I said that if you find this useful and don't share my objections, you should go ahead and use it. -- Eric

    Your side-effect argument is also a little weak, can you explain how ToArray() is side-effect free in that case?

    No, because I don't understand the question. -- Eric

    I hae a direct link to the iterators here if any one wants a full example: http://code.google.com/p/ximura/source/browse/trunk/Ximura%20Base/Helper/Linq/Linq.cs

    There are ForEach, ForIndex, and ForBigIndex iterators.

    Paul Stancer

  • @Steve Cooper: "You get two interations (sic), but the second is on a subset. Iteration is so fast that programming differently for it would be a premature optimisation."

    No, there's a single iteration. The first Where looks at the first item.  If it's accepted, "yield return" creates a continuation point and immediately passes control to the second Where. The second Where then makes the same decision, and eventually returns to the continuation point in the first Where, where the second item is processed.  This means we "loop" only once, and pass the results up through the chain of operations, through the use of continuations created by "yield return".  This construct is highly efficient... at least as efficient as you'd code imperatively.

  • One way to avoid the philosophical objection is to only manipulate sequences of Actions. For example, we can already do this:

       var purge = new DirectoryInfo("c:\\stuff")

                   .GetFiles()

                   .Where(file => (file.Attributes == FileAttributes.Normal))

                   .Select(file => new Action(file.Delete));

    The "purge" is now a list of actions to be executed later. And I can tack on a few actions to send notification emails after the purge:

       string[] recipients = new[] {"fred@nowhere.com", "sid@somewhere.com"};

       purge = purge.Concat(

           recipients.Select(recipient =>

               new Action(() => new SmtpClient().Send("do-not-reply", recipient, "Hey!", "Finished purge"))));

    The only thing we need a new extension method for is this:

       Action doPurge = purge.Aggregate();

    Which can be implemented thusly:

       public static Action Aggregate(this IEnumerable<Action> source)

       {

           Action combined = null;

           foreach (Action action in source)

               combined += action;

           return combined;

       }

    Just turns the sequence of Actions into a single multi-cast delegate.

    Note that up to this point, there have been no side-effects whatsoever. We've only been manipulating lists of actions, not actually executing them. So nothing we've done can possibly invalidate that philosophical objection. :)

    The last thing we do is:

       doPurge();

    Which is where all the side-effects happen at last, and who could possibly be surprised by side-effects from a statement like that?

  • Steve, code to print all code lines neglects any error handling.  Try to effectivly determine what error happened and where it happened with your code example is difficult and greatly increases the cost to support and maintain in an ongoing basis.

    Your example:

    string[] classNames = rootDir

        .GetDirectories()         // all directories, recursively

        .GetFiles("*.cs")         // all c# files

        .Select(GetFileContent)   // the file content as a string

        .SelectMany(content => content.Split('\n')) // all the lines

        .Where(IsClassDefinition) // class definition lines

        .Select(GetClassName)     // class names

        .ToArray();

    Errors/exceptions that may happen:

       - permission error at any one of

               - directory

               - file

       - Memory errors (file too big, too many lines)

       - File content errors (binary file with .CS extension)

       - Generic .net errors

    A generic exception.tostring() is not useful for supporting a production quality commercial product.

    This is what is lost in making calls a.b(w).c(x).d(y).e(z) since any particular call in the chain may fail or (much worse) induce side effects that affect later calls in the chain.  It was used early on in C++ systems but fell out of favor for that reason.

    The sequential code is much easier to test and maintain if you factor out into a new function the code to open a CS file and print its contents.

    For compactness, you could even do the same thing in the cmd shell without having to ship a C# application (using FOR, findstr, grep and if adventurous (SED)).

  • @Greg - the example you quote from Steve does not have side effects. It only performs "read" operations and makes copies, so it is perfectly safe for it to only partially execute.

    Regarding exceptions, if an exception is thrown at any stage, that exception will capture a call stack that can be logged, and which precisely pinpoints where the error occurred; if customised handling or recovered is needed (unlikely, given that a partial result would be of no use) then the exception object can be queried for information.

    What would be gained by breaking it up into separate statements? Nothing, from an exception handling perspective, unless you wrapped each step in a separate try/catch. And even then, if every catch simply logged error information and aborted the whole operation, then nothing at all would be gained but readability would be shot to pieces.

    Finally, your suggestion of a pipeline of Unix tools would be logically (and in terms of maintenance and robustness) no different from what Steve's example does within C#, except it wouldn't be statically typed. He has simply constructed a similar pipeline of operations on statically typed objects.

  • @Steve:  I consider your new example a false dilemma, because the "sequential" version should be a 4-line method broken out into subroutines/methods.  As others have pointed out, the Linq-ified version is also impossible to debug in spite of being easier to read.  But even ignoring those two issues, how does it justify a "ForEach" extension method instead of a regular foreach loop?  If ForEach returns void (otherwise it would be a Select), then it would always have to be the last statement in a Linq construct, so there's no way for it to eliminate nested loops or conditionals or other such verbosity.  At most, it would eliminate a little bit of whitespace and possibly a pair of braces.

    I'm not arguing against Linq here - I've been using it since the first CTPs - I just think that too many people are using it because it's slightly easier for them to write, or just because they can, without paying any heed to testing or maintenance.  There are times when a functional style is appropriate, and there are times when object-oriented or straight procedural styles are more appropriate.

Page 2 of 3 (45 items) 123