Continuing to an outer loop

Continuing to an outer loop

Rate This
  • Comments 39

When you have a nested loop, sometimes you want to “continue” the outer loop, not the inner loop. For example, here we have a sequence of criteria and a sequence of items, and we wish to determine if there is any item which matches every criterion:

match = null;
foreach(var item in items)
{
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      // No point in checking anything further; this is not
      // a match. We want to “continue” the outer loop. How?
    }
  }
  match = item;
  break;
}

There are many ways to achieve this. Here are a few, in order of increasing awesomeness:

Technique #1 (ugly): When you have a loop, the “continue” statement is essentially a “goto” which branches to the bottom of the loop, just before it does the loop condition test and pops back up to the top of the loop. (Apparently when you spell “goto” as “continue”, the “all gotos are all evil all the time” crowd doesn’t bother you as much.) You can of course simply make this explicit:

match = null;
foreach(var item in items)
{
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      goto OUTERCONTINUE;
    }
  }
  match = item;
  break;
OUTERCONTINUE:
  ;
}

Technique #2 (better): When I see a nested loop, I almost always consider refactoring the interior loop into a method of its own.

match = null;
foreach(var item in items)
{
  if (MeetsAllCriteria(item, critiera))
  {
    match = item;
    break;
  }
}

where the body of MeetsAllCriteria is obviously

foreach(var criterion in criteria)
  if (!criterion.IsMetBy(item))
    return false;
return true;

Technique #3 (awesome): The “mechanism” code here is emphasizing how the code works and hiding the meaning of the code. The purpose of the code is to answer the question “what is the first item, if any, that meets all the criteria?” If that’s the meaning of the code, then write code that reads more like that meaning:

var matches = from item in items
              where criteria.All(
                criterion=>criterion.IsMetBy(item))
              select item;
match = matches.FirstOrDefault();

That is, search the items for items that meet every criterion, and take the first of them, if there is one. The code now reads like what it is supposed to be implementing, and of course, if there are no loops in the first place then there’s no need to “continue” at all!

The biggest thing I love about LINQ is that it enables a whole new way to think about problem solving in C#. It’s gotten to the point now that whenever I write a loop, I stop and think about two things before I actually type “for”:

* Have I described anywhere what this code is doing semantically, or does the code emphasize more what mechanisms I am using at the expense of obscuring the semantics?

* Is there any way to characterize the solution to my problem as the result of a query against a data source, rather than as the result of a set of steps to be followed procedurally?

 

  • It appears the LINQ code is requesting a list of matches, and then choosing one.  That result may be different than the first found match.  

    Is the “matches” list ever realized?  A big, or slow, list of items will perform badly

    If IsMetBy() has side effects, can wee see the whole list of items being tested?

    If IsMetBy() has no side effects, can the compiler “see” the optimization?

  • For me, the interesting part of this post is the use of the All method as basically a short-cut AND for a list of predicates.  I never thought of using the All function for that purpose and had created my own predicate concatenation methods.  Now I realize that All can be used for AND and Any can be used for OR.

    Thanks again, Eric, for a thought provoking post!

  • Inspired by this article, I just replaced a thorny nested foreach loop with a query in my project. You rock!

    Quick question: I don't think there's any way to remove this loop, but I could be wrong:

    IEnumerable <Node> EnumerateAncestors (Node node)

    {

       for ( Node n = node; n != null; n = n.Parent () )

           yield return n;

    }

  • I would far and away prefer technique 2.  

    Technique 3 may be easier understand in terms of *why* it gets the right answer, but it's terrible in terms of *how* it gets to that answer.  Get an object representing all matches and then take just the first one?  Terribly inefficient!

    Yes, I know that LINQ is lazy, and the LINQ code in technique 3 isn't significantly slower than looping, but it *looks* as if it would perform poorly, and that's a problem.  This is my fundamental problem with LINQ.  We developers are bad enough about writing efficient code -- LINQ just makes it that much harder to recognize efficient code when we see it.

    All coding is working with abstractions. How does your argument not also apply to, say, local variables in C? Used to be before C that you got to decide which locals were enregistered and which were not; now the C compiler does it for you and you cannot recognize what is efficient use of registers and what is not. Would you rather write everything in assembler? -- Eric

  • I have to disagree with Alan: Clarity and robustness are usually far more important than micromanaging efficiently. And if you *must* micromanage efficiency, we have C and C++ which force you to do so.

    No one *ever* chose a language such as C# because garbage collection is more efficient at managing memory than direct memory management. Just by choosing C#, you've decided to trust (or be indifferent to) the language/environment developers to do the grunt work acceptably well. Once you trust the environment to do memory management, you might as well trust it to do queries.

    In more than 30 years of application development, I don't think I've ever seen a substantial performance problem caused by sub-optimal micro-management. Every one of the substantial performance problems has been caused by architectural or design failures. And I've been persuaded that while LINQ by no means makes it impossible to create a bad architecture or design, it does make it easier overall to implement a good architecture and design, and makes those elements *much* clearer.

  • Larry H.: Your EnumerateAncestors is an unfold operation; using Rx:

    return EnumerableEx.Generate<Node, Node>(node, n => n != null, n => n, n => n.Parent());

    (not sure if type inference would work on that)

  • @Alan - I understand your point about LINQ or any lazy execution having the potential to lead to inefficient code, but it also has the potential to lead to more efficient code and better understood code.  It is a tool that must be used wisely.  In this particular code snippet, I believe it improves things "awesomely."

    The actual potential performance bottle neck in this code snippet is the same with all three techniques:  the criteria variable.  If the criteria variable has lazy evaluation and slow enumeration, then it could be better to cache it in a list before using it.  So how is the caching question answered?  If this code snippet was a method with the signature below should we cache criteria?

    T FirstMatch<T>(IEnumerable<T> source, IEnumerable<Predicate<T>> criteria)

    Definitely not: "premature optimization is the root of all evil"  However, it would be great if the documentation of the method pointed out that criteria was a potential performance bottleneck and would potentially be enumerated once per item in source.  This is something I believe should be added to the documentation of the Microsoft LINQ Enumerable methods such as Join, Intersect, etc.  The documentation should make it clear the performance of the these methods and the potential bottlenecks.

  • @Alan/@DRBlaise - I think that LINQ will (over the long run) absolutely result in more efficient and scalable code than traditional looping/iteration constructs. Here's why.

    As Moore's law morphs from "more cpu cycles / unit-time" to "more cores per chip" - the importance and value of a query abstraction layer like LINQ will increase. The ability to take a query and parallelize it when written in procedural, a-follows-b form is incredibly hard. It generally requires human understanding of the unstated conditions and constraints - a machine transformation is extremely difficult, if not impossible. By contract, a semantic query is much easier to parallelize with something like PLINQ:

    var matches = from item in items.AsParallel()

                 where criteria.All(

                   criterion=>criterion.IsMetBy(item))

                 select item;

    match = matches.FirstOrDefault();

    Just as .NET already makes it possible to optimize your code at runtime (via JIT) for a particular computing instruction set (x86, x65, etc) - I imagine that with things like PLINQ and TPL it will soon be able to optimize it for a particular multi-processing architecture (# of cpu cores). In fact I would be surprised if there aren't significant efforts going on now to figure out ways to make it even easier to solve such problems more transparently (i.e. without manually injecting the AsParallel() call).

    I think the improvements in rapid development, maintainability, and comprehension that LINQ adds to C#/.NET development will be paying dividends to those developers that understand it and apply it in the right contexts.

  • For the concise crowd:

    var match = items.FirstOrDefault(item => item.All(criterion => criterion.IsMetBy(item)));

  • Let me try that again:

    var match = items.FirstOrDefault(item => criteria.All(criterion => criterion.IsMetBy(item)));

  • Very nice idea.  I like it.  You are right about it changing the way you think to solve problems.  It will be interesting to see how far LINQ changes high level language design.

  • Some interesting observations there Anthony P. Could you clarify your comment below?

    "Introduce a nested loop where I was going through that same list of 80000 integers and comparing it with another list of 66,667 integers (or whatever the number). All integers that were equal between the two lists were added to a third. Incidentally, there were 53,333 matches. It tooks the nested loops 33,319 milliseconds to arrive at that result. LINQ did it in 53."

    Is 53 meant to be 53 milliseconds (3 orders of magnitude faster?) or 53,000 milliseconds (approaching half the speed?).

  • This is the one and only time I've been jealous of PHP developers.  As I remember, in PHP the syntax is simply 'continue 2' (or 3,45 etc...).  Now THAT's a nifty language feature.

  • Sebastian: You're not precisely replacing a foreach with a query per se, but quite interesting nonetheless. Thanks!

  • @Trees, 53 milliseconds. Blazingly fast when compared to nested for loops in the admittedly contrived scenario of linking thousands of ints to other thousands of ints.

Page 2 of 3 (39 items) 123