Continuing to an outer loop

Continuing to an outer loop

Rate This
  • Comments 39

When you have a nested loop, sometimes you want to “continue” the outer loop, not the inner loop. For example, here we have a sequence of criteria and a sequence of items, and we wish to determine if there is any item which matches every criterion:

match = null;
foreach(var item in items)
{
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      // No point in checking anything further; this is not
      // a match. We want to “continue” the outer loop. How?
    }
  }
  match = item;
  break;
}

There are many ways to achieve this. Here are a few, in order of increasing awesomeness:

Technique #1 (ugly): When you have a loop, the “continue” statement is essentially a “goto” which branches to the bottom of the loop, just before it does the loop condition test and pops back up to the top of the loop. (Apparently when you spell “goto” as “continue”, the “all gotos are all evil all the time” crowd doesn’t bother you as much.) You can of course simply make this explicit:

match = null;
foreach(var item in items)
{
  foreach(var criterion in criteria)
  {
    if (!criterion.IsMetBy(item))
    {
      goto OUTERCONTINUE;
    }
  }
  match = item;
  break;
OUTERCONTINUE:
  ;
}

Technique #2 (better): When I see a nested loop, I almost always consider refactoring the interior loop into a method of its own.

match = null;
foreach(var item in items)
{
  if (MeetsAllCriteria(item, critiera))
  {
    match = item;
    break;
  }
}

where the body of MeetsAllCriteria is obviously

foreach(var criterion in criteria)
  if (!criterion.IsMetBy(item))
    return false;
return true;

Technique #3 (awesome): The “mechanism” code here is emphasizing how the code works and hiding the meaning of the code. The purpose of the code is to answer the question “what is the first item, if any, that meets all the criteria?” If that’s the meaning of the code, then write code that reads more like that meaning:

var matches = from item in items
              where criteria.All(
                criterion=>criterion.IsMetBy(item))
              select item;
match = matches.FirstOrDefault();

That is, search the items for items that meet every criterion, and take the first of them, if there is one. The code now reads like what it is supposed to be implementing, and of course, if there are no loops in the first place then there’s no need to “continue” at all!

The biggest thing I love about LINQ is that it enables a whole new way to think about problem solving in C#. It’s gotten to the point now that whenever I write a loop, I stop and think about two things before I actually type “for”:

* Have I described anywhere what this code is doing semantically, or does the code emphasize more what mechanisms I am using at the expense of obscuring the semantics?

* Is there any way to characterize the solution to my problem as the result of a query against a data source, rather than as the result of a set of steps to be followed procedurally?

 

  • @Anthony - I can just repeat what Eric already suspected; you're only measuring the creation of the query, not its actual execution. If you for example do:

    var myList = from i in new int[] { 1, 2, 3, 4 } where i < 3 select i;

    then myList, at least from my understanding, does not contain an int list with { 1, 2 }, but instead the query-to-be-executed itself. The actual resulting list's elements are only retrieved (the query is only executed) when you actually loop through the elements of myList. I think you should include that looping in your benchmark, or force the query to be evaluated by calling a .ToArray() on myList.

  • @Chris J and @Eric - I believe I'm measuring it post-execution,  but I do not wish to clog the comments section with my code. I've forwarded the code I'm using as a basis for my observations to Eric via the contact link, so perhaps he can point out the errors of my reasoning or any compiler optimizations that are happening that I'm not aware of.

    The quick summary is that for the LINQ version, I'm creating the query, calling .ToList(), and then getting the count of list and outputting that to the screen along with the time duration. In a quick test not in the code sent to Eric, I even wrote the first few matches to the screen from the LINQ version to ensure execution has occurred, all still very fast. I'll leave it up to Eric to possibly tell us how woeful my original nested loops must have been!

  • @Anthony P - this is just a guess, but I think the information you are leaving out that is confusing people is that you are replacing 2 nested for loops with a LINQ Join statement.  Basically, you created your own Join logic using nested loops and then tried the LINQ Join.  The LINQ join will be amazingly faster because it does not use nested loops, it creates a keyed hash table for the second enumeration and does a lookup.  For 2 lists of lengths m and n, your execution time is O(m*n) and the LINQ Join method is O(m+n).  Did I guess correctly?

    Wow! You are exactly right. Nice psychic debugging! I've taken a look at Anthony's code and in fact this is not an apples-to-apples comparison; the LINQ version optimizes for speed and uses a lot of extra space; the foreach version optimizes for small use of space and is much slower. If you do an apples-to-apples comparison, no matter how you slice it, there is in many cases a large performance penalty for using the higher level of abstraction, particularly if you use it naively.

    But by making the code look more like the business logic, that lets you then abstract this stuff away. It lets you mock up queries againt in-memory objects that then work against databases. It lets you optimize things at the query level. It lets you write code that is flexible and reusable and easily understood, and those things are often worth some performance penalty. -- Eric

  • Well, yes, it was stated that it was a nested loop producing a third list, and a LINQ join version producing the third list. I do not believe I left any of that out of my original description? But you're right, if LINQ actually is going about it in a different way than the nested loops between lists, then that explains the difference but also contradicts what Eric said "Behind the scenes, it's doing all the same loops as the original program, it's just providing a more convenient syntax." I guess the truth is it's doing the same loops as an entirely more efficient version of the original might have done had I implemented such a feature!

  • I personally like option 2 and the option as proposed by Mauvaisours (at the start of the comments) more than option 3. Yes, option 3 makes it clear what you want and not how you want to get it, but precisely because of that does it hide that it's going to do a lot of work when executed. What you are doing is an O(|items|*|criteria|) operation. Having clear loops makes it easy to recognise you'll be busy executing that loop for a while if |items| and/or |criteria| is very large. The LINQ hides that to some degree. It's just like you expect the 'get' part of a property to return fast, while a GetSpecificStuff() tells you that this might take a while to return.

  • I want to thank Eric for the improved code he sent me. He's right, the smarter version of the foreach is fast--faster than the LINQ join I was using (which is still pretty fast, all told). The nested foreach is orders of magnitude slower than either, and I suppose the LINQ-equivalent of that (which is matching in a where clause instead of using a join) is *ridiculously* slow. In fact, it's still executing, and it has been several minutes since the program started! So, yes, my comparison was nowhere near apples-to-apples, and I'm glad I at least knew enough to do the join instead of the where. The foreach/hash is another story, unfortunately.

  • @Anthony, DRBlaise and Eric - It's been interesting to follow the performance discussion for the int[] int[] join scenario. In short what LINQ has offered is compressing loop/hash code that is more complex than the nested loop OP example into yet another short query. Possibly at a nominal performance penalty. I think this still reinforces the OP. Thank you all for the commentary.

  • Wouldn't the easiest solution be to loop over criteria outside and items inside?

  • I perfectly agree with Stilgar: There is no reason to hate GOTOs nowadays.

    This GOTO-hater issue all comes from the times when it was possible to jump out of one function into the middle of another one and continue there; when one could accidentally have two same labels in the script and the GOTO was just taking the first one found and so on. In the context of such scripting languages I also would recommend:

    Never use a GOTO!

    But these times are gone. GOTOs are strictly compiler checked, I cannot jump accross methods, and even within the method I have to respect the code blocks. I can't accidentally have a label twice or misspelled. And GOTO is extremely fast.

    I personally would use:

    "Technique #1" (solid, medium readability, fast)

    - for high performance loops

    - if I cannot easily extract the inner loop

    "Technique #2" (solid, good readability, medium fast)

    - whenever I can

    "Technique #3" (sophisticated, with bad readability, unknown performance)

    - when I want that only .NET professionals understand the code

    - to show of

    It may sound funny, but LINQ has also the huge disadvantage of often not being available (I have customers who work with .NET 2.0, and one has a project in a discontinued .NET-language: although it's .NET 3.5 it has only .NET 1.1 language integration. No generics, no nullables, no LINQ etc.).

    America may be different from Europe, but here the motto often is "never touch a running system". People often don't migrate their projects to a higher .NET version if they don't have to. They don't pay an external programmer a huge amount of money to rewrite 100.000 lines of working code only to have the same functionality in a modern language - even if I need 3 or more times longer for every extension I have to implement for them - and not to mention the time I need to find a bug without an IDE with integrated debugger.

    Anyway, here my conclusion in short:

    GOTOs are not evil anymore, but LINQ still often is!

    PS: Just playing the devil's advocate, I like LINQ in most cases.

    PPS: One more to top it: I especially like LINQ in conjunction with VB.NET, for example when working with XML and (XML-)namespace-imports that are simply not supported by the C#-LINQ provider.

Page 3 of 3 (39 items) 123