Recently I was looking for a nice way to process large text files and keeping memory consumption at a minimum.

Though I wanted to keep input logic strictly separated from filtering and processing components. I figured this is similar (but simpler) to what LINQ does in principle. I came across the C# YIELD operator which seamed ideal for my intension.

YIELD is put into iterator blocks (foreach) and works on methods returning IEnumerables<>.

It is a language operator keyword that makes the compiler output pretty decent code that can help you save memory.

I’ve put together an example where a large text file is read line by line and filtered so that only lines containing specific keywords will be returned in a string enumeration.

Key here is that the IO-code must not contain any filter logic though not all lines can be read into memory at once.

   1:  using (TextReader reader = File.OpenText("AwesomeHuge.txt"))
   2:              {
   3:                  lines = ReadLinesFromFile(reader);
   4:                  var filteredLines = FilterLines( lines, "SearchString");
   5:                  foreach (var fl in filteredLines)
   6:                      Console.WriteLine(fl);                
   7:              }

The ReadLinesFromFile method (IO) looks straight forward:

   1:  private  IEnumerable<string> ReadLinesFromFile(TextReader reader)
   2:          {
   3:              while (true)
   4:              {
   5:                  var line = reader.ReadLine();
   6:   
   7:                  if (line == null)
   8:                      yield break;
   9:                  
  10:                  yield return line;
  11:              }
  12:          }

The FilterLines method (filter) makes use of YIELD once more so that in the end only those lines are kept that contain the search string.

   1:  private  IEnumerable<string> FilterLines(  IEnumerable<string> lines, string searchString)
   2:          {
   3:              foreach (string line in lines)
   4:              {
   5:                  if (line.Contains(searchString))
   6:                  {
   7:                      Console.WriteLine("Include Line: " + line);
   8:                      yield return line;
   9:                  }
  10:              }
  11:          }

If you never came across something similar I guess the best approach to understand what’s actually happening is by stepping through the code line by line.

The most important point is that a yielding method only gets executed once someone iterates over it’s IEnumerable return argument which sounds kind a recursive but it actually is not as implementation details show.

That means the collection only contain data that is relevant (through search string) and not the while file content as a quick glance might indicate.

For further reading have a look at this article: http://msdn.microsoft.com/en-us/library/9k7k7cf0(v=VS.100).aspx