Blog - Title

LINQ to Text Files

LINQ to Text Files

  • Comments 11

(Sept 30, 2008 - I've changed my approach for querying text files.  The new approach is detailed in LINQ to TEXT and LINQ to CSV.) 

Lazy evaluation is an important technique in functional programming. There is a entertaining article on functional programming here.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOC
I was thinking about lazy evaluation the other day, and the issues of processing huge text files using streaming techniques, and I realized that LINQ could do really cool things if I implemented an extension method for StreamReader that iterated over the lines of a text file. Because of the lazy evaluation, provided you construct only certain queries, this technique doesn't read the entire file into memory. Each line is read, processed by both LINQ query expressions, output to the console, and then the next line is read. However, be aware that certain LINQ operators such as orderby can force the entire file to be read into memory.

The following example is somewhat artificial; it could be implemented with a single query expression, but I wanted to use two query expressions to demonstrate the laziness of processing. No lines are read from the text file until the program iterates over the results of the second query expression.

I just gotta say, this is the expressiveness that I REALLY wanted many years ago when I was writing Unix shell scripts to manipulate text files.

The implementation is trivial. The following listing contains both the extension method as well as the code to use the extension method:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Linq;

namespace LinqToText
{
  public static class StreamReaderSequence
  {
    public static IEnumerable<string> Lines(this StreamReader source)
    {
      String line;

      if (source == null)
          throw new ArgumentNullException("source");
      while ((line = source.ReadLine()) != null)
      {
        yield return line;
      }
    }
  }

  class Program
  {
    static void Main(string[] args)
    {
      StreamReader sr = new StreamReader("TextFile.txt");

      var t1 =
        from line in sr.Lines()
        let items = line.Split(',')
        where ! line.StartsWith("#")
        select String.Format("{0}{1}{2}",
           
items[1].PadRight(16),
            items[2].PadRight(16),
            items[3].PadRight(16));

      var t2 =
        from line in t1
        select line.ToUpper();

      foreach (var t in t2)
        Console.WriteLine(t);

      sr.Close();
    }
  }
}

 

If you run this example with the following text file:

#This is a comment
1,Eric,White,Writer
2,Bob,Jones,Programmer
3,Orville,Wright,Inventor
4,Thomas,Jefferson,Statesman
5,George,Washington,President

It produces the following output.

ERIC            WHITE           WRITER
BOB             JONES           PROGRAMMER
ORVILLE         WRIGHT          INVENTOR
THOMAS          JEFFERSON       STATESMAN
GEORGE          WASHINGTON      PRESIDENT

 

 

Leave a Comment
  • Please add 4 and 6 and type the answer here:
  • Post
  • Oleg Tkachenko has a nice post comparing the StAX (java) and XmlReader (.NET and XmlLite) approaches...
  • PingBack from http://mhinze.com/18-links-today-2007-09-06/

  • Here is another way using the Microsoft.VisualBasic library:

    using System;

    using System.Collections.Generic;

    using Microsoft.VisualBasic.FileIO;

    namespace ConsoleApplication1

    {

       class Program

       {

           static void Main(string[] args)

           {

               TextFieldParser tfp = new TextFieldParser("TextFile.txt") {

                                           CommentTokens = new string[] { "#" } };

               tfp.SetDelimiters(",");

               List<string> d = new List<string>();

               while (!tfp.EndOfData)

               {

                   var q = tfp.ReadFields();

                   d.Add (String.Format("{0}{1}{2}",

                           q[1].ToUpper().PadRight(16),

                           q[2].ToUpper().PadRight(16),

                           q[3].ToUpper().PadRight(16)));

               }

               tfp.Close();

               foreach (var t in d)

                   Console.WriteLine(t);

           }

       }

    }

  • To the post directly above this one, that's not using the Microsoft.VisualBasic library.  It's still C#

  • Quite some time ago, I wrote a blog post on how you can stream text files as input into LINQ queries

  • Following are a few additional notes regarding the Linq to Text Files example. Taking Advantage of Multiple

  • how can you link a text file to database?afterwhich, can you manipulate data inside a text file?(e.g. averaging numbers inside a text file)..

    we badly need it..please reply..

  • Любопытно. Всё гениальное - просто.

  • how to sole this question

    1. Close VS 2008.

    2. Open the project file containing the LINQ To SQL item in Notepad.

    3. Remove the following lines:

    <ItemGroup>

        <Service Include="{3259AA49-8AA1-44D3-9025-A0B520596A8C}" />

    </ItemGroup>

    The Setup Project will now build successfully. However, if you double-click the DBML file to open the designer in VS 2008 the Setup Project will stop building again. The above lines do *not* get re-added to the project file but the Setup Project will stop building anyway. Just restart VS 2008 and it will work again — until you open the DBML designer again. Once the Setup Project fails due to this problem it will never build successfully until after you restart VS 2008.

  • AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

  • Meh, splitting line by Split(',') is not good way to go, line is simple string, it can work only for such a easy files. But when you get eg.  1,"Some, crazy",stuff it is over ...

Page 1 of 1 (11 items)