Welcome to MSDN Blogs Sign in | Join | Help
Streaming From Text Files to XML

Quite some time ago, I wrote a blog post on how you can stream text files as input into LINQ queries by writing an extension method that yields lines using the yield return statement.

You then can write a LINQ query that processes the text file in a lazy deferred fashion. If you then use the T:System.Xml.Linq.XStreamingElement to stream output, you then can create a transform from the text file to XML that uses a minimal amount of memory, regardless of the size of the source text file. You can transform a million records, and your working set will be very small.

The following text file, People.txt, is the source for this example.

#This is a comment

1,Tai,Yee,Writer

2,Nikolay,Grachev,Programmer

3,David,Wright,Inventor

The following code contains an extension method that streams the lines of the text file in a deferred fashion.

public static class StreamReaderExtension

{

    public static IEnumerable<string> Lines(this StreamReader source)

    {

        String line;

        if (source == null)

            throw new ArgumentNullException("source");

        while ((line = source.ReadLine()) != null)

            yield return line;

    }

}

 

class Program

{

    static void Main(string[] args)

    {

        using (StreamReader sr = new StreamReader("People.txt"))

        {

            XStreamingElement xmlTree = new XStreamingElement("Root",

                from line in sr.Lines()

                let items = line.Split(',')

                where !line.StartsWith("#")

                select new XElement("Person",

                           new XAttribute("ID", items[0]),

                           new XElement("First", items[1]),

                           new XElement("Last", items[2]),

                           new XElement("Occupation", items[3])

                       )

            );

            Console.WriteLine(xmlTree);

        }

    }

}

This example produces the following output:

<Root>

  <Person ID="1">

    <First>Tai</First>

    <Last>Yee</Last>

    <Occupation>Writer</Occupation>

  </Person>

  <Person ID="2">

    <First>Nikolay</First>

    <Last>Grachev</Last>

    <Occupation>Programmer</Occupation>

  </Person>

  <Person ID="3">

    <First>David</First>

    <Last>Wright</Last>

    <Occupation>Inventor</Occupation>

  </Person>

</Root>

Posted: Sunday, May 20, 2007 4:55 PM by EricWhite
Filed under: ,

Comments

Chris said:

Can you translate this code into VB? This is useless unless you're a C# coder.

# December 13, 2007 4:23 PM

EricWhite said:

It is not very easy to translate this code into VB. It doesn't actually translate directly, as there is no yield return statement in VB. Instead, you have to write your own iterator, implementing the Current property, and the Reset and MoveNext methods.

For more information on using the yield return keyword, and an example of an iterater that is implemented not using the yield return keyword, see:

http://blogs.msdn.com/ericwhite/pages/The-Yield-Contextual-Keyword.aspx

# December 13, 2007 5:10 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker