May, 2007

  • Eric White's Blog

    Streaming From Text Files to XML


    Quite some time ago, I wrote a blog post on how you can stream text files as input into LINQ queries by writing an extension method that yields lines using the yield return statement.

    This blog is inactive.
    New blog:

    Blog TOC
    You then can write a LINQ query that processes the text file in a lazy deferred fashion. If you then use the T:System.Xml.Linq.XStreamingElement to stream output, you then can create a transform from the text file to XML that uses a minimal amount of memory, regardless of the size of the source text file. You can transform a million records, and your working set will be very small.

    The following text file, People.txt, is the source for this example.

    #This is a comment

    The following code contains an extension method that streams the lines of the text file in a deferred fashion.

    public static class StreamReaderExtension
        public static IEnumerable<string> Lines(this StreamReader source)
            String line;
            if (source == null)
                throw new ArgumentNullException("source");
            while ((line = source.ReadLine()) != null)
                yield return line;
    class Program
        static void <_st13a_place _w3a_st="on">Main<_st13a_place><_st13a_place><_st13a_place />(string[] args)
            using (StreamReader sr = new StreamReader("People.txt"))
                XStreamingElement xmlTree = new XStreamingElement("Root",
                    from line in sr.Lines()
                    let items = line.Split(',')
                    where !line.StartsWith("#")
                    select new XElement("Person",
                               new XAttribute("ID", items[0]),
                               new XElement("First", items[1]),
                               new XElement("Last", items[2]),
                               new XElement("Occupation", items[3])


    This example produces the following output:

      <Person ID="1">
      <Person ID="2">
      <Person ID="3">

Page 1 of 1 (1 items)
Page 1 of 1 (1 items)