Lots of people seem to want to know how to read managed pdbs (especially from managed code). As a pet project, I was going to write an XmlReader that operated on managed pdbs. I could call it "XmlPdbReader" and it sounded kind of cool to me. Afterall, one of the alleged perks of XmlReader is that it's abstract and you can instantiate it over your own non-xml data source (on MSDN alone, there's this article; this architectural overview of XmlReader; this example of extending xml readers).  Same thing with XPathNavigator.  This sounds like a good time for me to make the disclaimer that although I think XML is neat, I really am no xml whiz.

Derive from XmlReader?
As I drilled down deeper, I notice that there's a lot of methods on both Reader or Navigator to override. Here's the list of abstract methods on XmlReader:

  1. public abstract XmlNodeType NodeType { get; }
  2. public abstract string LocalName { get; }
  3. public abstract string NamespaceURI { get; }
  4. public abstract string Prefix { get; }
  5. public abstract bool HasValue { get; }
  6. public abstract string Value { get; }
  7. public abstract int Depth { get; }
  8. public abstract string BaseURI { get; }
  9. public abstract bool IsEmptyElement { get; }
  10. public abstract int AttributeCount { get; }
  11. public abstract string GetAttribute(string name);
  12. public abstract string GetAttribute(string name, string namespaceURI);
  13. public abstract string GetAttribute(int i);
  14. public abstract bool MoveToAttribute(string name);
  15. public abstract bool MoveToAttribute(string name, string ns);
  16. public abstract bool MoveToFirstAttribute();
  17. public abstract bool MoveToNextAttribute();
  18. public abstract bool MoveToElement();
  19. public abstract bool ReadAttributeValue();
  20. public abstract bool Read();
  21. public abstract bool EOF { get; }
  22. public abstract void Close();
  23. public abstract ReadState ReadState { get; }
  24. public abstract XmlNameTable NameTable { get; }
  25. public abstract string LookupNamespace(string prefix);
  26. public abstract void ResolveEntity();

That's a lot to implement. I'd expect most people to be very intimidated to go and write their own customized XmlReader.
And even if you do go implement them all, how will you test them all to make sure they work properly?

There's got to be a better way...
It almost looks easier to just convert your non-xml data source into an xml text file and then use an XmlTextReader. The downside of that is that you need to pre-generate the entire file.... or do you?

There's another alternative. Write your own custom TextReader  (I talk more about TextReader here; I actually wrote that blog entry specifically in mind for this one), which takes in your data source but spits it out in an xml format. In other words, rather than implement a xml stream (hard to do with XmlReader), implement a text stream (easy to do with TextReader), and then use the already-existing & already-tested XmlTextReader class to convert the text stream into an xml stream.
Yes, that's evil. But it's a lot easier to write a TextReader than an XmlReader, so such a hack takes a hard problem and degenerates it into a simple problem.

As a simple case study, let's say you want an XML data source over a range of numbers. (keeping it simple).   So I ask for the range (1,3), and it gives back some XML. Ideally, I'd like to be able to say something like:

XmlReader r = new XmlRangeReader(1,3) :
XmlDocument doc = new XmlDocument();
doc.Load(reader);
Console.WriteLine(doc.OuterXml);

And it gives back this XML:

<range>
    <int>1</int>
    <int>2</int>
    <int>3</int>
</range>

Now to implement the XmlRangeReader class, I would need to provide and test all the 25+ abstract methods on XmlReader. That's a lot of work.
Instead, I could get the XML reader like this:

    XmlReader reader = new XmlTextReader(new RangeXmlTextReader(1, 3));

where RangeXmlTextReader is a TextReader where I only need to implement 1 single Read() method.  So this expression:
    new RangeXmlTextReader(1, 3).ReadToEnd()
evaluates to this string:
    <range><int>1</int><int>2</int><int>3</int></range>

Here's an implementation of RangeXmlTextReader, which uses the EnumeratorTextReader helper helper class I recently blogged about.

    public class RangeXmlTextReader : EnumeratorTextReader
    {
        int m_start, m_end;
        public RangeXmlTextReader(int start, int end)
        {
            m_start = start;
            m_end   = end;
        }
        protected override IEnumerator<String> GetEnumerator()
        {
            yield return "<range>";
            for (int i = m_start; i <= m_end; i++)            
                yield return "<int>" + i + "</int>";
            yield return "</range>";            
        }
    }

You've got to love those yield statements! It makes it clear that TextReader could scale to arbitrarily complex data stores and not be just limited to a range of numbers. Any code to print an data-store as xml could be easily converted to a EnumeratorTextReader by switching the "Console.WriteLine" / print calls into "yield".
Even if you derived from TextReader directly and provided a Read() override, that's still just implementing a single method and still much easier than implementing 25+ methods from XmlReader.

To summarize:
We've implemented a fully-functional XmlReader by implementing a single simple method, and even implementing it in a very intuitive, natural, and testable way.


Why this solution is good?
1) This is very simple to do.
2) This is very easy to test. I keep complaining that even if you derived from XmlReader, how would you test it?  Just because XmlDocument.Load() reads in your doc doesn't mean the reader is correct. Are you going to call MoveToNextAttribute() on every single possible state that your reader could be in?
In contrast, a TextReader is much simpler to test because it really is just a linear series of Read() calls.  There's exactly 1 way to consume the API, so if "new RangeXmlTextReader(1, 3).ReadToEnd()" prints out the XML string that you expect, you can be very confident your reader is working for the given inputs. 
3) This technique is consistent with Krzysztof's  Xml guidelines.

Why this solution is bad?
Anytime somebody takes a hard problem (implementing all of XmlReader), and then converts it into an easy problem (just implementing TextReader, and then feeding that into XmlTextReader), I have to ask "What's the catch?".   This solution works because we leverage the existing XmlTextReader to do all the hard work. It's optimized for end-user simplicity, but it does have some drawbacks:
1) The solution here has bad perf. First composing the XML as a string (extra garbage), and then parsing that string (extra work), we can't be helping perf.
2) Why does XmlReader have all those methods if we only need  a single Read call? There's a lot of redundancy in the XmlReader methods. For example, there are  8 *Attribute* (4 Move*Attribute() + 3 GetAttribute() +  AttributeCount) members. The base class really only needs 1 method to get a collection of attributes, and then it could implement all the enumeration itself. Another example  are methods like Depth() and EOF(). The base class could keep track of these things itself without having to ask the derived class.
3) The XmlReader interface is more powerful than just getting a text stream.  For example, the XmlReader can skip things whereas the TextReader must touch everything. Though you could derive from XmlTextReader and access these advanced methods.

Other commentary:
The CLR is cross-language and you could imagine language features that would make it even easier to represent everything as XML.
For example, our reader currently yields an IEnumerator<String> and then we parse that into xml tokens at runtime. Instead, could compile the xml strings at compile time into xml tokens and our reader could then yield an IEnumerator<XmlToken>. Here's the transformation:
Here's what we currently have:
    yield return "<range>";
If we had a compiler with great xml language support, perhaps we could mark that string as xml, and it would transform it like:
    yield return XmlBeginElementToken();  yield return XmlElementNameToken("range"); yield return XmlEndElementToken();
 



Here's the full code:

// Sample to easily implement an XmlReader.
// Implement XmlReader by using an XmlTextReader on top of a TextStream.
// http://blogs.msdn.com/jmstall 

using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.IO;

#region Text Reader Helper classes
// TextReader class requires an implementation of both Read() and Peek(). 
// This is a helper class that implements both of those based off a derived implementation of ReadLine().
// This is useful if a derived TextReader can implement ReadLine() more easily than just Read().    
// See http://blogs.msdn.com/jmstall/archive/2005/08/06/ReadLineTextReader.aspx for details.
public abstract class ReadLineTextReader : TextReader
{
    // The default TextReader.Peek() implementation just returns -1. How lame!
    // We can build a real implementation on top of Read().
    public override int Peek()
    {
        FillCharCache();
        return m_charCache;
    }

    // Reads one character. TextReader() demands this be implemented.
    public override int Read()        
    {
        FillCharCache();
        int ch = m_charCache;
        ClearCharCache();
        return ch;
    }

#region Character cache support
    int m_charCache = -2; // -2 means the cache is empty. -1 means eof.
    void ClearCharCache() 
    {
        m_charCache = -2;
    }
    void FillCharCache() 
    {            
        if (m_charCache != -2) return; // cache is already full
        m_charCache = GetNextCharWorker();
    }
#endregion 

#region Worker to get next signle character from a ReadLine()-based source
    // The whole point of this helper class is that the derived class is going to 
    // implement ReadLine() instead of Read(). So mark that we don't want to use TextReader's 
    // default implementation of ReadLine(). Null return means eof.
    public abstract override string ReadLine();

    // Gets the next char and advances the cursor.
    int GetNextCharWorker()
    {
        // Return the current character
        if (m_line == null)
        {
            m_line = ReadLine(); // virtual
            m_idx = 0;
            if (m_line == null)
            {
                return -1; // eof
            }
            m_line += "\r\n"; // need to readd the newline
        }
        char c = m_line[m_idx];
        m_idx++;
        if (m_idx >= m_line.Length)
        {
            m_line = null; // tell us next time around to get a new line.
        }
        return c;
    }

    // Current buffer
    int m_idx = int.MaxValue;
    string m_line;
#endregion
}

// Helper class to build a  TextReader around a IEnumerator<string> collection.
// The derived class implements a string collection (via IEnumerator<string>), which this class
// then exposes each item via ReadLine() to a TextReader().
// See http://blogs.msdn.com/jmstall/archive/2005/08/08/textreader_yield.aspx for details.
public abstract class EnumeratorTextReader : ReadLineTextReader
{
    IEnumerator<string> m_source;
    public EnumeratorTextReader()
    {
        m_source = this.GetEnumerator();
    }
    // Derived class implements a string collection, which this class exposes as a TextReader.
    protected abstract IEnumerator<string> GetEnumerator();

    // Override TextReader.ReadLine. Our base class, ReadLineTextReader, implements TextReader.Read()
    // using this ReadLine() function too.
    public override string ReadLine()
    {
        if (!m_source.MoveNext()) return null;
        return (string)m_source.Current;            
    }
}

#endregion Text Reader Helper classes


// XML text stream.
// This is just like CounterXmlTextReader, but it is implemented with the "yield" keyword.
public class RangeXmlTextReader : EnumeratorTextReader
{
    int m_start, m_end;
    public RangeXmlTextReader(int start, int end)
    {
        m_start = start;
        m_end   = end;
    }
    protected override IEnumerator<String> GetEnumerator()
    {
        yield return "<range>";
        for (int i = m_start; i <= m_end; i++)            
            yield return "<int>" + i + "</int>";
        yield return "</range>";            
    }
}

class Program
{
    static void Main(string[] args)
    {        
        // Test just printing to the console:
        Console.WriteLine(new RangeXmlTextReader(1, 3).ReadToEnd());

        XmlReader reader = new XmlTextReader(new RangeXmlTextReader(1, 3));                

        // Test reading into an XML document
        XmlDocument doc = new XmlDocument(); 
        doc.Load(reader); 
        Console.WriteLine(doc.OuterXml);
    }
}