Oleg Tkachenko has a nice post comparing the StAX (java) and XmlReader (.NET and XmlLite) approaches to streaming over a potentially large XML data source and filtering out unwanted elements. He concludes:
if you work with StAX you can readily work with .NET XmlReader and the other way. Great unification saves hours learning for developers. I wonder if streaming XML processing API should be standardized?
We've been discussing how to add streaming capabilities onto LINQ to XML for some time now. The value proposition is something like: Our target audience will sometimes encounter large documents or arbitrary streams of XML; they want the ease of use that LINQ to XML offers, but they don't want to have to load an entire data source into an in-memory tree before starting to work with it. They could use XmlReader, of course, but that is a considerably lower-level API that requires attention to all sorts of details of XML syntax that we know mainstream developers don't want to worry about. Let's offer some easy to use methods to allow LINQ to XML users to load a well-structured XML data source in definable chunks that can be worked on one at a time.
The obvious way to do this is imperatively , much like StaX or XOM does: the user writes a filter function / subclass and the XML API uses that to determine which elements in the XML source to pass through to the calling application. We think, however, that the better way is to do it more declaratively -- specifying what to do rather than how to do it. We're not ready to publicize a specific streaming input API, but let's talk about why it's worthwhile to avoid the easy (and arbitrarily powerful!) imperative filtering approach.
Consider, for example, Eric White's post on using the querying style that LINQ supports rather than the traditional imperative approach to process large text files. He follows up with another post explaining why he thinks this is so cool. Taking Eric's points and elaborating / extending them a bit, here are some concrete reasons for using the declarative / functional style rather than the imperative style:
OK, I admit that a lot of the concrete advantages are maybe going to be realized in some fuzzy future timeframe, not immediately. Nevertheless, we're trying to design a solid foundation today on which to build tomorrow, so we're trying to avoid making quick and easy design choices that will limit options in the future.
Finally, Ralf Lämmel will be presenting a paper on this topic at the XML 2006 Conference, and I'm sure he will be able to explain it all in much more detail than I can!