Welcome to MSDN Blogs Sign in | Join | Help
Writing Robust LINQ to XML Code that Performs Well

In the last three posts, in addition to the information regarding how we want to alter the markup in an Open XML document, I've made a few observations about how to write LINQ to XML code when modifying an XML tree in such a way that it becomes harder to introduce bugs.  In addition, I've also mentioned a couple of other points that you can take into consideration - you can write code that performs better if you know a bit about how LINQ to XML operates under the covers.

In Using LINQ to XML to Accept Revisions, I mention a couple of approaches to modifying/transforming an XML tree, and tell why I picked the approach I did for the particular problem I'm solving.

In Using LINQ to XML to Remove Personal Information, I go through a couple of idioms in detail, showing how to write robust code to modify an XML tree.

In Using LINQ to XML to Remove Comments, I discuss one issue to take into consideration so that you can write queries that perform well.

While my primary motivation in writing these posts is to show how to modify the Open XML markup to acheive the desired goals, there is some good information for LINQ to XML developers.  I felt that this might be lost behind the focus on Open XML markup, so I'm calling it out specifically here.

Posted: Monday, July 14, 2008 5:02 AM by EricWhite
Filed under: ,

Comments

int19h said:

One trick is to never, ever write queries or updates which retrieve or insert siblings before some particular element. In other words, avoid XNode.AddBeforeSelf(), XNode.ElementsBeforeSelf(), XNode.PreviousNode. This is because XNodes in an XContainer form a singly linked list, so only forward traversal is efficient, while backward requires restarting from the beginning of the container.

Because of this, IsBefore() and IsAfter() are not particularly efficient, either...

Interestingly enough, XPathDocument does not have this problem.

# July 14, 2008 1:56 AM

EricWhite said:

@int19h,

That is a good point.  I think that the design decision was to optimize for the most common scenarios, and keep the working set small.

-Eric

# July 14, 2008 5:45 AM

int19h said:

I understand the reasoning, but it's a real pity that MSDN docs don't say a word on this, and have outright deceiving samples (if you look the one for PreviousNode, it uses it in a while-loop to traverse nodes in reverse document order - that's O(n^2)!).

I've created a ticket for that at VS Connect:

https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=355102

# July 15, 2008 7:48 AM

EricWhite said:

Well, that would be my fault, since I wrote the LINQ to XML documentation. :P

Actually, I can't remember, but I vaguely recollect that the fact that nodes were forward only linked was an implementation detail that could in theory change in the future, so should not be put in the docs.

But what I can tell you is that I wasn't focused on node traversal as a primary scenario.  Instead, I wanted to teach people about using queries, and the functional approach.  There are a couple of scenarios where a developer will want to do node traversal, but in those cases, the developer will be aware that nodes are forward linked.

I'm not sure what a good example would be for previous node.  Suggestions?

-Eric

# July 15, 2008 11:28 AM

Doug Mahugh said:

Eric White on Linq and Open XML. Eric White has been posting some great code samples lately, including

# July 15, 2008 10:09 PM

int19h said:

> Actually, I can't remember, but I vaguely recollect that the fact that nodes were forward only linked was an implementation detail that could in theory change in the future, so should not be put in the docs.

You might not want to put the exact algorithmic complexity in the docs, but I don't see how it would hurt to write something like "the complexity of this operation is not guaranteed to be O(1)" in Remarks section.

> But what I can tell you is that I wasn't focused on node traversal as a primary scenario.  Instead, I wanted to teach people about using queries, and the functional approach.  There are a couple of scenarios where a developer will want to do node traversal, but in those cases, the developer will be aware that nodes are forward linked.

XLINQ is indeed mostly about queries, but this particular example I linked to was certainly about node traversal.

> I'm not sure what a good example would be for previous node.  Suggestions?

Ironically enough, reverse traversal is perhaps the only thing it's actually useful for ... so no other ideas here, sorry.

In reality, as it stands, any use of PreviousNode in a loop (even an implicit one, such as a LINQ query) is suspect.

# July 23, 2008 6:46 AM

EricWhite said:

Hi int19h,

Well, FWIW, I absolutely agree with you.  It should be mentioned.  I'll send an email to the current XML doc person.

-Eric

# July 23, 2008 9:15 AM

int19h said:

Great to hear; I'm looking forward to seeing that ticket closed as "Fixed" at Connect ;)

By the way, I sort of missed this:

> Well, that would be my fault, since I wrote the LINQ to XML documentation.

And just wanted to say that, on the whole, you did a great job there. It is one of the more interesting section of the Framework docs, with all the examples on how to do things (and how they can break if done wrong).

And of course the topics on pure/functional vs mutable/imperative approaches - "Functional Programming vs. Imperative Programming", "Refactoring Into Pure Functions", and "Introduction to Pure Functional Transformations" - I can only hope more people would read those. Though what makes me wonder is why the first two of those are under LINQ to XML, as they are really just as applicable to LINQ and C# in general; IMO, they should have been right there in the "Introduction to C#" sections (and thus our mission of world domination shall be complete, my brethren in lambda calculus... muhahhahaa!) ;)

# July 24, 2008 8:49 AM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker