[Blog Map] This blog is inactive. New blog: EricWhite.com/blog
I was in a meeting this afternoon, and someone said that they wished that there was a comparison of XPath expressions and LINQ to XML queries. Well, this already exists in the LINQ to XML documentation. The topic title is LINQ to XML for XPath Users. If you already speak XPath, this can provide a head-start on learning how to write LINQ to XML queries.
I might have missed something, but at a glance, I've found that it omits one very important fact, namely: all XPath expressions treat nodesets as true sets, and combine them using set union, and not plain concatenation. Where this matters is stuff like this:
Consider XML like this:
The XPath expression above would yield two <bar/> nodes in the tree, each one just once. On the other hand, according to those MSDN articles, here's the corresponding XLINQ query:
However, in practice, this one will yield three nodes, because bar[@id=1] will appear in the resulting sequence twice (because it's a descendant of both foo's). Of course, bringing it in line is a matter of a simple call to Distinct(), but it might be worth mentioning it.
By the way, a similar misconception is present in the translation of XPath union operator. It is suggested to use Concat() for it, which has the same problem - it won't discard duplicates automatically. Union() would be a more faithful choice, even if slower.
The translation of "//" is also incorrect - it suggests using Descendants(), which will work in most cases, but not when someone will try to translate position() as well (which is a subject of another article there). Technically, the correct translation of ".//b" that would also allow to translate position() to Select() with an index-taking lambda would be this:
Actually, sorry, I was wrong about that last one - it's even more complicated now that I think of it. In fact, the topic on translating position() could really try to cover some more complicated (and real) cases. Consider this XPath:
/root/foo/bar[position() mod 2 = 2]
A naive XLINQ translation, blindly following the guidelines set in the MSDN article, would be this:
doc.Elements("root").Elements("foo").Elements("bar").Select((x, i) => i % 2 == 1)
But, of course, this doesn't work, since position() inside an XPath filter expression is relative to the last step, not to the expression as a whole. The following XML demonstrates the problem.
The earlier XPath expression will yield nodes with id=1,3,5. XLINQ expression will yield nodes with id=1,3,6.
The correct translation in this case would be:
from foo in doc.Elements("root").Elements("foo")
from bar in foo.Elements("bar").Select((x, i) => i % 2 == 1)
All this also applies to "//", of course. But, since the only time there is a difference between "//" and "/descendant::" is when position() is involved, it's more obvious there.
In general, for the most general XPath -> XLINQ translation, a good rule of thumb is that each step has to be rewritten as a distinct "from" clause of a LINQ query, and each filter should immediately follow that "from".
Maybe you should check this post: