Out of the Angle Brackets
XPath expressions are pretty flexible. This flexibility allows for very creative ways of using XPath. Unfortunately some of them are suboptimal and cause bad performance of apps. This is especially visible in Xslt transformations where stylesheets contains tens if not hundreds of XPath expressions. Here is the list of the most common bad practices (or even anti-patterns) I have seen:
This is a very common pattern that very often leads to serious performance problems. The way it works is that it flattens the whole subtree (the most common usage I saw is flattening the whole xml document) and then it looks for the specified elements. Now in the .NET Framework there aren’t any specific optimizations for this patterns and using it *is costly*. I tried to figure out in what scenarios using // is really useful or necessary and the only real case where I found it useful was when the Xml was designed to have elements with the same name on different levels (e.g. recursive xml document describing the file system where <folder> element can have <folder> child elements). Other than that I could not really find any usage that could not be replaced with a more efficient counterpart. Usually being as specific as possible in your query helps (and is enough). Why do people use // then?
Yes, I know that many books, articles or blogs that teach Xslt use //. I suppose the reason there is that when you start learning Xslt you probably don’t know what the XPath is and // is the simplest thing that works. Unfortunately even after people learned XPath they continue to use // all over the place and complain that Xml/Xslt/XPath is slow (real story: I was refactoring a stylesheet that was slow – the transformation took 2 minutes and it was a web application. Just getting rid of all // (none of them was necessary) the transformation time decreased to 5 seconds).
While sometimes it may be beneficial to use an absolute XPath expression (i.e. starting from the document element - /) it is not usually the case. Yet I saw Xslt stylesheets that were using only absolute XPath expressions. This is not only costly (though not as costly as //) but in a lot of cases it is just plainly wrong. If you are positioned on an element and you want to process all the child elements of this element using an absolute path will work only if the element you are positioned on does not have any sibling elements with the same name. Otherwise you will start processing all the child elements of all the siblings with the same name as the element you are positioned on.
Take a look at the following expression: a/b/c[../../@id = 3]. What is happening here? We drill into the tree only to go up to check the condition on one of the ancestor elements. As a result we may process hundreds of <b> and <c> elements without any reason (because the condition is not satisfied). In addition the expression is not really readable. How about this expression: a[@id = 3]/b/c? The result should be the same as in the case above but it can be much faster because now we will not process elements we are not really interested in. It is also much more readable – I don’t need to virtually traverse the Xml document to be able to tell what element the condition applies to.
document() function has to go and access an external file or a network resource. Accessing file system (let alone network resource) is much more expensive than accessing memory. So instead of doing this in a loop (by loop I mean either xsl:for-each or xsl:template invoked more than once) just select what you really need once, store it in a variable and use the variable instead of document() method.
XPath has 13 different axes. From what I noticed people don’t know about XPath axes although they use them almost in each XPath expression they write. The reason for this is that the default axis is child axis and can be omitted. As a result /child::a/child::b (child:: denotes child axis) becomes /a/b. Similarly . is a shortcut for self::node() and .. is a shortcut for parent::node() (so, parent::node()/child::b becomes ../b). With these shortcuts (and // which is another shortcut – this time for /descendant-or-self::node()/ axis and which you should not be using without a good reason) people are able to address most of their needs without even having to know about axes. It’s more like “this is how XPath works” rather than “I really want to navigate the tree this way” (although in most cases the result of these two will be pretty much the same).
Here is an example – I want to access the next element on the same level that the element I am positioned on. It’s pretty easy with the following-sibling axis: following-sibling::*. Without using the following-sibling axis the expression can get complicated – you would probably have to capture the current position in a variable, go to the parent element and then address the element you are interested in using an index calculated based on the variable.
If you happen to use any of the above patterns or you think you are not using the right axes take a closer look at your stylesheets and think whether you could improve things and give a nice boost to your queries or transformations. Have I mentioned that you should avoid using //?