Update:Transformation times for Saxon processors have been remeasured and updated based on the feedback received from Dimitre Novatchev and Michael Kay. I also slightly altered the text below to reflect the change in Saxon command-line arguments.
Interestingly enough, the first live.com hit for the "XslCompiledTransform Performance" query at the moment is this post of Jeff Prosise, where he says he was disappointed that XslCompiledTransform ran just 3 times faster than XslTransform on a "fairly simple style sheet". He is concerned that XslCompiledTransform is not fast enough comparing to the good old MSXML 4.0. Well, as we will see very soon, XslCompiledTransform may easily outperform MSXML 4.0 by several times!
Here I compare transformation speed of different widely-used XSLT processors for several arbitrary chosen stylesheets. I deliberately do not consider many other important aspects, such as working set, start-up time, compilation time, scalability issues, etc., focusing on pure transformation time only. I fairly tried to make all processors compete on equal terms; however I could miss some important details, especially for Saxon, which I know very little about. So this post should in no way be considered as a thorough comparison of XSLT processors; you are encouraged to run your scenarios with different processors and pick the one that fits your needs in the best way.
Let's first briefly describe our today's contestants:
To run tests with MSXML I used the Msxsl.exe command-line utility. I had to tweak its code a little, because the -t option for measuring load and transformation times failed to work on CPUs faster than 2 GHz. The utility was developed around 09/2000, and apparently some of Microsoft developers did not realize how fast processors would become in 6 years! More precisely, this part of the Timer class constructor retrieves the frequency of the high resolution performance counter and rejects any value above INT_MAX = 2,147,483,647:
if (!::QueryPerformanceFrequency((LARGE_INTEGER *)&_freq) || _freq > INT_MAX)
// Counter not available
_freq = 0;
Below are the command-line arguments I used with Msxsl and Saxon. The number after -u specifies the version of MSXML to use, -o nul redirects output to the NUL device, so that file input/output operations affect our measurements in a minimal way. The undocumented -9 option forces Saxon to repeat the transform 9 times in a row, so that we obtain transformation time for the "warm" process. Unfortunately, the Msxsl utility does not provide a similar option, so for now MSXML 3.0/4.0 will be a little discriminated against. Both Saxon processors were run under Java™ 2 Runtime Environment version 1.4.2.
C:\XsltPerf>msxsl.exe -t -o nul -u 3.0 Kasparov-Karpov.xml chess.xsl
C:\XsltPerf>msxsl.exe -t -o nul -u 4.0 Kasparov-Karpov.xml chess.xsl
C:\XsltPerf>java -jar saxon6.5.5\saxon.jar -t -o nul -9 Kasparov-Karpov.xml chess.xsl
C:\XsltPerf>java -jar saxon8.7.3\saxon8.jar -t -o nul -9 Kasparov-Karpov.xml chess.xsl
Finally, for XslTransform and XslCompiledTransform I used the XsltPerf utility, presented in my previous post. The System.Data.SqlXml assembly was NGen'd, though I doubt it could considerably affect performance in the "warm" case. As a separate step, I verified that all processors produce the correct output.
For the first test, let's try the Queens stylesheet I used in the previous post. To not force you to read it, I recall here that this XSLTMark benchmark stylesheet, developed by Oren Ben-Kiki, finds all the possible solutions to the problem of placing N queens on an N×N chess board without any queen attacking another. XSLTMark uses N = 6, and the issue I immediately encountered was that one run of this scenario was executed too fast to make measurements quite reliable. So I tweaked its input file, which originally looked as <BoardSize>6</BoardSize> to make the stylesheet solving the same problem 20 times:
... 18 identical lines skipped here ...
Below are results for my Intel® Xeon® 3GHz box. Since XslCompiledTransform performance is affected by JIT-compilation on first use, as I described in my previous post, I give execution times of the first Transform call for this processor in parentheses. For example, for this stylesheet the first Transform takes about 53 ms, and subsequent ones take about 34 ms.
As you can see, MSXML 4.0 and XslCompiledTransform are much faster than other processors on this test; moreover, the latter is about 4 times faster than the former. I would like to note that the Queens stylesheet is rather artificial—it is an implementation of the backtracking algorithm in the language mainly oriented to deal with XML transformations. While it cannot be considered a real-world scenario, XslCompiledTransform performs really good even in that area. And if, in the past, performance issues might force you to implement similar helper functions in a general-purpose programming language, like C# or JScript, and call them using embedded scripts or extension objects technologies, now there is a greater chance you can implement those functions in XSLT itself and still have good performance.
For the following tests we take a couple of Sarvega XSLT Benchmark stylesheets, which represent real-world XSLT transforms. The Chess-FO stylesheet, developed by Anton Dovgyallo from the Russian Academy of Sciences, reads the sequence of moves in a chess game and produces a set of chess board diagrams, representing every intermediate position as a graphical image in the XSL-FO format:
Kasparov–Karpov1990 World Championship Game
Again, MSXML 4.0 and XslCompiledTransform are several times faster than other processors. And if the first transformation for XslCompiledTransform takes 2 times longer than for MSXML 4.0 due to JIT-compilation, subsequent ones are 4 times faster.
The DocBook-XHTML stylesheet, developed by Norman Walsh, transforms documents written in the DocBook format to XHTML. The input document used in Sarvega XSLT Benchmark is rather small—under 100 KB—and produces dozens of messages during its transformation. I had to redirect those messages to a file to minimize influence of xsl:message instructions on transformation time.
DocBook-XHTML is a huge stylesheet with thousands of templates, global parameters and variables, and you can see how badly JIT-compilation affects the first stylesheet run in case of XslCompiledTransform: 1970 ms versus 60 ms for subsequent runs. It would be really nice to have the ability to pre-compile and "pre-JIT" stylesheets, so you would not pay this price again and again on each application run, but currently the .NET Framework 2.0 does not provide means for that.
One can make a couple of conclusions from the results above:
Now it does not seem a coincidence that the last release of the Java platform, J2SE 5.0, replaced the Xalan interpreting processor with the XSLTC compiling processor as the default XSLT engine. And that Michael Kay, the creator of Saxon, is experimenting in the same direction. However, it is a very untrivial task to develop a compiler from an interpreter. As you remember, Microsoft had to discard the old interpreter code base and start from scratch twice—and their efforts led to creating swift and reliable XslCompiledTransform and MSXML 4.0 XSLT processors.