Yesterday we released our first Service Pack for SQL Server 2005. You might remember a while back I had written a small post about being hard at work on SP1. Back then I avoided giving any details for fear of running afoul of corporate policy but since the bits are officially out, I can now talk about the changes that were made to the XML datatype functionality.

 

The very first thing we did was increase the number of schemas we can import by accepting some non-deterministic schemas. In short, non-deterministic content models are now acceptable, providing that the occurrence constraints are “0”,”1” or “unbounded”. For other values, we have to keep rejecting the schemas. The reason is that for “0”, “1” and “unbounded” we can build a basic automaton to handle validation of instances. When you use integer values strictly greater than one in your constraints, you get into cases where in order to validate properly you might sometimes have to backtrack and implementation becomes more complex. So far I haven’t seen a single industry schema that contains non-deterministic content models where occurrence constraints aren’t all “0”, “1” or “unbounded” (if you know of any please let me know)
With this improvement we’re now able to import the complete set of Office 2003 schemas into SQL Server 2005 (providing you remove all unique, key and keyref nodes and change all lax wildcards to skip but as we saw in this week’s other post, that’s not very hard to do).
The developer in charge of this change was Brandon and he contributes to the SQL Programmability & API Development Team Blog. If you were tired of seeing the “XML instances of the content model of type or model group '...' can be validated in multiple ways and are not supported.” error message and this fix changed your life he’s the man to thank.


Our second improvement has to do with query performance. Static typing makes things slower when querying typed XML vs. untyped XML. In particular, when dealing with very large schema collections and complex XPath expressions static typing was a big consumer of both memory and CPU cycles. Our static typing dev AdrianB came up with implementation changes that speed up the computation of XQuery static types a great deal.
For example, we had imported all the fpML schemas (available from  http://www.fpml.org) into a schema collection and compiled the following query:
select convert(xml(fpML), '').query('//*:party')
On my office server this used to take 99.952 seconds with the RTM bits. With SP1 installed, it goes down to 0.328 seconds. That's over 300 times faster! As for memory consumption the number of pages is divided by almost 140. You shouldn’t expect such drastic improvements on all queries but if you use the descendant or descendant-or-self axis in your XPath expressions you're likely to notice the difference.
The code changes were significant and required something in the neighborhood of 175 new hand-crafted tests. Fortunately Adrian is a great guy to work with and in the end everything came together fine. He doesn’t have a blog yet but if you experience significant performance increases with your queries you should know he’s the one who made it happen.

 

I hope this post will make you want to try SP1 for yourself. Don’t hesitate to drop me a comment to share your experiences, good or bad.

 

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.