Author: Kun ChengReviewers: Peter Carlin, Mike Ruthruff, Thomas Kejser, Nicholas Dritsas
Here are 3 quick tips from query performance perspective based on real customer experience.
1. Promoting key XML attribute/property to relational column. Promoting to relational column will get you to on-par performance with relational queries, the best you can get out of modern RDBMS. To achieve optimal performance and scalability with XML data, I strongly recommend you first consider this option. The idea is identifying hot elements or attributes that your workload queries on the most. Then add computed columns using user functions to promote the elements or attributes out of XML data. Note it does require application code change to modify queries to take advantage of the promoted columns. A simple example below is a table with “XMLDATA” that has a hot element “a1”. Here are steps to add a computed column to promote the element (Note adding the computed column is offline operation for the table).
CREATE FUNCTION udf_get_a1 (@xData xml)
DECLARE @a1 varchar(30)
SELECT @a1 = @xData.value('(/Dept/orders/a1/text())', 'varchar(30)')
ALTER TABLE Orders
ADD a1 AS dbo.udf_get_a1(XMLDATA) PERSISTED
CREATE INDEX ind_a1 ON Orders(a1)
Now the original query below
FROM Orders t
WHERE t.XMLDATA.exist(N'(/Dept/orders/a1/text())[.="Company10001Special"]') = 1
Needs to be rewritten as
In a customer lab, we observed exponential performance gain by using this method (1000+ times faster!!). To caution against over-promotion, when you add too many promoted columns, the overhead of index change could reduce the performance gain.
2. Typed or Untyped XML data? Typed XML means there is validating schema defined against the data. For untyped XML, the whole XML data is treated as a big string. To decide whether or not to define typed XML, you need to examine your XML query access pattern. Generally speaking, schema speeds up lookups since the data types of XML elements/attributes are known. But it would slow down INSERT due to overhead of validating new data. For UPDATE, it would benefit from faster lookups like SELECT, but incur same schema validation overhead like INSERT.
Typed (w/ schema)
Untyped (w/o schema)
3. What XML indexes to add? Again depending on query access pattern, you may choose different XML indexes. Note XML index strategy is different than relational index. Popular DTA (Database Tuning Advisor) tool wouldn’t be able to recommend XML indexes. Fortunately there are only 3 types of secondary XML indexes to consider in addition to the required primary XML index.
Secondary XML index type
Benefits this type of queries
XPath queries with explicit path expression (XMLcol.exist ('/Dept/Orders/[@id="10001"]') = 1)
XPath queries with no explicit path expression but with value predicate (XMLcol.exist ('//[@id="10001"]') = 1)
XPath queries with multiple row results
As straight forward as it sounds, you may find it difficult to examine your application and find out what appropriate indexes to add. The best way I recommend is to add all 3 types of XML indexes and test your workload to find out which XML indexes were used to determine the ones to keep (Be aware of rebuilding index affecting system availability). You can build XML indexes on either untyped or typed data. But you will get more gain with indexes on typed data. I see up to 50 times faster query response (SELECT queries) compared to plain untyped XML! However for DML queries, they could be significantly impacted by the overhead of XML index change plus schema validation. So again the key is to test your workload to find out if it makes sense to add XML indexes.
In conclusion, Promotion to relational column is the primary option you should consider when working with XML data. Whether or not to use XML schema and XML indexes, you need to be careful about your decision. Test your workload and identify the XQueries, which are critical to the overall performance. There is no DTA to help you, no query hints apply to XQueries either. Examine the query plans of those XQueries thoroughly to eliminate inefficiency.