XML publishing functionality exposed as “FOR XML” in Microsoft SQL Server 2000 evolved quite a bit in SQL Server 2005. So, I thought I would document the matrix of what FOR XML results in. This article should help understand some performance characteristics and limitations of FOR XML as well as the difference between the following queries:

 

SELECT 1 a FOR XML RAW

SELECT (SELECT 1 a FOR XML RAW)

SELECT 1 a FOR XML RAW, TYPE

 

In SQL Server 2000 the server side XML publishing - FOR XML (see http://msdn2.microsoft.com/en-us/library/ms178107(SQL.90).aspx) - was implemented in the layer of code between the query processor and the data transport layer. Without FOR XML a SELECT query is executed by the query processor and the resulting rowset is sent to the client side by the server side TDS code. When a SELECT statement contains FOR XML the query processor produces the result the same way as without FOR XML and then FOR XML code formats the rowset as XML. For maximum XML publishing performance FOR XML does steaming XML formatting of the resulting rowset and directly sends its output to the server side TDS code in small chunks without buffering whole XML in the server space. The chunk size is 2033 UCS-2 characters. Thus, XML larger than 2033 UCS-2 characters is sent to the client side in multiple rows each containing a chunk of the XML. SQL Server uses a predefined column name for this rowset with one column of type NTEXT - “XML_F52E2B61-18A1-11d1-B105-00805F49916B” – to indicate chunked XML rowset in UTF-16 encoding. This requires special handling of the XML chunk rowset by the APIs to expose it as a single XML instance on the client side. In ADO.Net, one needs to use ExecuteXmlReader, and in ADO/OLEDB one should use the ICommandStream interface.

 

I’d like to discourage relying on 2033 character chunk size since this behavior can potentially change in future versions of the server.

 

For SQLOLEDB provider SQL Server 2000 introduces more optimizations. It addresses the most CPU-intensive operation during formatting a rowset as XML - conversion of various SQL types to strings. For that, when sending to OLEDB client, the server encodes XML into a proprietary binary XML format that allows transporting SQL types. This format is chunked the same way textual XML is and multiple binary XML chunks are sent to the client side in a rowset with one column of type IMAGE with the same predefined name. It’s then decoded in SQLOLEDB and exposed as a single XML stream. An added benefit of binary XML encoding is that it is often smaller comparing to textual XML. This optimization saves precious server CPU time and offloads the CPU-intensive conversion to the client.

 

The above should explain the following FOR XML limitations in SQL Server 2000:

1)      client code should either use a dedicated API (ExecuteXmlReader/ICommandStream) or special processing in order to view FOR XML result as a single XML instance/stream;

2)      FOR XML results cannot be easily persisted in the server space in the same statement (one usual workaround is to use an extra loopback connection into the same server using sp_OA* procedures);

3)      FOR XML is only supported with SELECT statements and not with data modification (DML) statements INSERT-SELECT/INSERT/UPDATE/DELETE;

4)      FOR XML results from a linked server which was linked through SQLOLEDB provider cannot be retrieved properly from a non-SQLOLEDB provider on the client side – the client is not able to decode the SQL Server 2000 binary XML format generated by the linked server.

The reason for all of them is improving XML publishing performance through streaming processing and using SQL Server 2000 binary XML format.

 

Among multiple SQL Server 2005 FOR XML enhancements (http://msdn.microsoft.com/library/en-us/dnsql90/html/forxml2k5.asp) there are

o       FOR XML is allowed in sub-query;

o       It is possible to assign FOR XML results to a parameter/variable or use sub-query FOR XML results in DML;

o       FOR XML can produce string typed XML or, with TYPE option, native XML type.

 

For example

DECLARE @xml_var XML

SET @xml_var =

(

  SELECT

    *,

    (

      SELECT *

      FROM Orders

      WHERE Orders.CustomerID=Customers.CustomerID

      FOR XML AUTO, TYPE

    )

  FROM Customers WHERE CustomerID='ALFKI'

  FOR XML AUTO, TYPE

)

 

This was possible by implementing FOR XML logic in an order sensitive extended aggregating operator. You can observe it by examining query plan for these new SQL Server 2005 FOR XML extensions:

 

However, performance characteristics of  FOR XML operator are slightly different from SQL Server 2000 FOR XML since the operator has to produce a single scalar value and cannot produce a rowset of XML chunks. For FOR XML with TYPE directive the operator produces scalar value of type XML, and value of type NVARCHAR(MAX) when there’s no TYPE directive. In order to preserve performance compatibility with SQL Server 2000 top level FOR XML with no TYPE in SQL Server 2005 is processed the same way it was in SQL Server 2000 – outside the query processor. That means that the SQL Server 2000 FOR XML limitations resulting from processing the top-level FOR XML outside the query processor are still present in SQL Server 2005 for top level FOR XML with no TYPE.

 

Also, because of some SQL parser restrictions, the following statement

 

INSERT t(xml_col)
SELECT 1 a FOR XML RAW

 

will still give a syntax error (“Msg 6819, Level 16, State 1, Line 2

The FOR XML clause is not allowed in a INSERT statement.”) but the following will work

 

INSERT t(xml_col)
SELECT (SELECT 1 a FOR XML RAW, TYPE)

 

For performance reasons the new XML type in SQL Server 2005 is stored within the server space in an optimized binary representation. This includes XML-typed result of SELECT…FOR XML…,TYPE. The optimized representation is different from SQL Server 2000 binary XML transport format. You can get more details on it in our XML Symposium 2005 publication at http://blogs.msdn.com/sqlprogrammability/attachment/570537.ashx. This format has to be converted to textual XML when sent to the clients that do not support it. .NET 2.0 SqlClient supports SQL Server 2005 internal XML representation. It is exposed as SqlXml object. Thus, when XML type is sent to .NET 2.0 SqlClient a XML conversion is not performed in the transport code layer.

 

More performance discussion.

Now you may wonder whether the new ways of XML publishing that SQL Server 2005 exposed – nesting FOR XML with sub-query syntax and adding TYPE directive - perform better than the good old SQL Server 2000-compatible top level FOR XML with no TYPE. There’s no straightforward answer since factors affecting XML publishing performance inside and outside the query processor space are different. From the feedback we have received so far the performance is adequate in most use cases whichever syntax you use. If adding “FOR XML …” to a query slows it down unacceptably you should try different ways to do the XML formatting both on the client and the server side and measure the performance. Performance factors to consider (server side FOR XML only):

o       In case of top level FOR XML with no TYPE the server does not have to form the whole result before outputting it since the result is sent to the client side chunk by chunk. FOR XML aggregating operator would form its output in a temporary LOB. To minimize performance impact temporary LOBs are cached in memory for up to 1MB of total size of all temporary LOBs per query (more for parallel queries). Larger LOBs are spilled into TEMPDB but the TEMPDB buffers will likely to stay in memory unless the query is long running and there’s strong memory pressure. Overall, for larger XML results the top level FOR XML with no TYPE may have slightly less latency and may perform a bit better comparing to sub-query FOR XML or FOR XML…,TYPE for simple FOR XML queries. This is when other factors do not have considerable impact.

o       AUTO and EXPLICIT mode of FOR XML generally require specific ordering of the relational input in order for the XML tag grouping heuristics to work properly. Such ORDER BY may translate into a costly SORT operator in the query execution plan. When XML tag hierarchy is formed by nesting FOR XML with sub-query syntax ORDER BY clause is optional (only needed for ordering sibling XML elements and not for nesting XML tags). Thus, the new SQL Server 2005 FOR XML syntax may help avoid costly SORTs.

o       Nesting FOR XML with sub-query syntax currently in most practical cases translates into Nested Loop join. However, there can be a combination of indices when joins and unions in AUTO and EXPLICIT modes of FOR XML are translated into more efficient MERGE or HASH joins, or MERGE unions. Thus, old style XML tag nesting may utilize more efficient join algorithms in presence of proper indices.

o       I already mentioned that top level FOR XML with no TYPE produces binary XML when the client provider is SQLOLEDB. This includes SQL Native Client – SNAC (SQLNCLI) – OLEDB provider introduced in SQL Server 2005. Generation of SQL Server 2000 binary XML format 1) requires less CPU for SQL type conversion on the server, 2) the result is often more compact than textual UTF-16 XML. While 1) is normally considered a good thing for the server, 2) may be important too if the result is sent over slower connections.
FOR XML…,TYPE produces binary internal representation of XML type which is also usually smaller than the textual representation and it is sent as is to .NET 2.0 SQL client. So, it can be a consideration when the result is sent over slower networks. In this case the SQL types to string conversion does happen since FOR XML…,TYPE results in untyped XML.

Now that I described the most important factors I still recommend measuring performance on your data. Of course, you don’t have a choice between the new and the old ways for FOR XML in assignment and DML statements but you may produce nested XML tags by nesting FOR XML AUTO, TYPE and FOR XML EXPLICIT, TYPE with the sub-query syntax.

 

In conclusion, let’s revisit what the 3 queries in the beginning produce:

 

SELECT 1 a FOR XML RAW

-- returns the same in SQL Server 2000 and 2005 -

--    a rowset of chunks containing resulting XML

--    with the predefined column name -

--    binary XML for SQLELEDB provider and textual UCS-2 XML otherwise.

 

SELECT (SELECT 1 a FOR XML RAW)

-- returns single row with

--    single unnamed column of type

--    NVARCHAR(MAX) containing the XML result

 

SELECT 1 a FOR XML RAW, TYPE

-- returns single row with

--    single unnamed column of type XML

--    containing the XML result;

-- it is sent as is to .NET 2.0 clients and

--    converted to textual XML otherwise.

 

Also, I’d like to point out that the first assignment statement in

 

DECLARE @xml_var XML

SET @xml_var = (SELECT 1 a FOR XML RAW)

SET @xml_var = (SELECT 1 a FOR XML RAW, TYPE)

 

involves NVARCHAR(MAX)àXML conversion while the second does not.

 

 

Eugene Kogan,

Technical Lead

Microsoft SQL Server Relational Engine

 

with thanks to Michael Rys – Program Manager at Microsoft SQL Server Relational Engine – for review and comments.