Welcome to MSDN Blogs Sign in | Join | Help

XQuery Candidate Recommendation Documents Released

The official candidate recommendation versions of the XQuery specifications have been published by the W3C at http://www.w3.org/TR/#CR. The details of the specific documents are as follows:

XQuery 1.0: An XML Query Language
http://www.w3.org/TR/2005/CR-xquery-20051103/

XQuery 1.0 and XPath 2.0 Functions and Operators
http://www.w3.org/TR/2005/CR-xpath-functions-20051103/

XQuery 1.0 and XPath 2.0 Data Model (XDM)
http://www.w3.org/TR/2005/CR-xpath-datamodel-20051103/

XQuery 1.0 and XPath 2.0 Formal Semantics
http://www.w3.org/TR/2005/CR-xquery-semantics-20051103/

XSLT 2.0 and XQuery 1.0 Serialization
http://www.w3.org/TR/2005/CR-xslt-xquery-serialization-20051103/

Posted by mrorke | 0 Comments

Dynamic XQuery Expressions

We have had several requests through our message boards for details on how to execute dynamic XQuery expressions. A dynamic XQuery expression is one whose exact details are not know at the time when the query is written. For example, consider the following XML fragment stored in a single XML data-type instance:

<foo id="1">foo 1</foo>
<foo id="2">foo 2</foo>
<foo id="3">foo 3</foo>

Suppose we have some client side logic where the user is able to choose which of the 3 foo elements that want to return. The XQuery expression used in this case is going to depend on the particular item requested and will not be known until the user makes their decision (i.e. it can not simply be hard-coded as a string). Since the XML data-type methods require a string constant as their input, there is no way to directly specify dynamic XQuery expressions. There are some workarounds for this problem, depending on exactly what part of the query is dynamic.

The easiest option is where the query simply needs to be parametrized. As is the case with the example above, the dynamic part of the query simply consists of a constant value that is different for each query execution. The easiest way to parametrize a query is to use the sql:variable() XQuery function and store the parameter value in the associated T-SQL variable. For example:

declare @index int
set @index=2

select data.query( N'
/foo[position()=sql:variable("@index")]
 ') from test

In this case, the index for the required item is stored in the T-SQL variable @index and this is then referenced in the XQuery expression through the sql:variable() function. Notice that we can not simply use the value of the sql:variable() function as an ordinal predicate - rather, we need to explicitly compare it's value to that of the position() function. This is due to a limitation of the XQuery implementation inside SQL Server 2005 that requires ordinal predicates to be numeric literal values (i.e. they can not be something that changes at query execution time).

While the parametrization of queries is a powerful tool, it does not solve the general problem of being able to execute truly dynamic XQuery expressions. In order to do this, you need to use the T-SQL exec function. The T-SQL exec function allows any string value to be executed as a T-SQL expression. For example:

declare @query nvarchar(200);

set @query=N'
select data.query( N''
/foo[1]
 '') from test1'

exec( @query )

The part of the query marked in red is the dynamic XQuery expression. Notice that the entire T-SQL expression (i.e. including the select ... from part of the statement) needs to be included in the dynamic query string. Also, notice that the ' characters that are used to contain the XQuery part of the expression need to be escaped (as '') in order to make the query string valid.

While this is not an ideal solution to the problem of dynamic queries, most cases can be handled by the parametrization option and the exec function should only be used as a last resort.

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 1 Comments

XQuery Test Suite - new version 0.7 released

The XQuery Testing Task Force is pleased to announce the publication of version 0.7 of our XQuery test suite. Please visit http://www.w3.org/XML/Query/test-suite/ for details.
Posted by mrorke | 0 Comments

Static Typing Consequences for the Parent Axis and Attribute Types

I posted a previous blog entry dealing with the static typing consequences of using the parent axis in your XQuery expressions over XML data inside SQL Server 2005 (Static Typing Consequences of the Parent Axis). In addition to the general issue of typing information loss associated with the parent axis, there is an additional issue that is related to attributes in particular. This issue is best explained through an example. Consider the following XSD schema definition:

create xml schema collection attribute_types as N'
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">	
	<xs:element name="root">
		<xs:complexType>
			<xs:choice maxOccurs="unbounded">
				<xs:element name="pointer">
					<xs:complexType>
						<xs:choice>
							<xs:element name="sibling" type="xs:string" />
						</xs:choice>
						<xs:attribute name="singleton_attribute" type="xs:IDREF" />
						<xs:attribute name="list_attribute" type="xs:IDREFS" />
					</xs:complexType>
				</xs:element>
				<xs:element name="target">
					<xs:complexType>
						<xs:attribute name="ID" type="xs:ID" />
					</xs:complexType>
				</xs:element>
			</xs:choice>
		</xs:complexType>
	</xs:element>
</xs:schema>'

In this case, we have defined a schema for a piece of XML that uses ID and IDREF types to create links between different elements. The important thing to notice here is the difference between the types of the singleton_attribute and list_attribute attributes of the pointer element. Note that the singleton_attribute has a type of xs:IDREF (implying a single IDREF value), while the list_attribute has a type xs:IDREFS (implying a list of IDREF values). Another key piece of information is the way in which list types are handled in XQuery. XQuery does not have a concept of list types as such, and handles these by simply treating their values as a sequence of singleton simple types. This is best illustrated by examining the static types of some queries over an instance of this data. Taking the following XML instance:

declare @x xml(attribute_types)
set @x='
<root>
	<pointer singleton_attribute="A">
		<sibling>test</sibling>
	</pointer>
	<pointer list_attribute="A B">
		<sibling>test</sibling>
	</pointer>

	<target ID="A"/>
	<target ID="B"/>
</root>'

If we execute a query to select the value of a single instance of the singleton_attribute:

select @x.query('
data( (/root/pointer)[1]/@singleton_attribute )
')

Examining the static type generated by this query, we get the following - xs:IDREF ? i.e. a single instance (optional) of an IDREF value. If we now execute a query to select the value of a single instance of the list_attribute:

select @x.query('
data( (/root/pointer)[2]/@list_attribute )
')

Examining the static type generated by this query, we get the following - xs:IDREF * i.e. multiple (optional) IDREF values. So, the result of selecting a single attribute of an XSD list type is a sequence of values, not a singleton. This is very useful since we can now use standard XQuery semantics (e.g. the ordinal predicate or FLWOR iteration) to navigate through this list of values and find the one we want.

This all works fine since the data function is able to handle sequences of values. But, remember that a lot of the functions and operators in XQuery require a singleton value and these will fail if applied to the value(s) of the list_attribute.

Now, remembering that the parent axis causes a loss of type fidelity and understanding that the above semantics require that the type of the attribute be known - what happens when we combine the parent axis with these attributes. Taking the following query against the singleton_attribute:

select @x.query('
data( (/root/pointer)[1]/sibling/../@singleton_attribute )
')

If we examine the static type generated by this query, we now get the following: xdt:anyAtomicType *. Two things to notice here, firstly we have lots type fidelity (IDREF has become  anyAtomicType) and secondly we have gone from an (optional) singleton value to a sequence! In fact, if we run a similar query against the list_attribute:

select @x.query('
data( (/root/pointer)[2]/sibling/../@list_attribute )
')

We get the exact same static type generated! The reasoning behind this is that, since we have lost all type fidelity by using the parent axis and there is a possibility that the attribute selected was a list type, we assume for the purposes of static typing that the result is a sequence i.e. we have lost the fact the the singleton_attribute was a singleton value!

This in and of itself is a problem, but (in most cases) the operations defined to deal with IDREF values can handle sequences of values, so this problem is moot in this case. But, armed with our knowledge of how the parent axis affects both the type and cardinality of attribute types, take the following new XSD schema:

create xml schema collection attribute_types2 as N'
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">	
	<xs:element name="root">
		<xs:complexType>
			<xs:choice maxOccurs="unbounded">
				<xs:element name="value">
					<xs:complexType>
						<xs:choice>
							<xs:element name="sibling" type="xs:string" />
						</xs:choice>
						<xs:attribute name="integer" type="xs:integer" />
					</xs:complexType>
				</xs:element>
			</xs:choice>
		</xs:complexType>
	</xs:element>
</xs:schema>'

In this case, we have an attribute that stores a single integer value. Taking the following instance of the above schema:

declare @x xml(attribute_types2)
set @x='
<root>
	<value integer="12">
		<sibling>test</sibling>
	</value>
</root>
'

We can use this attribute value in conjunction with the arithmetic operators to do some calculations:

select @x.query('
(/root/value)[1]/@integer + 10
')

This works fine and returns the expected result of 22. Now, if we add in a parent axis to the mix:

select @x.query('
(/root/value)[1]/sibling/../@integer + 10
')

We get the following error message returned:

Msg 2389, Level 16, State 1, Line 11
XQuery [query()]: '+' requires a singleton (or empty sequence), found operand of type 'xdt:anyAtomicType *'

So, even though we had no list types in our schema at all. Because of the loss of typing fidelity from the parent operator and the necessary assumption that any attribute might be a list type in this case, we loose cardinality fidelity too.

Again, the moral of the story is to use the parent operator with extreme caution :)

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 0 Comments

XML Data-type Usage By Other SQL Server 2005 Features

Several of the other features within SQL Server 2005 leverage the XML data-type to pass information around. For example, both Service Broker and triggers transfer data around the server in XML data-type instances. This means that you can easily leverage the power of the XQuery language and other XML data-type methods when using these features.

To illustrate this, we create a simple trigger on our database that fires when we create a new table:

create trigger CreateTableTrigger on database for create_table as 
select eventdata()

In this case, our trigger is simply outputting the result of the eventdata() function each time a table is created. If we now create a table called Test in this database, we get the following XML instance returned:

<EVENT_INSTANCE>
  <EventType>CREATE_TABLE</EventType>
  <PostTime>...</PostTime>
  <SPID>51</SPID>
  <ServerName>...</ServerName>
  <LoginName>...</LoginName>
  <UserName>dbo</UserName>
  <DatabaseName>master</DatabaseName>
  <SchemaName>dbo</SchemaName>
  <ObjectName>Test</ObjectName>
  <ObjectType>TABLE</ObjectType>
  <TSQLCommand>
    <SetOptions ANSI_NULLS="ON" ANSI_NULL_DEFAULT="ON" ANSI_PADDING="ON" QUOTED_IDENTIFIER="ON" ENCRYPTED="FALSE" />
    <CommandText>create table [Test](c1 int )</CommandText>
  </TSQLCommand>
</EVENT_INSTANCE>

I have removed some of the information from the XML for security reasons :) Basically, we see from this that the eventdata() function, called from within a trigger, returns an instance of the XML data-type. This means that we are free to go ahead and apply any of the XML data-type methods to this in order to customize the output from our trigger. If we drop the trigger we created earlier and create a new one with the following definition:

drop trigger CreateTableTrigger on database
go

create trigger CreateTableTrigger on database for create_table as 
select eventdata().value('( /EVENT_INSTANCE/ObjectName )[1]', 'nvarchar(max)')

Now, if we create a new table (say Test2), we get the following output:

Test2

Which is exactly what we expect if we run the value() method above over the XML returned from the event.

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 1 Comments

Static Typing Consequences of the Parent Axis

SQL Server 2005 has an XQuery implementation that provides full support for the parent axis, as defined in the XPath specification. Since our XQuery implementation is statically typed, we also support the static typing of the parent axis, as defined in the XQuery Formal Semantics document. The definition of static typing of the parent axis has a number of nuances that may not be entirely obvious without a detailed examination of it's definition. In essence, the main side effect of using this axis is that typing information is lost! This is best illustrated with an example.

Firstly, we create a simple schema with some typed elements that we will later query:

create xml schema collection small_type as N'
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
	<xs:element name="small">
		<xs:complexType>
			<xs:sequence>
				<xs:element maxOccurs="unbounded" name="item">
					<xs:complexType mixed="true">
						<xs:sequence>
							<xs:element name="embed-string" type="xs:string" nillable="true"/>
							<xs:element name="embed-int" type="xs:int" />
						</xs:sequence>
						<xs:attribute name="index" type="xs:unsignedByte" use="required" />
					</xs:complexType>
				</xs:element>
			</xs:sequence>
		</xs:complexType>
	</xs:element>
</xs:schema>'

Then we create a table and insert some instance data:

create table small_typed( id int IDENTITY PRIMARY KEY, xml_data xml(small_type), empty int)
go

insert into small_typed( xml_data ) values ('
<small>
	<item index="1">
		Item 1
		<embed-string>String Item 1</embed-string>
		<embed-int>1</embed-int>
	</item>
</small>
')

The next step is to set up an XQuery static type SQL profiler trace so we can examine the effects of various path expressions on the static type of the query. Once that is done, we execute the following path expression:

select xml_data.exist('
/small/item/embed-string
') from small_typed

From the SQL profiler trace, we see that the static type of this expression is:

element(embed-string,xs:string) *

Which is exactly what we expected, based on our XSD schema. Lets try another query, this time targeting the index attribute of the item element:

select xml_data.exist('
/small/item/@index
') from small_typed

In this case, the static type returned is the following:

attribute(index,xs:unsignedByte) *

Again, precisely what we expected based on our XSD schema. Now, throwing the parent axis into the mix, we execute the following query:

select xml_data.exist('
/small/item/embed-string/../@index
') from small_typed

If we take a look at the static type that results from this query, we now get the following:

attribute(index,xdt:anyAtomicType) *

Notice that the static type of the expression is now the generic xdt:anyAtomicType type, instead of xs:unsignedByte (which is what we would expect based on our XSD schema!). This loss of typing has consequences when the result of these expressions are used for further processing e.g. if they are passed through to functions.

Consider the following query which uses the XQuery number() function to extract and parse the value of the index attribute:

select xml_data.query('
number( (/small/item)[1]/@index )
') from small_typed

This query executes fine. We now alter the path expression to use the parent axis as follows:

select xml_data.query('
number( (/small/item/embed-string/..)[1]/@index )
') from small_typed

We now get the following error message:

Msg 2364, Level 16, State 1, Line 2
XQuery [small_typed.xml_data.query()]: Cannot implicitly convert from 'xdt:anyAtomicType ?' to 'xs:double'

Essentially, the use of the parent axis (and it's associated loss of typing) can mean that the resultant static type precludes the use of the expression with many of the XQuery functions and other expressions that are type dependent. As a further example of this, consider the following query where we use the + operator on the value of the index attribute:

select xml_data.query('
(/small/item)[1]/@index + 1
') from small_typed

This query executes and returns the expected results. Once again, if we alter the path expression to include the parent operator as follows:

select xml_data.query('
(/small/item/embed-string/..)[1]/@index + 1
') from small_typed

The query now returns the following error message:

Msg 9308, Level 16, State 1, Line 2
XQuery [small_typed.xml_data.query()]: The argument of '+' must be of a single numeric primitive type or 'http://www.w3.org/2004/07/xpath-datatypes#untypedAtomic'. Found argument of type 'xdt:anyAtomicType ?'.

The moral of the story - use the parent axis with caution :)

-
Disclaimer:
This posting is provided “AS IS” with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 0 Comments

XML Query Test Suite - version 0.6 released

The W3C XQuery Testing Task Force (of which I am a member) has released version 0.6 of the XML Query Test Suite. This is currently available at http://www.w3.org/XML/Query/test-suite/. The test suite consists of XQuery expressions, results and source files that test various different aspects of an XQuery implementation against the latest published drafts of the various XQuery specifications.
Posted by mrorke | 0 Comments

Querying Over Constructed XML Using Sub-queries

There are a number of cases where it would be useful to be able to run XQuery statements over the result of an XQuery expression which constructs new values. Unfortunately, directly querying over constructed XML is not supported in SQL Server 2005, so we need to use a work-around. Since the result of a query() method invocation is a new instance of the XML data-type, we can simply assign the results of a query to a variable, then perform further querying over that variable. While this will be acceptable for most cases, it is also possible to do this in a more direct manner, using sub-queries.

To illustrate this, I will use an example from our customer message boards. This particular case involved extracting the individual values from a list-typed simple value, using the nodes() method to create a separate relational row for each entry in the list. To start off with, I will create a simple schema which defines a the value of the foo element to be a string based list type:

create xml schema collection listExample as '
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="foo">
	<xs:simpleType>
		<xs:list itemType="xs:string" />
	</xs:simpleType>
</xs:element>
</xs:schema>'

We now create an instance of the xml data-type using this schema and some values:

declare @xml_data xml(listExample)
set @xml_data='<foo>Entry1 Entry2 Entry3</foo>'

The XQuery data() function will extract each individual entry from the list type. Thus, if we run the following query, we get the 3 'entry' items returned:

select @xml_data.query( 'data(/foo)' )

At this point, we would simply execute the nodes method over this XQuery expression to extract each of the individual values into a separate relational row. For example, using the following query:

select ref.value('.', 'nvarchar(100)')
from @xml_data.nodes('data(/foo)') T(ref)

Unfortunately, if you try to execute this query, you get the following error message returned:

Msg 2374, Level 16, State 1, Line 8
XQuery [nodes()]: A node or set of nodes is required for 'nodes()'

Since the data() function causes the creation of new items which are not stored in the base XML data-type instance, we are unable to execute the nodes() method over the result of this expression! In order to run a nodes() method over this, we have to construct a new XML data-type instance in a sub-query and run nodes over the result of that. This can be done as follows:

select T.ref.value('.', 'nvarchar(100)')
from 
(
      select [Xml]=@xml_data.query('
			for $i in data(/foo) return 
			element temp { $i }
		')
) A
cross apply A.Xml.nodes('/temp') T(ref)

The sub-query (shown in green) extracts the items out of the list-typed element node foo, returning each individual item as the text node of a newly constructed element named temp. Since the result of the query expression is a new XML data-type instance, we can now run nodes() over this sub-query to get the desired results.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 0 Comments

Working with XML from Multiple Instances

There are often cases where we need to perform operations based on multiple XML instances. To illustrate this, consider the example XML given below. Basically, we have two XML instances. The first instance (call this the results instance) contains information about the execution of a set of tests. Each test entry contains a name for the test, the result of the execution and an ID to the section of the specification that the test targets. The second instance (call this the sections instance) contains detailed information about the various sections of the specification, along with an identifying ID for each section.

Results Instance

<results>
	<item name="First Item" section-id="Section1" result="PASS" />
	<item name="Second Item" section-id="Section1" result="PASS" />
	<item name="Third Item" section-id="Section2" result="PASS" />
	<item name="Fourth Item" section-id="Section3" result="FAIL" />
	<item name="Fifth Item" section-id="Section3" result="PASS" />
	<item name="Sixth Item" section-id="Section3" result="FAIL" />
</results>

Sections Instance

<sections>
	<section name="First Section" id="Section1" />
	<section name="Second Section" id="Section2" />
	<section name="Third Section" id="Section3" />
</sections>

The report we would like to produce over this data shows the results of each tests, along with a detailed description of the section of the specification that the test targets. For example, the final data we want may look like the following:

<result name="First Item" section="First Section">PASS</result>
<result name="Second Item" section="First Section">PASS</result>
<result name="Third Item" section="Second Section">PASS</result>
<result name="Fourth Item" section="Third Section">FAIL</result>
<result name="Fifth Item" section="Third Section">PASS</result>
<result name="Sixth Item" section="Third Section">FAIL</result>

In order to do this, we essentially need to perform a join across the two instances. Using standard XQuery and the document function, we could write a FLWOR expression to do this, along the lines of the following:

let $sections := document("sections.xml")/sections
for $result in document("results.xml")/results/item
return
<result 
	name="{$result/@name}" 
	section="{$sections/section[id=$result/@id]/@name}">
{ $result/text() }
</result>

Unfortunately, XQuery in SQL Server 2005 will not execute this query. Firstly, we do not support the let syntax. Secondly, there is no document() (or equivalent) function. So, we need to re-write this query to use T-SQL to get around the limitations of our XQuery implementation. In order to perform the join operation, we need to extract the relevant fields out of the XML instances and turn them into relational data, so we can use them in the standard T-SQL context - the recommended method for doing this is a combination of the the value() method for converting the XML into a relational type, and the nodes() method for extracting the multiple values from the single XML instance.

We use nodes() to extract the multiple item and section elements from the pair of XML instances. Once these are extracted, we can use the value() method on each individual entry to convert it to a relational value, using these to perform the join. The query to do this is as follows:

select R.i.value('@name', 'varchar(30)')   [Name], 
       S.s.value('@name', 'varchar(30)')   [Section],
       R.i.value('@result', 'varchar(30)') [Result]
from Results  cross apply xml_data.nodes('/results/item') R(i),
     Sections cross apply xml_data.nodes('/sections/section') S(s)
where R.i.value('@section-id', 'varchar(30)') = S.s.value('@id', 'varchar(30)')

The join operation in this case is being done through the where clause of the select statement. Running this query will produce the following relational result set:

First Item	First Section	PASS
Second Item	First Section	PASS
Third Item	Second Section	PASS
Fourth Item	Third Section	FAIL
Fifth Item	Third Section	PASS
Sixth Item	Third Section	FAIL

If a relational results is what you wanted, then you are done :) The initial goal was to output XML though, so we can add a for xml statement to this query to take these relational results and re-compose them back into XML. The query including the for xml statement is as follows:

select R.i.value('@name', 'varchar(30)')   [@name], 
       S.s.value('@name', 'varchar(30)')   [@section],
       R.i.value('@result', 'varchar(30)') [text()]
from Results  cross apply xml_data.nodes('/results/item') R(i),
     Sections cross apply xml_data.nodes('/sections/section') S(s)
where R.i.value('@section-id', 'varchar(30)') = S.s.value('@id', 'varchar(30)')
for xml path('result'), type

This query then produces the expected results from above. The for xml statement itself is pretty self-explanatory. The string (result) given as the path parameter is the name of the container element, while the column names of the various values indicate how they will be stored. Notice that the column names for @name and @section correspond to the name and section attributes respectively, while the text() column becomes the text value of the element.

Thanks again to Jinghao for help with the for xml part of the query.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 2 Comments

Namespace Nuances

XQuery uses the concept of a namespace, adopted from the XML 1.0 specification. But, due to the complex nature of the XQuery language and it's dual roles as both query and serialization/XML construction language, there are several nuances to the use of namespaces that may not be immediately obvious.

The first issue I will deal with is the concept of a default namespace. In XML 1.0, if you used an xmlns attribute without an associated name, all elements under that scope were considered to be in the namespace referenced by the associated URI. Notice I said all elements - attributes are not affected by default namespaces in XML. This functionality is mirrored in XQuery, either through the use of the declare default element namespace prolog entry, or by directly constructing the xmlns attribute on your constructed XML. Again, in XQuery the default element namespace does not affect non-element data i.e. attributes, function names and variables (all of which may have associated namespace prefixes) must be explicitly namespaced if required (NOTE: SQL Server 2005 XQuery does not currently allow variables to be placed into a namespace, even though this is allowed in the XQuery specification). Function names in XQuery have their own default namespace entry, accessible through the declare default function namespace prolog option.

In XQuery a previously declared namespace prefix can be used to add a namespace to elements, attributes, function calls and variables. Unlike the default options, the same namespace prefix can be used to apply to all 4 different identifier types.

Lastly, when doing XML construction, the order that a namespace is declared in does not affect the scope, since attribute order in XML is considered non normative. Take the following XQuery expression as an example:

<foo:element foo:index="1" xmlns:foo="http://example.com/foo" />

In this simple case, we have constructed an element in the foo namespace, containing an attribute which is also in the foo namespace. Notice that we were able to use the foo namespace prefix even before it was technically defined (by the xmlns attribute). In this case, the result of the query was pretty obvious. Now, consider the following XQuery:

declare namespace foo="http://example.com/foo1";
<foo:element xmlns:foo="http://example.com/foo2" />

In this case, we initially declare that the namespace prefix foo points to the URI http://example.com/foo1. But, when we construct the new element, the namespace prefix foo is redefined to point to the URI http://example.com/foo2, which is the URI used for the constructed XML element. Again, these results are pretty obvious, but the trick comes when we combine XML construction with XQuery expressions to generate dynamic content. Consider the following query:

declare namespace foo="http://example.com/foo1";
<foo:element index="{/foo:index}" xmlns:foo="http://example.com/foo2" />

Initially, the namespace prefix foo is declared to point to the URI http://example.com/foo1. The element constructor in the query contains an attribute constructor for the attribute index which uses an XQuery expression to dynamically calculate the value for this attribute. Inside this dynamic attribute content expression, the namespace foo is referenced. Even though the redefinition of the foo namespace prefix technically occurs after the XQuery expression used to generate the contents of the index attribute, since the scope of the redefinition applies to the entire element constructor, within the dynamic attribute expression, the namespace prefix foo is defined to point to the URI http://example.com/foo2!

The easiest thing is to avoid such inconsistencies by placing any namespace prefix definitions as the first attributes in an element constructor. Thus, the order of assignment of URI's to namespace prefixes will be more immediately apparent.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 3 Comments

Performing XSLT Transforms on XML Data Stored in SQL Server 2005

A common task when dealing with XML data is to apply an XSLT style-sheet to the raw XML data in order to display it better. In a previous post to this blog, I showed how to append a processing instruction to your XML data in order to get IE to do the transformation of your XML data. This approach required that you know the location of the XSLT transform file and also that you were looking at your data through a file and not extracted directly from the server. Using the integration of the CLR components with SQL Server 2005, we can perform this same transformation within the server process itself and output plain HTML that will be viewable by any browser.

Firstly, we need to write the code that will apply the XSLT transformation. Following is the code I used to accomplish this:

using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;
using System.Data.SqlTypes;

public class XSLTTransform
{
	/// 
	/// Performs XSLT transforms on XML data stored in SQL Server 2005
	/// 
	public static SqlXml Transform( SqlXml inputDataXML, SqlXml inputTransformXML )
	{
		MemoryStream	memoryXml	= new System.IO.MemoryStream();
		XslCompiledTransform	xslt	= new XslCompiledTransform();
		XmlReader		output	= null;

		xslt.Load( inputTransformXML.CreateReader() );

		// Output the newly constructed XML
		XmlTextWriter outputWriter = new XmlTextWriter( memoryXml, System.Text.Encoding.Default );
		xslt.Transform( inputDataXML.CreateReader(), null, outputWriter, null);
		memoryXml.Seek( 0, System.IO.SeekOrigin.Begin );
		output = new XmlTextReader( memoryXml );

		return new SqlXml( output );
	}
}

Notice the return type of this function, as well as the types of the various input parameters i.e. the type SqlXml. This type resides in the System.Data.SqlTypes namespace, which is contained in the System.data.dll assembly that ships with SQL Server 2005 and is used to represent data stored in the XML data type of SQL Server 2005 within the CLR. The two important actions that we need to accomplish with this type are to create a generic XmlReader instance (which is done by calling the CreateReader() method for this type) and to serialize a generic XmlReader instance back into a SqlXml instance (which is done using a constructor). The rest of this code is just generic code for performing XSLT transforms.

In order to compile this code into an acceptable CLR assembly, make sure you use the version of the CLR that shipped with the SQL Server 2005 version you have installed. Once compiled, you add the assembly into the SQL Server process space using the following DDL:

create assembly XsltTransform
from 'C:/XsltTransform.dll'

In this case, I assume that the assembly is stored in the XsltTransform DLL and that this DLL is present in the root directory on the C drive. Once you have created the assembly in the SQL Server 2005 process, you need to map the static function it contains into the SQL UDF.  This is accomplished using the following DDL:

create function ApplyXsltTransform( @inputXML xml, @inputTransform xml )
returns xml
as external name XSLTTransform.XSLTTransform.Transform

In this case, we are creating a new SQL UDF which takes two, untyped XML data type instances as it's parameters and returns a further XML data type. You could also add a schema collection declaration to these parameters if you wished to enforce type checking on your input/output XML data. The 'external name' in this case is constructed using the following naming convention:

<assembly name>.<class name>.<static method name>

At this point, you are now ready to perform your XSLT transformation! As an example of how you might use this UDF, the following T-SQL statement with take the data stored in the XML file at CustomerData.xml and use the XSLT transform stored in the file CustomerDataDisplay.xsl to create an HTML version of this data:

declare @xml  xml
declare @xslt xml

select @xml  = BulkColumn from openrowset( Bulk 'C:\CustomerData.xml', SINGLE_BLOB ) as x 
select @xslt = BulkColumn from openrowset( Bulk 'C:\CustomerDataDisplay.xsl', SINGLE_BLOB ) as x 

select dbo.ApplyXsltTransform( @xml, @xslt )

Of course, you do not need to read your XML or XSLT data from files and can easily pass in any XML data type instance to this function in order to transform the XML data. Since the return type of this function is also an XML data type instance, it can be used to re-insert the transformed data, or even to define a computed column/view over your XML.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 3 Comments

Serialization and Concatenating String Values

A common requirement for an XML based application is the ability to do some form of simple string building. For example, the following query outputs a set of elements whose values are built up as a string, depending on some other value that is passed is as part of an iteration:

declare @x xml
set @x=''

select @x.query('
for $index in (1, 2, 3)
return
<element>
{ "Item" }{ $index }
</element>
')

The result of this query is the following set of 3 nodes:

<element>Item1</element>
<element>Item2</element>
<element>Item3</element>

In this instance, we constructed the string value for the element by creating 2 separate text nodes - one containing the string literal "Item" and the other containing the iterator value. As part of XML serialization, consecutive text nodes in the XML infoset are merged into a single text node, thus providing us with our desired output. Alternatively, we could have used the XQuery concat() function:

declare @x xml
set @x=''

select @x.query('
for $index in (1, 2, 3)
return
<element>
{ concat( "Item", $index cast as xs:string ? ) }
</element>
')

Notice that we had to add an explicit cast to the iterator value since the concat() function will only accept string values as it's arguments.

A further option would be to build your string value as a sequence of atomic values:

declare @x xml
set @x=''

select @x.query('
for $index in (1, 2, 3)
return
<element>
{ ( "Item", $index ) }
</element>
')

But, there is a subtle difference in the output of this query which is as follows:

<element>Item 1</element>
<element>Item 2</element>
<element>Item 3</element>

Notice that there is a space between the "Item" string value and the iterator value. Sequences of items are serialized with a space between each of the entries!

So, if you need to build strings - use the concat() function explicitly or rely on the XML infoset serialization rules and build multiple, consecutive text nodes. Remember that sequences are serialized with a space between each item.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 0 Comments

Efficient Property Promotion from XML Instances (Part 2)

Last week I covered the basics of property promotion i.e. automatically extracting data from an XML instance and storing this as relational data in order to improve query readability and/or performance. If you missed this previous post, you can access it at http://blogs.msdn.com/mrorke/archive/2005/05/24/421576.aspx.

In the earlier post, I made the assumption that there was always a single value that we wanted to extract per XML instance i.e. the promoted properties had a 1-to-1 relationship with the instance they were extracted from. Often though, XML is used to aggregate lots of different data values into a single place. In these cases, we may often find that we want to extract multiple values for a given property from a single XML instance. Take the XML instance described by the XSD schema shown below:

create xml schema collection CompanyDataXSD as N'
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:company="http://www.example.com/mycompany"
    targetNamespace="http://www.example.com/mycompany" elementFormDefault="qualified">
  <xs:element name="company">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element name="employee">
          <xs:complexType>
            <xs:all>
              <xs:element name="name" type="xs:string" />
            </xs:all>
            <xs:attribute name="id" type="xs:ID" />
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>'

This simple schema allows us to store multiple 'employee' elements (each with a name and an employee ID). We then create a table with an XML data-type instance that references this schema and insert some data:

create table CompanyData( id int primary key identity, xml_data xml( CompanyDataXSD ))
insert into CompanyData values( '
<company xmlns="http://www.example.com/mycompany">
  <employee id="EMP1"><name>Joe</name></employee>
  <employee id="EMP2"><name>Fred</name></employee>
  <employee id="EMP3"><name>Tom</name></employee>
</company>
')

Notice that this is the same data I used in the previous posting - only this time the individual employee elements are all contained inside a single XML instance. To re-produce the data from the first posting, we can run the following nodes() query to extract the various employee elements:

with xmlnamespaces( default 'http://www.example.com/mycompany' )
  select ref.query('.')
from CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)

The nodes() method allows us to crack a particular XML instance up into multiple rows. In the original posting, we wanted to extract the employee names and IDs out of the XML data and store these in standard relational columns. We can update the query above to do just this:

with xmlnamespaces( default 'http://www.example.com/mycompany' )
select
  id as [Database ID], 
  ref.value( './@id', 'nvarchar(10)' ) as [Employee ID],
  ref.value( './name', 'nvarchar(100)' ) as [Name]
from CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)

Executing this query produces the following result:

Database ID  Employee ID  Name
1            EMP1         Joe
1            EMP2         Fred
1            EMP3         Tom

Which is exactly what we had in the original post, other than the fact that the 'Database ID' in this case is always the same - indicating that the data all came from the same instance.

Since the result of the queries above is a set of rows, this type of query can not be used to define computed columns. We can still create a view to encapsulate this data as follows:

create view EmployeeDataProperties
with schemabinding
as
with xmlnamespaces( default 'http://www.example.com/mycompany' )
select
  id as [Database ID],
  ref.value( './@id', 'nvarchar(10)' ) as [Name],
  ref.value( './name', 'nvarchar(100)' ) as [Employee ID]
from dbo.CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)

As explained in the first posting though, we are unable to index this view (and thus materialize it) since it contains XML data-type methods. This means that each access of the view causes the query above to be re-executed instead of just reading the values from an index. The work-around in the first posting was to wrap each call to an XML data-type method into a UDF and use these UDFs in the view definition. This will not work in this case, since we are returning multiple rows for each invocation of the query.

It is possible to wrap this functionality into a table valued function, as shown below:

create function GetEmployeeDetails( @input xml( CompanyDataXSD ))
returns @employeeDetails table( 
  [Database ID] int,
  [Employee ID] nvarchar(10),
  [Name] nvarchar(100)
)
as
begin
  with xmlnamespaces( default 'http://www.example.com/mycompany' )
    insert @employeeDetails select
      id as [Database ID],
      ref.value( './@id', 'nvarchar(10)' ) as [Employee ID],
      ref.value( './name', 'nvarchar(100)' ) as [Name]
    from dbo.CompanyData cross apply xml_data.nodes('/company/employee') as T(ref)
  return
end

Unfortunately, we are still unable to create a view over this. The choice of whether to use a view directly, or create a table valued function to encapsulate this functionality depends entirely on your usage scenario and which syntax looks/works better for you.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 1 Comments

Efficient Property Promotion from XML Instances

In many cases, when storing XML data, some of the elements/attributes will be more important for storage/querying than others. Access to values inside XML data-type instances requires additional overhead when compared with just selecting relational data from a table. Thus, there are many times when we would like to take some of the data out of an instance and promote this into relational tables – this is often referred to as ‘property promotion’. SQL Server 2005 allows the user to define views and computed columns over XML data-type instance. These views/computed columns are useful for simplifying queries, but they will not provide any additional performance unless their contents are materialized somehow. SQL Server 2005 allows us to create an index over a view which will materialize the values stored and for computed columns, we can mark them as ‘persisted’ in order to have their values materialized in the database. In the case of an index view or a persisted computed column, queries of this data will simply retrieve the materialized values instead of re-executing the view/column definition, thus providing a performance benefit. Unfortunately, due to the complex nature of the XML data-type inside SQL Server 2005, there are several restrictions and caveats when it comes to using XML data-type methods inside view and/or computed column definitions. For this posting, I will walk the user through the steps required to create a materialized view over a set of XML data-type instances. For details on computed columns, Michael Rys has an article on his blog (Using XQuery Methods in CHECK Constraints) which deals specifically with check constraints on XML data-type columns, but the content is also applicable to computed columns.

Firstly, we will create a database with some stored XML data, typed according to the following schema:

create xml schema collection EmployeeDataXSD as N'
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:company="http://www.example.com/mycompany" 
	targetNamespace="http://www.example.com/mycompany" elementFormDefault="qualified">

	<xs:element name="employee">
		<xs:complexType>
			<xs:all>
				<xs:element name="name" type="xs:string" />
			</xs:all>
			<xs:attribute name="id" type="xs:ID" />
		</xs:complexType>
	</xs:element>
</xs:schema>'

This is a simple schema, concocted to store some arbitrary data about employees at a company. Adding this XSD to an XML schema collection called ‘EmployeeDataXSD’, we then create the following table:

create table EmployeeData( id int primary key identity, xml_data xml( EmployeeDataXSD ))

The following pieces of XML are then inserted:

insert into EmployeeData values( '<employee id="EMP1" xmlns="http://www.example.com/mycompany"><name>Joe</name></employee>')
insert into EmployeeData values( '<employee id="EMP2" xmlns="http://www.example.com/mycompany"><name>Fred</name></employee>')
insert into EmployeeData values( '<employee id="EMP3" xmlns="http://www.example.com/mycompany"><name>Tom</name></employee>')

Within our hypothetical company, the important information for querying over our employee’s is their name and employee identification number. Instead of forcing us to use an XML data-type method to extract this data each time we need it, we can instead define a view over this XML data that automatically extracts this. Most people’s initial reaction will be to define a view that uses the XML data-type methods directly, like the following:

create view EmployeeDataProperties
with schemabinding
as
      with xmlnamespaces( default 'http://www.example.com/mycompany' )
      select 
            id as [Database ID], 
            xml_data.value( '/employee[1]/name[1]', 'nvarchar(100)' ) as [Employee ID],
            xml_data.value( '/employee[1]/@id', 'nvarchar(10)' ) as [Name]
      from dbo.EmployeeData

In fact, this works fine and we are able to query this view to find the names and employee ID’s. All well and good, until we try and materialize this view by creating the following index:

create unique clustered index EmployeeDataProperties_IDX on EmployeeDataProperties( [Database ID], [Employee ID], [Name] )

Attempting to create this index results in the following error:

Msg 1985, Level 16, State 1, Line 1
Cannot create index on view 'dbo.EmployeeDataProperties'. It contains one or more xml data type methods.

At first glance this appears to be a dead end solution as you are not allowed to have any XML data-type methods inside a view definition! But, we can get around this by encapsulating the XML data-type method calls into UDF functions and using these as part of the view definition. To this end, I define the following two functions:

--- UDF to extract an employee name from an XML fragment
create function GetNamePropertyUDF( @input xml )
      returns nvarchar(100)
with schemabinding
as
begin
      return @input.value( 'declare default element namespace "http://www.example.com/mycompany"; /employee[1]/name[1]', 'nvarchar(100)' )
end
go
 
--- UDF to extract an employee ID from an XML fragment
create function GetIDPropertyUDF( @input xml )
      returns nvarchar( 10 )
with schemabinding
as
begin
      return @input.value( 'declare default element namespace "http://www.example.com/mycompany"; /employee[1]/@id', 'nvarchar(10)' )
end
go

Notice that the input XML is provided as an argument to the function, rather than selecting directly from the base table. This is to avoid another caveat of views which does not allow you to use UDF’s that perform data access. Also notice that we pass untyped XML into these functions. There is nothing to stop us from associating an XML schema collection with the input parameter, but this will add needless overhead since the original storage already performed the required validation. Now that we have our two UDF functions, we redefine our view in terms of these:

create view EmployeeDataProperties
with schemabinding
as
      select 
            id as [Database ID], 
            dbo.GetIDPropertyUDF( xml_data ) as [Employee ID],
            dbo.GetNamePropertyUDF( xml_data ) as [Name]
      from dbo.EmployeeData 

This time, we are able to successfully materialize the view by adding the index specified above. Querying this view provides us with the following list of employee names and ID’s:

Database ID	Employee ID	Name
1		EMP1		Joe
2		EMP2		Fred
3		EMP3		Tom

To demonstrate that the indexed view is actually being kept up to date with the data in the underlying XML data-type instance, we can run the following XML DML statement to update the value of the employee ID for the employee named ‘Joe’:

with xmlnamespaces( default 'http://www.example.com/mycompany' )
update EmployeeData set xml_data.modify( 'replace value of /employee[1]/@id with xs:ID("EMP4")' )
where xml_data.exist( '/employee[name="Joe"]' ) = 1

If we now query the data from the materialized view again, we get the following:

Database ID	Employee ID	Name
1		EMP4		Joe
2		EMP2		Fred
3		EMP3		Tom
-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 4 Comments

Adding a Style-sheet Declaration to your XML Output

A common requirement for formatting XML output from XQuery expressions, is to be able to add a style-sheet declaration. Style-sheet declarations consist of a standard processing-instruction node that appears at the top-level of the XML. In the following example, we are attempting to construct a PI node, then select a peice of XML that will make up the body of the result:

declare @x xml
set @x='<myXML/>'

select @x.query('
<?xml-stylesheet type="text/xsl" href="my-style-sheet.xsl" ?>
/.
')

But, when you try and execute this query, you receive the following error:

Msg 2373, Level 16, State 1, Line 6
XQuery [query()]: '/' is not supported with constructed XML

The problem here is that the query above becomes a single XQuery expression, which equates to building a PI and then querying it out (with the /.). In order to create both the PI and select the existing data from the source instance, we need to use the sequence constructor (,) to break this query up into two seperate expressions - one for generating the PI and the other for serializing out the existing data. Thus, the query above becomes:

declare @x xml
set @x='<myXML/>'

select @x.query('
<?xml-stylesheet type="text/xsl" href="my-style-sheet.xsl" ?>,
/.
')

Notice the comma character after the style-sheet PI construction. This is the XQuery sequence constructor syntax and allows us to chain multiple XQuery expressions together to produce a single output. This query will now return the required peice of XML:

<?xml-stylesheet type="text/xsl" href="my-style-sheet.xsl" ?>
<myXML />

Similarly, if we simply wanted to construct the XML as part of the XQuery expression itself, we would still need to seperate the constructor expressions for each of the XML nodes with a sequence constructor. For example:

declare @x xml
set @x=''

select @x.query('
<?xml-stylesheet type="text/xsl" href="my-style-sheet.xsl" ?>,
<MyXML />
')

Without the sequence constructor, this query would also raise an error.

-
Disclaimer:
This posting is provided “AS IS” with no waranties, and confers no rights.
Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.
Posted by mrorke | 0 Comments
More Posts Next page »
 
Page view tracker