As I wrote in my previous post, complex XML documents will produce multiple outputs when you're using the XML Source adapter. Most of the time it will be easier to pre-process your source file with XSLT to de-normalize it a bit. Reducing the number of outputs greatly simplifies your data flow.
Let's take the same XML document I used in the last example:
<extract date="2007-12-05">
<counters>
<counter category="dispatcher" name="server1">
<runtime>6</runtime>
<queue>3</queue>
<maxrequest>8</maxrequest>
<color>blue</color>
<host>
<name>svo2555</name>
<path>\\dispatcher</path>
<lastaccessed>2007-02-03</lastaccessed>
</host>
</counter>
<counter category="gateway" name="server1">
<runtime>1</runtime>
<queue>10</queue>
<maxrequest>10</maxrequest>
<color>purple</color>
<path>\\gateway</path>
</counters>
</extract>
We want to flatten this out a bit using an XSL transform like this one (forgive my novice XSLT skills):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/extract">
<xsl:variable name="extractDate" select="/extract/@date" />
<xsl:for-each select="counters/counter">
<counter>
<extractDate><xsl:value-of select="$extractDate"/></extractDate>
<category><xsl:value-of select="@category"/></category>
<name><xsl:value-of select="@name"/></name>
<runtime><xsl:value-of select="runtime"/></runtime>
<queue><xsl:value-of select="queue"/></queue>
<maxrequest><xsl:value-of select="maxrequest"/></maxrequest>
<color><xsl:value-of select="color"/></color>
<hostName><xsl:value-of select="host/name"/></hostName>
<path><xsl:value-of select="host/path"/></path>
<lastaccessed><xsl:value-of select="host/lastaccessed"/></lastaccessed>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
We'll apply the transform with an XML Task. Add one to your package, and open the editor. You'll want to change the Operation Type property to XSLT, set SaveOperationResult to true, and set all of the file connections.
Note, the Source should be your XML source document and the SecondOperand is your XSLT document.
The processed XML looks like this:
<?xml version="1.0" encoding="utf-8"?>
<extractDate>2007-12-05</extractDate>
<category>dispatcher</category>
<name>server1</name>
<hostName>svo2555</hostName>
<category>gateway</category>
Add a Data Flow Task, and setup your XML source to use the processed XML document. You'll need to update/regenerate the schema for your document to account for the new format. Notice there is now only one output to deal with.
Thanks for the tip. I've recently been on a project where we needed to do this exact same thing but we hit a slight problem. We noticed that if the original XML file is larger than 100MB it would cause a System.OutOfMemoryException when it tries to transform the XML. Have you heard of this occurring?
Hi Tanner,
I believe both the XML Task and XML Source read the entire XML document into memory before performing any operations. They were originally designed to work on smaller XML files.
I'd suggest trying a similar operation using a script task, and see if that works better. You'll have more control over how the XSL is executed.
~Matt
Are there any plans to getting this upgraded to XSLT 2.0 and making it more robust?
Mark
Hi Mark,
(Wow, sorry for the delay! I completely missed this comment).
Yes, improving the performance of our XSLT usage is one of our work items for the next release. Hopefully they'll make it in!
Thanks for the article!
Mohammad
Thanks for the article
Few questions:
1. Is it required to create the XSLT file, i just have an xml file how can create this xslt file.
2. what is second operand?
Regards,
Eshwar.
Hi Matt,
Is there a better way to create the xslt file and not manually?
Thanks
Matt,
I am trying to do the same thing, yanking data that's at the root level, but my file is also much simpler because I don't have sub element data. Can anyone here please help me generate the xslt file that I need to generate the final XML source file. Thanks
<ResponseFile Date="2011-11-11" StatusCode="s" MessageText="FAIL">
<Response TranCode="15" ID="4444" BorrowerID="1101" Status="5"/>
<Response TranCode="15" ID="4444" BorrowerID="7777" Status="5"/>
<Response TranCode="15" ID="4444" BorrowerID="8888" Status="5"/>
</ResponseFile>