Using XML Source

Using XML Source

Rate This
  • Comments 22

Providing a generic way to extract data from XML documents of any format is pretty complicated. The SSIS XML Source Adapter tries to do just that, to various degrees of success.

XML Source is easiest to use when your input has a simple element/sub-element structure:

<rootgoo>
    <goo>
        <subgoo>value</subgoo>
        <moregoo>1</moregoo>
    </goo>
    <goo>
        <subgoo>value</subgoo>
        <moregoo>2</moregoo>
    </goo>
</rootgoo>

Or when the values are listed as attributes (such as the output you get from SELECT ... FOR XML RAW statements in SQL Server).

<root>
   <row CustomerID="1" TerritoryID="1" AccountNumber="AW00000001" />
   <row CustomerID="2" TerritoryID="1" AccountNumber="AW00000002" />
</root>

Both of these XML documents would produce a single output in the XML Source. When your XML data starts to get more complicated (multiple levels of elements / attributes), you start getting more outputs. These outputs are linked with IDs, which you will have to join yourself with a merge join transform.

Dealing with multiple outputs

In this completely made up example. the XML is a little more complex.

<extract date="2007-12-05">
    <counters>
        <counter category="dispatcher" name="server1">
            <runtime>6</runtime>
            <queue>3</queue>
            <maxrequest>8</maxrequest>
            <color>blue</color>
            <host>
                <name>svo2555</name>
                <path>\\dispatcher</path>
                <lastaccessed>2007-02-03</lastaccessed>
            </host>
        </counter>
        <counter category="gateway" name="server1">
            <runtime>1</runtime>
            <queue>10</queue>
            <maxrequest>10</maxrequest>
            <color>purple</color>
            <host>
                <name>svo2555</name>
                <path>\\gateway</path>
                <lastaccessed>2007-02-03</lastaccessed>
            </host>
        </counter>
    </counters>
</extract>

Let's see how XML Source treats this.

xmlsrc

As you see, it created three outputs - counters, counter, and host - and created _id columns to identify the rows. Our destination table wants all of this data in a single row, which means we'll have to merge.

image

First we'll mark that the data has been sorted already, using the generated _id columns as the Sort Key. This will let us merge without having to insert a Sort transform.

- Right click on the XML Source, and bring up the Advanced Editor

- Select the host output, and set the IsSorted property to True.

image

- Expand the host output, and then expand Output Columns. Select the _id field, and set the SortKeyPosition property to 1

image

Follow the same steps for the counter output. Make sure you set the SortKeyPosition value on the "counter_id" column, and not the "counters_id" column - we're going to ignore the counters output all together.

Add a Merge Join transform. Connect the host and counter outputs from the XML Source to it.

image

You can then open the merge join transform, and take the columns from both inputs.

image

Voila!

Leave a Comment
  • Please add 8 and 3 and type the answer here:
  • Post
  • As I wrote in my previous post , complex XML documents will produce multiple outputs when you're using

  • How it can done using vb macro? can u explain

  • Nice. I've never read this before. Shows how valuable a bit of out-of-the-box thinking can be.

    -Jamie

  • hello there,

    Just as your example had 3 outputs, one of the XML I am working on produced 5 outputs. So I tried using multiple "merge and joins". The outputs were merged fine, but when I try to give the outputs of the resultant merge and joins to the next one, en error shows up saying that

    "The IsSorted property must be set to true on both sources of this transformation"

    How do I set that on a merge and join transform ??

  • Hi Kalcee,

    Looks like you might have missed a step. Take a look at the step that starts with "Right click on the XML Source, and bring up the Advanced Editor" ... you can force the IsSorted property to true on the "Input and Output Properties" page.

    ~Matt

  • Hi,

    Great information, I can remove most of my manual sorts!  It will clear my task up a lot!

    Quick question though, in your XML you have your main tag with a 'date' attribute, taking a look at your list of record sets you don't have a extract set.  I've run into the same problem as the parent tag has values in which I need.  Do you know how to access this data?

    In my case it's for the report date, I need to know for which date the data in the XML is for, unfortunately the date is held within the main tag like in your example.

    <extract date="2007-12-05">

       <counters>

    The date is not accessible according to my SSIS

    Cheers

    Ian

  • Hi Ian,

    I noticed this too - it looks like our XML Source doesn't pick up attribute values from the root element.

    Your best bet is to massage the data bit with XSLT before processing. See my other post about this here:

    http://blogs.msdn.com/mattm/archive/2007/12/15/xml-source-making-things-easier-with-xslt.aspx

    ~Matt

  • As a side note to using XML as a source, there is an XML destination adapter for SSIS available at http://www.keelio.com

  • XML source and multiple data outputs for elements.

  • Excellent! Looked all over to find a good and simple explanation of how to get the XML files into SQL Server

  • Hello,

    I have a single XML file and I use your technique (great post by the way..).

    But my problem is that I have more that two outputs coming from my XML Source Adapter.

    How could I deal with this without having to "cascade" the merge join ?

    Any help is welcome.

    Thanks.

  • what if I don't want to merge all the data in one table, what if I want to have one to many tables with a FK key?

    I did get the keys generated from SSIS but they don't seem work give right relationship.

    for example I got a table called events which hold the callid and eventid and I was hoping it would have the relationship there, but nope.

    the ids don't match in both tables, in fact I can't even find the eventid that is linked to the callid...

    at this point I got my data mapped to my SQL tables fine, but with out a relationship. I can't tell how many events are for one call.

    I am about to re do everything using C# and xml objects to import the data. it's hard to believe that this could not be done in SSIS.

    please help.

    Here is the example of the xml I am working with

    - <calls>

    - <call id="@BR8" callkey="1" date="09062008" acd="0" starttime="2118" endtime="2181" type="1" result="1" from="" to="x82185 CUSTOMS" dnis="2623" ansby="x82185 CUSTOMS" requeues="0" intf="0" overflowed="0" transfd="0" held="0" confd="0" timetoabn="0" timetoans="6" tottalktm="57" totconftm="0" totholdtm="0">

    - <events>

    - <event starttime="2118" callkey="1" endtime="2118" type="1" ani="" dnis="" digits="82185" cause="0" montd="0" rqud="0" intf="0" overflowed="0">

     <trnk key="default" id="C04B20" name="" nick="" />

     </event>

    - <event starttime="2118" callkey="1" endtime="2124" type="2" ani="" dnis="2623" digits="82185" cause="22" montd="0" rqud="0" intf="0" overflowed="0">

     <ext key="slap" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     <ext key="default" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     <trnk key="from" id="C04B20" name="" nick="" />

     </event>

    - <event starttime="2124" callkey="1" endtime="2181" type="3" ani="" dnis="2623" digits="" cause="0" montd="0" rqud="0" intf="0" overflowed="0">

     <ext key="slap" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     <ext key="default" id="82185" name="CUSTOMS" nick="" vport="0" autoattend="0" qcc="0" />

     </event>

     <event starttime="2181" callkey="1" endtime="2182" type="22" ani="" dnis="" digits="" cause="0" montd="0" rqud="0" intf="0" overflowed="0" />

     </events>

     </call>

  • Ive got the same problem with reading the date attibute from the root element.

    Its a standard, and i have a lot of different sources. Is there an update to fix this to get a 2nd output? or another work around?

    I tried adding an output manually but get this Data Flow Task [XML Source [1]]: An output cannot be added to the outputs collection.

  • Hi Diwakar - if your XML documents have different schemas (i.e. different columns and/or data types), you will need to use a separate data flow task for each schema.

  • Hi Matt,

    I am doing impact analysis of ssis package. here i am taking xml file of various .dtsx packages. its structure is changing as per the C# script used or not, SQL task is used or not.

    I want all the xml of different .dtsx should be in a common format so that i can use a common source format and fetch all the xml files one by one using for each loop container.

    Is it possible.

Page 1 of 2 (22 items) 12