Package Format Changes in SQL Server Denali

Package Format Changes in SQL Server Denali

Rate This
  • Comments 2

We made some significant changes to the .dtsx file package format in SQL Server Denali. The goal of these changes was to make the SSIS package format easier to read, and easier to diff when working with source control systems. Some of the changes around data flows (specifically, the way we persist lineage IDs) also make it somewhat easier to merge changes between packages as well. This post will explore more of these changes, providing XML samples so you can see exactly what was changed.

Side note: Did you know that the current DTSX package file format is available online? If you like reading about XML schemas, it’s a highly entertaining read!

It’s Pretty (Printed)

If you open up a .dtsx file in an XML/Text editor, one of the first things you’ll notice is that the package XML is now pretty-printed. It’s also more concise – instead of having separate elements for each property, these fields are now persisted as attributes (except for the PackageFormatVersion, which was left in the same format so 2005/2008 versions of DTEXEC could identify that the package was for a new version of SQL Server). Another feature that makes packages more concise is that we no longer persist properties that have default values (ex. VersionMajor = 1), and attributes are always listed alphabetically. The code snippets below provide an example of these changes.

2008 Format
  1. <?xml version="1.0"?><DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts" DTS:ExecutableType="SSIS.Package.2">
  2. <DTS:Property DTS:Name="PackageFormatVersion">3</DTS:Property>
  3. <DTS:Property DTS:Name="VersionComments"></DTS:Property>
  4. <DTS:Property DTS:Name="CreationDate" DTS:DataType="7">1/6/2011 3:58:59 PM</DTS:Property>
  5. <DTS:Property DTS:Name="PackageType">5</DTS:Property>
  6. <DTS:Property DTS:Name="ProtectionLevel">1</DTS:Property>
  7. <DTS:Property DTS:Name="MaxConcurrentExecutables">-1</DTS:Property>
  8. <DTS:Property DTS:Name="PackagePriorityClass">0</DTS:Property>
  9. <DTS:Property DTS:Name="VersionMajor">1</DTS:Property>
  10. <DTS:Property DTS:Name="VersionMinor">0</DTS:Property>
  11. <DTS:Property DTS:Name="VersionBuild">505</DTS:Property>
  12. <DTS:Property DTS:Name="VersionGUID">{D1F82D76-CC8D-400C-A9F8-706DF686C4B7}</DTS:Property>
  13. <DTS:Property DTS:Name="EnableConfig">0</DTS:Property>
  14. <DTS:Property DTS:Name="CheckpointFileName"></DTS:Property>
  15. <DTS:Property DTS:Name="SaveCheckpoints">0</DTS:Property>
  16. <DTS:Property DTS:Name="CheckpointUsage">0</DTS:Property>
  17. <DTS:Property DTS:Name="SuppressConfigurationWarnings">0</DTS:Property>
  18. <DTS:ConnectionManager>
  19. <DTS:Property DTS:Name="DelayValidation">0</DTS:Property>
  20. <DTS:Property DTS:Name="ObjectName">(local).DI-DW</DTS:Property>
  21. <DTS:Property DTS:Name="DTSID">{1B70C7B8-BFB6-4EBD-855B-0BAF7C417ADD}</DTS:Property>
  22. <DTS:Property DTS:Name="Description"></DTS:Property>
  23. <DTS:Property DTS:Name="CreationName">OLEDB</DTS:Property><DTS:ObjectData><DTS:ConnectionManager>
  24. <DTS:Property DTS:Name="Retain">0</DTS:Property>
  25. <DTS:Property DTS:Name="ConnectionString">Data Source=(local);Initial Catalog=DI-DW;Provider=SQLNCLI11;</DTS:Property></DTS:ConnectionManager></DTS:ObjectData></DTS:ConnectionManager>

Denali Format
  1. <?xml version="1.0"?>
  2. <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts"
  3.   DTS:refId="Package"
  4.   DTS:CreationDate="1/6/2011 3:58:59 PM"
  5.   DTS:CreationName="SSIS.Package.3"
  6.   DTS:DTSID="{FC5F3400-D2B4-40E4-9B51-ED252DC63634}"
  7.   DTS:ExecutableType="SSIS.Package.3"
  8.   DTS:LocaleID="1033"
  9.   DTS:ObjectName="Version4"
  10.   DTS:PackageType="5"
  11.   DTS:VersionBuild="506"
  12.   DTS:VersionGUID="{00D49CE3-FD6D-4CE8-8793-4F79F1F6B19C}">
  13.   <DTS:Property
  14.     DTS:Name="PackageFormatVersion">6</DTS:Property>
  15.   <DTS:ConnectionManagers>
  16.     <DTS:ConnectionManager
  17.       DTS:refId="Package.ConnectionManagers[(local).DI-DW]"
  18.       DTS:CreationName="OLEDB"
  19.       DTS:DTSID="{1B70C7B8-BFB6-4EBD-855B-0BAF7C417ADD}"
  20.       DTS:ObjectName="(local).DI-DW">
  21.       <DTS:ObjectData>
  22.         <DTS:ConnectionManager
  23.           DTS:ConnectionString="Data Source=(local);Initial Catalog=DI-DW;Provider=SQLNCLI11;" />
  24.       </DTS:ObjectData>
  25.     </DTS:ConnectionManager>

One final thing to note in the samples above is that elements that can appear multiple times (such as <DTS:ConnectionManager>, <DTS:EventHandler>) are now contained within a parent element (<DTS:ConnectionManagers>, <DTS:EventHandlers>).

Goodbye Lineage IDs

Most objects within a package that can be referred to by other objects now have a refId attribute defined on in the package XML. The refId value is a unique, human readable (and understandable) string that other elements can refer to. It’s a bit easier than using GUIDs or integer values (i.e. lineage IDs), since the text itself has actual meaning (the path to the object in the package, its type, and name).

Let’s examine the following example which provides the definition for an output column in a Flat File Source:

Output Column Format
  1. <?xml version="1.0" encoding="utf-8"?>
  2. <outputColumn
  3.   refId="Package\Historical Loads\Load DailyMarket\DailyMarket.Outputs[Flat File Source Output].Columns[DM_DATE]"
  4.   dataType="dbDate"
  5.   errorOrTruncationOperation="Conversion"
  6.   errorRowDisposition="FailComponent"
  7.   externalMetadataColumnId="Package\Historical Loads\Load DailyMarket\DailyMarket.Outputs[Flat File Source Output].ExternalColumns[DM_DATE]"
  8.   lineageId="Package\Historical Loads\Load DailyMarket\DailyMarket.Outputs[Flat File Source Output].Columns[DM_DATE]"
  9.   name="DM_DATE"
  10.   truncationRowDisposition="FailComponent">
  11.   <properties>
  12.     <property
  13.       dataType="System.Boolean"
  14.       description="Indicates whether the column uses the faster, locale-neutral parsing routines."
  15.       name="FastParse">false</property>
  16.     <property
  17.       dataType="System.Boolean"
  18.       description="Indicates whether the data is in binary format."
  19.       name="UseBinaryFormat">false</property>
  20.   </properties>
  21. </outputColumn>

This snippet defines the XML for the DM_DATE column on a Flat File Source named DailyMarket. Note the lack of lineage IDs, which are integer values. Instead of persisting an integer value, we persist the refId string. While lineage IDs are still used within the runtime, they are no longer persisted, and are actually regenerated when the package is loaded.

If we break down the pieces of the refId, we see that it looks a lot like path values used for 2005/2008 package configurations (although a little shorter, and supports data flow objects as well).

image

In the previous example, we can see that this output column is linked to an ExternalMetadataColumn by refId as well.

externalMetadataColumnId="Package\Historical Loads\Load DailyMarket\DailyMarket.Outputs[Flat File Source Output].ExternalColumns[DM_DATE]"

If you try to merge changes between two versions of a package, the refId can be used in find/replace operations to ensure that all references to that object have been correctly updated.

For comparison, here is the 2008 package format of that output column (pretty-printed to make it more readable):

Output Column (2008)
  1. <outputColumn id="10"
  2.               name="DM_DATE"
  3.               description=""
  4.               lineageId="10"
  5.               precision="0"
  6.               scale="0"
  7.               length="0"
  8.               dataType="dbDate"
  9.               codePage="0"
  10.               sortKeyPosition="0"
  11.               comparisonFlags="0"
  12.               specialFlags="0"
  13.               errorOrTruncationOperation="Conversion"
  14.               errorRowDisposition="FailComponent"
  15.               truncationRowDisposition="FailComponent"
  16.               externalMetadataColumnId="9"
  17.               mappedColumnId="0">
  18.     <properties>
  19.         <property id="11"
  20.                   name="FastParse"
  21.                   dataType="System.Boolean"
  22.                   state="default"
  23.                   isArray="false"
  24.                   description="Indicates whether the column uses the faster, locale-neutral parsing routines."
  25.                   typeConverter=""
  26.                   UITypeEditor=""
  27.                   containsID="false"
  28.                   expressionType="None">false</property>
  29.         <property id="12"
  30.                   name="UseBinaryFormat"
  31.                   dataType="System.Boolean"
  32.                   state="default"
  33.                   isArray="false"
  34.                   description="Indicates whether the data is in binary format."
  35.                   typeConverter=""
  36.                   UITypeEditor=""
  37.                   containsID="false"
  38.                   expressionType="None">false</property>
  39.     </properties>
  40. </outputColumn>

Show me the Annotations!

The DTSX file format stores both the package logic, and its layout information. One of the first improvements we did in Denali was move to a new WPF-based design surface, which allowed us to change the way we persisted all of our layout information. In 2005/2008, the layout information was stored as encoded XML within a PackageVariableValue node. Annotations were stored as binary streams that were very hard to extra and understand. In Denali, layout information is now contained with a CData section, without any additional encoding. Annotations are persisted in clear text, so they can easily be extracted for automated documentation generation.

Summary

Many changes have been made to the Denali package format to make it easier to diff and merge. Below are a summary of the changes that were covered in this article:

  • Pretty-printed XML
  • Property elements now persisted as attributes
  • Default values are not persisted
  • Attributes are sorted alphabetically
  • All elements that appear more than once are contained within parent elements
  • References identified by refId attribute
  • Tasks and Components are appear alphabetically in the XML
  • Readable layout nodes
  • Annotations persisted in clear text
Leave a Comment
  • Please add 6 and 5 and type the answer here:
  • Post
  • Matt,

    Do you think the XML is so friendly now that I can start hand-writing SSIS code in XML versus having to rely on the designer?

  • Sure - I think you might find it easier to use the programmatic API, or something like BIML (agilebi.com/.../biml-functionality-in-bids-helper), but it's definitely possible to create a package using the raw XML.

Page 1 of 1 (2 items)