It's Spann, not spam

Technical blog to provide content that developers find useful.

SSIS Performance

SSIS Performance

  • Comments 1

I am a .NET developer who likes to write code in C#.  When given the task to transform customers data from an Oracle 8i database to SQL Server 2005 Database, I immediately though of SSIS.   Well what I didn't know was the learning curve needed to write efficient packages.

Originally, I designed a package that relied heavily on scripts.  In SSIS, scripts can be used to do complex tasks.  For example: splitting out First Name, Middle Name, and Last Name from a single column.  Scripts in SSIS are not necessarily good to use when you are looping a dataset, especially one that is over 34K records.  The time for one control flow to execute on that dataset was over four hours.  Now if that was the only thing I had to worry about, I would have just moved on.  But because I am testing the package with just a subset of the data, that was not going to work.  The entire dataset from production is over 2.2 million records.  Now you can do the math as to how long that would take so, back to the drawing board.

I noticed that the data flow tasks were very efficient, compared to the control flow that contained the scripts.  So I set out to use just a data flow to do the same job.  I was able to convert all my scripts into multiple data flows.  It took some creative techniques in SQL to get a query that could be used to transform the data.  Because SQL is so powerful, I was able to break the data down into sets that could be used. 

For example:  I had a column that was positional as well as informative.  What I mean by this is you have a char(5) column that based on which position contained a number greater than 0 would be your lookup.  So you could have a value of 00100 which would tell me one review for X type was performed.

Select *

From (

         Select c.ID, Substring(review, 3, 1) as MyValue

         From Person AS p Inner Join Case as c ON p.ID = c.pID Inner Join stageTable m ON p.SSN

        ) AS RefTable

Where MyValue > 0

I have five of these queries and five Data Flows that insert my data into the correct format.

Now when I run the packages the execution time over the same dataset is much better.  I went from over four hours to ~8 seconds!  Now that is a performance gain to brag about.

I also recommend the following books in regards to help performance tune your packages.

"Microsoft SQL Server 2005: Integration Services" by Kirk Haselden (Development Manager on the Integration Services team)

"Expert SQL Server 2005 Integration Services" by Brian Knight and Erik Veerman (both SQL Server MVP's)

There is also a great article on the web that also points you in the right direction with regards to performance tuning.

"Microsoft SQL Server 2005 Integration Services: Performance Tuning Techniques" by Elizabeth Vitt

Page 1 of 1 (1 items)
Leave a Comment
  • Please add 6 and 6 and type the answer here:
  • Post