Welcome to MSDN Blogs Sign in | Join | Help

Speaking for the Vancouver BI PASS Chapter

I’ll be presenting for the Vancouver BI PASS Chapter next Friday, July 10th, at the Microsoft Canada offices. I really enjoyed meeting the Pacific Northwest SQL Server User Group a couple of weeks ago, and I’m looking forward to sharing SSIS with my fellow Canadians in Vancouver. I’ll be going over some common SSIS pitfalls, and it seems like I’ll have plenty of time to go over any other topics that come up (perhaps there will be interest in CDC & Merge?)

Hope to see you there!

Friday, July 10th 2009 
8:30 - 11:30 AM

Microsoft Canada
1111 W. Georgia, Suite 1100 (11th floor) 
Vancouver, British Columbia 
V6E 4M3

Posted by mmasson | 0 Comments
Filed under:

Scripts and assembly caching in 2005

A recent discussion with MVP Andy Leonard about a new property which showed up on the Script Task in SQL Server 2005 SP3 reminded me that I’ve been meaning to blog about how the script task and script component cache their precompiled code at runtime. This behavior has changed in every SP since 2005 RTM, and while there are a couple of KB articles that describe the issues that prompted the changes, I haven’t seen anyone document the behavior itself.

Note: The behavior in SQL Server 2008 is different from 2005 because of the switch from VSA to VSTA. 2008 does not have any of the problems or limitations listed below.

Script Assemblies

First a bit of background information on how the script task and script component work. I think the BOL entry for the script task does a good job of explaining the compilation behavior:

VSA scripts must be compiled before the package runs. The Script task includes the option to precompile script into binary code when the package is saved. When script is precompiled, the language engine is not loaded at run time and the package runs more quickly; however, compiled binary files consume significant disk space. If you do not use the precompile option, the script is compiled at run time, which slows package execution but consumes less disk space. If storing large packages is not a problem, you should precompile your VSA scripts. Also, only precompiled script can be run in a 64-bit environment.

Whether you’re using the PreCompile option or not, the script will eventually be compiled into a .NET assembly. Loading the assembly takes time, uses memory, and requires a separate AppDomain

2005 RTM

In the initial 2005 release, SSIS did not cache these Script Assemblies. This meant that if you were executing a script task inside of a loop, a new AppDomain would be created for each iteration, and the assembly is reloaded. This wasn’t good for performance, and because we can’t unload/destroy an AppDomain until the process finishes, could result in memory issues.

2005 SP1

Script assembly caching was introduced in SP1 to address the performance and memory issues. This functionality made sure that the script assembly was only loaded once, and then reused on subsequent executions. However, each script was associated with a GUID… if you copy/paste the .DTSX file, the GUID for the script task or script component stays the same (even if you later go in and modify the script code). This led to unexpected behavior – if the script code was completely different, but came from a copy/pasted script task, you’d end up reusing one of the script assemblies for both tasks. The related KB article describes the problem in more detail.

2005 SP2

This behavior was fixed in SP2 by changing the GUID every time the script is modified. This meant that if the script code was the same, and the name of the script task or component matched, the script assembly would only be loaded once.

2005 SP3

There was one remaining side effect of script caching that hadn’t turned up yet. This KB article describes the problem you might run into, but doesn’t give much detail (other than that it’s related to caching).

The issue was that if the same script is executed in parallel (multiple execute package tasks running the same child package in process), they might share the same Dts runtime object (which has collections for variables, and connection managers). So if a script in the first instance sets variable Foo to “A”, and the second instance sets it to “B”, the first instance would end up with a value of “B” as well.

You can see why this would be a problem (and extremely confusing).

Unfortunately this problem isn’t easily fixed without entirely disabling script caching. Because we don’t want to go back to the RTM behavior (no caching at all), we introduced a new property to the Script Task / Component to control caching - OptimizeScriptExecution. If you’re running a child package with scripts multiple times in parallel, be sure to set this value to False, which disables the cache and prevents the scripts from sharing a runtime object.

I hope that helps!

Posted by mmasson | 0 Comments
Filed under:

Speaking at the PNW SSUG

I’ll be speaking at the Pacific Northwest SQL Server User Group meeting next Wednesday, June 10th. I’ll be giving an updated version of the Avoiding Common Pitfalls in SSIS talk I gave at TechEd a couple of weeks ago. It will be my first time attending a user group meeting, and I’m really looking forward to meeting some more SQL people from the area.

See you there!

Posted by mmasson | 0 Comments

Tutorial: SSIS Performance Videos

This post is coming a little late (these have already been announced here and here), but incase you haven’t seen these before, I wanted to bring attention to four new performance related created by the SQLCAT and SSIS teams. They mostly focus on larger scale/Enterprise level package deployments, but there are good tips for all levels of SSIS usage.

From the SQLCAT post:

Measuring and Understanding the Performance of Your SSIS Packages in the Enterprise (SQL Server Video)
Link | Watch this video

Author: Denny Lee, Microsoft Corporation

This video demonstrates how to measure and understand the performance of packages, based on lessons learned from enterprise customers. In this video, you will learn the following guidelines for improving performance:

  • How the limits of the source system affect performance.
  • Why disk I/O is important.
  • Why you should establish a package performance baseline

Tuning Your SSIS Package Data Flow in the Enterprise (SQL Server Video)
Link | Watch this video

Author: David Noor, Microsoft Corporation

This video demonstrates how to improve the performance of the data flow in an Integration Services package. In this video, you will learn how to tune the following phases of the data flow:

  • Extraction
  • Transformation
  • Loading

You can apply these performance tuning tips when you design, develop, and run the data flow.

Understanding SSIS Data Flow Buffers (SQL Server Video)
Link | Watch this video

Author: Bob Bojanic, Microsoft Corporation

This video looks at the memory buffers that transfer data to the data flow of an Integration Services package. The video includes the following items:

  • Demonstration of a simple package that shows you how Integration Services divides the data along the data flow paths and how buffers carry data along those paths.
    This simple package performs the following operations:
    • Data extraction
    • Character mapping
    • Creation of new columns by using a synchronous transformation
    • Multicasting
    • Sorting by using a blocking (asynchronous) transformation.
  • Design guidelines for building and maintaining the data flow of a package.

Designing Your SSIS Packages for Parallelism (SQL Server Video)
Link | Watch this video

Author: Matt Carroll, Microsoft Corporation

This video demonstrates how to increase the performance of your Integration Services packages by designing for parallelism.

------

These four videos join the growing list of SSIS tutorial videos in the SQL Server Video series. Others that might interest you:

Avoiding Common Pitfalls in SSIS

I’ve uploaded the slides for the Chalk Talk I gave at TechEd last week. I always appreciate feedback (good or bad), and I’ve gotten some great comments so far. If you attended the session (or just like the slides), feel free to comment below.

Posted by mmasson | 2 Comments
Filed under:

Samples for the CDC & Merge demo

I’ve put together the samples I used in the Incremental Data Warehousing talk I did at the SSWUG.Org Virtual Conference. You can download them from my sky drive, here:

The zip has a readme.txt file that should explain how to setup and run the samples. Please let me know if you have any questions.

For those keeping track – these are the same demos I did at the last PASS Conference in Seattle. I updated the talk a bit for the SSWUG conference, but the demos are essentially the same.

Posted by mmasson | 0 Comments

Sample videos posted for the SSWUG.ORG Virtual Conference

They’ve posted 10 minute previews for some of the talks at the upcoming SSWUG.ORG Virtual Conference. Here are some of the SSIS related topics:

Matt Masson - Incremental Data Warehouse Loads with Merge & CDC

John Welch -  Doing More (ETL) With Less (Effort) by Automating SSIS

Brian Knight - Loading a Data Warehouse With SSIS

Anthony D’Angelo - Deep Dive: Extending SSIS with .NET Development

They also have a free community event coming up on April 17th. If you’re not sure how the whole virtual conference thing works, this event is a great way to try it out.

Posted by mmasson | 0 Comments

SQL Server 2008 Service Pack 1 is now available

SQL Server 2008 SP1 is now available for download, as well as the updated Feature Pack.

Happy downloading!

Posted by mmasson | 0 Comments

SSWUG.ORG Ultimate Virtual Conference – April 2009

468

I flew to Tucson last week to record my sessions for the upcoming SSWUG.ORG Spring Virtual Conference. I recorded three sessions for the BI track:

How to Avoid Common Pitfalls with SSIS

Starting up a new project with SQL Server Integration Services? Save time and money by learning about how to avoid common issues that users hit when implementing their ETL solutions. Drawing on the most frequently asked questions in the MSDN Integration Services forum and some of the more interesting packages we've seen over the years, we'll highlight common mistakes, clear up some misconceptions, and provide best practices to help you efficiently and effectively use Integration Services.

Incremental Data Warehouse Loads with MERGE and Change Data Capture

This talk describes the new MERGE statement and Change Data Capture (CDC) feature introduced in SQL Server 2008, and how to take advantage of them in SQL Server Integration Services. We'll cover the traditional ways of doing incremental data loads, and see how these new key features make the process a whole lot easier. If you're interested in learning best practices for this new SQL Server 2008 functionality, you won't want to miss this session!

New Connectivity Options in SSIS 2008

Figuring out the best way to connect to your data sources can be tricky. SQL Server Integration Services 2008 has made it easier with the addition of new high performance connectors for Oracle, Teradata, and SAP BI. This talk will walk through their features, and take a look at how they can drastically improve the performance of your data loads.

They put together a very impressive list of speakers again (I’m still surprised – and honored – to see my name next to these people!), and the sessions look pretty good as well. I’m sure I’ll be attending a number of the sessions myself. Some of the other SSIS ones include:

Anthony D'Angelo - Deep Dive: Extending SSIS with .NET Development

Brian Knight - Dynamically Configuring SSIS packages, Loading a Data Warehouse in SSIS, Performance Tuning SSIS 2005 and 2008

John Welch - Doing More (ETL) with Less (Effort) by Automating SSIS, Processing Flat Files with SSIS

Ted Malone - Agile Business Intelligence - Proven Techniques that move BI Development into the Agile World

They still have an early bird registration rate going, and you can use the vip code SPVMMASP09 to get an additional discount. The conference runs from April 22-24th… hope to see you there!

Posted by mmasson | 1 Comments
Filed under:

SSIS Team Blog – Now in Chinese

Xiaochen Wu, a member of our Shanghai team, has started a Chinese SSIS blog. He’s started it off by translating some of the posts I’ve done, and I’m sure he’ll find plenty of content in the future.

I don’t know what the following text means, but I assume it’s the title of the blog :)

SSIS 技术博客

分享SSIS使用心得和技巧

Posted by mmasson | 0 Comments
Filed under:

We Loaded 1TB in 30 Minutes with SSIS, and So Can You

A popular SSIS talk at recent conferences has been about the ETL Record that we announced last year. They’ve converted the talk to a white paper, and it is now available on MSDN. It’s a very interesting read.

Summary: In February 2008, Microsoft announced a record-breaking data load using Microsoft® SQL Server® Integration Services (SSIS): 1 TB of data in less than 30 minutes. That data load, using SQL Server Integration Services, was 30% faster than the previous best time using a commercial ETL tool. This paper outlines what it took: the software, hardware, and configuration used. We will describe what we did to achieve that result, and offer suggestions for how to relate these techniques to typical scenarios. Even for customers who don't have needs quite like this benchmark, such efforts can teach a lot about getting optimal performance.

Posted by mmasson | 0 Comments
Filed under:

API Sample – Programmatically create and export configuration files for your packages

The previous sample shows how to add a reference to an existing configuration file. This sample shows how to create a new configuration file for a package.

Things to note:

  • The first step is to flag the property you want to export by calling DtsProperty.SetExport(). The first argument is a reference to the object the property belongs to.
  • The CreatePackage() method just creates a package (it doesn’t matter what is in it). Typically you’d be using this code to add a configuration to an existing package (doing a Package.LoadFromXml() or something similar)
class ConfigurationExport
{
    static void Main(string[] args)
    {
        //
        // Export a configuration for all connection managers in a package
        //
        Package package = CreatePackage();

        // Enable configurations
        package.EnableConfigurations = true;

        // Flag all connection manager connection strings as exportable
        foreach (var cm in package.Connections)
        {
            DtsProperty connectionStringProp = cm.Properties["ConnectionString"];
            connectionStringProp.SetExport(cm, true);
        }

        // Export the configuration file
        package.ExportConfigurationFile(@"c:\temp\config.dtsconfig");
    }

    static Package CreatePackage()
    {
        var p = new Package();

        // Add Data Flow Task
        Executable dataFlowTask = p.Executables.Add("STOCK:PipelineTask");

        // Set the name (otherwise it will be a random GUID value)
        var taskHost = dataFlowTask as TaskHost;
        Debug.Assert(taskHost != null, "Unexpected task type");

        taskHost.Name = "Data Flow Task";

        // We need a reference to the InnerObject to add items to the data flow
        var pipeline = taskHost.InnerObject as MainPipe;
        Debug.Assert(pipeline != null, "Unexpected InnerObject type");

        // Create a package variable to store the row count value
        p.Variables.Add("RowCountVar", false, "User", 0);

        // Add connection manager
        ConnectionManager connection = p.Connections.Add("OLEDB");
        connection.Name = "MyConnection";
        connection.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorksDW2008;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;";

        // Add OLEDB Source
        IDTSComponentMetaData100 srcComponent = pipeline.ComponentMetaDataCollection.New();
        srcComponent.ComponentClassID = "DTSAdapter.OleDbSource";
        srcComponent.ValidateExternalMetadata = true;
        IDTSDesigntimeComponent100 srcDesignTimeComponent = srcComponent.Instantiate();
        srcDesignTimeComponent.ProvideComponentProperties();
        srcComponent.Name = "OleDb Source";

        // Configure it to read from the given table
        srcDesignTimeComponent.SetComponentProperty("AccessMode", 0);
        srcDesignTimeComponent.SetComponentProperty("OpenRowset", "[DimCustomer]");

        // Set the connection manager
        srcComponent.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.GetExtendedInterface(connection);
        srcComponent.RuntimeConnectionCollection[0].ConnectionManagerID = connection.ID;

        // Retrieve the column metadata
        srcDesignTimeComponent.AcquireConnections(null);
        srcDesignTimeComponent.ReinitializeMetaData();
        srcDesignTimeComponent.ReleaseConnections();

        // Add Row Count transform
        IDTSComponentMetaData100 rowCount = pipeline.ComponentMetaDataCollection.New();
        rowCount.ComponentClassID = "DTSTransform.RowCount";
        CManagedComponentWrapper instance = rowCount.Instantiate();
        instance.ProvideComponentProperties();

        // Set the variable name property
        instance.SetComponentProperty("VariableName", "User::RowCountVar");

        // Connect the OLEDB Source and the Row Count
        IDTSPath100 path = pipeline.PathCollection.New();
        path.AttachPathAndPropagateNotifications(srcComponent.OutputCollection[0], rowCount.InputCollection[0]);

        return p;
    }
}
Posted by mmasson | 1 Comments
Filed under:

API Sample - Programmatically add a configuration file to a package

Update 2009-03-03: Darren Green has a post which covers the other configuration types as well.

This sample programmatically adds a reference to an existing configuration file to a package.

Things to note:

  • The CreatePackage() method just creates a package (it doesn’t matter what is in it). Typically you’d be using this code to add a configuration to an existing package (doing a Package.LoadFromXml() or something similar)
  • For this approach to work, the paths in the configuration file must be the same in every package.
class ConfigurationReference
{
    static void Main(string[] args)
    {
        //
        // Add a reference to an existing configuration file.
        // Note all property paths must match.
        //
        Package package = CreatePackage();

        // Enable configurations
        package.EnableConfigurations = true;

        // Create the configuration reference
        Configuration config = package.Configurations.Add();
        config.Name = "Configuration";
        config.ConfigurationType = DTSConfigurationType.ConfigFile;
        config.ConfigurationString = @"c:\temp\config.dtsconfig";
    }

    static Package CreatePackage()
    {
        var p = new Package();

        // Add Data Flow Task
        Executable dataFlowTask = p.Executables.Add("STOCK:PipelineTask");

        // Set the name (otherwise it will be a random GUID value)
        var taskHost = dataFlowTask as TaskHost;
        Debug.Assert(taskHost != null, "Unexpected task type");

        taskHost.Name = "Data Flow Task";

        // We need a reference to the InnerObject to add items to the data flow
        var pipeline = taskHost.InnerObject as MainPipe;
        Debug.Assert(pipeline != null, "Unexpected InnerObject type");

        // Create a package variable to store the row count value
        p.Variables.Add("RowCountVar", false, "User", 0);

        // Add connection manager
        ConnectionManager connection = p.Connections.Add("OLEDB");
        connection.Name = "MyConnection";
        connection.ConnectionString = "Data Source=localhost;Initial Catalog=AdventureWorksDW2008;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;";

        // Add OLEDB Source
        IDTSComponentMetaData100 srcComponent = pipeline.ComponentMetaDataCollection.New();
        srcComponent.ComponentClassID = "DTSAdapter.OleDbSource";
        srcComponent.ValidateExternalMetadata = true;
        IDTSDesigntimeComponent100 srcDesignTimeComponent = srcComponent.Instantiate();
        srcDesignTimeComponent.ProvideComponentProperties();
        srcComponent.Name = "OleDb Source";

        // Configure it to read from the given table
        srcDesignTimeComponent.SetComponentProperty("AccessMode", 0);
        srcDesignTimeComponent.SetComponentProperty("OpenRowset", "[DimCustomer]");

        // Set the connection manager
        srcComponent.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.GetExtendedInterface(connection);
        srcComponent.RuntimeConnectionCollection[0].ConnectionManagerID = connection.ID;

        // Retrieve the column metadata
        srcDesignTimeComponent.AcquireConnections(null);
        srcDesignTimeComponent.ReinitializeMetaData();
        srcDesignTimeComponent.ReleaseConnections();

        // Add Row Count transform
        IDTSComponentMetaData100 rowCount = pipeline.ComponentMetaDataCollection.New();
        rowCount.ComponentClassID = "DTSTransform.RowCount";
        CManagedComponentWrapper instance = rowCount.Instantiate();
        instance.ProvideComponentProperties();

        // Set the variable name property
        instance.SetComponentProperty("VariableName", "User::RowCountVar");

        // Connect the OLEDB Source and the Row Count
        IDTSPath100 path = pipeline.PathCollection.New();
        path.AttachPathAndPropagateNotifications(srcComponent.OutputCollection[0], rowCount.InputCollection[0]);

        return p;
    }
}
Posted by mmasson | 2 Comments
Filed under:

Microsoft Tech-Ed North America 2009

Tech Ed Lives HereAre you going to TechEd in May? I’ll be down there in lovely Los Angeles answering questions at the SSIS booth and doing a Chalk Talk on avoiding common pitfalls with SSIS. There’s only a couple of SSIS related breakout sessions this time around, but plenty on the database platform and BI in general (there’s 430 breakout sessions in total).

BIN301 Connectivity Options for Data Integration with Microsoft SQL Server Integration Services

BIN207 Upgrading DTS Packages to Microsoft SQL Server Integration Services

Let me know if you’re going! Maybe we can arrange some sort of party for SSIS on the side…

Posted by mmasson | 0 Comments
Filed under:
More Posts Next page »
 
Page view tracker