Welcome to MSDN Blogs Sign in | Join | Help

Lookup Pattern: Incremental persistent cache updates

The Cache Transform isn't currently able to update (i.e. append to) an existing persistent cache file. This pattern presents a way to incrementally build up your lookup cache if your data flow process is responsible for adding new rows to your reference table. For an alternative approach, please see the Other Resources section at the end of this post.

The process is split up across multiple packages. Because a Cache Connection Manager reads its cache file once, and keeps the cache in memory until package execution is complete, you can't read and update a cache file in the same package.

Control Package

This is the control package which uses a script task to check if the cache file already exists. If it doesn't, it executes the "Create Cache" package. Once the cache is there, the main data flow package which performs your general ETL logic is executed. The "Data Flow" package will write out any new rows for the reference cache to a temporary file, and update a variable in the control package to indicate how many rows were written out. The parent checks if any rows were written, and if so, runs the "Update Cache" package to update the persistent cache file

image 

Create Cache

The cache creation package reads the source reference data, and then writes it to two separate locations. First it uses the Cache Transform to create a persistent cache, and then it writes the same data out to a RAW file. This RAW file will be incrementally updated by the Data Flow package to store the latest data for our reference set.

image

Data Flow

In the main ETL package, we use a key generation pattern to generate and insert new values into our reference table. Instead of rejoining the main flow right away, we multicast it out to a Row Count, followed by a RAW File Destination.

The Row Count writes to a local variable. After this data flow, we'll use a script task to copy this variable's value over to a variable in the parent package. You can see Jamie Thomson's post for more about how this works.

The RAW File Destination is able to append to an existing RAW File. Here we use it to add the new rows to the "incremental update" file we created in the "Create Cache" package.

image 

Update Cache

If the main ETL package wrote any new rows to the incremental update RAW file, the control package will run the Update Cache package. This package simply reads the incremental update file (which will contain the entire cache, since it's simply being appended to each time), and recreate the cache file.

image

The sample packages used in this post can be downloaded from my SkyDrive share.

 

Other Resources

Published Sunday, November 23, 2008 12:16 PM by mmasson
Filed under:

Comments

# Lookup Patterns

From the Lookup presentation I put together for the MS BI conference in October, here is a series of

Sunday, November 23, 2008 3:18 PM by SSIS Team Blog

# Learn more about Lookup and its new features in 2008

The SQL Server Integration Services team added valuable new caching options (and scalability) to the

Monday, January 19, 2009 5:10 PM by Douglas Laudenschlager
Anonymous comments are disabled
 
Page view tracker