• SQL Server Cast

    Synchronize two tables using SQL Server Integration Services (SSIS)-Part II of II

    • 1 Comments

    Please, refer to this link for the first part of this post.

    In this part we will start with the same sample scenario where “table A” in “database A” needs to be synchronized with “table B” in data warehouse “database B”. During the first part of this post we discussed how to deal with new records in “table A” but we did not explain how to proceed when existing records are updated in “table A”.

    We can implement different methods within SSIS to discover the records that differ between source and destination tables. For example, we can use the EXCEPT operator to extract those differences and update the destination table accordingly however, I did find that using tablediff utility is probably the fastest way to implement this in SSIS. tablediff is used in SQL Server replication to compare and return detailed information about what difference exist between two tables.

    The great thing about tablediff is that it can not only compare tables, but also script out the differences between then too, this way tables can be synchronized just by running the script. The caveat about using tablediff is that this utility only works with SQL Server servers so if you are trying to synchronize tables that are hosted in a different database engine you are out of luck with this post.

    In my case this is what I did to synchronize the two tables:

    1. Ensure tablediff utility is installed on you SQL Server. The tablediff.exe executable is found under C:\Program Files\Microsoft SQL Server\<version>\COM

    2. Add the tablediff.exe path to the Windows system path using Computer Properties > Advanced System Settings > Advanced > Environment Variables > System Variables > PATH

    3. Make sure the xp_cmdshell advanced SQL Server option is enabled running sp_configure from SQL Server Management Stduio (SSMS):

    clip_image001

    Please, carefully read the xp_cmdshell documentation and understand the implication of using this option in your environment. xp_cmdshell creates a Windows process with the same security rights as the SQL Server service account which means that sysadmins members can access OS functions than his Windows account may not have. By default, only members of the SQL Server sysadmin fixed server role are authorized to run this extended procedure. If your company has no guidelines about what permissions are used for the SQL Server service accounts and who belongs to the sysadmin fixed server role, carefully evaluate the xp_cmdshell option.

    4. Using the same SSIS project we created during the first part of this post, create two “Execute SQL Task” objects under the Control Flow section in BIDS. The first task, called “Execute tablediff” in my example, will take care of executing the tablediff.exe command. Here is a sample of the code in my case:

    exec master..xp_cmdshell 'tablediff.exe -sourceserver SQL2008R2\KILIMANJARO64 -sourcedatabase SSISDBSource -sourcetable Customer -destinationserver SQL2008R2\KILIMANJARO64 -destinationdatabase SSISDBDest -destinationtable Customer -f C:\Temp\Diff'

    image

    The important part here is the –f switch, that creates a T-SQL script with the changes that have to be implemented on the destination table to have the same information as the source, here is an example of this automatically generated script:

    image

    The second task, called “Execute SQL Script” in my example, will take care of running the saved C:\Temp\Diff.sql script in the database, implementing the changes from the source table on the destination table:

    image

     

    5. Optionally you can combine the “Data Flow” task we created during the first part of this blog with these two “Execute SQL Task” to have a full synchronization package.

    image

  • SQL Server Cast

    Synchronize two tables using SQL Server Integration Services (SSIS)–Part I of II

    • 7 Comments

    There are situations where a SQL Server DBA needs to keep two different tables in sync. Usually, the DBA needs to keep a copy of a table in a in a data warehouse repository that is used as a solution for archiving and/or reporting.

    SQL Server provides a robust method for keeping data synchronized across databases using Replication but there are situations when all we need is just to keep an online copy of a single table for archiving or reporting purposes and we would prefer to do not tinker into SQL Server replication.

    This post is divided into two parts: Part I explains how to update a destination table with the information that is added into a source table while Part II explains how to replicate any change that happens in existing records in the source table into destination. Please, take into account this is just another way of doing this, there are many other possibilities but the one listed here is probably one of the fastest to implement and modify on-demand.

    This procedure is based on the following scenario: A “table A” on “database A” is periodically replicated using SSIS into “table B” on “database B”. “Table A” is updated with new records and we need to copy those records into “table B” too. Our final implementation will look like this in SQL Server Business Intelligence Development Studio:

    image

     

    Let’s see how this works:

    1. “Source Table” is “table A” in “database A” while “Dest Table” is destination “table B” in “database B”. We start creating two different OLEDB connections in the Data Flow component in SSIS using both the source and destination table as data sources.

    2. We need to perform a JOIN operation on the two different data sources to copy the information we need from one table into the other. For this JOIN to work correctly, the data has to be sorted; this is described in the following MSDN link:

    In Integration Services, the Merge and Merge Join transformations require sorted data for their inputs. The input data must be sorted physically, and sort options must be set on the outputs and the output columns in the source or in the upstream transformation. If the sort options indicate that the data is sorted, but the data is not actually sorted, the results of the merge or merge join operation are unpredictable.

    The “Merge Join” operation is where we filter the data that has been added in the source table (the one we need) and the data that has not been added (the one we do not need). In our case the source table (on the left) contains all the columns we want to copy while on the destination table (on the right) contains only the record that corresponds to the Primary Key column “No_”. Here is the description of this task:

    image

    Here is the key part of the process: the Left Outer Join retrieves all the records in the source table but those that do not exist on the destination table are retrieved as NULLs in the Join Key column “No_”. This is also described in the product documentation:

    To include all products, regardless of whether a review has been written for one, use an ISO left outer join. The following is the query:

    USE AdventureWorks2008R2;
    GO
    SELECT p.Name, pr.ProductReviewID
    FROM Production.Product p
    LEFT OUTER JOIN Production.ProductReview pr
    ON p.ProductID = pr.ProductID

    The LEFT OUTER JOIN includes all rows in the Product table in the results, whether or not there is a match on the ProductID column in the ProductReview table. Notice that in the results where there is no matching product review ID for a product, the row contains a null value in the ProductReviewID column.

    3. Then we have to split the data we need from the one we do not need. For this we use a “Conditional Split” task that takes care of saving the information for those records where the Join Key “No_” was NULL, i.e. saving the information only of the new records. Here is a description of this conditional split task:

    image

    4. Finally we INSERT the resultant data from the Split Conditional task into the destination table, which is the same we use as the source table at the very beginning.

    In this example the Join task is configured so you can reuse the tasks as many times as you want, the records on the destination table will not be duplicated with the information on the source table, only the new records will be populated into the destination table.

Page 1 of 1 (2 items)