Today I will explain a way to track historic changes in a data warehouse and share a little program I wrote that will help you do this in minutes. It’s based on a method I’ve use many times and which is generic enough to cover many different scenarios. I’ll use an example to clarify how this works. In this example, I have a data warehouse database called MyDW, and this database contains a table called ImportTable which I’ll assume to be a table which is imported from another database by the data warehouse’s ETL process. For the purpose of this example I’ve kept thus table very simple:
CREATE TABLE [dbo].[ImportTable]( [ID] [int] NULL, [Name] [nvarchar](100) NULL, [Country] [nvarchar](100) NULL ) ON [PRIMARY]
I called this method of tracking history “snapshot-based versioning”. It does not rely on database triggers or on CDC. While those features might be useful for other scenarios, I’ve found them too limiting for data warehouse scenarios. For example, these features don’t work well if your ETL process refreshes the data by truncating the existing table and then reloading it.
Snapshot-based versioning instead relies on TSQL’s MERGE statement. The MERGE statement is supported in SQL Server 2008 or higher. A limitation is that you cannot track changes to fields of type image, ntext and text. This is usually not a blocker because in real-life scenarios most tables have a “modified” field that is updated whenever a change to the record is changed.
The idea is to have a second table called H_ImportTable which has the same fields as the original ImportTable. This second table, which I’ll refer to as the “history table”, has a versioned copy of the data in H_ImportTable. In other words, it works like a log which appends a row for any change to ImportTable. Obviously, if the table in the external database has changed multiple times between two ETL runs, this will appear only as one change in H_ImportTable, hence the name “snapshot-based versioning”.
You can create H_ImportTable as follows:
CREATE TABLE [dbo].[H_ImportTable] ( [_ACTION] [char](1) NOT NULL, [_KEY] [int] IDENTITY(1,1) NOT NULL, [_EFF_FROM] [datetime] NOT NULL, [_EFF_TO] [datetime] NOT NULL, [ID] [int] NULL, [Name] [nvarchar](100) NULL, [Country] [nvarchar](100) NULL )
After refreshing ImportTable, the ETL process should update H_ImportTable using the following SQL statement:
INSERT INTO [dbo].[H_ImportTable] SELECT _ACTION, SYSDATETIME() AS _EFF_FROM, '9999-12-31' AS _EFF_TO,[ID],[Name],[Country] FROM ( MERGE [dbo].[H_ImportTable] AS T USING [dbo].[ImportTable] AS S ON (S.[ID]=T.[ID] AND T._EFF_TO='9999-12-31') WHEN NOT MATCHED BY TARGET THEN INSERT(_ACTION,_EFF_FROM,_EFF_TO,[ID],[Name],[Country]) VALUES('A',SYSDATETIME(),'9999-12-31',[S].[ID],[S].[Name],[S].[Country]) WHEN MATCHED AND T._EFF_TO='9999-12-31' AND (T._ACTION='D' OR ([S].[Name]<>[T].[Name] OR [S].[Country]<>[T].[Country])) THEN UPDATE SET T._EFF_TO=SYSDATETIME() WHEN NOT MATCHED BY SOURCE AND T._EFF_TO='9999-12-31' AND T._ACTION<>'D' THEN UPDATE SET T._EFF_TO=SYSDATETIME() OUTPUT $Action Action_Out ,CASE WHEN S.[ID] IS NULL THEN 'D' WHEN Inserted._ACTION='D' THEN 'U' ELSE 'C' END AS _ACTION ,ISNULL(S.[ID],Deleted.[ID]) AS ID,[S].[Name],[S].[Country] ) AS MERGE_OUT WHERE MERGE_OUT.Action_Out = 'UPDATE'
You can adjust this SQL code to work with the tables in your data warehouse. To do so, adjust all the parts that are underlined. If you need to do this for many tables, making these changes manually is laborious and error-prone, This is why I wrote the program “Table Merge Scripts” which you can download here. This program generates the SQL code based on an existing table in the database. You can see the UI below:
First, edit the connection string by entering the correct server name and database name. Then click Connect.
Next select the table and the field that is the primary key.
The first button in the bottom half of the screen copies the SQL code to create the history table to the clipboard. The new table will have the same name as the selected source table, prefixed with “H_”. The code also creates indexes for _ACTION, _EFF_FROM, _EFF_TO, _KEY and the field which you designated as the primary key in the source table.
The next button copies the SQL code to update the history table to the clipboard. This consists of a MERGE command similar to the example shown above.
The final button copies the same SQL code as the second table, but wrapped in a stored procedure.