This blog provides information, news, tips, and announcements about the SQL Server Data Quality Services (DQS) feature introduced in SQL Server 2012.
OH22 data has just released a free SSIS transform task for SQL Server Data Quality Services (DQS) matching. The solution is published on codeplex. With this custom transform task, you can use the Knowledge Base (KB) created in DQS to automate data matching through SSIS.
In this article, I am going to walk through using the new SSIS transform to automate matching project which I described in the previous article: Matching related and duplicate Customer Records using SQL Server Data Quality Services (DQS).
To summarize, we have the following data set which we would like to identify duplicate and related records:
In the previous article, we created a DQS Knowledge Base, MyCustomerKB, with the following matching rules:
We were using Data Quality Client tool to (manually) execute a matching project using MyCustomerKB against the data input above. In this article, we will be automating the process using the new SSIS transform task.
The following are the steps to automate:
A. Install DQS Matching codeplex project
B. Create SSIS project and configure DQS matching transform task
C. Execute the SSIS project and review the results
You must map all domains used by matching rules in your domain. The following error message appears if you click OK without mapping all domains: "Not all domains have been assigned or has been assigned twice. Please assign all domains properly"
When you use composite domain as part of your matching rule, you can either map the composite domain or all of the single domains that make up the composite domain. If you map a column to a composite domain, then the single domains removed as drop down options. Conversely, if you select one of the single domains that are part of the composite domain, then the composite domain disappear from the drop down option. You can redo selection by un-selecting the column checkbox from the top section.
Optionally, you can also go to the Advanced tab to specify the minimal matching rule threshold. By default, DQS match only records with matching score of 80% and above. You can raise the minimal score by changing it from the advanced tab.
Create destination table. The SSIS DQS Matching produce 2 outputs: Matched Output and Unmatched Output. TheUnmatched Output will create the same output schema as your input data source. The Matched output add the following columns:
You do not need to include all the metadata above in your matched output table. I suggest to include at least the following metadata:
The scripts to create the matched and unmatched table based on the data source we previously used:
CREATE TABLE MyCustomers_Matched( CustomerID INT, CustomerName NVARCHAR(255), City NVARCHAR(32), Province NVARCHAR(32), LastUpdate DATETIME, [RecordId] nvarchar(255), [ClusterId] nvarchar(255), [RuleId] nvarchar(255), [SiblingId] nvarchar(255), [MatchingScore] nvarchar(255))
CREATE TABLE MyCustomers_UnMatched( CustomerID INT, CustomerName NVARCHAR(255), City NVARCHAR(32), Province NVARCHAR(32), LastUpdate DATETIME)
The final data flow task should look like below:
In this article, I describe how to use the SSIS DQS Matching transform task developed by OH22. Using this transform task, you can automate the DQS record matching using the Knowledge Base created from DQS client. You can download the codeplex project from https://ssisdqsmatching.codeplex.com/.
Excellent blog, data matching process provides high quality of data which helps to identify replica and maintain the database. For good quality of data matching service visit<a href=” http://www.elisthunter.com/”>elisthunter</a>
Impressive blog, Data matching service is a best way to maintain database…. To get more details visit<a href=”www.b2bmarketingarchives.com”>b2bmarketing archives</a>
Great Post... we provide different data services that enhances your email marketing at <a href="www.b2bmarketingpartners.com/.../">Data Matching Services</a>
Great post, thank u for the details its nice to gain knowledge from experts to get a Required set of email list of industries visit www.pioneerlists.com/.../it-industry-email-list
Informative Post . to know more about Data Matching visit <a href="www.b2bmarketingpartners.com/.../">Data Matching </a>
I have big problem with component. I send to input 509980 records, matched out 176061 and unmatfched 315976.
Sum of matched and unmatched 492037, not 509980!
The DQS Matching transform is inadequate and unstable (while the columns are mapped). Why do we need to specify minimum Matching Score on the advanced tab when Matching policy , with multiple matching rules, can be already defined in DQS. Why can't this component leverage the matching rules already defined in DQS? This component does not support multiple matching rules. How can Microsoft ignore the automated matching through an SSIS transformation and instead something untrustworthy from CodePlex is being pushed?