This blog provides information, news, tips, and announcements about the SQL Server Data Quality Services (DQS) feature introduced in SQL Server 2012.
In the previous blog article, I described how to use the free SSIS DQS Domain Value Import from OH22 data to automate importing values into a DQS Knowledge Base domain. In this article, I will go over the optional configuration to import invalid/error value types as well as to set synonyms.
First, let's quickly recap the domain type and synonym.
In DQS, you can define a domain value with different types:
Note: The difference between Error and Invalid is often subtle. Whether you set a domain value as Error or Invalid, DQS cleansing gives the same result and marks records containing those values as Invalid. Many DQS implementation combine the Error and Invalid and only use Correct or Invalid type to simplify.
When setting up domain value type to Error or Invalid, you can specify Correct To value. DQS cleansing then automatically replace the value with the Correct To (i.e. synonym).
The SSIS DQS Domain Value Import can automate importing values with different types and setting up synonym above. To illustrate, I use the Knowledge Base from the previous article:
Consider that we would like to insert new values to the domain Name from a SQL table:
CREATE TABLE DQSCOLORS2 ( NAME NVARCHAR(64), DomainType INT )-- Note: that the column containing domain value must have NVARCHAR/NCHAR and Domain type must be set to INT data type.
INSERT INTO DQSCOLORS2 VALUES ('white', 0) ,('tbd', 1) ,('oliver',2)
The domain type column must be set to one of the following integer value:
In this example, 'white' is a correct color name, 'tbd' is an error, and 'oliver' is an invalid color name.
Configure the SSIS DQS Domain Value Import as follows:
Make sure the DQS Knowledge Base Colors is published before executing the SSIS package. After the SSIS package completed, open the Knowledge Base to review the result:
The domain values are added with the specified type.
A couple of things to consider when using the Domain Type option:
In this example, I set up a new table with the following:
CREATE TABLE DQSCOLORS3 ( NAME NVARCHAR(64), Synonym NVARCHAR(64) )
INSERT INTO DQSCOLORS3 VALUES ('white', 'bright white') ,('DQS_NULL', 'not specified') ,('clear', 'transparent')
The first row sets synonym to an existing valid value (e.g. 'white') in the domain. The second row sets synonym to a special domain value called DQS_NULL (this value can be used to specify how to handle NULL during DQS cleansing). The last row set synonym where both the leading value and its synonym are not currently in the domain.
Make sure the DQS Knowledge Base Colors is published, then execute the SSIS package and review the result:
The synonyms are successfully added to the DQS domain.
'bright white' and 'not specified' are set as Correct to 'white' and DQS_NULL accordingly. This means, for example, that when you are cleansing records containing 'not specified', it will be corrected to null. Lastly, both 'clear' and 'transparent' are added with 'clear' set as the leading value. Note that when importing leading value and synonyms using SSIS DQS Domain Value Import, the synonyms are always set to invalid type (e.g. 'transparent') while the leading value are always set to correct (e.g. 'clear').
A few things to consider when using the Synonym option:
In this last example, we will import data that include both type and synonym:
CREATE TABLE DQSCOLORS4 ( NAME NVARCHAR(64), DomainType INT, Synonym NVARCHAR(64) )
INSERT INTO DQSCOLORS4VALUES ('red',NULL ,'dark red') ,('blue',0 ,'dark blue') ,('blue',1 ,'light blue') ,('blue',3 ,'bright blue') ,('yellow',0 , NULL)
DQS Domain Value import is configured to map all three input columns:
The result of the SSIS package execution shows the following:
A few things to consider when using both Domain Type and Synonym option:
Let's proceed to the next article where I describe the options you can use to handle error during import.
Nice sharing , this tutorial is helpful ! I have a quick question needs your advice. If I import to a domain without any domain values, SSIS works. But if that domain already has values inside, SSIS always fails and returns error messages below
[DQS Domain Value Import] Error: It's not possible to add domain values to a Domain, when the knowledge base is already opened and not in state "domain management".
[DQS Domain Value Import] Error: Publish the knowledge base before import new domain values.
[DQS Domain Value Import] Error: Make sure that no one makes changes to the knowledge base during package execution.
[DQS Domain Value Import ] Error: System.Exception: Open Knowledge Base
at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)
[SSIS.Pipeline] Error: DQS Domain Value Import failed the pre-execute phase and returned error code 0x80131500.
My KB already gets published, I try to unlock the KB and logout the DQS Client but they don't work. Is there anything I miss to configure ?
You should be able to import to a non empty kb (the examples I used in the blog articles were non empty KB). Make sure that you click "Finish" when you open the KB from DQS Client and select "Publish" on the subsequent popup dialog window. You should receive a popup confirmation : "The Knowledge Base was successfully published" and the "state" of the KB (on the kb list when you click "Open" from the main DQS screen) should be empty (you should also receive no lock). Another thing to watch out is the default setup in SSIS DQS Domain Value import is "Publish when there is no error". So make sure to publish if you encounter error the first time around.
This was very helpful, but is there a way to load value relations in a complex domain in an automated way? I have large complex domains. I want to load the individual columns this way, but then also load some cross domain rules to validate the complex domain. Is this possible??
The concept although sounds fairly simple, in reality, it is a night mere to import domain values using this route. I would rather use the feature within DQS. Useful post though.
I misinterpreted this blog. It does work very well to my surprise :) Thanks for the demo/post!
good and nice job