Configuring SSIS DQS Domain Value Import

Configuring SSIS DQS Domain Value Import

Rate This
  • Comments 7

In the previous blog article, I described how to use the free SSIS DQS Domain Value Import from OH22 data to automate importing values into a DQS Knowledge Base domain. In this article, I will go over the optional configuration to import invalid/error value types as well as to set synonyms.

First, let's quickly recap the domain type and synonym.

Domain type

In DQS, you can define a domain value with different types:

  • Correct: This is a valid value that belongs to the domain. For example, in the color name domain, red is a valid color name.
  • Error: This is a value that is incorrect. For example, 'tbd' is not a correct color name. 
  • Invalid: This is a value that may be correct but does not belong to the domain. For example: oliver is a correct person name but it is not a correct color name.

Note: The difference between Error and Invalid is often subtle. Whether you set a domain value as Error or Invalid, DQS cleansing gives the same result and marks records containing those values as Invalid.  Many DQS implementation combine the Error and Invalid and only use Correct or Invalid type to simplify. 

Synonym

When setting up domain value type to Error or Invalid, you can specify Correct To value. DQS cleansing then automatically replace the value with the Correct To (i.e. synonym).

The SSIS DQS Domain Value Import can automate importing values with different types and setting up synonym above.  To illustrate, I use the Knowledge Base from the previous article:

Importing domain values with different types

Consider that we would like to insert new values to the domain Name from a SQL table:

CREATE TABLE DQSCOLORS2 (
 NAME NVARCHAR(64),
 DomainType INT
 )
-- Note: that the column containing domain value must have NVARCHAR/NCHAR and Domain type must be set to INT data type.

INSERT INTO DQSCOLORS2
VALUES
 ('white', 0)
 ,('tbd', 1)
 ,('oliver',2)

The domain type column must be set to one of the following integer value:

  • 0 (correct)
  • 1 (error)
  • 2 (invalid)

In this example, 'white' is a correct color name, 'tbd' is an error, and 'oliver' is an invalid color name.

Configure the  SSIS DQS Domain Value Import as follows:

  • Specify connection to the DQS server
  • Select the Data Quality Knowledge Base from the drop down (e.g. Colors).
  • Select the Domain from the drop down (e.g. Name)
  • Specify input column for the Leading Value (e.g. Name)
  • Specify input column for the type (e.g. DomainType). Note: The drop down only shows column with INT data type. Check the data type if you do not see the column name from the drop down.
  • Accept all other default setting (In part 3 of this series, I will explain how to use error handling options)
  • Click OK to finish

Make sure the DQS Knowledge Base Colors is published before executing the SSIS package. After the SSIS package completed, open the Knowledge Base  to review the result:

The domain values are added with the specified type.

A couple of things to consider when using the Domain Type option:

  • Domain type source column must be set to INT and contained only 0,1,2 and does not have NULL -- any violation to this will result in error during package execution.  See the next article for more information on error handling.
  • You can't update an existing domain type (e.g. you want to change green from Correct to Invalid). This must be done through DQS Client.

Importing domain values with synonym

In this example, I set up a new table with the following:


CREATE TABLE DQSCOLORS3 (
 NAME NVARCHAR(64),
 Synonym  NVARCHAR(64)
 )

INSERT INTO DQSCOLORS3 VALUES
 ('white', 'bright white')
 ,('DQS_NULL', 'not specified')
 ,('clear', 'transparent')

The first row sets synonym to an existing valid value (e.g. 'white') in the domain. The second row sets synonym to a special domain value called DQS_NULL (this value can be used to specify how to handle NULL during DQS cleansing).  The last row set synonym where both the leading value and its synonym are not currently in the domain.

Configure the  SSIS DQS Domain Value Import as follows:

Make sure the DQS Knowledge Base Colors is published, then execute the SSIS package and review the result:

The synonyms are successfully added to the DQS domain.

'bright white' and 'not specified' are set as Correct to 'white' and DQS_NULL accordingly. This means, for example, that when you are cleansing records containing 'not specified', it will be corrected to null. Lastly, both 'clear' and 'transparent' are added with 'clear' set as the leading value. Note that when importing leading value and synonyms using SSIS DQS Domain Value Import, the synonyms are always set to invalid type (e.g. 'transparent') while the leading value are always set to correct (e.g. 'clear').

A few things to consider when using the Synonym option:

  • Synonyms are created as new value with domain type always set to Invalid
  • You can set synonym to an existing Correct domain values (e.g. 'white" and 'DQS_NULL') -- If you try to set synonym to an Invalid/Error domain, you will get an error during package execution. See the next article for more information on error handling.
  • You can set synonym to BOTH a new domain value and a new synonym pair (e.g. 'clear' and 'transparent' are new and not previously exists in the Color Name domain)

Importing domain values with both different types and synonym

In this last example, we will import data that include both type and synonym:

CREATE TABLE DQSCOLORS4 (
 NAME NVARCHAR(64),
 DomainType INT,
 Synonym  NVARCHAR(64)
 )

INSERT INTO DQSCOLORS4
VALUES
 ('red',NULL ,'dark red')
 ,('blue',0 ,'dark blue')
 ,('blue',1 ,'light blue')
 ,('blue',3 ,'bright blue')
 ,('yellow',0         , NULL)

DQS Domain Value import is configured to map all three input columns:

The result of the SSIS package execution shows the following:

A few things to consider when using both Domain Type and Synonym option:

  • Domain type is ignored when you have synonym. The new synonym always have Invalid type. (e.g. see 'dark red', 'dark blue', 'light blue' examples above)
  • When the synonym is NULL, then the Domain Type must have non NULL valid of either 0,1,2. In another word, you can't have both domain type and synonym to be NULL. This will result in an error during package execution.

Let's proceed to the next article where I describe the options you can use to handle error during import.

Leave a Comment
  • Please add 7 and 6 and type the answer here:
  • Post
  • Nice sharing , this tutorial is helpful ! I have a quick question needs your advice. If I import to a domain without any domain values, SSIS works. But if that domain already has values inside, SSIS always fails and returns error messages below

    [DQS Domain Value Import] Error: It's not possible to add domain values to a Domain, when the knowledge base is already opened and not in state "domain management".

    [DQS Domain Value Import] Error: Publish the knowledge base before import new domain values.

    [DQS Domain Value Import] Error: Make sure that no one makes changes to the knowledge base during package execution.

    [DQS Domain Value Import [2]] Error: System.Exception: Open Knowledge Base

      at oh22is.SqlServer.DQS.DomainValueDestination.PreExecute()

      at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

    [SSIS.Pipeline] Error: DQS Domain Value Import failed the pre-execute phase and returned error code 0x80131500.

    My KB already gets published,  I try to unlock the KB and logout the DQS Client  but they don't work.  Is there anything I miss to configure ?

  • Nick,

    You should be able to import to a non empty kb (the examples I used in the blog articles were non empty KB). Make sure that you click "Finish" when you open the KB from DQS Client and select "Publish" on the subsequent popup dialog window. You should receive a popup confirmation :  "The Knowledge Base was successfully published" and the "state" of the KB (on the kb list when you click "Open" from the main DQS screen) should be empty (you should also receive no lock). Another thing to watch out is the default setup in SSIS DQS Domain Value import is "Publish when there is no error". So make sure to publish if you encounter error the first time around.

  • This was very helpful, but is there a way to load value relations in a complex domain in an automated way? I have large complex domains. I want to load the individual columns this way, but then also load some cross domain rules to validate the complex domain. Is this possible??

  • The concept although sounds fairly simple, in reality, it is a night mere to import domain values using this route. I would rather use the feature within DQS. Useful post though.

  • I misinterpreted this blog. It does work very well to my surprise :) Thanks for the demo/post!

  • good and nice job

  • thanks

Page 1 of 1 (7 items)