This blog provides information, news, tips, and announcements about the SQL Server Data Quality Services (DQS) feature introduced in SQL Server 2012.
Data Quality Services (DQS) internally stores the domain values in a knowledge base in the Unicode format, and uses the trigram algorithm, which is language agnostic, to compare the domain values with your source data for the cleansing and matching operations. Therefore, you can practically use DQS to cleanse and match data in all the languages that are supported by the Windows operating system.
While creating a string domain in DQS, one can specify the language for the domain.
The Language drop-down list displays a limited set of languages, and this selection is only applicable for the Speller feature in DQS. The Speller feature works only for those languages that are listed in the Language drop-down list.
However, if you have values in a non-listed language (for example, Greek, Chinese, and so on), you must select Other from the Language drop-down list.
NOTE: Selecting Other from the Language drop-down list disables the Speller feature for the domain.
The collation setting determines the rules for comparing data in SQL Server. Although DQS stores all values in the Unicode format, the collation setting does influence the comparison rules. For example, the characters that are considered to be different in a collation setting might be considered the same in another collation setting. Therefore, you should choose a different collation setting, other than the default collation setting, while installing DQS only if you are completely aware of the collation comparison rules, and are sure about using the same comparison rules in DQS for cleansing and matching.
However, if you are not sure about the comparison rules in a collation, you must install DQS with the default server collation, and things should work fine for you. For more information about installing DQS and specifying collation settings, see here.
To see DQS in action on a non-listed language, see the following blog post where DQS helps in finding duplicates in the Hindi (India) language: govindkanshi.wordpress.com/.../data-quality-servicesdqs-deduplicationmatching-hindi-data-with-sql-server.