The sound of silence – null, empty, space, missing, dummy and other animals

The sound of silence – null, empty, space, missing, dummy and other animals

Rate This
  • Comments 1

Null, space, dummy, “unknown”, 99999, empty.  All these small and innocent words represent missing data that may have a bad impact on our business execution and analysis.

  • How can a customer participate in my new email campaign if he doesn’t have an email?
  • How can I ship my merchandise quickly and get mailing discounts from the post office if my data misses a zip code?
  • How can I report correctly on my business if my Segment field has the Value “Unknown”?

       So how DQS addresses the case of missing information?

  • DQS_NULL
    Each domain has a seeded value – DQS_NULL, which represents null values. When the data has a null or any number of spaces, it will be associated with this value.
    This DQS_NULL cannot be deleted and can be set as Correct or Invalid.

    The following diagram presnets the DQS_NULL in domain values:

 

  • Null equivalence
    Null Equivalence is related to the cases where your data consists actual values that are equivalent to an empty field. Common examples are “Unknown” , “Do not Know”, or 99999 for numeric values and 1/1/1111 for date values. To define these values as Null Equivalence, all you need to do is add them to the domain, and then link them to the DQS_NULL value as synonyms.

    The following diagrams presents Null equivalence values in Domain values:



  • Completeness
    DQS Integrated profiling has a specific metric that tracks missing values. This metric will track all the nulls and null equivalent values in your data.
    So if you’ve set the “Unknown” and DQS_NULL as null equivalent and your data has 5% populated with spaces and 10% populated with “Unknown” , the completeness metric will show that 15% of the data is missing.

    The following diagrams presents the profiling completeness in DQS activity, before and after the definition of Null Equivalence:

 

 

 

  • Missing data is Invalid
    If you want, you can set the DQS_NULL and all the null equivalence values as invalid, or set a domain rule that will set them to Invalid.

    The following diagrams present In Valid values in Domain Management and how they are tracked in a cleansing project:

 

 

 

       This concludes our discussion on Null Values .

Now go out and hunt them down !!! 

Leave a Comment
  • Please add 2 and 7 and type the answer here:
  • Post
  • Hello,

    "If you want, you can set the DQS_NULL and all the null equivalence values as invalid, or set a domain rule that will set them to Invalid. "

    Do you have any examples of domain rules that set a null value as invalid ? I'm afraid that patterns and regular expressions are not working well in this case.

    Thanks.

Page 1 of 1 (1 items)