<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>SSIS Team Blog : Data Profiling</title><link>http://blogs.msdn.com/mattm/archive/tags/Data+Profiling/default.aspx</link><description>Tags: Data Profiling</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Accessing a data profile programmatically</title><link>http://blogs.msdn.com/mattm/archive/2008/03/12/accessing-a-data-profile-programmatically.aspx</link><pubDate>Thu, 13 Mar 2008 09:03:31 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8179037</guid><dc:creator>mmasson</dc:creator><slash:comments>11</slash:comments><comments>http://blogs.msdn.com/mattm/comments/8179037.aspx</comments><wfw:commentRss>http://blogs.msdn.com/mattm/commentrss.aspx?PostID=8179037</wfw:commentRss><description>&lt;p&gt;The new Data Profiling Task in 2008 generates an XML output file. The output is easy to read, and could be used to make decisions within a control flow. For example, you could check whether or not to process your most recent data set based on criteria in the profile (are any values in certain columns null? Do my values fall within expected ranges?)&lt;/p&gt;  &lt;p&gt;While you can parse this XML file in traditional ways, those of you who aren't afraid of using undocumented, unsupported APIs can make use of the classes in the Microsoft.SqlServer.DataProfiler assembly which we use to load the XML profile internally. &lt;/p&gt;  &lt;p&gt;Let's use a simple scenario: We want to check one of the columns (AddressLine1) in our staging table to make sure that it contains no NULL values. If it is clean, we process it in a data flow task. If NULL values are found, we want to send an email to the DBA.&lt;/p&gt;  &lt;p&gt;Here's what the control flow looks like: &lt;/p&gt;  &lt;p&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="301" alt="image" src="http://blogs.msdn.com/blogfiles/mattm/WindowsLiveWriter/Accessingadataprofileprogrammatically_13B9E/image_9.png" width="426" border="0" /&gt; &lt;/p&gt;  &lt;p&gt;We're first running our data profile, then processing the results in a script task. If we match our criteria (i.e. no nulls in the AddresLine1 column), the script task returns success and we run the &amp;quot;Process&amp;quot; data flow. If the criteria doesn't match, we fail the task, and run the send mail task instead.&lt;/p&gt;  &lt;p&gt;Note, we can store the results of the data profiling task in a variable (as a string) instead of saving it out to disk.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/mattm/WindowsLiveWriter/Accessingadataprofileprogrammatically_13B9E/image_6.png"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="484" alt="image" src="http://blogs.msdn.com/blogfiles/mattm/WindowsLiveWriter/Accessingadataprofileprogrammatically_13B9E/image_thumb_2.png" width="572" border="0" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now the interesting part - the script task.&lt;/p&gt;  &lt;p&gt;First, add a reference to the Microsoft.SqlServer.DataProfiler DLL. It can be found under %ProgramFiles%\Microsoft SQL Server\100\DTS\Binn\Microsoft.SqlServer.DataProfiler.dll (and should also be in the GAC).&lt;/p&gt;  &lt;p&gt;The following code loads the data profile XML from the package variable, de-serializes it into a DataProfile object, and cycles through the profiles until it finds the one its looking for. &lt;/p&gt;  &lt;div class="csharpcode"&gt;   &lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;using&lt;/span&gt; Microsoft.DataDebugger.DataProfiling;&lt;/pre&gt;

  &lt;pre&gt;&amp;#160;&lt;/pre&gt;

  &lt;pre class="alt"&gt;&lt;span class="kwrd"&gt;const&lt;/span&gt; &lt;span class="kwrd"&gt;string&lt;/span&gt; ColumnName = &lt;span class="str"&gt;&amp;quot;AddressLine1&amp;quot;&lt;/span&gt;;&lt;/pre&gt;

  &lt;pre&gt;&lt;span class="kwrd"&gt;readonly&lt;/span&gt; &lt;span class="kwrd"&gt;long&lt;/span&gt; Threshold = 0;&lt;/pre&gt;

  &lt;pre class="alt"&gt;&amp;#160;&lt;/pre&gt;

  &lt;pre&gt;&lt;span class="kwrd"&gt;public&lt;/span&gt; &lt;span class="kwrd"&gt;void&lt;/span&gt; Main()&lt;/pre&gt;

  &lt;pre class="alt"&gt;{&lt;/pre&gt;

  &lt;pre&gt;    Dts.TaskResult = (&lt;span class="kwrd"&gt;int&lt;/span&gt;)ScriptResults.Success;&lt;/pre&gt;

  &lt;pre class="alt"&gt;&amp;#160;&lt;/pre&gt;

  &lt;pre&gt;    &lt;span class="rem"&gt;// Retrieve the profile from the package variable&lt;/span&gt;&lt;/pre&gt;

  &lt;pre class="alt"&gt;    &lt;span class="kwrd"&gt;string&lt;/span&gt; dataProfleXML = Dts.Variables[&lt;span class="str"&gt;&amp;quot;User::DataProfile&amp;quot;&lt;/span&gt;].Value.ToString();&lt;/pre&gt;

  &lt;pre&gt;&amp;#160;&lt;/pre&gt;

  &lt;pre class="alt"&gt;    &lt;span class="rem"&gt;// Deserialize&lt;/span&gt;&lt;/pre&gt;

  &lt;pre&gt;    DataProfileXmlSerializer serializer = &lt;span class="kwrd"&gt;new&lt;/span&gt; DataProfileXmlSerializer();&lt;/pre&gt;

  &lt;pre class="alt"&gt;    DataProfile profile = serializer.Deserialize(&lt;span class="kwrd"&gt;new&lt;/span&gt; System.IO.StringReader(dataProfleXML));&lt;/pre&gt;

  &lt;pre&gt;&amp;#160;&lt;/pre&gt;

  &lt;pre class="alt"&gt;    &lt;span class="rem"&gt;// Cycle through the profiles to find the one we're looking for&lt;/span&gt;&lt;/pre&gt;

  &lt;pre&gt;    &lt;span class="kwrd"&gt;foreach&lt;/span&gt; (Profile p &lt;span class="kwrd"&gt;in&lt;/span&gt; profile.DataProfileOutput.Profiles)&lt;/pre&gt;

  &lt;pre class="alt"&gt;    {&lt;/pre&gt;

  &lt;pre&gt;        &lt;span class="rem"&gt;// Check the profile type&lt;/span&gt;&lt;/pre&gt;

  &lt;pre class="alt"&gt;        &lt;span class="kwrd"&gt;if&lt;/span&gt; (p &lt;span class="kwrd"&gt;is&lt;/span&gt; ColumnNullRatioProfile)&lt;/pre&gt;

  &lt;pre&gt;        {&lt;/pre&gt;

  &lt;pre class="alt"&gt;            &lt;span class="rem"&gt;// Match the column name&lt;/span&gt;&lt;/pre&gt;

  &lt;pre&gt;            ColumnNullRatioProfile nullProfile = p &lt;span class="kwrd"&gt;as&lt;/span&gt; ColumnNullRatioProfile;&lt;/pre&gt;

  &lt;pre class="alt"&gt;            &lt;span class="kwrd"&gt;if&lt;/span&gt; (nullProfile.Column.Name.Equals(ColumnName, StringComparison.InvariantCultureIgnoreCase))&lt;/pre&gt;

  &lt;pre&gt;            {&lt;/pre&gt;

  &lt;pre class="alt"&gt;                &lt;span class="rem"&gt;// Make sure it's within our threshold&lt;/span&gt;&lt;/pre&gt;

  &lt;pre&gt;                &lt;span class="kwrd"&gt;if&lt;/span&gt; (nullProfile.NullCount &amp;gt; Threshold)&lt;/pre&gt;

  &lt;pre class="alt"&gt;                {&lt;/pre&gt;

  &lt;pre&gt;                    &lt;span class="rem"&gt;// Fail the task&lt;/span&gt;&lt;/pre&gt;

  &lt;pre class="alt"&gt;                    Dts.TaskResult = (&lt;span class="kwrd"&gt;int&lt;/span&gt;)ScriptResults.Failure;&lt;/pre&gt;

  &lt;pre&gt;                }&lt;/pre&gt;

  &lt;pre class="alt"&gt;&amp;#160;&lt;/pre&gt;

  &lt;pre&gt;                &lt;span class="kwrd"&gt;break&lt;/span&gt;;&lt;/pre&gt;

  &lt;pre class="alt"&gt;            }&lt;/pre&gt;

  &lt;pre&gt;        }&lt;/pre&gt;

  &lt;pre class="alt"&gt;    }           &lt;/pre&gt;

  &lt;pre&gt;}&lt;/pre&gt;
&lt;/div&gt;
&lt;style type="text/css"&gt;
.csharpcode, .csharpcode pre
{
	font-size: small;
	color: black;
	font-family: consolas, "Courier New", courier, monospace;
	background-color: #ffffff;
	/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt 
{
	background-color: #f4f4f4;
	width: 100%;
	margin: 0em;
}
.csharpcode .lnum { color: #606060; }&lt;/style&gt;

&lt;p&gt;Note, as I mentioned above, this API is for internal use, and is subject to change (it already has between CTP5 and CTP6, and will change again in the upcoming CTP-Refresh). I just thought some people out there might find this interesting. &lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8179037" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/mattm/archive/tags/Data+Profiling/default.aspx">Data Profiling</category></item><item><title>Data profiling task improvements</title><link>http://blogs.msdn.com/mattm/archive/2008/03/11/data-profiling-task-improvements.aspx</link><pubDate>Tue, 11 Mar 2008 19:37:36 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8155115</guid><dc:creator>mmasson</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/mattm/comments/8155115.aspx</comments><wfw:commentRss>http://blogs.msdn.com/mattm/commentrss.aspx?PostID=8155115</wfw:commentRss><description>&lt;p&gt;There's been some great feedback coming in about the new data profiling task. Jamie Thompson has a &lt;a href="http://blogs.conchango.com/jamiethomson/archive/2008/03/02/ssis-data-profiling-task-part-1-introduction.aspx"&gt;set of blog posts&lt;/a&gt; that cover all of the different profile options, and yesterday I read an interesting post from John Welch which describes a &lt;a href="http://agilebi.com/cs/blogs/jwelch/archive/2008/03/11/using-the-data-profiling-task-to-profile-all-the-tables-in-a-database.aspx"&gt;clever way to profile an entire database&lt;/a&gt;. &lt;/p&gt;  &lt;p&gt;I wanted to mention that we've been incorporating some of the feedback we've been getting, and list some of the improvements/changes that should be showing up in CTP Refresh.&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Quick profile page now has &amp;quot;All Tables&amp;quot; option&lt;/li&gt;    &lt;li&gt;Clarification that the task uses ADO.Net Connection managers&lt;/li&gt;    &lt;li&gt;Button to create ADO.Net connection managers directly from the task&lt;/li&gt;    &lt;li&gt;Progress events to show task completion percentage&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Keep the feedback coming!&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8155115" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/mattm/archive/tags/Katmai/default.aspx">Katmai</category><category domain="http://blogs.msdn.com/mattm/archive/tags/Data+Profiling/default.aspx">Data Profiling</category></item></channel></rss>