<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Michael Entin's notebook : Katmai</title><link>http://blogs.msdn.com/michen/archive/tags/Katmai/default.aspx</link><description>Tags: Katmai</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>SQL 2008 &amp; VS 2008</title><link>http://blogs.msdn.com/michen/archive/2007/12/04/SQL-2008-and-VS-2008.aspx</link><pubDate>Wed, 05 Dec 2007 09:42:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6663107</guid><dc:creator>michen</dc:creator><slash:comments>5</slash:comments><comments>http://blogs.msdn.com/michen/comments/6663107.aspx</comments><wfw:commentRss>http://blogs.msdn.com/michen/commentrss.aspx?PostID=6663107</wfw:commentRss><wfw:comment>http://blogs.msdn.com/michen/rsscomments.aspx?PostID=6663107</wfw:comment><description>Currently SQL Business Intelligence Development Studio (BIDS) and all the project types (AS, IS and RS) live in Visual Studio 2005. So don't try to open a solution that contains IS project in VS 2008 yet. What about final SQL 2008 - now that Visual Studio 2008 is released - what are the plans for BIDS and support of BI projects VS 2008?...(&lt;a href="http://blogs.msdn.com/michen/archive/2007/12/04/SQL-2008-and-VS-2008.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=6663107" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/michen/archive/tags/SSIS/default.aspx">SSIS</category><category domain="http://blogs.msdn.com/michen/archive/tags/Katmai/default.aspx">Katmai</category></item><item><title>SQL 2008 November CTP is available</title><link>http://blogs.msdn.com/michen/archive/2007/11/19/November_5F00_CTP.aspx</link><pubDate>Mon, 19 Nov 2007 20:56:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6405966</guid><dc:creator>michen</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/michen/comments/6405966.aspx</comments><wfw:commentRss>http://blogs.msdn.com/michen/commentrss.aspx?PostID=6405966</wfw:commentRss><wfw:comment>http://blogs.msdn.com/michen/rsscomments.aspx?PostID=6405966</wfw:comment><description>&lt;P&gt;Get it from &lt;A href="https://connect.microsoft.com/SQLServer"&gt;https://connect.microsoft.com/SQLServer&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;One of the improvements in this build is the ability to persist Lookup&amp;nbsp;reference data&amp;nbsp;and use non-OLEDB sources for Lookup.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=6405966" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/michen/archive/tags/SSIS+Lookup/default.aspx">SSIS Lookup</category><category domain="http://blogs.msdn.com/michen/archive/tags/Katmai/default.aspx">Katmai</category></item><item><title>Custom transforms: determining end of rowset</title><link>http://blogs.msdn.com/michen/archive/2007/08/31/Buffer.EndOfRowset.aspx</link><pubDate>Fri, 31 Aug 2007 21:41:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:4674204</guid><dc:creator>michen</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/michen/comments/4674204.aspx</comments><wfw:commentRss>http://blogs.msdn.com/michen/commentrss.aspx?PostID=4674204</wfw:commentRss><wfw:comment>http://blogs.msdn.com/michen/rsscomments.aspx?PostID=4674204</wfw:comment><description>&lt;P&gt;With SQL 2008 CTPs starting to get into the hands of customers and ISVs, we've found that some custom transforms and script transforms use incorrect code to check for the end of rowset (incoming data), which could cause problems when you move the code to SQL 2008. Unfortunately, some Microsoft docs helped to spead this error, so let me clarify what I mean and the correct usage pattern.&lt;/P&gt;
&lt;P&gt;Most often&amp;nbsp;the &lt;STRONG&gt;EndOfRowset &lt;/STRONG&gt;property of SSIS buffer is being often used incorrectly. The property exists on &lt;A class="" href="http://msdn2.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.pipelinebuffer.endofrowset.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.pipelinebuffer.endofrowset.aspx"&gt;PipelineBuffer&lt;/A&gt; object and &lt;A class="" href="http://msdn2.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.scriptbuffer.endofrowset.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/microsoft.sqlserver.dts.pipeline.scriptbuffer.endofrowset.aspx"&gt;ScriptBuffer&lt;/A&gt; object, so this applies to custom transforms and script transforms.&lt;/P&gt;
&lt;P&gt;The buffer.EndOfRowset property is an indicator that the current buffer is the very last (for particular input), and the component will not see any more buffers (on this input).&lt;/P&gt;
&lt;P mce_keep="true"&gt;Note that:&lt;BR&gt;1) Value of this property does not depend on whether you’ve iterated over rows in that buffer,&lt;BR&gt;2) The value should not be used to check whether the buffer contains any rows.&lt;/P&gt;
&lt;P mce_keep="true"&gt;The last point is the most important here. In SQL 2005, the data flow engine sends an additional&amp;nbsp;empty buffer with EndOfRowset indicator&amp;nbsp;at the end. So the buffer [in SQL 2005] either contains some rows, or contains EndOfRowset indicator. But don't rely on either of this.&lt;/P&gt;
&lt;P mce_keep="true"&gt;In SQL 2008, this dummy empty buffer was deemed too expensive from performance point of view, and the data flow engine would not send an additional empty buffer anymore. Instead it simply sets EndOfRowset flag on the last buffer. Thus you get a series of buffers that contain the data, and at the very end&amp;nbsp;a buffer that contains both the data and&amp;nbsp;the EndOfRowset indicator.&lt;/P&gt;
&lt;P mce_keep="true"&gt;In some samples and forum posts I've seen incorrect usage of &lt;STRONG&gt;EndOfRowset &lt;/STRONG&gt;property that relies on SQL 2005 behavior (exclusive OR: &lt;EM&gt;either&lt;/EM&gt; end-of-rowset &lt;EM&gt;or &lt;/EM&gt;some data), i.e. the code does not check the incoming rows if it gets end of rowset. In other cases the code assumes it should receive empty buffer, and uses row count as end-of-rowset indicator.&amp;nbsp;If you have a code that uses incorrect patterns, e.g.:&lt;/P&gt;&lt;PRE&gt;&lt;FONT color=#ff0000&gt;/* warning - incorrect code */&lt;/FONT&gt;&lt;BR&gt;&lt;FONT color=#0000ff&gt;if&lt;/FONT&gt; (!buffer.EndOfRowset)&lt;BR&gt;{&lt;BR&gt;&lt;FONT color=#0000ff&gt;&amp;nbsp;&amp;nbsp;while&lt;/FONT&gt;(buffer.NextRow())&lt;BR&gt;&amp;nbsp; {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;FONT color=#008000&gt;// do something with the row&lt;BR&gt;&lt;/FONT&gt;&amp;nbsp; }&lt;BR&gt;}&lt;BR&gt;&lt;FONT color=#0000ff&gt;else&lt;/FONT&gt;&lt;BR&gt;{&lt;BR&gt;&lt;FONT color=#008000&gt;&amp;nbsp; // do something at the end&lt;BR&gt;&lt;/FONT&gt;}&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;or&lt;/P&gt;&lt;PRE&gt;&lt;FONT color=#ff0000&gt;/* warning - another incorrect code */&lt;BR&gt;&lt;/FONT&gt;&lt;FONT color=#0000ff&gt;if&lt;/FONT&gt; (buffer.RowCount != 0)&lt;BR&gt;{&lt;BR&gt;&lt;FONT color=#0000ff&gt;&amp;nbsp;&amp;nbsp;while&lt;/FONT&gt;(buffer.NextRow())&lt;BR&gt;&amp;nbsp; {&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;FONT color=#008000&gt;// do something with the row&lt;BR&gt;&lt;/FONT&gt;&amp;nbsp; }&lt;BR&gt;}&lt;BR&gt;&lt;FONT color=#0000ff&gt;else&lt;/FONT&gt;&lt;BR&gt;{&lt;BR&gt;&lt;FONT color=#008000&gt;&amp;nbsp; // do something at the end&lt;BR&gt;&lt;/FONT&gt;}&amp;nbsp;&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;Please change it to:&lt;BR&gt;&lt;/P&gt;&lt;PRE&gt;&lt;FONT color=#0000ff&gt;while &lt;FONT color=#000000&gt;(buffer.NextRow())&lt;BR&gt;{&lt;BR&gt;&amp;nbsp; &lt;/FONT&gt;&lt;FONT color=#008000&gt;// do something with the row&lt;BR&gt;&lt;/FONT&gt;&lt;FONT color=#000000&gt;}&lt;BR&gt;&lt;/FONT&gt;&lt;BR&gt;if&lt;/FONT&gt; (buffer.EndOfRowset)&lt;BR&gt;{&lt;BR&gt;&lt;/FONT&gt;&lt;FONT color=#008000&gt;&amp;nbsp; // do something at the end&lt;BR&gt;&lt;/FONT&gt;}&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;Otherwise you risk missing contents of last buffer in SQL 2008 (first incorrect sample) or fail to detect end of data (second incorrect sample). The last version works correctly both in SQL 2005 and in SQL 2008. &lt;/P&gt;
&lt;P mce_keep="true"&gt;Update: see this followup blog &lt;BR&gt;&lt;A href="http://blogs.msdn.com/michen/archive/2008/10/19/does-buffer-nextrow-skips-the-first-row-in-a-buffer.aspx"&gt;http://blogs.msdn.com/michen/archive/2008/10/19/does-buffer-nextrow-skips-the-first-row-in-a-buffer.aspx&lt;/A&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=4674204" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/michen/archive/tags/SSIS+Scripting/default.aspx">SSIS Scripting</category><category domain="http://blogs.msdn.com/michen/archive/tags/Katmai/default.aspx">Katmai</category></item><item><title>SSIS Backpressure Mechanism</title><link>http://blogs.msdn.com/michen/archive/2007/06/12/ssis-backpressure-mechanism.aspx</link><pubDate>Wed, 13 Jun 2007 05:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3260307</guid><dc:creator>michen</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/michen/comments/3260307.aspx</comments><wfw:commentRss>http://blogs.msdn.com/michen/commentrss.aspx?PostID=3260307</wfw:commentRss><wfw:comment>http://blogs.msdn.com/michen/rsscomments.aspx?PostID=3260307</wfw:comment><description>&lt;P&gt;One of the mechanisms that SSIS data flow engine utilizes to achieve high performance is “back pressure”.&lt;/P&gt;
&lt;P&gt;Let’s consider a simple package with a source and destination. What happens if the source is fast and destination is slow? Say source is huge local raw file and destination is a remote relational database. If we read the source data with full speed, we would need a lot of memory to store all the data we’ve read, but could not write yet. In many cases we’d simply run out of memory or had to swap data to hard drive.&lt;/P&gt;
&lt;P&gt;To avoid this, SSIS limits the speed of sources by controlling the number of active buffers inside each execution tree. This way you can process fast sources without running out of memory.&lt;/P&gt;
&lt;P&gt;If a source or an async component is too fast (compared to the the transformations or destination down the path), the source is suspended when its execution tree gets too many buffers (currently this is fixed at 5 buffers). If the source is slow (i.e. transforms and destinations can process data faster than sources generate it), the back-pressure mechanism does not get involved and sources can run with full speed.&lt;/P&gt;
&lt;P&gt;Unfortunately, in SQL 2005 we did not have any diagnostics that would tell user what is the slow part of the flow - the source or the transforms/destination. In Katmai (SQL 2008) if the back-pressure kicks in during package execution, at the end of package execution we report the total time that the source had to wait because of this mechanism. &lt;/P&gt;
&lt;P&gt;What would you do with this information? If you run the package and see that the time a particular source was suspended is zero or&amp;nbsp;relatively low (compared to the total execution time), you know the source is the slowest part of the data flow, and you need to focus on optimizing the source. E.g. you may remove unused columns from the query, simplify SQL statements, create indexes, etc.&lt;/P&gt;
&lt;P&gt;But if a source reported&amp;nbsp;that it has been&amp;nbsp;suspected most of the time (considerable part of the total package execution time), you know the source is fast enough and you need to concentrate on performance of transforms and destinations.&lt;/P&gt;
&lt;P&gt;Another implication of back pressure is that the source database connection can be opened longer than it is needed to execute the query otherwise. If you see that a source has been suspended for several minites, and you don't want to hold the appropriate database connection for such a long time, consider staging source data in raw files as suggested by Jamie Thomson at&lt;BR&gt;&lt;A href="http://blogs.conchango.com/jamiethomson/archive/2006/12/12/SSIS_3A00_-Dropping-data-into-a-raw-file.aspx"&gt;http://blogs.conchango.com/jamiethomson/archive/2006/12/12/SSIS_3A00_-Dropping-data-into-a-raw-file.aspx&lt;/A&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3260307" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/michen/archive/tags/SSIS/default.aspx">SSIS</category><category domain="http://blogs.msdn.com/michen/archive/tags/Perf/default.aspx">Perf</category><category domain="http://blogs.msdn.com/michen/archive/tags/Katmai/default.aspx">Katmai</category></item><item><title>Katmai SSIS data flow task improvements</title><link>http://blogs.msdn.com/michen/archive/2007/06/11/katmai-ssis-data-flow-improvements.aspx</link><pubDate>Mon, 11 Jun 2007 10:00:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3223657</guid><dc:creator>michen</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/michen/comments/3223657.aspx</comments><wfw:commentRss>http://blogs.msdn.com/michen/commentrss.aspx?PostID=3223657</wfw:commentRss><wfw:comment>http://blogs.msdn.com/michen/rsscomments.aspx?PostID=3223657</wfw:comment><description>&lt;P&gt;With first Katmai (SQL Server 2008) CTP out, I think it is time to blog about some performance and scalability improvements in this release.&lt;/P&gt;
&lt;P&gt;I'll assume readers are familiar with SSIS data flow performance concepts, if not make sure you've read these two articles:&lt;BR&gt;&lt;A href="http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx" mce_href="http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx"&gt;http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx&lt;/A&gt;&lt;BR&gt;&lt;A href="http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/" mce_href="http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/"&gt;http://www.simple-talk.com/sql/sql-server-2005/sql-server-2005-ssis-tuning-the-dataflow-task/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;The most important concepts for today are:&lt;BR&gt;&lt;STRONG&gt;Asynchronous components&lt;/STRONG&gt; - components that create new rows and thus new data flow buffers (compare to synchronous components that modify data in the incoming buffers, but can't create or remove rows). Both blocking and partially blocking components (in terms of ssisperf whitepaper linked above) are asynchronous.&lt;BR&gt;&lt;STRONG&gt;Execution trees&lt;/STRONG&gt; - the tree describing the path a data buffer takes through the components, from start to end. The tree starts with a source or async component and usually ends with a destination or sometimes with a transform. It is a tree (not a simple list) because some transforms, e.g. Multicast "split" the data flow into multiple logical paths, but avoid buffer duplication - all sub-trees use the same "physical" buffer.&lt;/P&gt;
&lt;P&gt;In SQL 2005 each execution tree was assigned a single OS thread ("worker thread"), and under some conditions (complex packages with&amp;nbsp;small data flow EngineThreads property value) several execution trees could share a thread. One benefit of this approach is that all thread scheduling was done in advance during pre-execution phase, so the data flow did not have to spend any time to assign threads at runtime.&lt;/P&gt;
&lt;P&gt;But there were several drawbacks: since the data flow did not know relative amount of work per execution tree, the scheduling could be suboptimal. Also, if you had simple package with just one or two execution trees, you would only use one or two processors, and the package might not benefit from high-end multiprocessor machine. You may have a lot of synchronous components, but if they share the same execution tree, they used to share the thread too. Even if you logically split the data flow using Multicast (synchronous transform), all output paths of Multicast belong to the same execution tree, and thus are executed serially by Yukon data flow task.&lt;/P&gt;
&lt;P&gt;To achieve high level of parallelism on a multiprocessor machine, you had to split an execution tree, either by splitting the tree into independent paths, or by inserting async transforms to create a new tree. UnionAll could be used for later, as described in the ssisperf whitepaper. If you insert UnionAll with one input, it does not change the outputs, but splits the execution tree into two&amp;nbsp;new trees - each can be executed on its own processor. The drawback is that the UnionAll is async transform, and thus has to copy data, so it might have a noticeable performance overhead.&amp;nbsp;You should only use this trick if you checked that your package benefits from this extra parallelism. Usually packages already have multiple data flows, or multiple execution trees inside single data flow, so they don't need to do this. But if you have a very high-end machine and a pipeline that takes too long and does not use all the processors, you should try it (make a backup copy of original package and compare the performance before commiting to this change).&lt;/P&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now the good news: you don't need to worry about any of this or use the UnionAll trick anymore! In Katmai, the data flow was redesigned to do dynamic scheduling and can now execute multiple components in parallel, even if they belong to the same execution tree. The overhead of dynamic scheduling is very small. We sometimes saw a very small (~1%) performance loss on single processor machines, but on multiprocessor machines you'll usually see performance improvement, especially if you had to use UnionAll trick to introduce more parallelism and remove it when you move to Katmai.&lt;/P&gt;
&lt;P&gt;As most server machines now have two or more processors or at least Hyper Threading, we think most users will see the performance improvements from this change. And you get the best performance automatically, no need to think if you need to introduce more parallelism by adding UnionAll transform, one less bullet to worry about.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3223657" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/michen/archive/tags/SSIS/default.aspx">SSIS</category><category domain="http://blogs.msdn.com/michen/archive/tags/Perf/default.aspx">Perf</category><category domain="http://blogs.msdn.com/michen/archive/tags/Katmai/default.aspx">Katmai</category></item></channel></rss>