<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Craig Freedman's SQL Server Blog : I/O</title><link>http://blogs.msdn.com/craigfr/archive/tags/I_2F00_O/default.aspx</link><description>Tags: I/O</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>OPTIMIZED Nested Loops Joins</title><link>http://blogs.msdn.com/craigfr/archive/2009/03/18/optimized-nested-loops-joins.aspx</link><pubDate>Wed, 18 Mar 2009 17:34:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9486997</guid><dc:creator>craigfr</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/9486997.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=9486997</wfw:commentRss><description>In my past two posts, I explained how SQL Server may add a sort to the outer side of a nested loops join and showed how &lt;A title="Optimizing I/O Performance by Sorting - Part 2" href="http://blogs.msdn.com/craigfr/archive/2009/03/04/optimizing-i-o-performance-by-sorting-part-2.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2009/03/04/optimizing-i-o-performance-by-sorting-part-2.aspx"&gt;this sort can significantly improve performance&lt;/A&gt;.&amp;nbsp; In &lt;A title="Random Prefetching" href="http://blogs.msdn.com/craigfr/archive/2008/10/07/random-prefetching.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2008/10/07/random-prefetching.aspx"&gt;an earlier post&lt;/A&gt;, I discussed how SQL Server can use random prefetching to improve the performance of a nested loops join.&amp;nbsp; In this post, I'm going to explore one more nested loops join performance feature.&amp;nbsp; I'll use the same database that I used in &lt;A title="Optimizing I/O Performance by Sorting - Part 1" href="http://blogs.msdn.com/craigfr/archive/2009/02/25/optimizing-i-o-performance-by-sorting-part-1.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2009/02/25/optimizing-i-o-performance-by-sorting-part-1.aspx"&gt;my two prior posts&lt;/A&gt;.&amp;nbsp; Let's start with the following simple query: 
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T&lt;BR&gt;WHERE RandKey &amp;lt; 1000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1011]=COUNT_BIG([T].[Data]), [Expr1012]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1010]) &lt;B&gt;OPTIMIZED&lt;/B&gt; WITH UNORDERED PREFETCH)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (1000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;Notice that the nested loops join includes an extra keyword: OPTIMIZED.&amp;nbsp; This keyword indicates that the nested loops join may try to reorder the input rows to improve I/O performance.&amp;nbsp; This behavior is similar to the explicit sorts that we saw in my two previous posts, but unlike a full sort it is more of a best effort.&amp;nbsp; That is, the results from an optimized nested loops join may not be (and in fact are highly unlikely to be) fully sorted.&lt;/P&gt;
&lt;P&gt;SQL Server only uses an optimized nested loops join when the optimizer concludes based on its cardinality and cost estimates that a sort is most likely not required, but where there is still a possibility&amp;nbsp;&amp;nbsp; that a sort could be helpful in the event that the cardinality or cost estimates are incorrect.&amp;nbsp; In other words, an optimized nested loops join may be thought of as a "safety net" for those cases where SQL Server chooses a nested loops join but would have done better to have chosen an alternative plan such as a full scan or a nested loops join with an explicit sort.&amp;nbsp; For the above query which only joins a few rows, the optimization is unlikely to have any impact at all.&lt;/P&gt;
&lt;P&gt;Let's look at an example where the optimization actually helps:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T&lt;BR&gt;WHERE RandKey &amp;lt; 100000000 AND&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Flags &amp;amp; 0x1 = 0x1 AND&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Flags &amp;amp; 0x2 = 0x2 AND&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Flags &amp;amp; 0x4 = 0x4 AND&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Flags &amp;amp; 0x8 = 0x8&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1014]=(0) THEN NULL ELSE [Expr1015] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1014]=COUNT_BIG([T].[Data]), [Expr1015]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1013]) &lt;B&gt;OPTIMIZED&lt;/B&gt; WITH UNORDERED PREFETCH)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (100000000)),&amp;nbsp; WHERE:(([T].[Flags]&amp;amp;(1))=(1) AND ([T].[Flags]&amp;amp;(2))=(2) AND ([T].[Flags]&amp;amp;(4))=(4) AND ([T].[Flags]&amp;amp;(8))=(8)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;The Flags column contains the value 0xFF in every row.&amp;nbsp; Thus, every one of the bitwise AND predicates evaluates to true and this query returns about 2.5 million rows or 10% of the table.&amp;nbsp; Ordinarily, when faced with a query like this one, SQL Server would resort to a sequential scan of the entire table.&amp;nbsp; Indeed, if you try this query without the extra bitwise filters, you will get a sequential scan.&amp;nbsp; However, SQL Server does not realize that these predicates are always true, estimates a much lower cardinality of less than 10,000 rows, and chooses a simple nested loops join plan.&amp;nbsp; Note that I would generally recommend against using predicates like these ones in a real world application precisely because they will lead to cardinality estimation errors and poor plans.&lt;/P&gt;
&lt;P&gt;To see what effect the optimized nested loops join has, let's compare the above plan with an "un-optimized" nested loops join.&amp;nbsp; We can eliminate the optimization by using the following UPDATE STATISTICS statement to trick SQL Server into believing that the table is very small:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;UPDATE STATISTICS T WITH ROWCOUNT = 1, PAGECOUNT = 1&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I'll compare the above query with the following simpler query which uses essentially the same plan and touches the same data but has an "un-optimized" nested loops join:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T WITH (INDEX (IRandKey))&lt;BR&gt;WHERE RandKey &amp;lt; 100000000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1009]=COUNT_BIG([T].[Data]), [Expr1010]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (100000000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;We can reset the statistics using the following statement:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;UPDATE STATISTICS T WITH ROWCOUNT = 25600000, PAGECOUNT = 389323&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;As in my last post, I'm going to simulate a larger table by reducing the memory available to the server to 1 GByte with SP_CONFIGURE 'MAX SERVER MEMORY' and I'm also going to flush the buffer pool between runs with DBCC DROPCLEANBUFFERS.&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Note that you will NOT want to run these statements on a production server.&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;I ran both of the above queries with three different constants.&amp;nbsp; Here are my results.&amp;nbsp; Keep in mind that these results depend greatly on the specific hardware.&amp;nbsp; If you try this experiment, your results may vary.&lt;/P&gt;
&lt;TABLE border=1 cellSpacing=0 cellPadding=0&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD vAlign=top rowSpan=2 width=255 colSpan=2&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=255 colSpan=2&gt;
&lt;P align=center&gt;&lt;B&gt;Execution Time&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom rowSpan=2 width=128&gt;
&lt;P align=right&gt;&lt;B&gt;Increase&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P align=right&gt;&lt;B&gt;OPTIMIZED&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P align=right&gt;&lt;B&gt;"un-OPTIMIZED"&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD rowSpan=3 width=128&gt;
&lt;P&gt;&lt;B&gt;Constant&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;&lt;B&gt;10,000,000&lt;BR&gt;(1% of rows)&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;6.5 minutes&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;26 minutes&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;4x&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;&lt;B&gt;100,000,000&lt;BR&gt;(10% of rows)&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;10.4 minutes&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;4.3 hours&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;25x&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;&lt;B&gt;250,000,000&lt;BR&gt;(25% of rows)&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;11.3 minutes&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;10.6 hours&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;56x&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;Clearly the optimized nested loops join can have a huge impact on performance.&amp;nbsp; Moreover, as the plan touches more rows the benefit of the optimization grows dramatically.&amp;nbsp; Although a full scan or a nested loops join with an explicit sort would be faster, the optimized nested loops join really is a safety net protecting against a much worse alternative.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9486997" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Joins/default.aspx">Joins</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/I_2F00_O/default.aspx">I/O</category></item><item><title>Optimizing I/O Performance by Sorting – Part 2</title><link>http://blogs.msdn.com/craigfr/archive/2009/03/04/optimizing-i-o-performance-by-sorting-part-2.aspx</link><pubDate>Wed, 04 Mar 2009 20:27:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9458717</guid><dc:creator>craigfr</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/9458717.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=9458717</wfw:commentRss><description>&lt;P&gt;In my last post, I discussed how SQL Server can use sorts to transform random I/Os into sequential I/Os.&amp;nbsp; In this post, I'll demonstrate directly how such a sort can impact performance.&amp;nbsp; For the following experiments, I'll use the same 3 GByte database that I created &lt;A title="Optimizing I/O Performance by Sorting - Part 1" href="http://blogs.msdn.com/craigfr/archive/2009/02/25/optimizing-i-o-performance-by-sorting-part-1.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2009/02/25/optimizing-i-o-performance-by-sorting-part-1.aspx"&gt;last week&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;The system I'm using to run this test has 8 GBytes of memory.&amp;nbsp; To exaggerate the performance effects and simulate an even larger table that does not fit in main memory, I'm going to adjust the ‘MAX SERVER MEMORY' SP_CONFIGURE option to allow SQL Server to use just 1 GByte of memory.&amp;nbsp; I'm going to use CHECKPOINT to ensure that the newly created database is completely flushed to disk before running any experiments.&amp;nbsp; Finally, I'm going to run DBCC DROPCLEANBUFFERS before each test to ensure that none of the data is cached in the buffer pool between tests.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CHECKPOINT&lt;/P&gt;
&lt;P mce_keep="true"&gt;EXEC SP_CONFIGURE 'SHOW ADVANCED OPTIONS', '1'&lt;BR&gt;RECONFIGURE&lt;BR&gt;EXEC SP_CONFIGURE 'MAX SERVER MEMORY', '1024'&lt;BR&gt;RECONFIGURE&lt;/P&gt;
&lt;P mce_keep="true"&gt;DBCC DROPCLEANBUFFERS&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;B&gt;Note that you will NOT want to run these statements on a production server.&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;As I discussed last week, SQL Server can use one of three plans for the following query depending on the value of the constant:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T&lt;BR&gt;WHERE RandKey &amp;lt; &lt;I&gt;constant&lt;/I&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;To recap, if the constant is small, SQL Server uses a non-clustered index seek and a bookmark lookup.&amp;nbsp; If the constant is large, SQL Server uses a clustered index scan to avoid performing many random I/Os.&amp;nbsp; Finally,&amp;nbsp; if the constant is somewhere in the middle, SQL Server uses the non-clustered index seek but sorts the rows prior to performing the bookmark lookup to reduce the number of random I/Os.&amp;nbsp; You can review last week's post to see examples of each of these plans.&amp;nbsp; I'm going to focus on the third and final plan with the sort.&lt;/P&gt;
&lt;P&gt;To demonstrate the benefit of the sort, I need to be able to run the same query with and without the sort.&amp;nbsp; A simple way to make SQL Server remove the sort is to use the following UPDATE STATISTICS statement to trick SQL Server into believing that the table is really small.&amp;nbsp; To ensure that I still get the plan with the non-clustered index seek and the bookmark lookup, I need to add an INDEX hint.&amp;nbsp; I'm also adding a RECOMPILE query hint to ensure that SQL Server generates a new plan after I've altered the statistics.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;UPDATE STATISTICS T WITH ROWCOUNT = 1, PAGECOUNT = 1&lt;/P&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T WITH (INDEX (IRandKey))&lt;BR&gt;WHERE RandKey &amp;lt; &lt;I&gt;constant&lt;BR&gt;&lt;/I&gt;OPTION (RECOMPILE)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I can also reset the statistics using the following statement:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;UPDATE STATISTICS T WITH ROWCOUNT = 25600000, PAGECOUNT = 389323&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Here is an example of the default plan with the real statistics and with the sort:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1010]=COUNT_BIG([T].[Data]), [Expr1011]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;|--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1009]) WITH UNORDERED PREFETCH)&lt;BR&gt;&lt;B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[PK] ASC))&lt;BR&gt;&lt;/B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (2000000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P mce_keep="true"&gt;Here is an example of the plan after running UPDATE STATISTICS and without the sort:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1009]=COUNT_BIG([T].[Data]), [Expr1010]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (2000000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P mce_keep="true"&gt;Here are my results running this query with two values of the constant both with and without the sort.&amp;nbsp; Keep in mind that these results depend greatly on the specific hardware.&amp;nbsp; If you try this experiment, your results may vary.&lt;/P&gt;
&lt;TABLE border=1 cellSpacing=0 cellPadding=0&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD vAlign=top rowSpan=2 width=255 colSpan=2&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=255 colSpan=2&gt;
&lt;P align=center&gt;&lt;B&gt;Execution Time&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom rowSpan=2 width=128&gt;
&lt;P align=right&gt;&lt;B&gt;% Increase&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P align=right&gt;&lt;B&gt;with Sort&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P align=right&gt;&lt;B&gt;with&lt;I&gt;out&lt;/I&gt; Sort&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD rowSpan=2 width=128&gt;
&lt;P&gt;&lt;B&gt;Constant&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;&lt;B&gt;2,000,000&lt;BR&gt;(0.2% of rows)&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;91 seconds&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;352 seconds&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;286%&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;&lt;B&gt;4,000,000&lt;BR&gt;(0.4% of rows)&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;97 seconds&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;654 seconds&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=bottom width=128&gt;
&lt;P align=right&gt;574%&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;&lt;B&gt;% Increase&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P&gt;100%&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P align=right&gt;6%&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P align=right&gt;86%&lt;/P&gt;&lt;/TD&gt;
&lt;TD vAlign=top width=128&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;There are a two points worth noting regarding these results.&amp;nbsp; First, it should be very clear that the plan with the sort is significantly faster (up to 7 times faster) than the plan without the sort.&amp;nbsp; This result clearly shows the benefit of sequential vs. random I/Os.&amp;nbsp; Second, doubling the number of rows touched had hardly any effect on the execution time for the plan with the sort but nearly doubled the execution time for the plan without the sort.&amp;nbsp; Adding additional I/Os to the plan with the sort adds only a small incremental cost since the I/Os are sequential and the disk head will pass over the required data exactly once either way.&amp;nbsp; Adding additional I/Os to the plan without the sort adds additional disk seeks and increases the execution time proportionately to the increase in the number of rows.&amp;nbsp; In fact, if the constant is increased further, the execution time of the plan with the sort will continue to increase only gradually with the execution time of the plan without the sort will continue to increase rapidly.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9458717" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Joins/default.aspx">Joins</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/I_2F00_O/default.aspx">I/O</category></item><item><title>Optimizing I/O Performance by Sorting – Part 1</title><link>http://blogs.msdn.com/craigfr/archive/2009/02/25/optimizing-i-o-performance-by-sorting-part-1.aspx</link><pubDate>Wed, 25 Feb 2009 20:29:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9444352</guid><dc:creator>craigfr</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/9444352.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=9444352</wfw:commentRss><description>In &lt;A title="Sequential Read Ahead" href="http://blogs.msdn.com/craigfr/archive/2008/09/23/sequential-read-ahead.aspx"&gt;this post&lt;/A&gt; from last year, I discussed how random I/Os are slower than sequential I/Os (particularly for conventional rotating hard drives).&amp;nbsp; For this reason, SQL Server often favors query plans that perform sequential scans of an entire table over plans that perform random lookups of only a portion of a table.&amp;nbsp; (See the last example in &lt;A title="Index Examples and Tradeoffs" href="http://blogs.msdn.com/craigfr/archive/2006/07/13/664902.aspx"&gt;this post&lt;/A&gt; for a simple demonstration.)&amp;nbsp; In other cases, instead of performing a sequential scan, SQL Server introduces a sort operator whose sole purpose is to convert random I/Os into sequential I/Os. 
&lt;P&gt;Let's look at an example of such a sort.&amp;nbsp; To measure the performance effects, we'll need a reasonably large table.&amp;nbsp; The following script creates a 25.6 million row table that consumes about 3 GBytes of storage.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE DATABASE IOTest&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ON ( NAME = IOTest_Data, FILENAME = '...\IOTest_Data.mdf', SIZE = 4 GB )&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; LOG ON ( NAME = IOTest_Log, FILENAME = '...\IOTest_Log.ldf', SIZE = 200 MB )&lt;BR&gt;GO&lt;BR&gt;ALTER DATABASE IOTest SET RECOVERY SIMPLE&lt;BR&gt;GO&lt;BR&gt;USE IOTest&lt;BR&gt;GO&lt;BR&gt;CREATE TABLE T (&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; PK INT IDENTITY PRIMARY KEY,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RandKey INT,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Flags TINYINT,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Data INT,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; Pad CHAR(100))&lt;BR&gt;GO&lt;BR&gt;SET NOCOUNT ON&lt;BR&gt;DECLARE @I INT&lt;BR&gt;SET @I = 0&lt;BR&gt;WHILE @I &amp;lt; 100000&lt;BR&gt;&amp;nbsp; BEGIN&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; WITH&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X2 (R) AS ( SELECT RAND() UNION ALL SELECT RAND() ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X4 (R) AS ( SELECT R FROM X2 UNION ALL SELECT R FROM X2 ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X8 (R) AS ( SELECT R FROM X4 UNION ALL SELECT R FROM X4 ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X16 (R) AS ( SELECT R FROM X8 UNION ALL SELECT R FROM X8 ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X32 (R) AS ( SELECT R FROM X16 UNION ALL SELECT R FROM X16 ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X64 (R) AS ( SELECT R FROM X32 UNION ALL SELECT R FROM X32 ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;X128 (R) AS ( SELECT R FROM X64 UNION ALL SELECT R FROM X64 ),&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X256 (R) AS ( SELECT R FROM X128 UNION ALL SELECT R FROM X128 )&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; INSERT T (RandKey, Flags, Data, Pad)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT R * 1000000000, 0xFF, 1, '' FROM X256&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SET @I = @I + 1&lt;BR&gt;&amp;nbsp; END&lt;BR&gt;GO&lt;BR&gt;CREATE INDEX IRandKey on T (RandKey, Flags)&lt;BR&gt;GO&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Due to the fixed width Pad column, each row of T consumes 113 bytes (plus overhead).&amp;nbsp; Roughly 65 rows fit on a single 8 Kbyte page.&amp;nbsp; (The Flags column is unused in this example, but I will make use of it in a subsequent post.)&lt;/P&gt;
&lt;P&gt;The RandKey column, as the name suggests, contains random values.&amp;nbsp; Notice that we have a non-clustered index on this column.&amp;nbsp; Given a predicate on the RandKey column, SQL Server can use this index to fetch qualifying rows from the table.&amp;nbsp; However, because the values in this column are random, the selected rows will be scattered randomly throughout the clustered index.&lt;/P&gt;
&lt;P&gt;If we select just a few rows from the table using a filter on RandKey, SQL Server will use the non-clustered index:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T&lt;BR&gt;WHERE RandKey &amp;lt; 1000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1011]=(0) THEN NULL ELSE [Expr1012] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1011]=COUNT_BIG([T].[Data]), [Expr1012]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1010]) OPTIMIZED WITH UNORDERED PREFETCH)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (1000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;The non-clustered index seek selects a few rows (the use of random keys means that the exact number may vary each time the table is loaded) and &lt;A title="Bookmark Lookup" href="http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx"&gt;looks them up in the clustered index&lt;/A&gt; to get the value of the Data column for the SUM aggregate.&amp;nbsp; The non-clustered index seek is very efficient - it likely touches only one page - but the clustered index seek generates a random I/O for each row.&lt;/P&gt;
&lt;P&gt;If we select a large number of rows, SQL Server recognizes that the random I/Os are too expensive and switches to a clustered index scan:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T&lt;BR&gt;WHERE RandKey &amp;lt; 10000000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1009]=(0) THEN NULL ELSE [Expr1010] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1009]=COUNT_BIG([T].[Data]), [Expr1010]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[PK__T__...]), WHERE:([T].[RandKey]&amp;lt;(10000000)))&lt;/P&gt;
&lt;P&gt;This query touches only 1% of the data.&amp;nbsp; Still, the query is going to touch more than half of the pages in the clustered index so it is faster to scan the entire clustered index than to perform on the order of 256,000 random I/Os.&lt;/P&gt;
&lt;P&gt;Somewhere in between these two extremes things get a little more interesting:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT SUM(Data)&lt;BR&gt;FROM T&lt;BR&gt;WHERE RandKey &amp;lt; 2500000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1010]=COUNT_BIG([T].[Data]), [Expr1011]=SUM([T].[Data])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([T].[PK], [Expr1009]) WITH UNORDERED PREFETCH)&lt;BR&gt;&lt;B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[PK] ASC))&lt;BR&gt;&lt;/B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([T].[IRandKey]), SEEK:([T].[RandKey] &amp;lt; (2500000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([T].[PK__T__...]), SEEK:([T].[PK]=[T].[PK]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;This query touches a mere 0.25% of the data.&amp;nbsp; The plan uses the non-clustered index to avoid unnecessarily touching many rows.&amp;nbsp; Yet, performing 64,000 random I/Os is still rather expensive so SQL Server adds a sort.&amp;nbsp; By sorting the rows on the clustered index key, SQL Server transforms the random I/Os into sequential I/Os.&amp;nbsp; Thus, we get the efficiency of the seek - touching only those rows that qualify - with the performance of the sequential scan.&lt;/P&gt;
&lt;P&gt;It is worth pointing out that sorting on the clustered index key will yield rows that are in the logical index order.&amp;nbsp; Due to fragmentation or due simply to the multiple layers of abstraction between SQL Server and the actual hard drives, there is no guarantee that the physical order on disk matches the logical order.&lt;/P&gt;In my next post, I'll run some of these queries and demonstrate the performance implications of the sort.&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9444352" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Joins/default.aspx">Joins</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/I_2F00_O/default.aspx">I/O</category></item><item><title>Random Prefetching</title><link>http://blogs.msdn.com/craigfr/archive/2008/10/07/random-prefetching.aspx</link><pubDate>Wed, 08 Oct 2008 00:44:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8987905</guid><dc:creator>craigfr</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/8987905.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=8987905</wfw:commentRss><description>In &lt;A title="Sequential Read Ahead" href="http://blogs.msdn.com/craigfr/archive/2008/09/23/sequential-read-ahead.aspx"&gt;my last post&lt;/A&gt;, I explained the importance of asynchronous I/O &amp;nbsp;and described how SQL Server uses sequential read ahead to boost the performance of scans.&amp;nbsp; In this post, I'll discuss how SQL Server uses random prefetching.&amp;nbsp; Let's begin with a simple example of a query plan that performs many random I/Os.&amp;nbsp; As in my prior post, all of the examples in this post use a 1GB scale factor &lt;A href="http://www.tpc.org/tpch/default.asp"&gt;TPC-H&lt;/A&gt; database.&amp;nbsp; The following query returns the number of line items associated with each order placed on March 15, 1998: 
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT O_ORDERKEY, COUNT(*)&lt;BR&gt;FROM ORDERS JOIN LINEITEM ON O_ORDERKEY = L_ORDERKEY&lt;BR&gt;WHERE O_ORDERDATE = '1998-03-15'&lt;BR&gt;GROUP BY O_ORDERKEY&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1008]=CONVERT_IMPLICIT(int,[Expr1012],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([ORDERS].[O_ORDERKEY]) DEFINE:([Expr1012]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([ORDERS].[O_ORDERKEY], [Expr1011]) OPTIMIZED &lt;B&gt;WITH UNORDERED PREFETCH&lt;/B&gt;)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([ORDERS].[O_ORDERDATE_CLUIDX]), SEEK:([ORDERS].[O_ORDERDATE]='1998-03-15') ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([LINEITEM].[L_ORDERKEY_IDX]), SEEK:([LINEITEM].[L_ORDERKEY]=[ORDERS].[O_ORDERKEY]) ORDERED FORWARD)&lt;/P&gt;
&lt;P mce_keep="true"&gt;This query plan uses an index &lt;A title="Nested Loops Join" href="http://blogs.msdn.com/craigfr/archive/2006/07/26/679319.aspx"&gt;nested loops join&lt;/A&gt;.&amp;nbsp; The clustered index seek on the ORDERS table returns the 661 orders that were placed on March 15, 1998.&amp;nbsp; For each of these 661 orders, SQL Server performs an index seek on the LINEITEM table to lookup the records associated with this order.&amp;nbsp; Each of these index seeks potentially represents a series of random I/Os to navigate from the root of the B-tree index to the leaf page(s) where the records for that order are stored.&amp;nbsp; To minimize the cost of these I/Os, SQL Server enhances the nested loops join with prefetching.&amp;nbsp; (Notice the WITH UNORDERED PREFETCH keywords associated with the nested loops join.)&amp;nbsp; The prefetching mechanism peers ahead in the clustered index seek on the ORDERS table and issues asynchronous I/Os for the pages that will ultimately be needed by the index seek on the LINEITEM table.&amp;nbsp; As in the sequential read ahead scenario, we can see the prefetching in action by checking the output of SET STATISTICS IO ON.&amp;nbsp; Look at the read-ahead reads for the LINEITEM table:&lt;/P&gt;
&lt;P&gt;Table 'LINEITEM'. Scan count 661, logical reads 5165, physical reads 2, &lt;B&gt;read-ahead reads 5000&lt;/B&gt;, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.&lt;BR&gt;Table 'ORDERS'. Scan count 1, logical reads 15, physical reads 2, read-ahead reads 19, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.&lt;/P&gt;
&lt;P&gt;You may have noticed that the prefetch in this example was UNORDERED.&amp;nbsp; Indeed, there are two types of prefetch:&amp;nbsp; UNORDERED and ORDERED.&amp;nbsp; Although a nested loops join ordinarily preserves the order of the rows from its outer input (in this case the ORDERS table), a nested loops join WITH UNORDERED PREFETCH does not preserve order.&amp;nbsp; Instead, the rows are returned in the order that the asynchronous I/Os complete.&amp;nbsp; However, if the order of the rows is important, SQL Server can use a nested loops join WITH ORDERED PREFETCH.&amp;nbsp; For example, observe what happens to the plan if we add an ORDER BY clause to the above query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT O_ORDERKEY, COUNT(*)&lt;BR&gt;FROM ORDERS JOIN LINEITEM ON O_ORDERKEY = L_ORDERKEY&lt;BR&gt;WHERE O_ORDERDATE = '1998-03-15'&lt;BR&gt;GROUP BY O_ORDERKEY&lt;BR&gt;&lt;B&gt;ORDER&lt;/B&gt;&lt;B&gt; BY O_ORDERKEY&lt;/B&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1008]=CONVERT_IMPLICIT(int,[Expr1012],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([ORDERS].[O_ORDERKEY]) DEFINE:([Expr1012]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;|--Nested Loops(Inner Join, OUTER REFERENCES:([ORDERS].[O_ORDERKEY], [Expr1011]) &lt;B&gt;WITH ORDERED PREFETCH&lt;/B&gt;)&lt;BR&gt;&lt;B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([ORDERS].[O_ORDERKEY] ASC))&lt;BR&gt;&lt;/B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([ORDERS].[O_ORDERDATE_CLUIDX]), SEEK:([ORDERS].[O_ORDERDATE]='1998-03-15 00:00:00.000') ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([LINEITEM].[L_ORDERKEY_IDX]), SEEK:([LINEITEM].[L_ORDERKEY]=[ORDERS].[O_ORDERKEY]) ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;Notice that SQL Server chooses to push the sort below the nested loops join.&amp;nbsp; For this sort to satisfy the ORDER BY clause, the nested loops join must preserve the order of the rows that it returns.&amp;nbsp; Thus, this time SQL Server uses a nested loops join WITH ORDERED PREFETCH.&lt;/P&gt;
&lt;P&gt;SQL Server can also use random prefetching to speed up &lt;A title="Bookmark Lookup" href="http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx"&gt;bookmark lookups&lt;/A&gt; and certain update and delete statements.&amp;nbsp; For instance, consider the following two statements:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT *&lt;BR&gt;FROM LINEITEM&lt;BR&gt;WHERE L_ORDERKEY BETWEEN 5000000 AND 5001000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([Uniq1002], [LINEITEM].[L_SHIPDATE], [Expr1004]) OPTIMIZED &lt;B&gt;WITH UNORDERED PREFETCH&lt;/B&gt;)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([LINEITEM].[L_ORDERKEY_IDX]), SEEK:([LINEITEM].[L_ORDERKEY] &amp;gt;= (5000000) AND [LINEITEM].[L_ORDERKEY] &amp;lt;= (5001000)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([LINEITEM].[L_SHIPDATE_CLUIDX]), SEEK:([LINEITEM].[L_SHIPDATE]=[LINEITEM].[L_SHIPDATE] AND [Uniq1002]=[Uniq1002]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;UPDATE LINEITEM&lt;BR&gt;SET L_DISCOUNT = 0.1&lt;BR&gt;WHERE L_ORDERKEY BETWEEN 5000000 AND 5001000&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Clustered Index Update(OBJECT:([LINEITEM].[L_SHIPDATE_CLUIDX]), SET:([LINEITEM].[L_DISCOUNT] = RaiseIfNull([ConstExpr1010])) &lt;B&gt;WITH UNORDERED PREFETCH&lt;/B&gt;)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([ConstExpr1010]=CONVERT_IMPLICIT(money,[@1],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Top(ROWCOUNT est 0)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([LINEITEM].[L_ORDERKEY_IDX]), SEEK:([LINEITEM].[L_ORDERKEY] &amp;gt;= [@2] AND [LINEITEM].[L_ORDERKEY] &amp;lt;= [@3]) ORDERED FORWARD)&lt;/P&gt;
&lt;P&gt;Both plans use a non-clustered index on the LINEITEM table to identify rows that match the L_ORDERKEY predicate.&amp;nbsp; In the case of the SELECT statement, SQL Server performs a bookmark lookup - recall that a bookmark lookup is just a special case of a nested loops join - to fetch the columns of the LINEITEM table that are not covered by the non-clustered index. &amp;nbsp;In the case of the UPDATE statement, SQL Server needs to locate the correct page and row in the clustered index and update the L_DISCOUNT column.&amp;nbsp; The resulting I/O sequence is the same as the bookmark lookup.&amp;nbsp; In both cases, to minimize the cost of the I/Os, SQL Server adds prefetching to the plan.&amp;nbsp; Just as in the original example above, the prefetch mechanism peers ahead in the non-clustered index seek on the LINEITEM table and issues asynchronous I/Os for the pages of the clustered index that will be needed.&lt;/P&gt;For systems with many hard drives, random prefetching can dramatically improve performance.&amp;nbsp; However, prefetching can adversely affect concurrency as I explained in &lt;A title="Read Committed and Bookmark Lookup" href="http://blogs.msdn.com/craigfr/archive/2007/06/07/read-committed-and-bookmark-lookup.aspx"&gt;this post&lt;/A&gt;.&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8987905" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Joins/default.aspx">Joins</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Updates/default.aspx">Updates</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/I_2F00_O/default.aspx">I/O</category></item><item><title>Sequential Read Ahead</title><link>http://blogs.msdn.com/craigfr/archive/2008/09/23/sequential-read-ahead.aspx</link><pubDate>Tue, 23 Sep 2008 23:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8962716</guid><dc:creator>craigfr</dc:creator><slash:comments>9</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/8962716.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=8962716</wfw:commentRss><description>&lt;P&gt;Balancing CPU and I/O throughput is essential to achieve good overall performance and to maximize hardware utilization.&amp;nbsp; SQL Server includes two asynchronous I/O mechanisms - sequential read ahead and random prefetching - that are designed to address this challenge.&lt;/P&gt;
&lt;P&gt;To understand why asynchronous I/O is so important, consider the CPU to I/O performance gap.&amp;nbsp; The memory subsystem on a modern CPU can deliver data sequentially at roughly 5 Gbytes per second per socket (or for non-NUMA machines for all sockets sharing the same bus) and (depending on how you measure it) can fetch random memory locations at roughly 10 to 50 million accesses per second.&amp;nbsp; By comparison, a high end 15K SAS hard drive can read only 125 Mbytes per second sequentially and can perform only 200 random I/Os per second (IOPS).&amp;nbsp; Solid State Disks (SSDS) can reduce the gap between sequential and random I/O performance by eliminating the moving parts from the equation, but a performance gap remains.&amp;nbsp; In an effort to close this performance gap, it is not uncommon for servers to have a ratio of 10 or more drives for every CPU.&amp;nbsp; (It is also important to consider and balance the entire I/O subsystem including the number and type of disk controllers not just the drives themselves but that is not the focus of this post.)&lt;/P&gt;
&lt;P&gt;Unfortunately, a single CPU issuing only synchronous I/Os can keep only one spindle active at a time.&amp;nbsp; For a single CPU to exploit the available bandwidth and IOPs of multiple spindles effectively the server must issue multiple I/Os asynchronously.&amp;nbsp; Thus, SQL Server includes the aforementioned read ahead and prefetching mechanisms.&amp;nbsp; In this post, I'll take a look at sequential read ahead.&lt;/P&gt;
&lt;P&gt;When SQL Server performs a sequential &lt;A title="Scans vs. Seeks" href="http://blogs.msdn.com/craigfr/archive/2006/06/26/647852.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/06/26/647852.aspx"&gt;scan&lt;/A&gt; of a large table, the storage engine initiates the read ahead mechanism to ensure that pages are in memory and ready to scan before they are needed by the query processor.&amp;nbsp; The read ahead mechanism tries to stay 500 pages ahead of the scan.&amp;nbsp; We can see the read ahead mechanism in action by checking the output of SET STATISTICS IO ON.&amp;nbsp; For example, I ran the following query on a 1GB scale factor &lt;A href="http://www.tpc.org/tpch/default.asp" mce_href="http://www.tpc.org/tpch/default.asp"&gt;TPC-H&lt;/A&gt; database.&amp;nbsp; The LINEITEM table has roughly 6 million rows.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SET STATISTICS IO ON&lt;/P&gt;
&lt;P&gt;SELECT COUNT(*) FROM LINEITEM&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Table 'LINEITEM'. Scan count 3, logical reads 22328, physical reads 3, &lt;B&gt;read-ahead reads 20331&lt;/B&gt;, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Repeating the query a second time shows that the table is now cached in the buffer pool:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT COUNT(*) FROM LINEITEM&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Table 'LINEITEM'. Scan count 3, logical reads 22328, physical reads 0, &lt;B&gt;read-ahead reads 0&lt;/B&gt;, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;For sequential I/O performance, it is important to distinguish between allocation ordered and index ordered scans.&amp;nbsp; An allocation ordered scan tries to read pages in the order in which they are physically stored on disk while an index ordered scan reads pages according to the order in which the data on those index pages is sorted.&amp;nbsp; (Note that in many cases there are multiple levels of indirection such as RAID devices or SANS between the logical volumes that SQL Server sees and the physical disks.&amp;nbsp; Thus, even an allocation ordered scan may in fact not be truly optimally ordered.)&amp;nbsp; Although SQL Server tries to sort and read pages in allocation order even for an index ordered scan, an allocation ordered scan is generally going to be faster since pages are read in the order that they are written on disk with the minimal number of seeks.&amp;nbsp; Heaps have no inherent order and, thus, are always scanned in allocation order.&amp;nbsp; Indexes are scanned in allocation order only if the isolation level is read uncommitted (or the NOLOCK hint is used) and only if the query process does not request an ordered scan.&amp;nbsp; Defragmenting indexes can help to ensure that index ordered scans perform on par with allocation ordered scans.&lt;/P&gt;In my next post, I'll take a look at random prefetching.&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8962716" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Scans+and+Seeks/default.aspx">Scans and Seeks</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/I_2F00_O/default.aspx">I/O</category></item></channel></rss>