<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Craig Freedman's SQL Server Blog : Isolation Levels</title><link>http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx</link><description>Tags: Isolation Levels</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Query Failure with Read Uncommitted</title><link>http://blogs.msdn.com/craigfr/archive/2007/06/12/query-failure-with-read-uncommitted.aspx</link><pubDate>Tue, 12 Jun 2007 22:54:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3256115</guid><dc:creator>craigfr</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/3256115.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=3256115</wfw:commentRss><description>&lt;P&gt;Over the past month or so, I've looked at pretty much every isolation level except for read uncommitted or nolock.&amp;nbsp; Today I'm going to wrap up this series of posts with a discussion of read uncommitted.&amp;nbsp; Plenty has already been written about the dangers of nolock.&amp;nbsp; For example, see these excellent posts by &lt;A title="Previously committed rows might be missed if NOLOCK hint is used" href="http://blogs.msdn.com/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx"&gt;Lubor Kollar of the SQL Server Development Customer Advisory Team&lt;/A&gt; and by &lt;A title="Timebomb - The Consistency problem with NOLOCK / READ UNCOMMITTED" href="http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx"&gt;Tony Rogerson&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;I'd like to demonstrate just one additional hazard of nolock.&amp;nbsp; Begin by creating two tables as follows:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t1 (k int, data int)&lt;BR&gt;insert t1 values (0, 0)&lt;BR&gt;insert t1 values (1, 1)&lt;/P&gt;
&lt;P&gt;create table t2 (pk int primary key)&lt;BR&gt;insert t2 values (0)&lt;BR&gt;insert t2 values (1)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Next, in session 1 lock the first row of t2 using the following update:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t2 set pk = pk where pk = 0&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, in session 2 run the following query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select *&lt;BR&gt;from t1 with (nolock)&lt;BR&gt;where exists (select * from t2 where t1.k = t2.pk)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;This query uses the following plan:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Nested Loops(Left Semi Join, WHERE:([t1].[k]=[t2].[pk]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t1]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t2].[PK__t2__71D1E811]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;The table scan fetches the first row of t1 without acquiring any locks and then tries to join this row with t2.&amp;nbsp; Since we've locked the first row of t2 and since the clustered index scan of t2 runs at the default read committed isolation level, the query blocks.&lt;/P&gt;
&lt;P&gt;Finally, in session 1 delete the first row of t1 and commit the transaction:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;delete t1 where k = 0&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The query in session 2 is now free to continue.&amp;nbsp; However, we deleted the row that it is trying to join while it was blocked.&amp;nbsp; The query tries to retrieve more data from the deleted row and fails with the following error:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Msg 601, Level 12, State 3, Line 1&lt;BR&gt;Could not continue scan with NOLOCK due to data movement.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;As you can see, not only can a read uncommitted or nolock scan cause unexpected results, it can even cause a query to fail entirely!&lt;/P&gt;
&lt;P&gt;SQL Server 2000 can also generate this error if a query plan includes a bookmark lookup and if a row is deleted after it is returned by a non-clustered index seek but before the base table row is fetched by the bookmark lookup.&amp;nbsp; SQL Server 2005 does not generate an error in this case.&amp;nbsp; Recall that &lt;A title="Bookmark Lookup" href="http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx"&gt;in SQL Server 2005 a bookmark lookup is just a join&lt;/A&gt;.&amp;nbsp; Thus, if the bookmark lookup cannot find a matching base table row, it simply discards it just like any other join.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3256115" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item><item><title>Read Committed and Bookmark Lookup</title><link>http://blogs.msdn.com/craigfr/archive/2007/06/07/read-committed-and-bookmark-lookup.aspx</link><pubDate>Fri, 08 Jun 2007 00:27:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3149120</guid><dc:creator>craigfr</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/3149120.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=3149120</wfw:commentRss><description>In my last two posts, I discussed two scenarios - one involving &lt;A class="" title="Read Committed and Updates" href="http://blogs.msdn.com/craigfr/archive/2007/05/22/read-committed-and-updates.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/05/22/read-committed-and-updates.aspx"&gt;updates&lt;/A&gt; and another involving &lt;A class="" title="Read Committed and Large Objects" href="http://blogs.msdn.com/craigfr/archive/2007/05/31/read-committed-and-large-objects.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/05/31/read-committed-and-large-objects.aspx"&gt;large objects&lt;/A&gt; - where SQL Server extends the duration of read committed locks until the end of a statement instead of releasing the locks as soon as each row is released.&amp;nbsp;&amp;nbsp; In this post - which I promise will be the last in this series on read committed locks - I will discuss a final scenario involving &lt;A class="" title="Bookmark Lookup" href="http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/06/30/652639.aspx"&gt;bookmark lookups&lt;/A&gt; where SQL Server holds read committed locks longer than expected. 
&lt;P&gt;As you might expect, if you've read my prior two posts, this final scenario occurs when there is a blocking operator between the non-clustered index seek and the bookmark lookup operation.&amp;nbsp;&amp;nbsp; There are basically three cases:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;A sort on the outer side of the nested loops join.&lt;/LI&gt;
&lt;LI&gt;An "OPTIMIZED" nested loops join.&lt;/LI&gt;
&lt;LI&gt;A nested loops join "WITH PREFETCH."&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;The following example demonstrates a bookmark lookup with a nested loops join with a prefetch:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t (a int, b int, c char(1000))&lt;BR&gt;create unique clustered index ta on t(a)&lt;/P&gt;
&lt;P&gt;set nocount on&lt;BR&gt;declare @i int&lt;BR&gt;set @i = 0&lt;BR&gt;while @i &amp;lt; 1000&lt;BR&gt;&amp;nbsp; begin&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; insert t values (@i, @i, @i)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; set @i = @i + 1&lt;BR&gt;&amp;nbsp; end&lt;BR&gt;set nocount off&lt;/P&gt;
&lt;P&gt;create index tb on t(b)&lt;/P&gt;
&lt;P&gt;select c from t where b &amp;lt; 25&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Here is the query plan for the select:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([t].[a], [Expr1004]) WITH UNORDERED PREFETCH)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Seek(OBJECT:([t].[tb]), SEEK:([t].[b] &amp;lt; (25)) ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([t].[ta]), SEEK:([t].[a]=[t].[a]) LOOKUP ORDERED FORWARD)&lt;/P&gt;
&lt;P mce_keep="true"&gt;Note the "WITH UNORDERED PREFETCH" keywords on the nested loops join.&lt;/P&gt;
&lt;P&gt;I am not going to demontrate it , but when SQL Server executes this query, it holds S locks on the rows returned by the index seek until the query finishes executing. &amp;nbsp;If you want to view these locks, you can set up an experiment similar to the ones from my prior two posts.&lt;/P&gt;
&lt;P&gt;Why does SQL Server acquire these extra locks?&amp;nbsp; Suppose that SQL Server did &lt;I&gt;not &lt;/I&gt;hold the S locks on the index seek until the end of the statement but rather released them immediately.&amp;nbsp; The prefetch results in a delay between when the index seek returns rows and when the join processes the rows and the executes the clustered index seek.&amp;nbsp; During this delay, there would be no locks held on rows returned by the seek but not yet processed by the join.&amp;nbsp; Another session could slip in, modify the row, and cause an inconsistency similar to the one that I demonstrated with index intersections in &lt;A class="" title="Query Plans and Read Committed Isolation Level" href="http://blogs.msdn.com/craigfr/archive/2007/05/02/query-plans-and-read-committed-isolation-level.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/05/02/query-plans-and-read-committed-isolation-level.aspx"&gt;this post&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Before I wrap up with this topic, I want to emphasize a couple more points.&amp;nbsp; First, updates can also use prefetching.&amp;nbsp; So, if you have a query plan with an update that includes a prefetch (again, look for the "WITH PREFETCH" keywords), SQL Server holds locks on the source of the rows to be updated just as it does when the plan includes any other blocking operator such as a sort or spool.&amp;nbsp; Second, to avoid corrupting data, updates always acquire locks even when run&amp;nbsp;at read uncommitted isolation level and, if necessary due to blocking operators in the plan, hold these locks until the end of the statement.&amp;nbsp; Queries that access large objects and queries with bookmark lookups do not acquire or hold locks when run at read uncommitted isolation level.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3149120" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item><item><title>Read Committed and Large Objects</title><link>http://blogs.msdn.com/craigfr/archive/2007/05/31/read-committed-and-large-objects.aspx</link><pubDate>Thu, 31 May 2007 19:00:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3009750</guid><dc:creator>craigfr</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/3009750.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=3009750</wfw:commentRss><description>&lt;P&gt;In &lt;A class="" title="Read Committed and Updates" href="http://blogs.msdn.com/craigfr/archive/2007/05/22/read-committed-and-updates.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/05/22/read-committed-and-updates.aspx"&gt;my last post&lt;/A&gt;, I explained that SQL Server holds read committed locks until the end of an update statement (instead of releasing the locks as soon as each row is released) if there is a blocking operator between the scan or seek of the rows to be updated and the update itself.&amp;nbsp; In this post, I'll take a look at a similar result involving large objects.&lt;/P&gt;
&lt;P&gt;Normally, when SQL Server moves data through a blocking operator such as a sort, it makes a copy of the data.&amp;nbsp; Once SQL Server makes a copy, there is no need to preserve the original row or source of the data.&amp;nbsp; However, since large objects (e.g., varchar(max)) can store up to 2 Gbytes, it is generally not practical to make copies of large objects.&amp;nbsp; Instead, whenever possible, SQL Server uses "pointers" to the data instead of making copies of the data.&amp;nbsp; To ensure that the pointers remain valid, SQL Server does not release any locks on the rows that contain the large objects until the statement completes.&lt;/P&gt;
&lt;P&gt;Let's observe this behavior.&amp;nbsp; Begin by creating the following table:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t (pk int primary key, i int, lob varchar(max))&lt;BR&gt;insert t values (1, 1, 'abc')&lt;BR&gt;insert t values (2, 2, 'def')&lt;BR&gt;insert t values (3, 3, 'ghi')&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;In session 1 lock the third row:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t set i = i where pk = 3&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, in session 2 check the spid (we'll use it later to look at the locks) and run this query which scans the table and reads each large object (using the default read committed isolation level):&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select @@spid&lt;/P&gt;
&lt;P mce_keep="true"&gt;select lob from t&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;This query uses a trivial plan which consists of a clustered index scan:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[PK__t__2D27B809]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;Because session 1 has a lock on the third row of the table, the scan blocks.&amp;nbsp; While it is blocked, we can check which locks it holds by running the following query in session 1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select resource_type, request_mode, request_type, request_status&lt;BR&gt;from sys.dm_tran_locks&lt;BR&gt;where request_session_id = &lt;I&gt;&amp;lt;session_2_spid&amp;gt;&lt;/I&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;resource_type  request_mode  request_type  request_status
-------------  ------------  ------------  ---------------
DATABASE       S             LOCK          GRANT
PAGE           IS            LOCK          GRANT
KEY            S             LOCK          WAIT
OBJECT         IS            LOCK          GRANT&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;We see the the scan is not holding any key locks and is waiting for the one key lock held by session 1.&lt;/P&gt;
&lt;P&gt;Next, kill the scan in session 2 and try the following query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select lob from t order by i&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The plan for this query includes a sort:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Sort(ORDER BY:([t].[i] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[PK__t__2D27B809]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;As I explained above, when we check the locks held by this query, we find that due to the large object and the blocking sort, this query holds key locks on each row it touches:&lt;/P&gt;&lt;PRE&gt;resource_type  request_mode  request_type  request_status
-------------  ------------  ------------  ---------------
DATABASE       S             LOCK          GRANT
KEY            S             LOCK          GRANT
PAGE           IS            LOCK          GRANT
KEY            S             LOCK          WAIT
OBJECT         IS            LOCK          GRANT
KEY            S             LOCK          GRANT&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;If you have any doubt whether the extra key locks are due to the large object, repeat this experiment with the following nearly identical query which does not return the large object:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select i from t order by i&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Finally, try one more query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select i from t where lob &amp;gt; 'a'&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The plan for this query uses an explicit filter operator to evaluate the predicate on the large object:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Filter(WHERE:([t].[lob]&amp;gt;[@1]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[PK__t__2D27B809]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;The filter is not a blocking operator and the query does return the large object.&amp;nbsp; Nevertheless, if you run this query, you will observe that SQL Server once again holds locks on each row touched by this query.&amp;nbsp; In this example, SQL Server is overly conservative and retains the locks even though they are technically unnecessary.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3009750" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item><item><title>Read Committed and Updates</title><link>http://blogs.msdn.com/craigfr/archive/2007/05/22/read-committed-and-updates.aspx</link><pubDate>Wed, 23 May 2007 00:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2800332</guid><dc:creator>craigfr</dc:creator><slash:comments>7</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/2800332.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=2800332</wfw:commentRss><description>&lt;P&gt;Let's try an experiment.&amp;nbsp; Begin by creating the following simple schema:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t1 (a int, b int)&lt;BR&gt;create clustered index t1a on t1(a)&lt;BR&gt;insert t1 values (1, 1)&lt;BR&gt;insert t1 values (2, 2)&lt;BR&gt;insert t1 values (3, 3)&lt;/P&gt;
&lt;P mce_keep="true"&gt;create table t2 (a int)&lt;BR&gt;insert t2 values (9)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;In session 1, lock the third row of table t1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t1 set b = b where a = 3&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, in session 2 check the spid (you'll need it later) and run the following update at the default read committed isolation level:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select @@spid&lt;/P&gt;
&lt;P mce_keep="true"&gt;update t1 set t1.b = t1.b&lt;BR&gt;where exists (select * from t2 where t2.a = t1.b)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;This update uses the following plan:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Clustered Index Update(OBJECT:([t1].[t1a]), SET:([t1].[b] = [t1].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Top(ROWCOUNT est 0)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Left Semi Join, WHERE:([t2].[a]=[t1].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t1].[t1a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t2]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;This plan scans table t1 and looks each row up in table t2 to see whether the row must be updated.&amp;nbsp; The scan acquires U locks on each row of t1.&amp;nbsp; If the row is updated, the update upgrades the lock to an X lock.&amp;nbsp; If the row is not updated, the scan releases the row and the lock since we are running in read committed isolation.&lt;/P&gt;
&lt;P&gt;Since session 1 is holding a lock on the third row of table t1, the udpate blocks when the scan of t1 reaches the third row.&amp;nbsp; At this point, we can check what locks session 2 is holding by running the following query in session 1 (or any other session):&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select resource_type, request_mode, request_type, request_status&lt;BR&gt;from sys.dm_tran_locks&lt;BR&gt;where request_session_id = &lt;I&gt;&amp;lt;session_2_spid&amp;gt;&lt;/I&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;resource_type  request_mode  request_type  request_status
-------------  ------------  ------------  --------------
DATABASE       S             LOCK          GRANT
OBJECT         IS            LOCK          GRANT
KEY            U             LOCK          WAIT
PAGE           IU            LOCK          GRANT
OBJECT         IX            LOCK          GRANT&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;As expected, we see only one outstanding U lock request.&lt;/P&gt;
&lt;P&gt;Next, return to session 2, abort the blocked update, and run the following statement:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;update t1 set t1.a = t1.a&lt;BR&gt;where exists (select * from t2 where t2.a = t1.b)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Notice that this time we are updating the clustering key of the index.&amp;nbsp; Updates to the clustering key can cause rows to move within the index.&amp;nbsp; To ensure that a row is not updated, encountered again by the same scan, and updated a second time (which would be incorrect), SQL Server must add a blocking operator between the scan and update of table t1.&amp;nbsp; This requirement is known as "Halloween protection."&amp;nbsp; Indeed, the new plan includes a sort:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Clustered Index Update(OBJECT:([t1].[t1a]), SET:([t1].[a] = [t1].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Top(ROWCOUNT est 0)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(DISTINCT ORDER BY:([t1].[a] ASC, [Uniq1002] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, WHERE:([t2].[a]=[t1].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t1].[t1a]), ORDERED FORWARD)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t2]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;Once again this update blocks.&amp;nbsp; Let's check the which locks it is holding by running the above query on the sys.dm_tran_locks DMV:&lt;/P&gt;&lt;PRE&gt;resource_type  request_mode  request_type  request_status
-------------  ------------  ------------  --------------
DATABASE       S             LOCK          GRANT
OBJECT         IS            LOCK          GRANT
KEY            U             LOCK          WAIT
KEY            U             LOCK          GRANT
KEY            U             LOCK          GRANT
PAGE           IU            LOCK          GRANT
OBJECT         IX            LOCK          GRANT&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;This time we see that there are two granted U locks.&amp;nbsp; What's going on?&amp;nbsp; Shouldn't these locks have been released since we are running a read committed scan?&amp;nbsp; Not so fast!&amp;nbsp; With the blocking sort operator in the plan, no rows are updated until the scan completes.&amp;nbsp; If SQL Server simply released each U lock when the scan of t1 released each row, none of the rows would be locked when the update started.&amp;nbsp; Without any locks, another session could slip in and modify the rows that we'd already scanned and which we were planning to update.&amp;nbsp; Allowing another session to modify these rows could lead to incorrect results and data corruption.&amp;nbsp; Thus, SQL Server retains these locks until the statement (not the transaction) finishes executing.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2800332" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Updates/default.aspx">Updates</category></item><item><title>Serializable vs. Snapshot Isolation Level</title><link>http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx</link><pubDate>Wed, 16 May 2007 21:45:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2678249</guid><dc:creator>craigfr</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/2678249.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=2678249</wfw:commentRss><description>&lt;P&gt;Both the serializable and snapshot isolation levels provide a read consistent view of the database to all transactions.&amp;nbsp; In either of these isolation levels, a transaction can only read data that has been committed.&amp;nbsp; Moreover, a transaction can read the same data multiple times without ever &lt;B&gt;observing&lt;/B&gt; any concurrent transactions making changes to this data.&amp;nbsp; The unexpected read committed and repeatable read results that I demonstrated in my prior few posts are not possible in serializable or snapshot isolation level.&lt;/P&gt;
&lt;P&gt;Notice that I used the phrase "without ever &lt;B&gt;observing&lt;/B&gt;&lt;I&gt; &lt;/I&gt;any ... changes."&amp;nbsp; This choice of words is deliberate.&amp;nbsp; In serializable isolation level, SQL Server acquires key range locks and holds them until the end of the transaction.&amp;nbsp; A key range lock ensures that, once a transaction reads data, no other transaction can alter that data - not even to insert phantom rows - until the transaction holding the lock completes.&amp;nbsp; In snapshot isolation level, SQL Server does not acquire any locks.&amp;nbsp; Thus, it is possible for a concurrent transaction to modify data that a second transaction has already read.&amp;nbsp; The second transaction simply does not &lt;B&gt;observe&lt;/B&gt; the changes and continues to read an old copy of the data.&lt;/P&gt;
&lt;P&gt;Serializable isolation level relies on pessimistic concurrency control.&amp;nbsp; It guarantees consistency by assuming that two transactions might try to update the same data and uses locks to ensure that they do not but at a cost of reduced concurrency - one transaction must wait for the other to complete and two transactions can deadlock.&amp;nbsp; Snapshot isolation level relies on optimistic concurrency control.&amp;nbsp; It allows transactions to proceed without locks and with maximum concurrency, but may need to fail and rollback a transaction if two transactions attempt to modify the same data at the same time.&lt;/P&gt;
&lt;P&gt;It is clear there are differences in the level of concurrency that can be achieved and in the failures (deadlocks vs. update conflicts) that are possible with the serializable and snapshot isolation levels.&lt;/P&gt;
&lt;P&gt;How about transaction isolation?&amp;nbsp; How do serializable and snapshot differ in terms of the transaction isolation that they confer?&amp;nbsp; It is simple to understand serializable.&amp;nbsp; For the outcome of two transactions to be considered serializable, it must be possible to achieve this outcome by running &lt;B&gt;one transaction at a time&lt;/B&gt; in some order.&lt;/P&gt;
&lt;P&gt;Snapshot does &lt;B&gt;not&lt;/B&gt; guarantee this level of isolation.&amp;nbsp; A few years ago, &lt;A href="http://research.microsoft.com/~Gray/" mce_href="http://research.microsoft.com/~Gray/"&gt;Jim Gray&lt;/A&gt; shared with me the following excellent example of the difference.&amp;nbsp; Imagine that we have a bag containing a mixture of white and black marbles.&amp;nbsp; Suppose that we want to run two transactions.&amp;nbsp; One transaction turns each of the white marbles into black marbles.&amp;nbsp; The second transaction turns each of the black marbles into white marbles.&amp;nbsp; If we run these transactions under serializable isolation, we must run them one at a time.&amp;nbsp; The first transaction will leave a bag with marbles of only one color.&amp;nbsp; After that, the second transaction will change all of these marbles to the other color.&amp;nbsp; There are only two possible outcomes:&amp;nbsp; a bag with only white marbles or a bag with only black marbles.&lt;/P&gt;
&lt;P&gt;If we run these transactions under snapshot isolation, there is a third outcome that is not possible under serializable isolation.&amp;nbsp; Each transaction can simultaneously take a snapshot of the bag of marbles as it exists before we make any changes.&amp;nbsp; Now one transaction finds the white marbles and turns them into black marbles.&amp;nbsp; At the same time, the other transactions finds the black marbles - but only those marbles that where black when we took the snapshot - not those marbles that the first transaction changed to black - and turns them into white marbles.&amp;nbsp; In the end, we still have a mixed bag of marbles with some white and some black.&amp;nbsp; In fact, we have precisely switched each marble.&lt;/P&gt;
&lt;P&gt;The following graphic illustrates the difference:&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;IMG src="http://blogs.msdn.com/photos/craigfr/images/2678168/original.aspx" border=0 mce_src="http://blogs.msdn.com/photos/craigfr/images/2678168/original.aspx"&gt;&lt;/P&gt;
&lt;P&gt;We can demonstrate this outcome using SQL Server.&amp;nbsp; Note that snapshot isolation is only available in SQL Server 2005 and must be explicitly enabled on your database:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;alter database &lt;I&gt;database_name&lt;/I&gt; set allow_snapshot_isolation on&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Begin by creating a simple table with two rows representing two marbles:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table marbles (id int primary key, color char(5))&lt;BR&gt;insert marbles values(1, 'Black')&lt;BR&gt;insert marbles values(2, 'White')&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Next, in session 1 begin a snaphot transaction:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;set transaction isolation level snapshot&lt;BR&gt;begin tran&lt;BR&gt;update marbles set color = 'White' where color = 'Black'&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, before committing the changes, run the following in session 2:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;set transaction isolation level snapshot&lt;BR&gt;begin tran&lt;BR&gt;update marbles set color = 'Black' where color = 'White'&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Finally, commit the transaction in session 1 and check the data in the table:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;commit tran&lt;BR&gt;select * from marbles&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Here are the results:&lt;/P&gt;&lt;PRE&gt;id          color
----------- -----
1           White
2           Black&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;As you can see marble 1 which started out black is now white and marble 2 which started out white is now black.&amp;nbsp; If you try this same experiment with serializable isolation, one transaction will wait for the other to complete and, depending on the order, both marbles will end up either white or black.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2678249" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item><item><title>Repeatable Read Isolation Level</title><link>http://blogs.msdn.com/craigfr/archive/2007/05/09/repeatable-read-isolation-level.aspx</link><pubDate>Wed, 09 May 2007 17:57:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2504802</guid><dc:creator>craigfr</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/2504802.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=2504802</wfw:commentRss><description>&lt;P&gt;In my last two posts, I showed how queries running at read committed isolation level may generate unexpected results in the presence of concurrent updates.&amp;nbsp; Many but not all of these results can be avoided by running at repeatable read isolation level.&amp;nbsp; In this post, I'll explore how concurrent updates may affect queries running at repeatable read.&lt;/P&gt;
&lt;P&gt;Unlike a read committed scan, a repeatable read scan retains locks on every row it touches until the end of the transaction.&amp;nbsp; Even rows that do not qualify for the query result remain locked.&amp;nbsp; These locks ensure that the rows touched by the query cannot be updated or deleted by a concurrent session until the current transaction completes (whether it is committed or rolled back).&amp;nbsp; These locks do not protect rows that have not yet been scanned from updates or deletes and do not prevent the insertion of new rows amid the rows that are already locked.&amp;nbsp; The following graphic illustrates this point:&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;IMG src="http://blogs.msdn.com/photos/craigfr/images/2504516/original.aspx" border=0 mce_src="http://blogs.msdn.com/photos/craigfr/images/2504516/original.aspx"&gt;&lt;/P&gt;
&lt;P&gt;Note that the capability to insert new "phantom" rows between locked rows that have already been scanned is the principle difference between the repeatable read and serializable isolation levels.&amp;nbsp; A serializable scan acquires a key range lock which prevents the insertion of any new rows anywhere within the range (as well as the update or deletion of any existing rows within the range).&lt;/P&gt;
&lt;P&gt;In the remainder of this post, I'll give a couple of examples of how we can get unexpected results even while running queries at repeatable read isolation level.&amp;nbsp; These examples are similar to the ones from my previous two posts.&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Row Movement&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;First, let's see how we can move a row and cause a repeatable read scan to miss it.&amp;nbsp; As with all of the other example in this series of posts, we'll need two sessions.&amp;nbsp; Begin by creating this simple table:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t (a int primary key, b int)&lt;BR&gt;insert t values (1, 1)&lt;BR&gt;insert t values (2, 2)&lt;BR&gt;insert t values (3, 3)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Next, in session 1 lock the second row:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t set b = 2 where a = 2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now, in session 2 run a repeatable read scan of the table:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select * from t with (repeatableread)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;This scan reads the first row then blocks waiting for session 1 to release the lock it holds on the second row.&amp;nbsp; While the scan is blocked, in session 1 let's move the third row to the beginning of the table before committing the transaction and releasing the exclusive lock blocking session 2:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;update t set a = 0 where a = 3&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;As we expect, session 2 completely misses the third row and returns just two rows:&lt;/P&gt;&lt;PRE&gt;a           b           c
----------- ----------- -----------
1           1           1
2           2           2&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;Note that if we change the experiment so that session 1 tries to touch the first row in the table, it will cause a deadlock with session 2 which holds a lock on this row.&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Phantom Rows&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;Let's also take a look at how phantom rows can cause unexpected results.&amp;nbsp; This experiment is similar to the nested loops join experiment from my previous post.&amp;nbsp; Begin by creating two tables:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t1 (a1 int primary key, b1 int)&lt;BR&gt;insert t1 values (1, 9)&lt;BR&gt;insert t1 values (2, 9)&lt;/P&gt;
&lt;P mce_keep="true"&gt;create table t2 (a2 int primary key, b2 int)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now, in session 1 lock the second row of table t1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t1 set a1 = 2 where a1 = 2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Next, in session 2 run the following outer join at repeatable read isolation level:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;set transaction isolation level repeatable read&lt;BR&gt;select * from t1 left outer join t2 on b1 = a2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The query plan for this join uses a nested loops join:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Nested Loops(Left Outer Join, WHERE:([t1].[b1]=[t2].[a2]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t1].[PK__t1]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t2].[PK__t2]))&lt;/P&gt;
&lt;P&gt;This plan scans the first row from t1, tries to join it with t2, finds there are no matching rows, and outputs a null extended row.&amp;nbsp; It then blocks waiting for session 1 to release the lock on the second row of t1.&amp;nbsp; Finally, in session 1, insert a new row into t2 and release the lock:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;insert t2 values (9, 0)&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Here is the output from the outer join:&lt;/P&gt;&lt;PRE&gt;a1          b1          a2          b2
----------- ----------- ----------- -----------
1           9           NULL        NULL
2           9           9           0&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;Notice that we have both a null extended and a joined row for the same join key!&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Summary&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;As I pointed out at the conclusion of my previous post, I want to emphasize that the above results are not incorrect but rather are a side effect of running at a reduced isolation level. &amp;nbsp;SQL Server guarantees that the committed data is consistent at all times.&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;EM&gt;CLARIFICATION 8/26/2008: The above examples work as I originally described if they are executed in tempdb.&amp;nbsp; However, the SELECT statements in session 2 may not block as described if the examples are executed in other databases due to an optimization where SQL Server avoids acquiring read committed locks when it knows that no data has changed on a page.&amp;nbsp; If you encounter this problem, either run these examples in tempdb or change the UPDATE statements in session 1 so that they actually change the data in the updated row.&amp;nbsp; For instance, for the first example try "update t set b = 12 where a = 2".&lt;/EM&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2504802" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Scans+and+Seeks/default.aspx">Scans and Seeks</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item><item><title>Query Plans and Read Committed Isolation Level</title><link>http://blogs.msdn.com/craigfr/archive/2007/05/02/query-plans-and-read-committed-isolation-level.aspx</link><pubDate>Wed, 02 May 2007 21:48:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2379019</guid><dc:creator>craigfr</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/2379019.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=2379019</wfw:commentRss><description>&lt;P&gt;&lt;A class="" title="Read Committed Isolation Level" href="http://blogs.msdn.com/craigfr/archive/2007/04/25/read-committed-isolation-level.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/04/25/read-committed-isolation-level.aspx"&gt;Last week&lt;/A&gt; I looked at how concurrent updates may cause a scan running at read committed isolation level to return the same row multiple times or to miss a row entirely.&amp;nbsp; This week I'm going to take a look at how concurrent updates may affect slightly more complex query plans.&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Nested Loops Join&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;Let's begin by considering this simple query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table Customers (CustId int primary key, LastName varchar(30))&lt;BR&gt;insert Customers values (11, 'Doe')&lt;/P&gt;
&lt;P mce_keep="true"&gt;create table Orders (OrderId int primary key, CustId int foreign key references Customers, Discount float)&lt;BR&gt;insert Orders values (1, 11, 0)&lt;BR&gt;insert Orders values (2, 11, 0)&lt;/P&gt;
&lt;P mce_keep="true"&gt;select * from Orders O join Customers C on O.CustId = C.CustId&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The plan for this query uses a &lt;A class="" title="Nested Loops Join" href="http://blogs.msdn.com/craigfr/archive/2006/07/26/679319.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/07/26/679319.aspx"&gt;nested loops join&lt;/A&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Nested Loops(Inner Join, OUTER REFERENCES:([O].[CustId]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([Orders].[PK__Orders] AS [O]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Seek(OBJECT:([Customers].[PK__Customers] AS [C]), SEEK:([C].[CustId]= [O].[CustId]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;Recall that the nested loops join executes its inner input once for each row from its outer input.&amp;nbsp; In this example, the Orders table is the outer table and we have two orders so we will execute two seeks of the Customers table.&amp;nbsp; Moreover, both orders were placed by the same customer.&amp;nbsp; What happens if we change the customer data between the two index seeks?&amp;nbsp; We can setup an experiment to find out.&amp;nbsp; First, lock the second order in session 1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update Orders set Discount = 0.1 where OrderId = 2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, in session 2 run the join:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select * from Orders O join Customers C on O.CustId = C.CustId&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The join will scan the first order and join it with the Customers table.&amp;nbsp; Then it will try to scan the second order and block waiting for the lock held by session 1.&amp;nbsp; Finally, in session 1 update the customer name and commit the transaction to release the lock and allow the query in session 2 to complete:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;update Customers set LastName = 'Smith' where CustId = 11&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Here are the results of the join:&lt;/P&gt;&lt;PRE&gt;OrderId     CustId      Discount               CustId      LastName
----------- ----------- ---------------------- ----------- ------------------------------
1           11          0                      11          Doe
2           11          0.1                    11          Smith&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;Notice that the customer data is different for the two orders even though the customer id is the same!&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Full Outer Join&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;Next, consider the following full outer join query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t1 (a1 int, b1 int)&lt;BR&gt;insert t1 values (1, 1)&lt;BR&gt;insert t1 values (2, 2)&lt;/P&gt;
&lt;P mce_keep="true"&gt;create table t2 (a2 int, b2 int)&lt;BR&gt;insert t2 values (1, 1)&lt;/P&gt;
&lt;P&gt;select * from t1 full outer loop join t2 on t1.a1 = t2.a2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The nested loops join does not support full outer joins directly so this query generates a two part plan (which I described in my original nested loops join post):&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Concatenation&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;|--Nested Loops(Left Outer Join, WHERE:([t1].[a1]=[t2].[a2]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t1]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t2]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([t1].[a1]=NULL, [t1].[b1]=NULL))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Left Anti Semi Join, WHERE:([t1].[a1]=[t2].[a2]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t2]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t1]))&lt;/P&gt;
&lt;P&gt;Although the original query only references each table onces, this query plan includes two scans of each table.&amp;nbsp; It is possible for the data to change between the two scans.&amp;nbsp; Let's see what happens.&amp;nbsp; Begin by locking the second row of t1 in session 1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t1 set a1 = 2 where a1 = 2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, in session 2 run the join:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select * from t1 full outer loop join t2 on t1.a1 = t2.a2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The query plan begins by scanning the first row of t1 and joining it with t2 (where there is a matching row).&amp;nbsp; It then blocks.&amp;nbsp; Finally, in session 1 delete the first row from t1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;delete t1 where a1 = 1&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;When the join query plan resumes it scans t2 and performs an anti-semi join with t1 to search for rows that exist in t2 and not in t1.&amp;nbsp; These rows are required for the full outer join result, but cannot be generated by the left outer nested loops join.&amp;nbsp; Keep in mind that by this point in the query plan we have already joined the first row of t1 with the row in t2.&amp;nbsp; However, because we deleted the row from t1, the anti-semi join finds the row in t2, cannot match it with a row in t1, and generates a null extended row.&amp;nbsp; Here is the result:&lt;/P&gt;&lt;PRE&gt;a1          b1          a2          b2
----------- ----------- ----------- -----------
1           1           1           1
2           2           NULL        NULL
NULL        NULL        1           1&lt;/PRE&gt;
&lt;P&gt;As the above analysis suggests, the result includes both a joined and a null extended row!&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;I&gt;CLARIFICATION 8/26/2008: The above example works as I originally described if it is executed in tempdb.&amp;nbsp; However, the SELECT statement in session 2 may not block as described if the example is executed in other databases due to an optimization where SQL Server avoids acquiring read committed locks when it knows that no data has changed on a page.&amp;nbsp; If you encounter this problem, either run this example in tempdb or change the UPDATE statement in session 1 so that it actually changes the value of column b1.&amp;nbsp; For example, try "update t1 set b1 = 12 where a1 = 2".&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Index Intersection&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Finally, consider the following query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;create table t (a int primary key, b int, c int, check (b = c))&lt;BR&gt;create index tb on t(b)&lt;BR&gt;create index tc on t(c)&lt;/P&gt;
&lt;P mce_keep="true"&gt;insert t values (1, 1, 1)&lt;BR&gt;insert t values (2, 2, 2)&lt;/P&gt;
&lt;P&gt;select * from t with (index(tb, tc))&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;I've forced an index intersection plan.&amp;nbsp; The query plan scans and joins columns from both non-clustered indexes to form the final result:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Hash Match(Inner Join, HASH:([t].[a])=([t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Scan(OBJECT:([t].[tb]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Scan(OBJECT:([t].[tc]))&lt;/P&gt;
&lt;P mce_keep="true"&gt;Recall that the &lt;A class="" title="Hash Join" href="http://blogs.msdn.com/craigfr/archive/2006/08/10/687630.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/08/10/687630.aspx"&gt;hash join&lt;/A&gt; scans its entire build input (tb) and then scans its entire probe input (tc).&amp;nbsp; What happens if the contents of the indexes change between the two scans?&amp;nbsp; Once again, let's find out.&amp;nbsp; Begin by locking the second row in session 1:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t set b = 4, c = 4 where a = 2&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Now, in session 2 run the select statement:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;select * from t with (index(tb, tc))&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The inex scan of tb will read the first row and then block on the second row.&amp;nbsp; Finally, in session 1, update the first row:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;update t set b = 3, c = 3 where a = 1&lt;BR&gt;commit tran&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The select statement resumes, finishes the scan of index tb, scans index tc where it finds the updated first row, and returns a joined row that consists partially of a row prior to the update and partially of a row after the update:&lt;/P&gt;&lt;PRE&gt;a           b           c
----------- ----------- -----------
1           1           3
2           4           4&lt;/PRE&gt;
&lt;P&gt;Notice that this result even seems to violate the check constraint!&lt;/P&gt;
&lt;P&gt;&lt;B&gt;Summary&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;I've demonstrated three ways that queries may behave unexpectedly when running in read committed isolation level.&amp;nbsp; I want to emphasize that these results are &lt;EM&gt;not&lt;/EM&gt; incorrect. &amp;nbsp;SQL Server guarantees that the committed data is consistent at &lt;EM&gt;all&lt;/EM&gt; times and does &lt;EM&gt;not&lt;/EM&gt; permit any constraints to be violated.&amp;nbsp; These results are merely a consequence of running at a relatively low isolation level.&amp;nbsp; If we repeat the above experiments with snapshot read committed enabled or at a higher isolation level, we do not see these results.&amp;nbsp; The benefit of read committed is higher concurrency with less blocking and fewer deadlocks.&amp;nbsp; The disadvantage is lower consistency guarantees.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2379019" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Joins/default.aspx">Joins</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item><item><title>Read Committed Isolation Level</title><link>http://blogs.msdn.com/craigfr/archive/2007/04/25/read-committed-isolation-level.aspx</link><pubDate>Wed, 25 Apr 2007 19:59:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:2274063</guid><dc:creator>craigfr</dc:creator><slash:comments>7</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/2274063.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=2274063</wfw:commentRss><description>&lt;P&gt;SQL Server 2000 supports four different isolation levels: read uncommitted (or nolock), read committed, repeatable read, and serializable.&amp;nbsp; SQL Server 2005 adds two new isolation levels: read committed snapshot and snapshot.&amp;nbsp; These isolation levels determine what locks SQL Server takes when accessing data and, therefore, by extension they determine the level of concurrency and consistency that statements and transactions experience.&amp;nbsp; All of these isolation levels are described in &lt;A class="" title="SET TRANSACTION ISOLATION LEVEL (Transact-SQL)" href="http://msdn2.microsoft.com/en-us/library/ms173763.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/ms173763.aspx"&gt;Books Online&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;In this post, I'm going to take a closer look at the default isolation level of read committed.&amp;nbsp; When SQL Server executes a statement at the read committed isolation level, it acquires short lived share locks on a row by row basis.&amp;nbsp; The duration of these share locks is just long enough to read and process each row; the server generally releases each lock before proceeding to the next row.&amp;nbsp; Thus, if you run a simple select statement under read committed and check for locks (e.g., with sys.dm_tran_locks), you will typically see at most a single row lock at a time.&amp;nbsp; The sole purpose of these locks is to ensure that the statement only reads and returns committed data.&amp;nbsp; The locks work because updates always acquire an exclusive lock which blocks any readers trying to acquire a share lock.&lt;/P&gt;
&lt;P&gt;Now, let's suppose that we scan an entire table at read committed isolation level.&amp;nbsp; Since the scan locks only one row at a time, there is nothing to prevent a concurrent update from moving a row before or after our scan reaches it.&amp;nbsp; The following graphic illustrates this point:&lt;/P&gt;
&lt;P mce_keep="true"&gt;&lt;IMG src="http://blogs.msdn.com/photos/craigfr/images/2273874/original.aspx" border=0&gt;&lt;/P&gt;
&lt;P&gt;Let's try an experiment to see this effect in action.&amp;nbsp; We'll need two server sessions for this experiment.&amp;nbsp; First, create a simple table with three rows:&lt;/P&gt;
&lt;P&gt;create table t (a int primary key, b int)&lt;BR&gt;insert t values (1, 1)&lt;BR&gt;insert t values (2, 2)&lt;BR&gt;insert t values (3, 3)&lt;/P&gt;
&lt;P mce_keep="true"&gt;Next, in session 1 lock the second row:&lt;/P&gt;
&lt;P&gt;begin tran&lt;BR&gt;update t set b = 2 where a = 2&lt;/P&gt;
&lt;P&gt;Now, in session 2 run a simple scan of the table:&lt;/P&gt;
&lt;P&gt;select * from t&lt;/P&gt;
&lt;P mce_keep="true"&gt;This scan will read the first row and then block waiting for session 1 to release the lock it holds on the second row.&amp;nbsp; While the scan is blocked, in session 1 we can swap the first and third rows and then commit the transaction and release the exclusive lock blocking session 2:&lt;/P&gt;
&lt;P&gt;update t set a = 4 where a = 1&lt;BR&gt;update t set a = 0 where a = 3&lt;BR&gt;select * from t&lt;BR&gt;commit tran&lt;/P&gt;
&lt;P mce_keep="true"&gt;Here are the new contents of the table following these updates:&lt;/P&gt;
&lt;P&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; b&lt;BR&gt;----------- -----------&lt;BR&gt;0&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 3&lt;BR&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;Finally, here is the result of the scan from session 2:&lt;/P&gt;
&lt;P&gt;a&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; b&lt;BR&gt;----------- -----------&lt;BR&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;1&lt;BR&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 2&lt;BR&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1&lt;/P&gt;
&lt;P&gt;Notice that in this output the first row was scanned prior to the updates while the third row was scanned following the updates.&amp;nbsp; In fact, these two rows are really the same row from before and after the update.&amp;nbsp; Moreover, the original third row that had the value (3, 3) is not output at all.&amp;nbsp; (We could claim that changing the primary key effectively deleted one row and created a new row, but we could also achieve the same effect on a non-clustered index.)&lt;/P&gt;
&lt;P&gt;Finally, try repeating this experiment, but add a unique column to the table:&lt;/P&gt;
&lt;P&gt;create table t (a int primary key, b int, c int unique)&lt;BR&gt;insert t values (1, 1, 1)&lt;BR&gt;insert t values (2, 2, 2)&lt;BR&gt;insert t values (3, 3, 3)&lt;/P&gt;
&lt;P mce_keep="true"&gt;You'll get the same result, but you'll see "duplicates" in the unique column.&lt;/P&gt;
&lt;P&gt;If the above results are not acceptable, you can either enable read committed snapshot for your database or you can run at a higher isolation level (albeit with somewhat lower concurrency).&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;/P&gt;
&lt;P&gt;CLARIFICATION 8/26/2008: The above example works as I originally described if it is executed in tempdb.&amp;nbsp; However, the SELECT statement in session 2 may not block as described if the example is executed in other databases due to an optimization where SQL Server avoids acquiring read committed locks when it knows that no data has changed on a page.&amp;nbsp; If you encounter this problem, either run this example in tempdb or change the UPDATE statement in session 1 so that it actually changes the value of column b.&amp;nbsp; For example, try "update t set b = 12 where a = 2".&lt;/P&gt;&lt;/EM&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=2274063" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Scans+and+Seeks/default.aspx">Scans and Seeks</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Isolation+Levels/default.aspx">Isolation Levels</category></item></channel></rss>