Fast ordered delete

Fast ordered delete

  • Comments 6

We have a visit from MySpace database team in our lab this week. I have used this opportunity and asked Kevin Stephenson, MySpace’s Senior Database Developer, about “pain points” he encounters while doing his everyday job supporting SQL Server. He brought up a problem of efficient ordered delete. He needs to delete large portions of older custom log entries periodically and he needs to delete them in certain order. He also knows that it is efficient to break huge update operations into more small ones to allow truncating log. Kevin is aware of several methods. He immediately illustrated his problem on a small example:

drop table t1
create table t1 (a int primary key, b char (100))
declare @i int
set @i=1
SET NOCOUNT ON
while (@i<100000)
begin
insert into t1 values (@i,'x');
set @i=@i+1
end
 
set statistics io on
set statistics time on
delete from t1 where a in (select top (10000) a from t1 order by a);
set statistics time off
set statistics io off

And I got the following result messages:

Table 't1'. Scan count 2, logical reads 30564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 20152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 78 ms,  elapsed time = 80 ms.

The delete query in the above script has the following plan (before performing the delete):

Pic1

We can see the table t1 is accessed twice; the lower branch is performing the “select top (10000) a from t1” subquery and joining the result to the second occurrence of the same table t1. So we need to read the “deleted” portion of the table twice to identify the rows to delete and then once more to perform the delete. The question was, is it possible to perform the same operation more efficiently?

The answer is using a tip – a view with ORDER BY. You cannot use ORDER BY in a view definition unless you use also TOP in the SELECT clause in the same view. And we know we want to delete 10000 rows. So the view definition is as follows:

create view v1 as (select top (10000) * from t1 order by a)

and we can delete the “top” rows using simply

delete from v1

The query plan for this delete is much simpler.

Pic2

and the I/O and cpu statistics demonstrate the improvement:

Table 't1'. Scan count 1, logical reads 654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 75 ms.

Kevin Stephenson and Lubor Kollar

Leave a Comment
  • Please add 7 and 2 and type the answer here:
  • Post
  • PingBack from http://microsoft-sharepoint.simplynetdev.com/fast-ordered-delete/

  • such a simple solution - I've just been working around applying an update based upon a TOP which of course will not take an order by - this makes a simple way around it - thanks.

  • The simplest way to delete the top X rows is use to use a derived table. delete t1 from ( select top

  • Another way is using CTE

    with d  as

    (

    select top (10000) a from t

    )

    delete from d;

  • Thank you Simon and sunwei for you additions to my blog. Both derived tables and CTEs are great alternatives for views and I'm happy to see that they both work also in the case of the "efficient" delete!

  • Not to forget the inline table-valued functions that behave like parameterized views, allowing thus to specify the number of records to be returned and deleted:

    CREATE FUNCTION dbo.TestUDF(

    @number_records int)

    RETURNS TABLE

    AS

    RETURN (

       SELECT top (@number_records) a, b

       FROM t1

       ORDER BY a

    )

    DELETE FROM dbo.TestUDF(10001)

Page 1 of 1 (6 items)