Fast ordered delete
We have a visit from MySpace database team in our lab this week. I have used this opportunity and asked Kevin Stephenson, MySpace’s Senior Database Developer, about “pain points” he encounters while doing his everyday job supporting SQL Server. He brought up a problem of efficient ordered delete. He needs to delete large portions of older custom log entries periodically and he needs to delete them in certain order. He also knows that it is efficient to break huge update operations into more small ones to allow truncating log. Kevin is aware of several methods. He immediately illustrated his problem on a small example:
drop table t1
create table t1 (a int primary key, b char (100))
declare @i int
set @i=1
SET NOCOUNT ON
while (@i<100000)
begin
insert into t1 values (@i,'x');
set @i=@i+1
end
set statistics io on
set statistics time on
delete from t1 where a in (select top (10000) a from t1 order by a);
set statistics time off
set statistics io off
And I got the following result messages:
Table 't1'. Scan count 2, logical reads 30564, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 1, logical reads 20152, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 80 ms.
The delete query in the above script has the following plan (before performing the delete):
We can see the table t1 is accessed twice; the lower branch is performing the “select top (10000) a from t1” subquery and joining the result to the second occurrence of the same table t1. So we need to read the “deleted” portion of the table twice to identify the rows to delete and then once more to perform the delete. The question was, is it possible to perform the same operation more efficiently?
The answer is using a tip – a view with ORDER BY. You cannot use ORDER BY in a view definition unless you use also TOP in the SELECT clause in the same view. And we know we want to delete 10000 rows. So the view definition is as follows:
create view v1 as (select top (10000) * from t1 order by a)
and we can delete the “top” rows using simply
delete from v1
The query plan for this delete is much simpler.
and the I/O and cpu statistics demonstrate the improvement:
Table 't1'. Scan count 1, logical reads 654, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 75 ms.
Kevin Stephenson and Lubor Kollar