Lubor Kollar
Partition elimination is very important when SQL Server executes queries against partitioned tables or partitioned views. In general, SQL Server is doing an excellent job of not scanning the partitions that are excluded by some predicates. Recently we have discovered one scenario where the partition elimination does not work against partitioned tables in SQL Server 2005 and this blog describes the conditions leading to the problem as well as easy workarounds. Additionally you will learn how to discover if partition elimination works for your query or not. You will also learn what is static and dynamic partition elimination.
The most reliable way to find out if partition elimination happens in your query is to use the SET STATISTICS PROFILE ON command, run the query and investigate the output. But let me start with building our example table:
create partition function PF1 (int) as range for values (100,200,300,400);
create partition scheme PS1 as partition PF1 all to ([PRIMARY]);
go
create table t1 (a int, b int) on PS1 (a);
declare @i int;
set @i=1;
set nocount on;
while (@i<22)
begin;
insert into t1 values (20*@i, @i);
set @i=@i+1;
end;
The following query shows distribution of all rows in table t1 across the five partitions:
select $partition.PF1(a) [Partition Number], a, b from t1
Partition Number a b
1 20 1
1 40 2
1 60 3
1 80 4
1 100 5
2 120 6
2 140 7
2 160 8
2 180 9
2 200 10
3 220 11
3 240 12
3 260 13
3 280 14
3 300 15
4 320 16
4 340 17
4 360 18
4 380 19
4 400 20
5 420 21
First, I will show 5 examples how partition elimination works correctly in SQL Server 2005 and I will explain the difference between the static and dynamic partition elimination. Here is a small batch and we will investigate the output later below
set statistics profile on;
declare @i1 int;
declare @i2 int;
set @i1=50;
set @i2=250 ;
select * from t1 where a<50 or a>450; -- (Q1) only two partitions are scanned
select * from t1 where a in (50,450); -- (Q2) only two partitions are scanned
select * from t1 where a<@i2 and a>100; -- (Q3) only two partitions are scanned
select * from t1 where a=100;-- (Q4) only one partition is scanned - static partition elimination
select * from t1 where a=@i2; -- (Q5) only one partition is scanned - dynamic partition elimination
set statistics profile off;
You will see the result set followed by the showplan with the columns “Rows” and “Executes” in front of it for each of the four queries above. For the query Q1
select * from t1 where a<50 or a>450
the showplan output is
Rows Executes StmtText
2 1 select * from t1 where a<50 or a>450; -- (Q1) only two partitions
2 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR
2 1 |--Constant Scan(VALUES:(((1)),((5))))
2 2 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d
The scan of the partitioned table becomes a Nested Loops looping over the partitions. And we see already in the Constant Scan that we will be scanning only partitions 1 and 5. The “Executes” value 2 below confirms we did 2 scans of an individual partition.
The IN predicate “a in (50,450)” in the Q2 is turned into “ a = 50 OR a = 450”, and SQL Server will access only the two partitions, 1, and 5, containing all qualifying rows
0 1 select * from t1 where a in (50,450); -- (Q2) only two partitions are sc
0 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR
0 2 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[dbo].[t
The second query, Q3,
select * from t1 where a<@i2 and a>100
yields
7 1 select * from t1 where a<@i2 and a>100; -- (Q3) only two partitions
7 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR
2 1 |--Filter(WHERE:([PtnIds1004]<=RangePartitionNew([@i2],(0)
4 1 | |--Constant Scan(VALUES:(((2)),((3)),((4)),((5))))
7 2 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d
In the above plan we see that we have statically eliminated the partition 1 from the Constant Scan because of the a>100 predicate and we are using the predicate a<@i2 to potentially eliminate more partitions using the Filter above the Constant Scan. The later is dynamic elimination because it depends on the run time value of the @i2 how many additional partitions will be eliminated. By looking at the “Executes” of the Table Scan we see that again we are scanning only 2 individual partitions.
The query Q4
select * from t1 where a=100
has no Constant Scan at all. This is because SQL Server knows already in the compile time which single partition will be accessed to retrieve the complete result. Here is the plan
1 1 SELECT * FROM [t1] WHERE [a]=@1
1 1 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[dbo]. [t1].[a]=(100)) PARTITION ID:((1)))
Now compare the above with the plan for query Q5
select * from t1 where a=@i2.
Also here we know we will scan only a single partition but we don’t know which at the time we compile the query therefore the WHERE is paramaterized
0 1 select * from t1 where a=@i2; -- (Q5) only one partition is scanned 0 1 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[dbo].[t1].[a]=[@i2]) PARTITION ID:(RangePartitionNew([@i2],(0),(100),(200),(300),(400))))
All the above partition elimination worked as expected and we saw that SQL Server skipped as many partitions as possible. Now investigate the following cases
set @i2=250;
select * from t1 where a<50 or a>@i2; -- (Q6)
select * from t1 where a<@i1 or a>@i2; -- (Q7)
select * from t1 where a in (@i1,@i2); -- (Q8)
Taking into consideration the boundary values (100,200,300,400) and the values of @i1=50 and @i2=250, both the Q6 and Q7 should be able to skip the second partition safely. However, the plans are
11 1 select * from t1 where a<50 or a>@i2 -- (Q6) all partitions are
11 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PAR
5 1 |--Constant Scan(VALUES:(((1)),((2)),((3)),((4)),((5))))
11 5 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d
and
11 1 select * from t1 where a<@i1 or a>@i2 -- (Q7) all partitions are
The Q8 should scan only two partitions – 1 and 3 – but it will scan all five again
0 1 select * from t1 where a in (@i1,@i2) -- (Q8)
0 5 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d
In both cases SQL Server 2005 is scanning all partitions. The partition elimination in SQL Server 2005 does not work if there is OR between the eliminating predicates and if at least one of them is parameterized at the same time. Since the IN predicate with at least 2 elements inside the list is transformed to an OR predicate, the same is true for the IN lists with at least one parameter value as well.
There are several possible workarounds, and I will show the most convenient one with the UNION ALL replacing the OR. In our case the query Q6 will become
select * from t1 where a<50
UNION ALL
select * from t1 where a>@i2
with the query plan
11 1 select * from t1 where a<50 UNION ALL select * from t1 where a
11 1 |--Concatenation
2 1 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d
9 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1011]
3 1 |--Filter(WHERE:([PtnIds1011]>=RangePartitionNew([@i2
5 1 | |--Constant Scan(VALUES:(((1)),((2)),((3)),((4))
9 3 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([tes
The query plan above has static elimination for the predicate a<50 and dynamic partition elimination for the predicate a>i2. The first one will end up accessing a single partition and the second will scan the last three partitions because of the @i2 value of 250.
The second query, Q6, will be rewritten to
select * from t1 where a<@i1
11 1 select * from t1 where a<@i1 UNION ALL select * from t1 where
2 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1010]
1 1 | |--Filter(WHERE:([PtnIds1010]<=RangePartitionNew([@i1
5 1 | | |--Constant Scan(VALUES:(((1)),((2)),((3)),((4))
2 1 | |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([tes
And the third query, Q7, is equivalent to
select * from t1 where a =@i1
select * from t1 where a =@i2
0 1 select * from t1 where a =@i1 -- (Q7) UNION ALL select * fro
0 1 |--Concatenation
0 1 |--Table Scan(OBJECT:([test].[dbo].[t1]), WHERE:([test].[d
You can also use dynamic query to remove the parameters by converting them to strings and concatenating with the rest of the query. In SQL Server 2000 and before such use of dynamic query inside of a stored procedure would cause authentication failure for users of the stored procedure without explicit access to the objects referenced in the dynamic query but in SQL Server 2005 this problem may be avoided by using the EXECUTE AS clause that can implicitly define the execution context of the statement.
Summary
I have shown how you can find out if SQL Server eliminates partitions in your queries. Most of the time SQL Server is doing great job in partition elimination and it is scanning or seeking only partitions that could potentially yield rows satisfying the WHERE, IN or other row eliminating clauses. In spite of that you should still check the query plans of your queries against partitioned tables if you suspect the performance may be or is an issue. You should be aware that SQL Server is employing both the static partition elimination when the query is optimized and dynamic partition elimination when the choice of scanned and skipped partitions is made at the query execution time.
I have shown an example, when there is at least one parameter and either IN or OR clause when SQL Server is not eliminating all partitions it could. I’m showing a workaround using a UNION ALL instead of the OR and SQL Server is again eliminating all partitions that cannot yield any row satisfying the query. SQL Server development team is planning to address the above mentioned partition elimination problem in one of the future SQL Server service packs or releases.
In this post, I’m going to take a look at how query plans involving partitioned tables work. Note that
Hi, Here in Brazil I´m working on MS Gold Certified Partner, and I act in two distinct teams - Data...
Hi,
Great aritical, though do you happen to know why the optimizer it NOT grabbing the correct partition when a @varible is used verses a direct value.
funny thing is the plan doesn't change between estimated verses actual( you would think it would pick up the correct partition after the exection) in addtional if this actually by design out can you verify that the optimizer really is pulling the correct partition.
The optimizer cannot grab the correct partition since it does not know the value of the @variable at compile time. You are correct that the correct partition is picked up during execution. The plan stays the same during the execution - it is read only structure used to navigate the execution similarly like your C# or VB program is - only the flow of execution may differ from execution to execution. You are also correct that we could potentially explicitly tell what partition has been visited in the showplan's "run time information" (read chapter 2 of "T-SQL Querying" from the "Inside SQL Server 2005 book if interested about more details" but you can still figure it out from there implicitly because only one branch is executed.
Lubor:
set @i1=120;
select * from t1 where a < @i1;
Any reason why the the “Executes” value is 2 here, and not 1?
120 is the first value in Partition#2, thus <120 should only be Partition#1...
Thanks for the post, it has provided some much needed direction with Partitioned Tables.
....
I'm not sure what partition function you have in mind... In the one I have in the blog the first boundary value is 100 thus precicate <120 must scan the first two partitions. Or post your Partition Function, Schema, and table to see what you exacly mean.
Ok, I see it now.
Its the boundary, now the rows which satisfy.
I was thinking (errantly) thinking that because there were only rows which satisfied the <120 condition, it would only scan Part1. But now I see it has to scan Part2 for 101-119 as well (My conditition overlapping with the boundary definition).
Thanks for the followup.
I think it would be interested to see the break down of these types of situations:
declare @holder table ( hValue int )
insert into @holder (hValue) values ( 20 )
insert into @holder (hValue) values ( 40 )
insert into @holder (hValue) values ( 60 )
--Derived Table Example
select a, b from t1
join ( select hValue as HV from @holder h ) as derived1
on t1.a = derived1.HV
--EXISTS Example
select a, b from t1 tOne
where EXISTS
( select null from @holder h where h.hValue = tOne.a)
I'm still using your examples to learn how to fine-tune my eye for partition elimination.
Man, what would be nice is something like:
set partitions profile on; --<this does not exist
And then it would just show you:
Table PartitionTouches
t1 1
Because it gets cumbersome/confusing sometimes.
I guess I can dream.
But thank you again for the post...I'd be lost/frustrated without it.
Folks,
I have VLDB on SQL2000. Our primary table contains slightly less than 1B records, and I am anticipating 450M records per year.
We have implemented distributed view on 2000, and it works reasonably well. As we are migrating to SQL2005, I am not certain of the maturity of partitioning to migrate to that architecture, as well as having everything on a single server. My first reaction is to architect a hybrid solution, where I would use partitioning on the local servers, and then have a distributed view on top of all the partitions.
Any thoughts would be appreciated.
I'm Jungsun Kim, SQL Server MVP in Korea.
This is good article, I think.
(of course, too old. ^^)
I would like to introduce this article to my buddies.
so I just translated to Korean and posted in my blog,
"http://blog.naver.com/visualdb/50027176087"
Thank you.