SELECT from a view slower than "equivalent" SELECT from the base table

SELECT from a view slower than "equivalent" SELECT from the base table

Rate This
  • Comments 5

Sometime we get complaints that a query is slower than it could be because a filter isn’t pushed very deeply down into a plan.  For example, consider this hypothetical poor performance scenario (my apologies in advance for the lack of normalization):

 

USE tempdb

IF OBJECT_ID ('Sales') IS NOT NULL DROP TABLE Sales

IF OBJECT_ID ('SalesSummary') IS NOT NULL DROP VIEW SalesSummary

CREATE TABLE Sales (

  SalesPerson varchar(30) NOT NULL,

  SalesAmount money NOT NULL, Comments char(200))

INSERT INTO Sales VALUES ('Green ', 19011.87, '')

INSERT INTO Sales VALUES ('Green', 2478.42, '')

INSERT INTO Sales VALUES ('Green ', 1975.11, '')

INSERT INTO Sales VALUES ('White', 3007.01, '')

INSERT INTO Sales VALUES ('White', 5312.44, '')

INSERT INTO Sales VALUES ('Brown', 843.20, '')

CREATE INDEX idx ON Sales (SalesPerson, SalesAmount)

GO

CREATE VIEW SalesSummary AS

SELECT SalesPerson, SUM (SalesAmount) AS TotalSales

FROM Sales

GROUP BY SalesPerson

GO

 

SET STATISTICS PROFILE ON

GO

-- This query uses an index seek to retrieve only the rows

-- where SalesPerson = 'Green'

SELECT *

FROM SalesSummary

WHERE SalesPerson = 'Green'

GO

 

-- This query uses an index scan, then filters the rows later.

SELECT *

FROM SalesSummary

WHERE SalesPerson LIKE 'Green'

GO

SET STATISTICS PROFILE OFF

GO

 

The first query filters on “SalesPerson = 'Green'”.  The second filters on “SalesPerson LIKE 'Green'”.  Here’s the first query’s plan:

 

  |--Stream Aggregate(DEFINE:([Expr1004]=SUM([Sales].[SalesAmount]), [Sales].[SalesPerson]=ANY([Sales].[SalesPerson])))

       |--Index Seek(OBJECT:([Sales].[idx]), SEEK:([Sales].[SalesPerson]='Green') ORDERED FORWARD)

 

You can see that the first thing the plan does is a very efficient index seek to narrow the set of rows down to those that pass the “SalesPerson = ‘Green’” filter.  Then a Stream Aggregate operator computes the SUM(SalesAmount) expression for each SalesPerson returned by the index seek. 

 

The second plan, though, scans every row in the table and computes the SUM for every SalesPerson.  Only after it has scanned and aggregated every row does it filter out those values that don’t survive the LIKE predicate. 

 

  |--Filter(WHERE:([Sales].[SalesPerson] like 'Green'))

       |--Stream Aggregate(GROUP BY:([Sales].[SalesPerson]) DEFINE:([Expr1004]=SUM([Sales].[SalesAmount])))

            |--Index Scan(OBJECT:([Sales].[idx]), ORDERED FORWARD)

 

Here’s a general rule off thumb you can follow when looking for tuning opportunities in a query plan: for best performance, you usually want to push the most selective predicates as deeply as possible into the plan.  If you do the most selective operation first, the remaining operators have fewer rows to process, and that means faster overall query execution.  Based on this rule of thumb, the first plan here is clearly preferable from a performance perspective.  So what gives?  Why does the use of “LIKE” instead of “=” make SQL refuse to push the filter down?  Even more baffling, why do you get the more efficient plan when you bypass the view and select directly from the table, even if you use "LIKE"?  Run this and you’ll see what I mean:

 

SELECT SalesPerson, SUM (SalesAmount) AS TotalSales

FROM Sales

WHERE SalesPerson LIKE 'Green'

GROUP BY SalesPerson

 

This uses an efficient index seek-based plan, just like the first query. 

 

There are several things going on here:

  • Views must behave like a table
  • The “LIKE” and “=” operators use subtly different rules for string matching 
  • GROUP BY uses the same string comparison rules as “=” for the purposes of determining which rows end up in the same bucket 

What I mean by “views must behave like a table” is that the output of a select from a view must be the same as what you could get by materializing the view (e.g. selecting it into a temp table), then querying the materialized view.  In this case, the GROUP BY in the view could return a different total sales amount for a SalesPerson if SQL chose a plan that pushed a LIKE predicate below the Stream Aggregate.  Here’s proof:

 

SELECT *

FROM SalesSummary

WHERE SalesPerson LIKE 'Green '

 

-- Query 1 output:

SalesPerson                    TotalSales           

------------------------------ ---------------------

Green                          23465.4000

 

 

SELECT SalesPerson, SUM (SalesAmount) AS TotalSales

FROM Sales

WHERE SalesPerson LIKE 'Green '

GROUP BY SalesPerson

 

-- Query 2 output:

SalesPerson                    TotalSales           

------------------------------ ---------------------

Green                          20986.9800

 

The first query selects from the view, while the second moves the view logic into the query and selects directly from the base table.  Other than that, they are identical, yet the SUM(SalesAmount) calculation is different.  Recall that I mentioned that “=” and “LIKE” have different string comparison semantics.  In particular, LIKE considers trailing blanks in the right-hand operand to be significant, while the “=” operator ignores trailing blanks.  For the set of three rows with group ID “Green”, only two will qualify for the filter “WHERE SalesPerson LIKE ‘Green ‘” because they have trailing blanks.  The third “Green” row doesn’t have any trailing blanks and won’t survive the LIKE filter.  When you push this LIKE filter below the aggregate, you end up SUMming a different set of rows.  That’s not allowed if the GROUP BY is part of a view; if a WHERE clause applied to a view can change a property of a row instead of just filtering it out, it would mean that the view didn’t behave like a materialized table.  Put another way, a filter on a view that includes a GROUP BY is only allowed to eliminate entire groups; it's not legal for it to eliminate some base rows in a group but retain others, changing the group's membership.  It’s therefore by design that the slower scan-based plan is selected for the select from the view with LIKE.  In contrast, the query from the view with the "SalesPerson='Green'" filter can be pushed because GROUP BY uses the same string comparison rules as the "=" operator.  It's safe for the QO to assume that pushing the "=" filter below the Stream Aggregate will not change the view's semantics. 

 

This isn’t just about trailing blanks – you can see the attached script for a couple of examples that demonstrate the exact same thing (pushing LIKE below an aggregate changes the output of the aggregate) for a couple of interesting non-blank characters.  And it isn’t only about “LIKE” vs. “=”, either; this is just the example that was close at hand when I wrote this (we have a case open for this scenario right now). 

 

Finally, be aware that derived tables (and CTEs) also provide the same guarantee.  For example, note that this query selects from the base table but also does a full scan followed by filter, just like the above select from the view:

 

SELECT *

FROM (

   SELECT SalesPerson, SUM (SalesAmount) AS TotalSales

   FROM Sales

   GROUP BY SalesPerson) AS t

WHERE SalesPerson LIKE 'Green'

 

So, to net out all of this: Generally you want your filters to be pushed deep into the query plan -- as deeply as possible.  But when you’re selecting from a view, there will be some limits to what can be pushed.  Some filters can’t be pushed beneath parts of the view without changing the view’s semantics, and that would break a contract that SQL is required to maintain.  

 

 

UPDATE (2 March 2009): Fabiano Amorim pointed out that the Query #1 and Query #2 use the same plan on SQL 2008.  He's right; a new performance optimization causes the LIKE predicate to be pushed below the GROUP BY's aggregate operator.  I think this is actually a bug -- it does result in a faster plan, but it breaks the "views behave like tables" rule that SQL follows in all other cases.  The general rule stands: not all predicates can be pushed below a view's GROUP BY, even in SQL 2008 with this fairly aggressive performance optimization.  (And don't be surprised if this optimization gets removed in a future release. ;)

 

Attachment: like_views_nopush.sql
Leave a Comment
  • Please add 8 and 2 and type the answer here:
  • Post
  • Very interesting post.

    What I will remember from this post is that I should pay special attention to my execution plans each time that I'm using LIKE against a view. I have checked some of my current queries though (because I'm just happening to do that right now) and there the LIKE-clause seems to be pushed al the way to the bottom (I'm not using GROUP BY).

    Are there some simple rules that we can follow as in: if you make sure that you do this and this and this and never this, than your predicate will always be pushed to the bottom? Or is the guideline simply, if you're LIKE and performance is critical, don't use a view?
  • Unfortunately this isn't restricted to LIKE, or even to views with GROUP BY. Here's another example that shows how a TOP in a view can (actually, *must*) prevent pushing of an equality filter.  Notice that the two queries seem equivalent on the surface, but return different results!  The chance of different results is the reason that SQL can't push the filter predicate down into the view in this case.

    CREATE TABLE t1 (c1 int)
    INSERT INTO t1 VALUES (1)
    INSERT INTO t1 VALUES (2)
    GO
    CREATE VIEW v1 AS SELECT TOP 1 * FROM t1 ORDER BY c1
    GO
    SELECT TOP 1 * FROM v1 WHERE c1 = 2 -- filter not pushed
    SELECT TOP 1 * FROM t1 WHERE c1 = 2 -- filter pushed
    GO

    Check out KB 297469 for a more complete discussion of this particular scenario. (BTW, this is a good reason to avoid using TOP 100 PERCENT just so that you can stuff an ORDER BY inside a view.  The technique doesn't come for free -- better to leave ORDER BYs in the outer SELECT where they belong.)

    Use of the new ROW_NUMBER() function in SQL 2005 can have a similar effect.  If the filter was pushed down to the bottom of the tree, the row numbers assigned to the view's rows would change.  That would produce different results than a materialized view, which is illegal.

    DROP TABLE t1 DROP VIEW v1
    GO
    CREATE TABLE t1 (c1 int, c2 int)
    INSERT INTO t1 VALUES (1, 2)
    GO
    CREATE VIEW v1 AS
    SELECT *, ROW_NUMBER() OVER (ORDER BY c2) AS rn
    FROM t
    GO
    -- filter gets pushed
    SELECT *, ROW_NUMBER() OVER (ORDER BY c2) AS rn FROM t WHERE c1< 5
    -- filter not pushed
    SELECT * FROM v1 WHERE c1< 5
    GO

    So a full list of red flags would include GROUP BY, LIKE, ROW_NUMBER, and TOP, but that isn't a complete list.  I know from experience that OUTER JOIN in a view can also prevent predicate pushdown in some cases (I'm drawing a blank on an example right now, though... if you'd like to see an example script let me know and I'll see if I can put together a simple demo).  
  • I hadn't thought of that yet, but indeed it is logical that a view would behave differently than a base table in these cases. I have checked my current views an I appear to be using only simple stuff like simple columns, base-tables, INNER JOINs and UNIONs so I guess I'm in the clear.

    On a side note. This is information that I have been wondering about for a very long time but that you simply do not find in BOL. To me, this seems to be vital information though that I would expect to find in BOL under views|performance. Perhaps something that you can let your docwriters know. :-)

    Thank you for the detailed response!
  • EXCELLENT!  This is perfect for my current engagement.

  • Good article, thanks:)

Page 1 of 1 (5 items)