Craig Freedman's SQL Server Blog

A discussion of query processing, query execution, and query plans in SQL Server.

Implied Predicates and Query Hints

Implied Predicates and Query Hints

  • Comments 9

In this post, I want to take a look at how two seemingly unrelated features of SQL Server can interact to cause a problem.  The idea for this post came from a question submitted by a reader.  Let's begin.  Consider the following trivial schema and query:

CREATE TABLE T1 (A INT, B INT)
CREATE TABLE T2 (A INT, B INT)

SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.B = 0
OPTION (HASH JOIN)

Not surprisingly, this query yields the following plan:

  |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]), RESIDUAL:([T2].[A]=[T1].[A]))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[B]=(0)))
       |--Table Scan(OBJECT:([T2]))

In fact, this query yields this plan with or without the hint.  Now let's make a small modification to the WHERE clause of the query see what happens:

SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A = 0
OPTION (HASH JOIN)

Now this query yields the following error message:

Msg 8622, Level 16, State 1, Line 1

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

What happened?  Why does this seemingly innocuous change to the query cause it to fail?  To find the answer, let's run the query without the HASH JOIN hint and look at the plan:

SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A = 0

  |--Nested Loops(Inner Join)
       |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]=(0)))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

There are two things about this plan that are especially notable:

First, the plan includes the predicate "T2.A = 0".  Although we did not specify this predicate in the original query, SQL Server derives it from the predicates that we did specify.  This derivation is a good thing.  It allows SQL Server to filter out rows from the scan of T2 earlier than would otherwise be possible.

Second, the original equijoin predicate "T1.A = T2.A" appears nowhere in this plan.  This predicate is redundant with the original predicate "T1.A = 0" and the derived predicate "T2.A = 0" so SQL Server eliminates it.  Normally, eliminating a redundant predicate is a good thing.  By the time the rows from the scans reach the join, there is no reason to evaluate this predicate.  It would be a waste of time.  Unfortunately, in this case, the eliminated predicate also happens to be the only equijoin predicate and hash joins (and merge joins) require at least one equijoin predicate.  Thus, the query with the HASH JOIN hint fails.

Note that the loss of the hash join and merge join alternatives for the above plan is not a big deal from a performance perspective.  With or without the join predicate, the query is a cross join since all rows from T1 will join with all rows from T2.

If we throw in a third, briefly lived feature of SQL Server 2008, the situation gets even more complex.  SQL Server 2008 RTM has an optimization that substitutes constants for parameter and variable values if we use the RECOMPILE hint.  Note that this optimization was removed from SQL Server 2008 Cummulative Update 4 and Service Pack 1 to fix an issue.  If your instance of SQL Server has the fix, you will not be able to reproduce the following scenario.

Let's see a simple example of this optimization in action.  Compare the plans for the following two identical queries:

DECLARE @P INT
SET @P = 0

SELECT * FROM T1 WHERE T1.A = @P

  |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=[@P]))

SELECT * FROM T1 WHERE T1.A = @P
OPTION (RECOMPILE)

  |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

Observe how the WHERE clause in the first plan references the @P variable while the WHERE clause in the second plan which includes the RECOMPILE hint references the constant 0.  This optimization is safe since the plan generated for the query with the RECOMPILE hint will be used only once and then discarded rather than cached.

The preceding plans are "actual" rather than "estimated" query plans.  That is, I collected them using SET STATISTICS PROFILE ON rather than SET SHOWPLAN_TEXT ON.  The parameter substitution optimization can only be applied when the query is recompiled immediately prior to execution and the actual parameter values are known.

Now let's create a simple stored procedure:

CREATE PROCEDURE MY_SP (@P1 INT, @P2 INT)
AS
SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A BETWEEN @P1 AND @P2
OPTION (HASH JOIN, RECOMPILE)

If we run this stored procedure with two different parameters, it works just fine:

EXEC MY_SP 0, 1

Here is the query plan:

  |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]), RESIDUAL:([T2].[A]=[T1].[A]))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]>=(0) AND [T1].[A]<=(1)))
       |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]>=(0) AND [T2].[A]<=(1)))

However, if we run the stored procedure with the same parameter, it fails with the same error that we saw at the beginning of this post.  For example, the following fails:

EXEC MY_SP 0, 0

Msg 8622, Level 16, State 1, Line 1

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Why does the stored procedure work with some parameters but not with others?  If we alter the stored to remove the HASH JOIN hint, we can see what has happened:

ALTER PROCEDURE MY_SP (@P1 INT, @P2 INT)
AS
SELECT *
FROM T1 INNER JOIN T2 ON T1.A = T2.A
WHERE T1.A BETWEEN @P1 AND @P2
OPTION (RECOMPILE)

EXEC MY_SP 0, 0

  |--Nested Loops(Inner Join)
       |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]=(0)))
       |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

SQL Server has transformed the predicate "T1.A BETWEEN @P1 AND @P2" first into "T1.A BETWEEN 0 AND 0" and then into "T1.A = 0".  Now we have exactly the same scenario that we had in the first failed query above.

Query hints can be powerful tools, but they can also backfire - often in mysterious ways.  In these examples, two and even three separate features all interact to induce the error.  Even a small change to a query or to a stored procedure's parameters can be the difference between success and failure.
  • Big thanks for your essential posts! It's very useful information for SQL developers. I set label "IMPORTANT" for each your article in my Outlook :))

  • I agree that understanding this is important.

    You say that "Even a small change to a query or to a stored procedure's parameters can be the difference between success and failure."  

    To me, this would really indicate that the compiler is actually rather unstable.  Let's say that the word "robust" does not come to mind.

    Nevertheless, thank you for your observations.

  • Nice paper.

    However, how do I get this to work?

    Thanks

    select *

    from table1

    inner join table2 on (

    (

    LTRIM(rtrim(table2.firstname)) like LTRIM(rtrim(table1.firstname))

    or LTRIM(rtrim(table2.middlename)) like LTRIM(rtrim(table1.firstname))

    or LTRIM(rtrim(table2.lastname)) like LTRIM(rtrim(table1.firstname))

    or LTRIM(rtrim(table2.wholename)) like ('%'+LTRIM(rtrim(table1.firstname))+'%')

    )

    and

    (

    LTRIM(rtrim(table2.firstname)) like LTRIM(rtrim(table1.lastname))

    or LTRIM(rtrim(table2.middlename)) like LTRIM(rtrim(table1.lastname))

    or LTRIM(rtrim(table2.lastname)) like LTRIM(rtrim(table1.lastname))

    or LTRIM(rtrim(table2.wholename)) like ('%'+LTRIM(rtrim(table1.lastname))+'%')

    )

    )

    option (hash join,recompile)

  • Hash and merge join both require equijoin predicates.  (See the table in this post blogs.msdn.com/.../702828.aspx.)  Since the above query does note have an equijoin predicate, you must remove the hash join hint and allow the optimizer to choose a nested loops join.  Unfortunately, if the two tables are large, this query may take some time as it will effectively compare all rows in one table to all rows in the other table.

  • @ W Moore.  Perhaps I chose my words poorly.  In the last sentence of this post, I was referring to how hints can prevent the optimizer from finding a plan if the hints request a plan that is not valid for the given query.  Small differences in a query (e.g., changing "T1.a = T2.b" to "T1.a > T2.b") can change the set of valid plans and cause a hint that worked into a hint that causes an error.  In the absence of hints, SQL Server will find and execute a plan for any valid query.  This is why hints should be used cautiously.

  • This issue is fixed in SQL Server 2012 RC0.

    I tested this scenario and correct action is now chosen by Query Optimizer :  Invalid hint is ignored and query uses Nested Loops.

    Thanks again for highlighting this peculiar case.

  • Hi craige,

    can you help me ?

    there are two TSQL sentences :sentence 1 and sentence 2

    The sentence 1 can be excuted successfully ,but there will be throw an exception when I excuted the sentence 2,why?

    --sentence 1

    use [AdventureWorks]

    go

    DECLARE @c VARCHAR(10)

    SET @c = ''

    SELECT CustomerID

    FROM dbo.Customers AS C

    WHERE CustomerID IN (SELECT  @c + CustomerID FROM dbo.Orders)

    OPTION (HASH JOIN, RECOMPILE)

    --sentence 2

    use [AdventureWorks]

    go

    DECLARE @c VARCHAR(10)

    SET @c = ''

    SELECT CustomerID

    FROM dbo.Customers AS C

    WHERE CustomerID IN (SELECT  @c + CustomerID FROM dbo.Orders)

    OPTION (HASH JOIN)

  • Hi,

    The first query succeeds because as noted in the above post, with the recompile hint, the optimizer substitutes the empty string constant for the variable @c.  You can see the substitution by checking the compute scalar in the plan:

     |--Hash Match(Left Semi Join, HASH:([C].[CustomerID])=([Expr1006]), RESIDUAL:([Northwind].[dbo].[Customers].[CustomerID] as [C].[CustomerID]=[Expr1006]))

          |--Index Scan(OBJECT:([Northwind].[dbo].[Customers].[Region] AS [C]))

          |--Compute Scalar(DEFINE:([Expr1006]=N''+[Northwind].[dbo].[Orders].[CustomerID]))

               |--Index Scan(OBJECT:([Northwind].[dbo].[Orders].[CustomersOrders]))

    Without the recomile hint, the optimizer cannot do the substitution.  I believe it then incorrectly concludes that @c is not a runtime constant and further concludes that it cannot safely use the hash join since it would need to reevaluate the orders subquery for each row from the customers table.  I'm afraid that I'm not aware of any workaround other than to remove hash join hint.

    Craig

  • Further to Luke's comment it does now work on 2012 RTM but doesn't just ignore the hint. The plans for the two queries are

     |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]), RESIDUAL:([T2].[A]=[T1].[A]))

          |--Table Scan(OBJECT:([T1]), WHERE:([T1].[B]=(0)))

          |--Table Scan(OBJECT:([T2]))

     |--Hash Match(Inner Join, HASH:([T1].[A])=([T2].[A]), RESIDUAL:([T2].[A] = [T1].[A]))

          |--Table Scan(OBJECT:([T1]), WHERE:([T1].[A]=(0)))

          |--Table Scan(OBJECT:([T2]), WHERE:([T2].[A]=(0)))

Page 1 of 1 (9 items)
Leave a Comment
  • Please add 7 and 3 and type the answer here:
  • Post