Craig Freedman's SQL Server Blog

A discussion of query processing, query execution, and query plans in SQL Server.

Recursive CTEs continued ...

Recursive CTEs continued ...

  • Comments 8
In this post, I will finish the discussion of recursive CTEs that I began in my last post.  I will continue to use the CTE examples from Books Online.  To run these examples, you'll need to install the Adventure Works Cycles OLTP sample database.

In my last post, I explained that all recursive queries follow the same pattern of one or more anchor sub-selects and one or more recursive sub-selects combined by a UNION ALL.  Similarly, all recursive query plans also follow the same pattern which looks like so:

  |--Index Spool(WITH STACK)
       |--Concatenation
            |--Compute Scalar(DEFINE:([Expr10XX]=(0)))
            |    |--... anchor sub-select plan(s) ...
            |--Assert(WHERE:(CASE WHEN [Expr10ZZ]>(100) THEN (0) ELSE NULL END))
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr10YY], [Recr10XX], ...))
                      |--Compute Scalar(DEFINE:([Expr10ZZ]=[Expr10YY]+(1)))
                      |    |--Table Spool(WITH STACK)
                      |--... recursive sub-select plan(s) ...

Because of this basic plan shape, SQL Server cannot execute recursive queries using parallel plans.  There are two basic problems.  First, the stack spool does not support parallel execution.  Second, the nested loops join does not support parallel execution on its inner input.

Finally, let's look at how the placement of a WHERE clause can have a big impact on the performance of a recursive query.  In my last post, I dissected a recursive query that returns a list of all employees and their level within the organization.  Suppose that we want to run the same query but limit the results to those employees in the first two levels.  We could write the following query (which you'll also find in Books Online) with an extra WHERE clause to limit the results:

WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports
WHERE EmployeeLevel <= 2

This query computes uses essentially the same plan as the query without the extra WHERE clause.  The WHERE clause simply introduces a filter operator at the root of the plan:

Rows   Executes
34     1        |--Filter(WHERE:([Recr1012]<=(2)))
290    1             |--Index Spool(WITH STACK)
290    1                  |--Concatenation
0      0                       |--Compute Scalar(DEFINE:([Expr1013]=(0)))
0      0                       |    |--Compute Scalar(DEFINE:([Expr1003]=(0)))
1      1                       |         |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=NULL) ...)
289    1                       |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
289    1                            |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
0      0                                 |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
290    1                                 |    |--Table Spool(WITH STACK)
0      0                                 |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
289    290                                    |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=[Recr1007]) ...)

As you can see from the row counts, this query computes the level of every employee in the organization before discarding the results that we do not want.  Now, suppose that we instead move the WHERE clause into the recursive sub-select:

WITH DirectReports(ManagerID, EmployeeID, EmployeeLevel) AS
(
    SELECT ManagerID, EmployeeID, 0 AS EmployeeLevel
    FROM HumanResources.Employee
    WHERE ManagerID IS NULL
    UNION ALL
    SELECT e.ManagerID, e.EmployeeID, EmployeeLevel + 1
    FROM HumanResources.Employee e
        INNER JOIN DirectReports d
        ON e.ManagerID = d.EmployeeID 
    WHERE EmployeeLevel < 2
)
SELECT ManagerID, EmployeeID, EmployeeLevel
FROM DirectReports

This query is semantically identical to the previous query, but if we look at the plan, we can see that the filter moves into the recursive portion of the plan:

Rows   Executes
34     1        |--Index Spool(WITH STACK)
34     1             |--Concatenation
0      0                  |--Compute Scalar(DEFINE:([Expr1013]=(0)))
0      0                  |    |--Compute Scalar(DEFINE:([Expr1003]=(0)))
1      1                  |         |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=NULL) ...)
33     1                  |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
33     1                       |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
0      0                            |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
34     1                            |    |--Table Spool(WITH STACK)
0      0                            |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
33     34                                |--Nested Loops(Inner Join)
7      34                                     |--Filter(WHERE:(STARTUP EXPR([Recr1008]<(2))))
7      7                                      |    |--Constant Scan
33     7                                      |--Index Seek(OBJECT:([Employee].[IX_Employee_ManagerID]), SEEK:([ManagerID]=[Recr1007]) ...)

Notice from the row counts that the new plan computes only the portion of the result set that really interests us and then terminates the recursion.

You may also observe than the plan now has a startup filter.  An ordinary filter reads rows from its input tree and then evaluates a Boolean expression on each row to determine whether to return it.  A startup filter evaluates its expression only once per execution.  The expression does not depend on the input rows.  If the expression evaluates to true, the startup filter returns all of its input rows.  If the expression evaluates to false, the startup filter returns no rows.

In this example, if the startup filter expression evaluates to true, the constant scan returns one row to the nested loops join immediately above the filter and this join then executes the index seek on its inner input.  However, if the startup filter expression evaluates to false, the filter and then the nested loops join return no rows.

While there are many scenarios where a startup filter is genuinely useful, this plan is slightly more complex than is really necessary.  The plan could have used an ordinary filter:

  |--Index Spool(WITH STACK)
       |--Concatenation
            |--Compute Scalar(DEFINE:([Expr1013]=(0)))
            |    |--Compute Scalar(DEFINE:([Expr1003]=(0)))
            |         |--Index Seek(OBJECT:([HumanResources].[Employee].[IX_Employee_ManagerID]), SEEK:([HumanResources].[Employee].[ManagerID]=NULL) ORDERED FORWARD)
            |--Assert(WHERE:(CASE WHEN [Expr1015]>(100) THEN (0) ELSE NULL END))
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1015], [Recr1006], [Recr1007], [Recr1008]))
                      |--Filter(WHERE:([Recr1008]<(2)))
                      |    |--Compute Scalar(DEFINE:([Expr1015]=[Expr1014]+(1)))
                      |         |--Table Spool(WITH STACK)
                      |--Compute Scalar(DEFINE:([Expr1009]=[Recr1008]+(1)))
                           |--Index Seek(OBJECT:([HumanResources].[Employee].[IX_Employee_ManagerID] AS [e]), SEEK:([e].[ManagerID]=[Recr1007]) ORDERED FORWARD)

Note that this plan is artificial.  You cannot get SQL Server to generate it.
  • You've discussed how adding the filter inside the recursive portion of the plan, but what about getting a filter on the CTE to apply to the recursive temporary results rather than the completed results?  Is there a method to make a CTE expand essentially to evaluate the criteria on the CTE as part of the recursive operation (thereby limiting the recursion)?

  • SQL Server will push filters down to the anchor sub-select in certain very limited scenarios.  For example:

    with cte(a, b) as

    (

    select a, b from t

    union all

    select cte.a, t.b from t join cte on t.a = cte.a

    )

    select * from cte where a = 0

    The filter "a = 0" will push down.  Unfortunately, if you change the recursive sub-select to return t.a instead of cte.a, the filter will not pushdown even though the columns are equivalent.

  • Can I use this "WITH statement" inside a subquery?

  • Hi Deb,

    The WITH statement can only be used at the start of a statement and cannot be embedded within a subquery.  However, you can use the WITH statement to declare multiple comma separated CTEs and you can reference a CTE within a subquery.  See the following pages for more information:

    msdn.microsoft.com/.../ms190766(v=sql.105).aspx

    msdn.microsoft.com/.../ms175972(v=sql.110).aspx

    HTH

    Craig

  • Hi,

    I need also the posibility to sort the result on each level. In Oracle the statement is called "order  siblings by".

    I have a tree similar to the directory tree of the Microsoft Explorer and I have to sort each directory on each level

    Uwe

  • Hi Uwe,

    I would recommend looking at Inside Microsoft SQL Server 2008: T-SQL Querying, Chapter 12 (Graphs, Trees, Hierarchies, and Recursive Queries) [www.amazon.com/.../0735626030] for some ideas as to how to solve this problem.  An earlier version of this chapter is also available in the SQL Server 2005 edition of the book as Chapter 9 [www.amazon.com/.../0735623139].

    HTH

    Craig

  • i'm trying to translate this oracle query that, given an employee (i.e JONES, using the oracle scott.emp table) will return his ancestors (KING) and his descendants (FORD, SCOTT and their descendants) and provide the level for each

    WITH union_results AS

    (

    SELECT LEVEL AS lvl

    , ename

    , 'Bottom-Up' AS direction

    FROM scott.emp

    START WITH ename = 'JONES'

    CONNECT BY NOCYCLE empno = PRIOR mgr

       UNION

    SELECT LEVEL AS lvl

    , ename

    , 'Top-Down' AS direction

    FROM scott.emp

    START WITH ename =  'JONES'

    CONNECT BY NOCYCLE mgr = PRIOR empno

    )

    SELECT DISTINCT

     (SELECT MAX (lvl) FROM union_results WHERE direction = 'Bottom-Up')

     + ( CASE

             WHEN  direction = 'Bottom-Up'

           THEN  1 - lvl

           ELSE lvl - 1

         END

       ) AS rev_level

    ,  ename

    FROM  union_results

    ORDER BY  rev_level;

    REV_LEVEL ENAME

    ----------           ----------

    1                     KING

    2                    JONES

    3                    FORD

    3                    SCOTT

    4                    ADAMS

    4                    SMITH

  • Hi Ion,

    I'm not familiar with the Oracle syntax, but perhaps the following would work?

    WITH

    reports (empno, ename, mgr, lvl) AS

    (

    SELECT empno, ename, mgr, 0 as lvl

    FROM emp

    WHERE ename = 'JONES'

    UNION ALL

    SELECT e.empno, e.ename, e.mgr, lvl + 1

    FROM emp e INNER JOIN reports r ON e.mgr = r.empno

    ),

    mgrs (empno, ename, mgr, lvl) AS

    (

    SELECT empno, ename, mgr, 0 as lvl

    FROM emp

    WHERE ename = 'JONES'

    UNION ALL

    SELECT e.empno, e.ename, e.mgr, lvl - 1

    FROM emp e INNER JOIN mgrs m ON e.empno = m.mgr

    )

    SELECT lvl - (SELECT MIN(lvl) FROM mgrs) + 1 AS rev_level, ename

    FROM (SELECT lvl, ename FROM reports UNION SELECT lvl, ename FROM mgrs) u

    ORDER BY lvl

    go

    HTH

    Craig

Page 1 of 1 (8 items)
Leave a Comment
  • Please add 8 and 4 and type the answer here:
  • Post