Craig Freedman's SQL Server Blog

A discussion of query processing, query execution, and query plans in SQL Server.

The UNPIVOT Operator

The UNPIVOT Operator

Rate This
  • Comments 3

The UNPIVOT operator is the opposite of the PIVOT operator.  As I explained in my earlier post, the PIVOT operator takes a normalized table and transforms it into a new table where the columns of the new table are derived from the values in the original table.  The UNPIVOT operator takes a pivoted table and transforms it back into a normalized form with one row per data point using the column names as values in the result.  For example, suppose we have the following data:

CREATE TABLE PIVOT_Sales(EmpId INT, [2005] MONEY, [2006] MONEY, [2007] MONEY)
INSERT PIVOT_Sales VALUES(1, 12000, 18000, 25000)
INSERT PIVOT_Sales VALUES(2, 15000, 6000, NULL)
INSERT PIVOT_Sales VALUES(3, NULL, 20000, 24000)

This is the output of the PIVOT operation from my earlier posts.  There is one row for each employee with up to three years of sales data per row.  If there is no sales data for a particular employee for a particular year, we simply insert NULL.  We can transform this table back to its original form with a single row per employee per year using the following UNPIVOT statement:

SELECT EmpId, CAST (Yr AS INT) AS Yr, Sales
FROM (SELECT EmpId, [2005], [2006], [2007] FROM PIVOT_Sales) AS p
UNPIVOT (Sales FOR Yr IN ([2005], [2006], [2007])) AS s

I've explicitly casted the Yr column to integer.  The default type for the pivot column is NVARCHAR(128).  This type is based on the maximum column name length of 128 characters.

The resulting output is:

EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
2           2005        15000.00
2           2006        6000.00
3           2006        20000.00
3           2007        24000.00

Unlike PIVOT operations which may not be reversible, all UNPIVOT operations are reversible (so long as all of the input data is preserved).  That is, we can always transform the output of an UNPIVOT operation back into the original table using an appropriate PIVOT operation.  Unlike PIVOT operator, the UNPIVOT operator does not require or support aggregation functions.

Let's look at the plan for the above query:

  |--Compute Scalar(DEFINE:([Expr1010]=CONVERT(int,[Expr1009],0)))
       |--Filter(WHERE:([Expr1008] IS NOT NULL))
            |--Nested Loops(Left Outer Join, OUTER REFERENCES:([PIVOT_Sales].[2005], [PIVOT_Sales].[2006], [PIVOT_Sales].[2007]))
                 |--Compute Scalar(DEFINE:([PIVOT_Sales].[EmpId]=[PIVOT_Sales].[EmpId]))
                 |    |--Table Scan(OBJECT:([PIVOT_Sales]))
                 |--Constant Scan(VALUES:((N'2005',[PIVOT_Sales].[2005]),(N'2006',[PIVOT_Sales].[2006]),(N'2007',[PIVOT_Sales].[2007])))

This query plan simply takes each row of the input table and joins it with a constant scan that generates three rows - one for each of the three columns listed in the UNPIVOT IN clause - for each input row.  The plan then filters out any rows that have NULL data.  (Note that [Expr1008] is the Sales column and [Expr1009] is the Yr column.)  There are a couple of points worth noting about this query plan.  First, the join must be a nested loops join because the constant scan operator uses the correlated parameters from the outer side of the join to generate rows.  There is no way to generate these rows without these correlated parameters.  Second, the join need not be a left outer join.  The constant scan always produces exactly three rows and, thus, the outer rows always join and are never NULL extended.  Nevertheless, the outer join is harmless in this context and behaves like an inner join.

Note that we can write the original query as:

SELECT p.EmpId, Yr, Sales
FROM PIVOT_Sales AS p CROSS APPLY
    (
    SELECT EmpId, 2005 AS Yr, [2005] AS Sales UNION ALL
    SELECT EmpId, 2006, [2006] UNION ALL
    SELECT EmpId, 2007, [2007]
    ) AS s
WHERE Sales IS NOT NULL

This query yields a nearly identical query plan.  The UNION ALL syntax produces a similar result to the constant scan except that there are now three constant scans and a concatenation operator:

  |--Filter(WHERE:([Union1007] IS NOT NULL))
       |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[2005], [p].[2006], [p].[2007]))
            |--Table Scan(OBJECT:([tempdb].[dbo].[PIVOT_Sales] AS [p]))
            |--Concatenation
                 |--Constant Scan(VALUES:(((2005))))
                 |--Constant Scan(VALUES:(((2006))))
                 |--Constant Scan(VALUES:(((2007))))

In this plan, [Union1007] is the Sales column.  We can actually see the definition of [Union1007] from the DefinedValues column of the concatenation operator in the SET SHOWPLAN_ALL ON output:

[Union1006] = ([Expr1003], [Expr1004], [Expr1005]), [Union1007] = ([p].[2005], [p].[2006], [p].[2007])

Notice that the values for [Union1007] are actually derived directly from the correlated parameters of the cross apply (from the PIVOT_Sales table) and not from the constant scans.  [Union1006] is the Yr column and the values are derived from the constant scans.

  • Hi Craig,

    Thanks for writing this post. I have been trying to understand the constant scan for almost 1 month.

    On the other hand, I am wondering why this execution plan needs a Compute Scalar after the Table Scan?

    I ask this because I do not find any "scalar value" referenced later on within the query plan and according to BOL: "The Compute Scalar operator evaluates an expression to produce a computed scalar value. This may then be returned to the user, referenced elsewhere in the query, or both. An example of both is in a filter predicate or join predicate."

                    |--Compute Scalar(DEFINE:([PIVOT_Sales].[EmpId]=[PIVOT_Sales].[EmpId]))

                    |    |--Table Scan(OBJECT:([PIVOT_Sales]))

                    |--Constant Scan(VALUES:((N'2005',[PIVOT_Sales].[2005]),(N'2006',[PIVOT_Sales].[2006]),(N'2007',[PIVOT_Sales].[2007])))

  • This compute scalar is not really needed in this example.  It has no impact on function or performance since it just redefines a column as itself.  Note that the query does output EmpId.  If we modify the query to return a function of EmpId, you can see that the compute scalar actually computes the function:

    SELECT EId, CAST (Yr AS INT) AS Yr, Sales

    FROM (SELECT EmpId*10 AS EId, [2005], [2006], [2007] FROM PIVOT_Sales) AS p

    UNPIVOT (Sales FOR Yr IN ([2005], [2006], [2007])) AS s

  • Thank you!

    I have been searching for a straight-forward explanation of UNPIVOT forever!  So far as I can tell, yours is the only one that exists online...

Page 1 of 1 (3 items)
Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post