In my next few posts, I'm going to look at how SQL Server implements the PIVOT and UNPIVOT operators. Let's begin with the PIVOT operator. The PIVOT operator takes a normalized table and transforms it into a new table where the columns of the new table are derived from the values in the original table. For example, suppose we want to store annual sales data by employee. We might create a schema such as the following:

CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)

INSERT Sales VALUES(1, 2005, 12000)

INSERT Sales VALUES(1, 2006, 18000)

INSERT Sales VALUES(1, 2007, 25000)

INSERT Sales VALUES(2, 2005, 15000)

INSERT Sales VALUES(2, 2006, 6000)

INSERT Sales VALUES(3, 2006, 20000)

INSERT Sales VALUES(3, 2007, 24000)

Notice that this schema has one row per employee **per year**. Moreover, notice that in the sample data employees 2 and 3 only have sales data for two of the three years worth of data. Now suppose that we'd like to transform this data into a table that has one row per employee **with all three years of sales data in each row**. We can achieve this conversion very easily using PIVOT:

SELECT EmpId, [2005], [2006], [2007]

FROM (SELECT EmpId, Yr, Sales FROM Sales) AS s

PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p

I'm not going to delve into the PIVOT syntax which is already documented in Books Online. Suffice it to say that this statement sums up the sales for each employee for each of the specified years and outputs one row per employee. The resulting output is:

EmpId 2005 2006 2007 ----------- --------------------- --------------------- --------------------- 1 12000.00 18000.00 25000.00 2 15000.00 6000.00 NULL 3 NULL 20000.00 24000.00

Notice that SQL Server inserts NULLs for the missing sales data for employees 2 and 3.

The SUM keyword (or some other aggregate) is required. If the *Sales* table includes multiple rows for a particular employee for a particular year, PIVOT does aggregate them - in this case by summing them - into a single data point in the result. Of course, in this example, since the entry in each "cell" of the output table is the result of summing a single input row, we could just as easily have used another aggregate such as MIN or MAX. I've used SUM since it is more intuitive.

This PIVOT example is reversible. The information in the output table can be used to reconstruct the original input table using an UNPIVOT operation (which I will cover in a later post). However, not all PIVOT operations are reversible. To be reversible, a PIVOT operation must meet the following criteria:

- All of the input data must be transformed. If we include a filter of any kind including on the IN clause, some data may be omitted from the PIVOT result. For example, if we altered the above example only to output sales for 2006 and 2007, clearly we could not reconstruct the 2005 sales data from the result.
- Each cell in the output table must derive from a single input row. If multiple input rows are aggregated into a single cell, there is no way to reconstruct the original input rows.
- The aggregate function must be an identity function (when used on a single input row). SUM, MIN, MAX, and AVG all return the single input value unchanged and, thus, can be reversed. COUNT does not return its input value unchanged and, thus, cannot be reversed.

Here is an example of a non-reversible PIVOT operation. This example, calculates the total sales for all employees for all three years. It does not itemize the output by employee.

SELECT [2005], [2006], [2007]

FROM (SELECT Yr, Sales FROM Sales) AS s

PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p

Here is the output. Each cell represents the sum of two or three rows from the input table.

2005 2006 2007 --------------------- --------------------- --------------------- 27000.00 44000.00 49000.00

In my next post, I'll look at some example PIVOT query plans.