<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Craig Freedman's SQL Server Blog : Aggregation</title><link>http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx</link><description>Tags: Aggregation</description><dc:language>en</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Query Processing Presentation</title><link>http://blogs.msdn.com/craigfr/archive/2008/05/15/query-processing-presentation.aspx</link><pubDate>Thu, 15 May 2008 19:39:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8508493</guid><dc:creator>craigfr</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/8508493.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=8508493</wfw:commentRss><description>&lt;P mce_keep="true"&gt;Last week, I had the opportunity to &lt;A class="" title="[New England] NESQL Special Meeting, Featuring Craig Freedman" href="http://sqlblog.com/blogs/adam_machanic/archive/2008/05/02/new-england-nesql-special-meeting-featuring-craig-freedman.aspx" mce_href="http://sqlblog.com/blogs/adam_machanic/archive/2008/05/02/new-england-nesql-special-meeting-featuring-craig-freedman.aspx"&gt;talk&lt;/A&gt; to the &lt;A title="New England SQL Server Users Group" href="http://www.nesql.org/"&gt;New England SQL Server Users Group&lt;/A&gt;.&amp;nbsp; I would like to thank the group for inviting me, &lt;A title="Adam Machanic" href="http://sqlblog.com/blogs/adam_machanic/"&gt;Adam Machanic&lt;/A&gt; for organizing the event, and &lt;A title="Red Gate" href="http://www.red-gate.com/"&gt;Red Gate&lt;/A&gt; for sponsoring it.&amp;nbsp; My talk was an introduction to query processing, query execution, and query plans in SQL Server.&amp;nbsp; I've had a request for the slides, so here they are.&amp;nbsp; Enjoy!&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8508493" width="1" height="1"&gt;</description><enclosure url="http://blogs.msdn.com/craigfr/attachment/8508493.ashx" length="759608" type="application/pdf" /><category domain="http://blogs.msdn.com/craigfr/archive/tags/Scans+and+Seeks/default.aspx">Scans and Seeks</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Joins/default.aspx">Joins</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Updates/default.aspx">Updates</category></item><item><title>Ranking Functions: RANK, DENSE_RANK, and NTILE</title><link>http://blogs.msdn.com/craigfr/archive/2008/03/31/ranking-functions-rank-dense-rank-and-ntile.aspx</link><pubDate>Tue, 01 Apr 2008 00:04:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8346526</guid><dc:creator>craigfr</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/8346526.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=8346526</wfw:commentRss><description>&lt;P&gt;In &lt;A title="Ranking Functions: ROW_NUMBER" href="http://blogs.msdn.com/craigfr/archive/2008/03/19/ranking-functions-row-number.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2008/03/19/ranking-functions-row-number.aspx"&gt;my previous post&lt;/A&gt;, I discussed the ROW_NUMBER ranking function which was introduced in SQL Server 2005.&amp;nbsp; In this post, I'll take a look at the other ranking functions - RANK, DENSE_RANK, and NTILE.&amp;nbsp; Let's begin with RANK and DENSE_RANK.&amp;nbsp; These functions are similar - both in functionality and implementation - to ROW_NUMBER.&amp;nbsp; The difference is that while the ROW_NUMBER function assigns a unique (and ascending) value to each row without regard for ties in the ORDER BY values, the RANK and DENSE_RANK functions assign the same value to rows that have the same ORDER BY value.&amp;nbsp; The difference between the RANK and DENSE_RANK functions is in how values are assigned to rows following a tie.&amp;nbsp; The easiest way to illustrate the difference between all of these functions is with a simple example:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE T (PK INT IDENTITY, A INT, B INT, C INT)&lt;BR&gt;CREATE UNIQUE CLUSTERED INDEX TPK ON T(PK)&lt;/P&gt;
&lt;P mce_keep="true"&gt;INSERT T VALUES (0, 1, 6)&lt;BR&gt;INSERT T VALUES (0, 1, 4)&lt;BR&gt;INSERT T VALUES (0, 3, 2)&lt;BR&gt;INSERT T VALUES (0, 3, 0)&lt;BR&gt;INSERT T VALUES (1, 0, 7)&lt;BR&gt;INSERT T VALUES (1, 0, 5)&lt;BR&gt;INSERT T VALUES (0, 2, 3)&lt;BR&gt;INSERT T VALUES (0, 2, 1)&lt;/P&gt;
&lt;P&gt;SELECT *,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ROW_NUMBER() OVER (ORDER BY B) AS RowNumber,&lt;BR&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;RANK() OVER (ORDER BY B) AS Rank,&lt;BR&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;DENSE_RANK() OVER (ORDER BY B) AS DenseRank&lt;BR&gt;FROM T&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;PK    A     B     C     RowNumber  Rank       DenseRank
----- ----- ----- ----- ---------- ---------- ----------
5     1     0     7     1          1          1
6     1     0     5     2          1          1
1     0     1     6     3          3          2
2     0     1     4     4          3          2
7     0     2     3     5          5          3
8     0     2     1     6          5          3
3     0     3     2     7          7          4
4     0     3     0     8          7          4&lt;/PRE&gt;
&lt;P&gt;Notice how the ROW_NUMBER function ignores the duplicate values for column B and assigns the unique integers from 1 to 8 to the 8 rows while the RANK and DENSE_RANK functions assigns the same value to each of the pairs of duplicate rows.&amp;nbsp; Moreover, notice how the RANK function counts the duplicate rows even while it assigns the same value to each duplicate row whereas the DENSE_RANK function does not count the duplicate rows.&amp;nbsp; For example, both the RANK and DENSE_RANK functions assign a rank of 1 to the first two rows, but the RANK function assigns a rank of 3 to the third row - as it is the third row - while the DENSE_RANK function assigns a rank of 2 to the third row - as it contains the second distinct value for column B.&amp;nbsp; Note that the maximum value returned by the DENSE_RANK function is exactly equal to the number of distinct values in column B.&lt;/P&gt;
&lt;P&gt;The query plan for computing all three of these functions is essentially the same and, in fact, as long as all three of the functions share the same PARTITION BY and ORDER BY clauses, a single sequence project operator can compute all three functions simultaneously:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Sequence Project(DEFINE:([Expr1003]=row_number, [Expr1004]=rank, [Expr1005]=dense_rank))&lt;BR&gt;&lt;B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[B])]&lt;BR&gt;&lt;/B&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:()]&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[B] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TPK]))&lt;/P&gt;
&lt;P&gt;The only real difference between this plan and a plan that computes only the ROW_NUMBER function is the presence of the extra segment operator that groups by column B.&amp;nbsp; This segment operator detects ties and notifies the sequence project operator when to output the same value and when to output a new value for the RANK and DENSE_RANK functions.&lt;/P&gt;
&lt;P&gt;All of the issues from my previous post on ROW_NUMBER (such as the use of an index to avoid the sort or the impact of mixing ranking functions with different PARTITION BY and/or ORDER BY clauses) apply equally well to the RANK and DENSE_RANK functions so I will not repeat them here.&lt;/P&gt;
&lt;P&gt;Now let's take a look at the NTILE function.&amp;nbsp; This function breaks an input set down into N equal sized groups.&amp;nbsp; To determine how many rows belong in each group, SQL Server must first determine the total number of rows in the input set.&amp;nbsp; If the NTILE function includes a PARTITION BY clause, SQL Server must compute the number of rows in each partition separately.&amp;nbsp; Once we know the number of rows in each partition, we can write the NTILE function as&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;NTILE(N) := (N * (ROW_NUMBER() - 1) / COUNT(*)) + 1&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;where COUNT(*) is the number of rows in each partition.&amp;nbsp; We need the -1 and +1 in this equation as the ROW_NUMBER and NTILE functions are 1 not 0 based.&lt;/P&gt;
&lt;P&gt;We already know how to compute the ROW_NUMBER function, so the real question is how to compute the COUNT(*) function.&amp;nbsp; It turns out that SQL Server supports use of the OVER clause that we've already seen with the ranking functions with nearly all &lt;A href="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx"&gt;aggregate functions&lt;/A&gt; as well.&amp;nbsp; This type of aggregate function is sometimes referred to as a "window aggregate function."&amp;nbsp; With a window aggregate function, instead of returning a single row per group, SQL Server returns the same aggregate result with each row in the group.&amp;nbsp; Let's look at an example:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT *, COUNT(*) OVER (PARTITION BY A) AS Cnt FROM T&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;PK    A     B     C     Cnt
----- ----- ----- ----- ----------
1     0     1     6     6
2     0     1     4     6
3     0     3     2     6
4     0     3     0     6
7     0     2     3     6
8     0     2     1     6
5     1     0     7     2
6     1     0     5     2&lt;/PRE&gt;
&lt;P&gt;Notice that unlike a scalar aggregate query which can return only aggregate functions or an aggregate query with a GROUP BY clause which can return only columns from the GROUP BY clause and aggregate functions, this query can return any columns from the table.&amp;nbsp; Moreover, since each row in the group gets the same result, the ORDER BY clause is not supported for this scenario.&amp;nbsp; Now let's take a look at the plan for this query:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Nested Loops(Inner Join)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[A])]&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[A] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TPK]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, WHERE:((1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1005],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1005]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;|--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;/P&gt;
&lt;P&gt;While this plan is certainly more complex than a typical aggregation query plan, it's actually fairly straightforward.&amp;nbsp; The sort and segment operators merely break the rows into groups in the same way as a stream aggregate.&amp;nbsp; The table spool above the segment operator is a special type of common subexpression spool known as a segment spool.&amp;nbsp; This spool operator makes a copy of all rows in one group and then, when the segment operator indicates the start of a new group, it returns a single row to the topmost &lt;A title="Nested Loops Join" href="http://blogs.msdn.com/craigfr/archive/2006/07/26/679319.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/07/26/679319.aspx"&gt;nested loops join&lt;/A&gt; which immediately executes its inner side.&amp;nbsp; At this point, the table spool replays the rows from the current group and the stream aggregate counts them.&amp;nbsp; Finally, the stream aggregate returns the count to the bottommost nested loops join which joins this count with the rows from the current group - the table spool replays these rows a second time.&lt;/P&gt;
&lt;P&gt;Note that this plan needs the segment spool because there is no way to know a priori how many rows are in the same group with the current row without scanning the entire group.&amp;nbsp; But after scanning the entire group and determining the count, we need some way to go back and apply the count to each of the original rows.&amp;nbsp; The spool makes it possible to go back.&amp;nbsp; A typical aggregate query with a GROUP BY clause does not have this issue because after computing the aggregate function (be it a COUNT or any other function), it simply outputs the result without go back to the original rows.&lt;/P&gt;
&lt;P&gt;For an example of a different query that yields a very similar query plan involving a segment spool, check out my discussion of subqueries on pages 171 through 173 of &lt;A title="Inside SQL Server 2005" href="http://www.insidesqlserver.com/index.html" mce_href="http://www.insidesqlserver.com/index.html"&gt;Inside Microsoft SQL Server 2005&lt;/A&gt;: &lt;A title="Inside Microsoft® SQL Server&lt;sup&gt;TM&lt;/sup&gt; 2005: Query Tuning and Optimization" href="http://www.microsoft.com/MSPress/books/8565.aspx" mce_href="http://www.microsoft.com/MSPress/books/8565.aspx"&gt;Query Tuning and Optimization&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;It's now a simple matter to see how to compute the NTILE function.&amp;nbsp; For example, the following query groups the rows by column A and then for each group computes which rows are below the median (denoted by a 1) and which rows are above the median (denoted by a 2):&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT *, NTILE(2) OVER (PARTITION BY A ORDER BY B) AS NTile FROM T&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Here are results for this query:&lt;/P&gt;&lt;PRE&gt;PK    A     B     C     NTile
----- ----- ----- ----- ------
1     0     1     6     1
2     0     1     4     1
7     0     2     3     1
8     0     2     1     2
3     0     3     2     2
4     0     3     0     2
5     1     0     7     1
6     1     0     5     2&lt;/PRE&gt;
&lt;P&gt;And here is the plan for this query:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; |--Sequence Project(DEFINE:([Expr1003]=ntile))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1007]=(1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[A])]&lt;BR&gt;&lt;/STRONG&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[A])]&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[A] ASC, [T].[B] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TPK]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, WHERE:((1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;/P&gt;
&lt;P&gt;This plan is basically the same as the plan for the previous COUNT(*) query.&amp;nbsp; The only difference is the addition of the extra segment and sequence project operators which compute the NTILE function by computing ROW_NUMBER and applying the formula I provided above.&amp;nbsp; Using this formula, we could also compute NTILE manually as follows:&lt;/P&gt;
&lt;P&gt;SELECT *, (2*(RowNumber-1)/Cnt)+1 AS MyNTile&lt;BR&gt;FROM&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;(&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT *,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;ROW_NUMBER() OVER (PARTITION BY A ORDER BY B) AS RowNumber,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;COUNT(*) OVER (PARTITION BY A ) AS Cnt,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;NTILE(2) OVER (PARTITION BY A ORDER BY B) AS NTile&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM T&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) TSeq&lt;/P&gt;
&lt;P&gt;Unfortunately, the optimizer does not recognize that it can use the same plan fragment to compute both the ranking functions and the COUNT(*) window aggregate function and ends up generating the following inefficient plan which computes the COUNT(*) twice:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1006]=((2)*([Expr1003]-(1)))/CONVERT_IMPLICIT(bigint,[Expr1004],0)+(1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[A])]&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sequence Project(DEFINE:([Expr1003]=row_number, [Expr1005]=ntile))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1010]=(1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[A])]&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;|&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Segment [GROUP BY:([T].[A])]&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[A] ASC, [T].[B] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TPK]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;|--Nested Loops(Inner Join, WHERE:((1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1007]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;|--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Nested Loops(Inner Join, WHERE:((1)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1012],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1012]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;/P&gt;I'll leave parsing this query plan as an exercise.&amp;nbsp; Enjoy!&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=8346526" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Ranking+Functions/default.aspx">Ranking Functions</category></item><item><title>Partial Aggregation</title><link>http://blogs.msdn.com/craigfr/archive/2008/01/18/partial-aggregation.aspx</link><pubDate>Fri, 18 Jan 2008 22:12:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7152155</guid><dc:creator>craigfr</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/7152155.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=7152155</wfw:commentRss><description>&lt;P&gt;In some of my past posts, I've discussed how SQL Server implements &lt;A href="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx"&gt;aggregation&lt;/A&gt; including the &lt;A title="Stream Aggregate" href="http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx"&gt;stream aggregate&lt;/A&gt; and &lt;A title="Hash Aggregate" href="http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx"&gt;hash aggregate&lt;/A&gt; operators.&amp;nbsp; I also used hash aggregation as an initial example in &lt;A title="Introduction to Parallel Query Execution" href="http://blogs.msdn.com/craigfr/archive/2006/10/11/introduction-to-parallel-query-execution.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/10/11/introduction-to-parallel-query-execution.aspx"&gt;my introductory post on parallel query execution&lt;/A&gt;.&amp;nbsp;&amp;nbsp; In this post, I'll look at a partial aggregation.&amp;nbsp; Partial aggregation is a technique that SQL Server uses to optimize parallel aggregation.&amp;nbsp; Before I begin, I just want to note that I also discuss partial aggregation in &lt;A title="Inside SQL Server 2005" href="http://www.insidesqlserver.com/index.html" mce_href="http://www.insidesqlserver.com/index.html"&gt;Inside Microsoft SQL Server 2005&lt;/A&gt;: &lt;A title="Inside Microsoft® SQL Server&lt;sup&gt;TM&lt;/sup&gt; 2005: Query Tuning and Optimization" href="http://www.microsoft.com/MSPress/books/8565.aspx" mce_href="http://www.microsoft.com/MSPress/books/8565.aspx"&gt;Query Tuning and Optimization&lt;/A&gt;.&amp;nbsp; (See the bottom of page 187.)&lt;/P&gt;
&lt;P&gt;Let's begin with a simple scalar aggregation example.&amp;nbsp; Recall that a scalar aggregate is an aggregate without a GROUP BY clause.&amp;nbsp; A scalar aggregate always produces a single output row.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE T (A INT, B INT IDENTITY, C INT, D INT)&lt;BR&gt;CREATE CLUSTERED INDEX TA ON T(A)&lt;/P&gt;
&lt;P&gt;SELECT COUNT(*) FROM T&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Not surprisingly, this query yields a trivial stream aggregate plan:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1005],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1005]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;|--Clustered Index Scan(OBJECT:([T].[TA]))&lt;/P&gt;
&lt;P&gt;Now, suppose that we want to parallelize this query.&amp;nbsp; Because this query outputs a single row, we cannot simply put an &lt;A title="The Parallelism Operator (aka Exchange)" href="http://blogs.msdn.com/craigfr/archive/2006/10/25/the-parallelism-operator-aka-exchange.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/10/25/the-parallelism-operator-aka-exchange.aspx"&gt;exchange (i.e., parallelism operator)&lt;/A&gt; at the root of the plan and divide the work of counting among multiple threads.&amp;nbsp; Such a strategy would yield one output row per thread which is clearly not the correct result.&lt;/P&gt;
&lt;P&gt;Alternatively, we could put a gather streams exchange between the stream aggregate and clustered index scan operators.&amp;nbsp; This strategy would permit us to use a parallel scan while still counting in a single thread and outputting a single row.&amp;nbsp; However, we would end up moving every row from the scan through the exchange - a rather costly operation.&amp;nbsp; Thus, this option, while valid, would not yield nearly the performance we'd like to see.&lt;/P&gt;
&lt;P&gt;Fortunately, there is a third option. &amp;nbsp;We can use a parallel scan, divide the work of counting among multiple threads (as in the first option), use an exchange to gather the per-thread counts into a single thread, and finally sum the per-thread counts to generate the grand total.&amp;nbsp; This strategy is much more efficient as we need only move a single row per thread through the exchange.&amp;nbsp; To get the optimizer to generate this plan, we need to add lots of data to the table.&amp;nbsp; To save time, I'm going to use UPDATE STATISTICS to trick the optimizer into thinking that we've added rows to the table:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;UPDATE STATISTICS T WITH ROWCOUNT = 1000000, PAGECOUNT = 100000&lt;BR&gt;SELECT COUNT(*) FROM T OPTION (RECOMPILE)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;We need the RECOMPILE query hint to force the optimizer to generate a new plan with the new statistics. &amp;nbsp;Here is the plan we get:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[globalagg1006],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([globalagg1006]=SUM([partialagg1005])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Parallelism(Gather Streams)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([partialagg1005]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TA]))&lt;/P&gt;
&lt;P&gt;We call the bottommost aggregate operator a partial aggregate because it computes only part of the result.&amp;nbsp; We also sometimes refer to this operator as a local aggregate because it computes the portion of the result that is local to the thread where it executes.&amp;nbsp; We refer to the topmost aggregate as a global aggregate because it computes the full result.&lt;/P&gt;
&lt;P&gt;SQL Server is able to use partial aggregation for most aggregate functions including the standard built-ins: COUNT, SUM, AVG, MIN, and MAX.&amp;nbsp; While partial aggregation is necessary to parallelize scalar aggregates, it is also useful even for aggregates with a GROUP BY clause.&amp;nbsp; Whether the optimizer chooses to use partial aggregation depends on the number of unique groups and the size of these groups.&amp;nbsp; If the optimizer anticipates that a query will generate few large groups (such as in the scalar aggregation case), it will use partial aggregation.&amp;nbsp; However, if the optimizer expects that a query will generate many small groups, it may choose to use a single level aggregation.&amp;nbsp; With small groups, a partial aggregate cannot reduce the number of rows significantly and merely adds overhead to the query.&amp;nbsp; Moreover, with many groups it is easy to parallelize the aggregation by hashing on the GROUP BY keys and distributing different groups to different threads.&lt;/P&gt;
&lt;P&gt;Let's see how the optimizer makes this choice.&amp;nbsp; Column B in our example has the IDENTITY property.&amp;nbsp; Although we have no real data, this property is sufficient to trick the optimizer into concluding that this column is mostly unique.&amp;nbsp; (Without a unique index, the optimizer cannot be certain that the column is indeed unique and must assume that it is not.)&amp;nbsp; Suppose we aggregate on this column:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT COUNT(*) FROM T GROUP BY B&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Parallelism(Gather Streams)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([T].[B]) DEFINE:([Expr1007]=COUNT(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([T].[B]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TA]))&lt;/P&gt;
&lt;P&gt;Notice that this query yields a normal single level, albeit parallel, aggregate operator.&amp;nbsp; Now suppose we aggregate on column C which does not have the IDENTITY property:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT COUNT(*) FROM T GROUP BY C&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[globalagg1006],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Parallelism(Gather Streams)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([T].[C]) DEFINE:([globalagg1006]=SUM([partialagg1005])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([T].[C] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([T].[C]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Partial Aggregate, HASH:([T].[C]), RESIDUAL:([T].[C] = [T].[C]) DEFINE:([partialagg1005]=COUNT(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([T].[TA]))&lt;/P&gt;
&lt;P&gt;This time we get a partial aggregate.&amp;nbsp; Also, observe that the partial aggregate is a hash aggregate while the global aggregate is a stream aggregate.&amp;nbsp; The optimizer is free to choose either physical aggregation operator (stream or hash) for the partial and global aggregates in a plan with partial aggregation.&amp;nbsp; The decision of which operator to use is cost based.&lt;/P&gt;
&lt;P&gt;Finally, it is worth noting that, while a stream aggregate behaves identically whether it is computing a partial or global aggregate, a partial hash aggregate does differ slightly from a normal hash aggregate.&amp;nbsp; First, a partial hash aggregate requests only a fixed minimal memory grant as it presumes that it will be computing a relatively small number of groups.&amp;nbsp; Second, a partial hash aggregate never spills rows to tempdb.&amp;nbsp; If a partial hash aggregate runs out of memory, it simply stops aggregating and begins returning non-aggregated rows.&amp;nbsp; This behavior is safe since the global aggregate will always compute the correct final results regardless of what the partial aggregate does.&amp;nbsp; The partial aggregate is merely a performance optimization and the goal is to prevent it from stealing resources from the global aggregate.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=7152155" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Parallelism/default.aspx">Parallelism</category></item><item><title>GROUPING SETS in SQL Server 2008</title><link>http://blogs.msdn.com/craigfr/archive/2007/10/11/grouping-sets-in-sql-server-2008.aspx</link><pubDate>Thu, 11 Oct 2007 19:13:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:5402551</guid><dc:creator>craigfr</dc:creator><slash:comments>4</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/5402551.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=5402551</wfw:commentRss><description>In my last two posts, I gave examples of aggregation &lt;A title="Aggregation WITH ROLLUP" href="http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx"&gt;WITH ROLLUP&lt;/A&gt; and &lt;A title="Aggregation WITH CUBE" href="http://blogs.msdn.com/craigfr/archive/2007/09/27/aggregation-with-cube.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/09/27/aggregation-with-cube.aspx"&gt;CUBE&lt;/A&gt;.&amp;nbsp; SQL Server 2008 continues to support this syntax, but also introduces new more powerful ANSI SQL 2006 compliant syntax.&amp;nbsp; In this post, I'll give an overview of the changes. 
&lt;P&gt;First, let's see how we rewrite simple WITH ROLLUP and CUBE queries using the new syntax.&amp;nbsp; I'll use the same schema and queries as in my previous posts:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)&lt;BR&gt;INSERT Sales VALUES(1, 2005, 12000)&lt;BR&gt;INSERT Sales VALUES(1, 2006, 18000)&lt;BR&gt;INSERT Sales VALUES(1, 2007, 25000)&lt;BR&gt;INSERT Sales VALUES(2, 2005, 15000)&lt;BR&gt;INSERT Sales VALUES(2, 2006, 6000)&lt;BR&gt;INSERT Sales VALUES(3, 2006, 20000)&lt;BR&gt;INSERT Sales VALUES(3, 2007, 24000)&lt;/P&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY EmpId, Yr WITH ROLLUP&lt;/P&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY EmpId, Yr WITH CUBE&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;We can rewrite these two queries using the new syntax as:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY ROLLUP(EmpId, Yr)&lt;/P&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY CUBE(EmpId, Yr)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;These new queries are semantically equivalent to and use the same query plans as the original queries.&amp;nbsp; Note that the new ROLLUP and CUBE syntax is only available in compatibility level 100.&amp;nbsp; The more general GROUPING SETS syntax, which I will discuss next, is also available in earlier compatibility levels.&lt;/P&gt;
&lt;P&gt;The new GROUPING SETS syntax is considerably more powerful.&amp;nbsp; It allows us to specify precisely which aggregations we want to compute.&amp;nbsp; As the following table illustrates, our simple two dimensional schema has a total of only four possible aggregations:&lt;/P&gt;
&lt;TABLE class="" cellSpacing=0 cellPadding=0 border=1&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96 colSpan=2 rowSpan=2&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96 colSpan=4&gt;
&lt;P align=center&gt;&lt;B&gt;Yr&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;2005&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;2006&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;2007&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=133&gt;
&lt;P&gt;&lt;B&gt;ALL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" width=96 rowSpan=4&gt;
&lt;P&gt;&lt;B&gt;EmpId&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;1&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" width=288 colSpan=3 rowSpan=3&gt;
&lt;P&gt;GROUP BY (EmpId, Yr)&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" width=133 rowSpan=3&gt;
&lt;P&gt;GROUP BY (EmpId)&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;2&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;3&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;ALL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" width=288 colSpan=3&gt;
&lt;P&gt;GROUP BY (Yr)&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=133&gt;
&lt;P&gt;GROUP BY ()&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;ROLLUP and CUBE are just shorthand for two common usages of GROUPING SETS.&amp;nbsp; We can express the above ROLLUP query as:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId, Yr), (EmpId), ())&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00&lt;/PRE&gt;
&lt;P&gt;This query explicitly asks SQL Server to aggregate sales by employee and year, to aggregate by employee only, and to compute the total for all employees for all years.&amp;nbsp; The () syntax with no GROUP BY columns denotes the total.&amp;nbsp; Similarly, we can express the above CUBE query by asking SQL Server to compute all possible aggregate combinations:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId, Yr), (EmpId), (Yr), ())&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
2           2005        15000.00
NULL        2005        27000.00
1           2006        18000.00
2           2006        6000.00
3           2006        20000.00
NULL        2006        44000.00
1           2007        25000.00
3           2007        24000.00
NULL        2007        49000.00
NULL        NULL        120000.00
1           NULL        55000.00
2           NULL        21000.00
3           NULL        44000.00&lt;/PRE&gt;
&lt;P&gt;We can also use GROUPING SETS to compute other results.&amp;nbsp; For example, we can perform a partial rollup aggregating sales by employee and year and by employee only but without computing the total for all employees for all years:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId, Yr), (EmpId))&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00&lt;/PRE&gt;
&lt;P&gt;We can skip certain rollup levels.&amp;nbsp; For example, we can compute the total sales by employee and year and the total sales for all employees and all years without computing any of the intermediate results:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId, Yr), ())&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
2           2005        15000.00
2           2006        6000.00
3           2006        20000.00
3           2007        24000.00
NULL        NULL        120000.00&lt;/PRE&gt;
&lt;P&gt;We can even compute multiple unrelated aggregations along disparate dimensions.&amp;nbsp; For example, we can compute the total sales by employee and the total sales by year:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId), (Yr))&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
NULL        2005        27000.00
NULL        2006        44000.00
NULL        2007        49000.00
1           NULL        55000.00
2           NULL        21000.00
3           NULL        44000.00&lt;/PRE&gt;
&lt;P&gt;Note that we could also write GROUPING SETS (EmpId, Yr) without the extra set of parenthesis, but the extra parenthesis make the intent of the query more explicit and clearly differentiate the previous query from the following query which just performs a normal aggregation by employee and year:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId, Yr))&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
2           2005        15000.00
1           2006        18000.00
2           2006        6000.00
3           2006        20000.00
1           2007        25000.00
3           2007        24000.00&lt;/PRE&gt;
&lt;P&gt;Here are some additional points worth noting about the GROUPING SETS syntax:&lt;/P&gt;
&lt;P&gt;As with any other aggregation query, if a column appears in the SELECT list and is not part of an aggregate function, it must appear somewhere in the GROUP BY clause.&amp;nbsp; Thus, the following is not valid:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId), ())&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Msg 8120, Level 16, State 1, Line 1&lt;BR&gt;Column 'Sales.Yr' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The order of the columns within each GROUPING SET and the order of the GROUPING SETS does not matter.&amp;nbsp; So both of the following queries compute the same CUBE although the order that the rows are output differs:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS&lt;B&gt;((EmpId, Yr), (EmpId), (Yr), ())&lt;/B&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS&lt;B&gt;((), (Yr), (EmpId), (Yr, EmpId))&lt;/B&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If the order that the rows are output matters, use an explicit ORDER BY clause to enforce that order.&lt;/P&gt;
&lt;P&gt;We can nest CUBE and ROLLUP within a GROUPING SETS clause as shorthand for expressing more complex GROUPING SETS.&amp;nbsp; This shorthand is most useful when we have more than three dimensions in our schema.&amp;nbsp; For example, suppose we add a month column to our sales table:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE Sales (EmpId INT, Month INT, Yr INT, Sales MONEY)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Now, suppose we want to compute sales for each employee by month and year, by year, and total.&amp;nbsp; We could write out all of the GROUPING SETS explicitly:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Month, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS((EmpId, &lt;B&gt;Yr, Month&lt;/B&gt;), (EmpId, &lt;B&gt;Yr&lt;/B&gt;), (EmpId))&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Or we can use ROLLUP to simplify the query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Month, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY GROUPING SETS(&lt;STRONG&gt;(&lt;/STRONG&gt;EmpId, &lt;STRONG&gt;ROLLUP(Yr, Month))&lt;/STRONG&gt;)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Note that once again the correct use of parenthesis is critical.&amp;nbsp; If we omit one set of parenthesis from the above query, the meaning changes significantly and we end up separately aggregating by employee and then computing the year and month ROLLUP for all employees.&lt;/P&gt;The new GROUPING SETS syntax is available in all of &lt;A title="Microsoft SQL Server 2008" href="http://www.microsoft.com/sql/2008/default.mspx" mce_href="http://www.microsoft.com/sql/2008/default.mspx"&gt;SQL Server 2008&lt;/A&gt; Community Technology Preview (CTP) releases.&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=5402551" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Rollup+and+Cube/default.aspx">Rollup and Cube</category></item><item><title>Aggregation WITH CUBE</title><link>http://blogs.msdn.com/craigfr/archive/2007/09/27/aggregation-with-cube.aspx</link><pubDate>Thu, 27 Sep 2007 21:30:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:5172072</guid><dc:creator>craigfr</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/5172072.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=5172072</wfw:commentRss><description>&lt;P&gt;In my last post, I wrote about how&amp;nbsp;&lt;A title="Aggregation WITH ROLLUP" href="http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx"&gt;aggregation WITH ROLLUP&lt;/A&gt; works.&amp;nbsp; In this post, I will discuss how aggregation WITH CUBE works.&amp;nbsp; Like the WITH ROLLUP clause, the WITH CUBE clause permits us to compute multiple "levels" of aggregation in a single statement.&amp;nbsp; To understand the difference between these two clauses, let's look at an example.&amp;nbsp; We'll use the same fictitious sales data from last week's example.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)&lt;BR&gt;INSERT Sales VALUES(1, 2005, 12000)&lt;BR&gt;INSERT Sales VALUES(1, 2006, 18000)&lt;BR&gt;INSERT Sales VALUES(1, 2007, 25000)&lt;BR&gt;INSERT Sales VALUES(2, 2005, 15000)&lt;BR&gt;INSERT Sales VALUES(2, 2006, 6000)&lt;BR&gt;INSERT Sales VALUES(3, 2006, 20000)&lt;BR&gt;INSERT Sales VALUES(3, 2007, 24000)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Consider the following query from last week:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY EmpId, Yr WITH ROLLUP&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;It will be easier to see what is happening if we pivot the sales data:&lt;/P&gt;
&lt;TABLE class="" cellSpacing=0 cellPadding=0 border=1&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=213 colSpan=2 rowSpan=2&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=426 colSpan=4&gt;
&lt;P align=center&gt;&lt;B&gt;Yr&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2005&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2006&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2007&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;ALL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" width=106 rowSpan=4&gt;
&lt;P&gt;&lt;B&gt;EmpId&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;1&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;12000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;18000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;25000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;55000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;15000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;6000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;21000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;3&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;20000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;24000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;44000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;ALL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;120000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;The table clearly shows that the WITH ROLLUP clause computes the total for each employee for all years and the grand total for all employees and all years.&amp;nbsp; The query does not compute the totals for each year for all employees.&amp;nbsp; Moreover, the order of the columns in the GROUP BY clause determines in which order the data is totaled.&lt;/P&gt;
&lt;P&gt;Now let's repeat the same query but replace the WITH ROLLUP clause with a WITH CUBE clause:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY EmpId, Yr WITH CUBE&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;This query computes all possible sub-totals and totals:&lt;/P&gt;
&lt;TABLE class="" cellSpacing=0 cellPadding=0 border=1&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=213 colSpan=2 rowSpan=2&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=426 colSpan=4&gt;
&lt;P align=center&gt;&lt;B&gt;Yr&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2005&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2006&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2007&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;ALL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" width=106 rowSpan=4&gt;
&lt;P&gt;&lt;B&gt;EmpId&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;1&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;12000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;18000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;25000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;55000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;2&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;15000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;6000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;21000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;3&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;20000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;24000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;44000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;ALL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;&lt;I&gt;27000.00&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;&lt;I&gt;44000.00&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;&lt;I&gt;49000.00&lt;/I&gt;&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=106&gt;
&lt;P&gt;&lt;B&gt;120000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;
&lt;P mce_keep="true"&gt;Because the WITH CUBE clause causes the query to compute all possible totals, the order of the columns in the GROUP BY clause does not matter.&amp;nbsp; Of course, by default, SQL Server does not pivot the results of either of the above queries.&amp;nbsp; Here is the actual output from the WITH CUBE query:&lt;/P&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00
NULL        2005        27000.00
NULL        2006        44000.00
NULL        2007        49000.00&lt;/PRE&gt;
&lt;P&gt;Next, let's look at the query plan for the WITH CUBE query:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Concatenation&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[EmpId], [Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([Sales].[EmpId] ASC, [Sales].[Yr] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[Yr], [Sales].[EmpId]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([Sales].[Yr] ASC, [Sales].[EmpId] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([Sales]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1012]=NULL))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Spool&lt;/P&gt;
&lt;P&gt;This plan consists of two parts.&amp;nbsp; SQL Server has effectively rewritten our query as follows:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY EmpId, Yr WITH ROLLUP&lt;BR&gt;UNION ALL&lt;BR&gt;SELECT NULL, Yr, SUM(Sales)&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY Yr&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The first part of the plan computes the result for the WITH ROLLUP query above.&amp;nbsp; I described how this query plan works in last week's post.&amp;nbsp; The second part of this plan computes the missing year sub-totals yielding the entire CUBE result.&amp;nbsp; Note that this plan employs a common sub-expression spool.&amp;nbsp; As I discussed in &lt;A title="Optimized Non-clustered Index Maintenance in Per-Index Plans" href="http://blogs.msdn.com/craigfr/archive/2007/08/22/optimized-non-clustered-index-maintenance-in-per-index-plans.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/08/22/optimized-non-clustered-index-maintenance-in-per-index-plans.aspx"&gt;this post&lt;/A&gt;, a common sub-expression spool copies its input rows into a worktable and then reads and returns the rows from the worktable multiple times - in this case twice.&amp;nbsp; The spool is meant to improve performance although, in this example, it has little impact since the server could just as easily have re-read the original Sales table. &amp;nbsp;However, if the input to the aggregation was more complex and cost more to evaluate, the spool would help.&lt;/P&gt;
&lt;P&gt;If we use the WITH CUBE clause when aggregating on more than two columns, SQL Server simply generates increasingly complex plans with additional inputs to the concatentation operator.&amp;nbsp; As with the simple two column example, the idea is to compute the whole CUBE by computing all of the individual ROLLUPs that compose it.&lt;/P&gt;
&lt;P&gt;Finally, we can actually combine WITH CUBE and PIVOT to generate the above table in a single simple statement.&amp;nbsp; (I actually proposed a variation of this query in an answer to a reader's &lt;A title="The PIVOT Operator" href="http://blogs.msdn.com/craigfr/archive/2007/07/03/the-pivot-operator.aspx#comments" mce_href="http://blogs.msdn.com/craigfr/archive/2007/07/03/the-pivot-operator.aspx#comments"&gt;comment&lt;/A&gt; on my post about the PIVOT operator but I like this solution better.)&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, [2005], [2006], [2007], [ALL]&lt;BR&gt;FROM&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; (&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SELECT&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CASE WHEN GROUPING(EmpId) = 0&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; THEN CAST (EmpId AS CHAR(7))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ELSE 'ALL'&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; END AS EmpId,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CASE WHEN GROUPING(Yr) = 0&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; THEN CAST (Yr AS CHAR(7))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ELSE 'ALL'&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; END AS Yr,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(Sales) AS Sales&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; FROM Sales&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; GROUP BY EmpId, Yr WITH CUBE&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ) AS s&lt;BR&gt;PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007], [ALL])) AS p&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Here is the output from this query:&lt;/P&gt;&lt;PRE&gt;EmpId   2005                  2006                  2007                  ALL
------- --------------------- --------------------- --------------------- ---------------------
1       12000.00              18000.00              25000.00              55000.00
2       15000.00              6000.00               NULL                  21000.00
3       NULL                  20000.00              24000.00              44000.00
ALL     27000.00              44000.00              49000.00              120000.00&lt;/PRE&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=5172072" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Rollup+and+Cube/default.aspx">Rollup and Cube</category></item><item><title>Aggregation WITH ROLLUP</title><link>http://blogs.msdn.com/craigfr/archive/2007/09/21/aggregation-with-rollup.aspx</link><pubDate>Fri, 21 Sep 2007 20:36:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:5036687</guid><dc:creator>craigfr</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/5036687.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=5036687</wfw:commentRss><description>&lt;P mce_keep="true"&gt;In this post, I'm going to discuss how aggregation WITH ROLLUP works.&amp;nbsp; The WITH ROLLUP clause permits us to execute multiple "levels" of aggregation in a single statement.&amp;nbsp; For example, suppose we have the following fictitious sales data.&amp;nbsp; (This is the same data that I used for my series of posts on the PIVOT operator.)&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)&lt;BR&gt;INSERT Sales VALUES(1, 2005, 12000)&lt;BR&gt;INSERT Sales VALUES(1, 2006, 18000)&lt;BR&gt;INSERT Sales VALUES(1, 2007, 25000)&lt;BR&gt;INSERT Sales VALUES(2, 2005, 15000)&lt;BR&gt;INSERT Sales VALUES(2, 2006, 6000)&lt;BR&gt;INSERT Sales VALUES(3, 2006, 20000)&lt;BR&gt;INSERT Sales VALUES(3, 2007, 24000)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;We can write a simple aggregation query to compute the total sales by year:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY Yr&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;As expected, this query returns three rows - one for each year:&lt;/P&gt;&lt;PRE&gt;Yr          Sales
----------- ---------------------
2005        27000.00
2006        44000.00
2007        49000.00&lt;/PRE&gt;
&lt;P&gt;The query plan is a simple &lt;A title="Stream Aggregate" href="http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx"&gt;stream aggregate&lt;/A&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1010]=COUNT_BIG([Sales].[Sales]), [Expr1011]=SUM([Sales].[Sales])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([Sales].[Yr] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([Sales]))&lt;/P&gt;
&lt;P&gt;Now suppose that we want to compute not just the sale by year but the total sales as well.&amp;nbsp; We could write a UNION ALL query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY Yr&lt;BR&gt;UNION ALL&lt;BR&gt;SELECT NULL, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;This query works and does give the correct result:&lt;/P&gt;&lt;PRE&gt;Yr          Sales
----------- ---------------------
2005        27000.00
2006        44000.00
2007        49000.00
NULL        120000.00&lt;/PRE&gt;
&lt;P&gt;However, the query plan performs two scans and two aggregations (one to compute the sales by year and one to compute the total sales):&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Concatenation&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1023]=(0) THEN NULL ELSE [Expr1024] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1023]=COUNT_BIG([Sales].[Sales]), [Expr1024]=SUM([Sales].[Sales])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([Sales].[Yr] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([Sales]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1010]=NULL))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1025]=COUNT_BIG([Sales].[Sales]), [Expr1026]=SUM([Sales].[Sales])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([Sales]))&lt;/P&gt;
&lt;P&gt;We can do better by adding a WITH ROLLUP clause to the original query:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY Yr WITH ROLLUP&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;This query is simpler to write and uses a more efficient query plan with only a single scan:&lt;/P&gt;
&lt;P&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([Sales].[Yr] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([Sales]))&lt;/P&gt;
&lt;P&gt;The bottom stream aggregate in this query plan is the same as the stream aggregate in the original non-ROLLUP query.&amp;nbsp; This aggregation is a normal aggregation and, as such, it can be implemented using a stream aggregate (as in this example) or a &lt;A title="Hash Aggregate" href="http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx"&gt;hash aggregate&lt;/A&gt; (try adding an OPTION (HASH GROUP) clause to the above query).&amp;nbsp; It can also be parallelized.&lt;/P&gt;
&lt;P&gt;The top stream aggregate is a special aggregate that computes the ROLLUP.&amp;nbsp; (Unfortunately, in SQL Server 2005 there is no way to discern from the query plan that this aggregate implements a ROLLUP.&amp;nbsp; This issue will be fixed in SQL Server 2008 graphical and XML plans.)&amp;nbsp; A ROLLUP aggregate is always implemented using stream aggregate and cannot be parallelized.&amp;nbsp; In this simple example, the ROLLUP stream aggregate merely returns each pre-aggregated input row while maintaining a running total of the Sales column.&amp;nbsp; After outputting the final input row, the aggregate also returns one additional row with the final sum.&amp;nbsp; Since SQL lacks a concept of an ALL value, the Yr column is set to NULL for this final row.&amp;nbsp; If NULL is valid value for Yr, we can identify the ROLLUP row using the GROUPING(Yr) construct.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CASE WHEN GROUPING(Yr) = 0&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; THEN CAST (Yr AS CHAR(5))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ELSE 'ALL'&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; END AS Yr,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY Yr WITH ROLLUP&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;Yr    Sales
----- ---------------------
2005  27000.00
2006  44000.00
2007  49000.00
ALL   120000.00&lt;/PRE&gt;
&lt;P&gt;We can also compute multiple ROLLUP levels in a single query.&amp;nbsp; For example, suppose that we want to compute the sales first by employee and then for each employee by year:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr, SUM(Sales) AS Sales&lt;BR&gt;FROM Sales&lt;BR&gt;GROUP BY EmpId, Yr WITH ROLLUP&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00&lt;/PRE&gt;
&lt;P&gt;There are a couple of points worth noting about this query.&amp;nbsp; First, since the combination of the EmpId and Yr columns is unique, in the absence of the WITH ROLLUP clause, this query would just return the original data.&amp;nbsp; However, with the WITH ROLLUP clause the query produces a useful result.&amp;nbsp; Second, the order of the columns in the GROUP BY clause is relevant with the WITH ROLLUP clause.&amp;nbsp; To see why simply try the same query but reverse the EmpId and Yr columns.&amp;nbsp; Instead of computing the sales first by employee it will compute the sales first by year.&lt;/P&gt;
&lt;P&gt;The query plan for this query is identical to the query plan for the prior query except that it groups on both the EmpId and Yr columns instead of on just the EmpId column.&amp;nbsp; Like the prior query plan, this query plan includes two stream aggregates: the bottom one which is a normal stream aggregate and the top one which computes the ROLLUP.&amp;nbsp; This ROLLUP stream aggregate actually computes two running totals: one which computes the total sales for an employee for all years and one which compute the total sales for all employees and all years.&amp;nbsp; This table shows how the ROLLUP computation proceeds:&lt;/P&gt;
&lt;P&gt;
&lt;TABLE class="" cellSpacing=0 cellPadding=0 border=1&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;EmpId&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;Yr&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;SUM(Sales) BY EmpId, Yr&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;SUM(Sales) BY EmpId&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;SUM(Sales)&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2005&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;12000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;12000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;12000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2006&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;18000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;30000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;30000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;1&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2007&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;25000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;55000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;55000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;1&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;NULL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;55000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;55000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2005&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;15000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;15000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;70000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2006&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;6000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;21000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;76000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;2&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;NULL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;21000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;76000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;3&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2006&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;20000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;20000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;96000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;3&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;2007&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;24000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;44000.00&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;120000.00&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;3&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;NULL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;44000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;120000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;NULL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;NULL&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;&lt;/B&gt;&amp;nbsp;&lt;/P&gt;&lt;/TD&gt;
&lt;TD class="" vAlign=top width=96&gt;
&lt;P&gt;&lt;B&gt;120000.00&lt;/B&gt;&lt;/P&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;/P&gt;
&lt;P mce_keep="true"&gt;In my next post, I'll take a look at the WITH CUBE clause.&amp;nbsp; I'll discuss how it differs from WITH ROLLUP both in terms of function and in terms of its implementation.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=5036687" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Rollup+and+Cube/default.aspx">Rollup and Cube</category></item><item><title>PIVOT Query Plans</title><link>http://blogs.msdn.com/craigfr/archive/2007/07/09/pivot-query-plans.aspx</link><pubDate>Mon, 09 Jul 2007 21:05:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3784411</guid><dc:creator>craigfr</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/3784411.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=3784411</wfw:commentRss><description>&lt;P&gt;In &lt;A title="The PIVOT Operator" href="http://blogs.msdn.com/craigfr/archive/2007/07/03/the-pivot-operator.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2007/07/03/the-pivot-operator.aspx"&gt;my last post&lt;/A&gt;, I gave an overview of the PIVOT operator.&amp;nbsp; In this post, I'm going to take a look at the query plans generated by the PIVOT operator.&amp;nbsp; As we'll see, SQL Server generates a surprisingly simple query plan that is essentially just a fancy aggregation query plan.&lt;/P&gt;
&lt;P&gt;Let's use the same schema and queries from my previous post:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)&lt;BR&gt;INSERT Sales VALUES(1, 2005, 12000)&lt;BR&gt;INSERT Sales VALUES(1, 2006, 18000)&lt;BR&gt;INSERT Sales VALUES(1, 2007, 25000)&lt;BR&gt;INSERT Sales VALUES(2, 2005, 15000)&lt;BR&gt;INSERT Sales VALUES(2, 2006, 6000)&lt;BR&gt;INSERT Sales VALUES(3, 2006, 20000)&lt;BR&gt;INSERT Sales VALUES(3, 2007, 24000)&lt;/P&gt;
&lt;P mce_keep="true"&gt;SELECT [2005], [2006], [2007]&lt;BR&gt;FROM (SELECT Yr, Sales FROM Sales) AS s&lt;BR&gt;PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;This query generates the following query plan:&lt;/P&gt;&lt;PRE&gt;  |--Compute Scalar(DEFINE:(
                 [Expr1006]=CASE WHEN [Expr1024]=(0) THEN NULL ELSE [Expr1025] END,
                 [Expr1007]=CASE WHEN [Expr1026]=(0) THEN NULL ELSE [Expr1027] END,
                 [Expr1008]=CASE WHEN [Expr1028]=(0) THEN NULL ELSE [Expr1029] END))
       |--Stream Aggregate(DEFINE:(
                      [Expr1024]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1025]=SUM(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1026]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1027]=SUM(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1028]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1029]=SUM(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END)))
            |--Table Scan(OBJECT:([Sales]))&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;This is just a basic &lt;A title=Aggregation href="http://blogs.msdn.com/craigfr/archive/2006/09/06/743116.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/06/743116.aspx"&gt;scalar aggregation&lt;/A&gt; query plan!&amp;nbsp; It calculates one SUM aggregate for each year.&amp;nbsp; Like any SUM aggregate, each aggregate actually computes both the count and the sum.&amp;nbsp; If the count is zero, the query plan returns NULL else it returns the sum.&amp;nbsp; (The compute scalar handles this logic.)&lt;/P&gt;
&lt;P&gt;The only twist is that each SUM aggregate is actually computed over a CASE statement that filter for those rows that match the year for which it is summing sales.&amp;nbsp; The CASE statement returns the value of the &lt;I&gt;Sales &lt;/I&gt;column for those rows that match the year and NULLs for all other rows.&amp;nbsp; To clarify what is happening, we can view the results of the CASE statements without the aggregation:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, Yr,&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CASE WHEN Yr = 2005 THEN Sales END AS [2005],&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; CASE WHEN Yr = 2006 THEN Sales END AS [2006],&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;CASE WHEN Yr = 2007 THEN Sales END AS [2007]&lt;BR&gt;FROM Sales&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;PRE&gt;EmpId       Yr          2005                  2006                  2007
----------- ----------- --------------------- --------------------- ---------------------
1           2005        12000.00              NULL                  NULL
1           2006        NULL                  18000.00              NULL
1           2007        NULL                  NULL                  25000.00
2           2005        15000.00              NULL                  NULL
2           2006        NULL                  6000.00               NULL
3           2006        NULL                  20000.00              NULL
3           2007        NULL                  NULL                  24000.00
2           2007        NULL                  NULL                  NULL&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;When computing the sums of each of the year columns, the query plan relies on the fact that aggregate functions discard NULLs; that is, the NULL values are not included in the results.&amp;nbsp; Although this point may seem intuitive for a SUM aggregate, the significance is clearer for a COUNT aggregate:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE T (A INT)&lt;BR&gt;INSERT T VALUES(NULL)&lt;/P&gt;
&lt;P mce_keep="true"&gt;-- Returns 1: the number rows in table T&lt;BR&gt;SELECT COUNT(*) FROM T&lt;/P&gt;
&lt;P mce_keep="true"&gt;-- Returns 0: the number of non-NULL values of column A&lt;BR&gt;SELECT COUNT(A) FROM T&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Note that we could just as easily have written the original query as:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(CASE WHEN Yr = 2005 THEN Sales END) AS [2005],&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(CASE WHEN Yr = 2006 THEN Sales END) AS [2006],&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; SUM(CASE WHEN Yr = 2007 THEN Sales END) AS [2007]&lt;BR&gt;FROM Sales&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;This query gets a nearly identical query plan.&amp;nbsp; The only visible difference is the use of an extra compute scalar to evaluate the CASE statements.&lt;/P&gt;&lt;PRE&gt;  |--Compute Scalar(DEFINE:(
                 [Expr1004]=CASE WHEN [Expr1013]=(0) THEN NULL ELSE [Expr1014] END,
                 [Expr1005]=CASE WHEN [Expr1015]=(0) THEN NULL ELSE [Expr1016] END,
                 [Expr1006]=CASE WHEN [Expr1017]=(0) THEN NULL ELSE [Expr1018] END))
       |--Stream Aggregate(DEFINE:(
                      [Expr1013]=COUNT_BIG([Expr1007]), [Expr1014]=SUM([Expr1007]),
                      [Expr1015]=COUNT_BIG([Expr1008]), [Expr1016]=SUM([Expr1008]),
                      [Expr1017]=COUNT_BIG([Expr1009]), [Expr1018]=SUM([Expr1009])))
            |--Compute Scalar(DEFINE:(
                           [Expr1007]=CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END,
                           [Expr1008]=CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END,
                           [Expr1009]=CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END))
                 |--Table Scan(OBJECT:([Sales]))&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;The other big difference between the PIVOT syntax and query plan and the alternative syntax and query plan is that the PIVOT query suppresses the following warning about NULLs:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;Warning: Null value is eliminated by an aggregate or other SET operation.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;We could also suppress this warning by executing the following statement:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SET ANSI_WARNINGS OFF&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;At this point, the query plan for a multi-row PIVOT operation should not come as a surprise:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, [2005], [2006], [2007]&lt;BR&gt;FROM (SELECT EmpId, Yr, Sales FROM Sales) AS s&lt;BR&gt;PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;The query plan for this PIVOT operation uses CASE statements to compute the same intermediate result that we saw above.&amp;nbsp; Then, like any other GROUP BY query, it uses either &lt;A title="Stream Aggregate" href="http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx"&gt;stream&lt;/A&gt; or &lt;A title="Hash Aggregate" href="http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx" mce_href="http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx"&gt;hash aggregate&lt;/A&gt; to group by &lt;I&gt;EmpId&lt;/I&gt; and to compute the final result.&amp;nbsp; In this case, the optimizer chooses a stream aggregate.&amp;nbsp; Since we do not have an index to provide order, it must also introduce a sort.&lt;/P&gt;&lt;PRE&gt;  |--Compute Scalar(DEFINE:(
                 [Expr1007]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END,
                 [Expr1008]=CASE WHEN [Expr1027]=(0) THEN NULL ELSE [Expr1028] END,
                 [Expr1009]=CASE WHEN [Expr1029]=(0) THEN NULL ELSE [Expr1030] END))
       |--Stream Aggregate(GROUP BY:([Sales].[EmpId]) DEFINE:(
                      [Expr1025]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1026]=SUM(CASE WHEN [Sales].[Yr]=(2005) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1027]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1028]=SUM(CASE WHEN [Sales].[Yr]=(2006) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1029]=COUNT_BIG(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END),
                      [Expr1030]=SUM(CASE WHEN [Sales].[Yr]=(2007) THEN [Sales].[Sales] ELSE NULL END)))
            |--Compute Scalar(DEFINE:([Sales].[EmpId]=[Sales].[EmpId]))
                 |--Sort(ORDER BY:([Sales].[EmpId] ASC))
                      |--Table Scan(OBJECT:([Sales]))&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;To see how this query really is no different than any other GROUP BY query or to see some alternative query plans, try creating a clustered index on &lt;I&gt;EmpId&lt;/I&gt; to eliminate the sort or using an &lt;I&gt;OPTION(HASH GROUP)&lt;/I&gt; hint to force a hash aggregate.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3784411" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Pivot+and+Unpivot/default.aspx">Pivot and Unpivot</category></item><item><title>The PIVOT Operator</title><link>http://blogs.msdn.com/craigfr/archive/2007/07/03/the-pivot-operator.aspx</link><pubDate>Tue, 03 Jul 2007 21:31:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3674230</guid><dc:creator>craigfr</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/3674230.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=3674230</wfw:commentRss><description>&lt;P&gt;In my next few posts, I'm going to look at how SQL Server implements the PIVOT and UNPIVOT operators.&amp;nbsp; Let's begin with the PIVOT operator.&amp;nbsp; The PIVOT operator takes a normalized table and transforms it into a new table where the columns of the new table are derived from the values in the original table.&amp;nbsp; For example, suppose we want to store annual sales data by employee.&amp;nbsp; We might create a schema such as the following:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)&lt;BR&gt;INSERT Sales VALUES(1, 2005, 12000)&lt;BR&gt;INSERT Sales VALUES(1, 2006, 18000)&lt;BR&gt;INSERT Sales VALUES(1, 2007, 25000)&lt;BR&gt;INSERT Sales VALUES(2, 2005, 15000)&lt;BR&gt;INSERT Sales VALUES(2, 2006, 6000)&lt;BR&gt;INSERT Sales VALUES(3, 2006, 20000)&lt;BR&gt;INSERT Sales VALUES(3, 2007, 24000)&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Notice that this schema has one row per employee &lt;B&gt;per year&lt;/B&gt;.&amp;nbsp; Moreover, notice that in the sample data employees 2 and 3 only have sales data for two of the three years worth of data.&amp;nbsp; Now suppose that we'd like to transform this data into a table that has one row per employee &lt;B&gt;with all three years of sales data in each row&lt;/B&gt;.&amp;nbsp; We can achieve this conversion very easily using PIVOT:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT EmpId, [2005], [2006], [2007]&lt;BR&gt;FROM (SELECT EmpId, Yr, Sales FROM Sales) AS s&lt;BR&gt;PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;I'm not going to delve into the PIVOT syntax which is already documented in &lt;A title="Using PIVOT and UNPIVOT" href="http://msdn2.microsoft.com/en-us/library/ms177410.aspx" mce_href="http://msdn2.microsoft.com/en-us/library/ms177410.aspx"&gt;Books Online&lt;/A&gt;.&amp;nbsp; Suffice it to say that this statement sums up the sales for each employee for each of the specified years and outputs one row per employee.&amp;nbsp; The resulting output is:&lt;/P&gt;&lt;PRE&gt;EmpId       2005                  2006                  2007
----------- --------------------- --------------------- ---------------------
1           12000.00              18000.00              25000.00
2           15000.00              6000.00               NULL
3           NULL                  20000.00              24000.00&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;Notice that SQL Server inserts NULLs for the missing sales data for employees 2 and 3.&lt;/P&gt;
&lt;P&gt;The SUM keyword (or some other aggregate) is required.&amp;nbsp; If the &lt;I&gt;Sales&lt;/I&gt; table includes multiple rows for a particular employee for a particular year, PIVOT does aggregate them - in this case by summing them -&amp;nbsp; into a single data point in the result.&amp;nbsp; Of course, in this example, since the entry in each "cell" of the output table is the result of summing a single input row, we could just as easily have used another aggregate such as MIN or MAX.&amp;nbsp; I've used SUM since it is more intuitive.&lt;/P&gt;
&lt;P&gt;This PIVOT example is reversible.&amp;nbsp; The information in the output table can be used to reconstruct the original input table using an UNPIVOT operation (which I will cover in a later post).&amp;nbsp; However, not all PIVOT operations are reversible.&amp;nbsp; To be reversible, a PIVOT operation must meet the following criteria:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;All of the input data must be transformed. If we include a filter of any kind including on the IN clause, some data may be omitted from the PIVOT result. For example, if we altered the above example only to output sales for 2006 and 2007, clearly we could not reconstruct the 2005 sales data from the result.&lt;/LI&gt;
&lt;LI&gt;Each cell in the output table must derive from a single input row. If multiple input rows are aggregated into a single cell, there is no way to reconstruct the original input rows.&lt;/LI&gt;
&lt;LI&gt;The aggregate function must be an identity function (when used on a single input row). SUM, MIN, MAX, and AVG all return the single input value unchanged and, thus, can be reversed. COUNT does not return its input value unchanged and, thus, cannot be reversed.&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;Here is an example of a non-reversible PIVOT operation.&amp;nbsp; This example, calculates the total sales for all employees for all three years.&amp;nbsp; It does not itemize the output by employee.&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;SELECT [2005], [2006], [2007]&lt;BR&gt;FROM (SELECT Yr, Sales FROM Sales) AS s&lt;BR&gt;PIVOT (SUM(Sales) FOR Yr IN ([2005], [2006], [2007])) AS p&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P mce_keep="true"&gt;Here is the output.&amp;nbsp; Each cell represents the sum of two or three rows from the input table.&lt;/P&gt;&lt;PRE&gt;2005                  2006                  2007
--------------------- --------------------- ---------------------
27000.00              44000.00              49000.00&lt;/PRE&gt;
&lt;P mce_keep="true"&gt;In my next post, I'll look at some example PIVOT query plans.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=3674230" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category><category domain="http://blogs.msdn.com/craigfr/archive/tags/Pivot+and+Unpivot/default.aspx">Pivot and Unpivot</category></item><item><title>Hash Aggregate</title><link>http://blogs.msdn.com/craigfr/archive/2006/09/20/hash-aggregate.aspx</link><pubDate>Wed, 20 Sep 2006 21:09:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:763905</guid><dc:creator>craigfr</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/763905.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=763905</wfw:commentRss><description>&lt;P&gt;In my prior two posts, I wrote about the stream aggregate operator.&amp;nbsp; Stream aggregate is great for scalar aggregates and for aggregations where we have an index to provide a sort order on the group by column(s) or where we need to sort anyhow (e.g., due to an order by clause).&lt;/P&gt;
&lt;P&gt;The other aggregation operator, hash aggregate, is similar to hash join.&amp;nbsp; It does not require (or preserve) sort order, requires memory, and is blocking (i.e., it does not produce any results until it has consumed its entire input).&amp;nbsp; Hash aggregate excels at efficiently aggregating very large data sets.&lt;/P&gt;
&lt;P&gt;Here is pseudo-code for the hash aggregate algorithm:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size=2&gt;for each input row&lt;BR&gt;&amp;nbsp; begin&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; calculate hash value on group by column(s)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; check for a matching row in the hash table&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if we do not find a match&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; insert a new row into the hash table&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; update the matching row with the input row&lt;BR&gt;&amp;nbsp; end&lt;BR&gt;output all rows in the hash table&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;While stream aggregate computes just one group at a time, hash aggregate computes all of the groups simultaneously.&amp;nbsp; We use a hash table to store these groups.&amp;nbsp; With each new input row, we check the hash table to see whether the new row belongs to an existing group.&amp;nbsp; If it does, we simply update the existing group.&amp;nbsp; If not, we create a new group.&amp;nbsp; Since the input data is unsorted, any row can belong to any group.&amp;nbsp; Thus, we cannot output any results until we’ve finished processing every input row.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Memory and spilling&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;As with hash join, the hash aggregate requires memory.&amp;nbsp; Before executing a query with a hash aggregate, SQL Server uses cardinality estimates to estimate how much memory we need to execute the query.&amp;nbsp; With a hash join, we store each build row, so the total memory requirement is proportional to the number and size of the build rows.&amp;nbsp; The number of rows that join and the output cardinality of the join have no effect on the memory requirement of the join.&amp;nbsp; With a hash aggregate, we store one row for each group, so the total memory requirement is actually proportional to the number and size of the output groups or rows.&amp;nbsp; If we have fewer unique values of the group by column(s) and fewer groups, we need less memory.&amp;nbsp; If we have more unique values of the group by column(s) and more groups, we need more memory.&lt;/P&gt;
&lt;P&gt;So, what happens if we run out of memory?&amp;nbsp; Again, like hash join, if we run out of memory, we must begin spilling rows to tempdb.&amp;nbsp;&amp;nbsp;We spill one or more buckets or partitions including any partially aggregated results along with any additional new rows that hash to the spilled buckets or partitions.&amp;nbsp; Although we do not attempt to aggregate the spilled new rows, we do hash them and divide them up into several buckets or partitions.&amp;nbsp; Once we’ve finished processing all input groups, we output the completed in-memory groups and repeat the algorithm by reading back and aggregating one spilled partition at a time.&amp;nbsp; By dividing the spilled rows into multiple partitions, we reduce the size of each partition and, thus, reduce the risk that the algorithm will need to repeat many times.&lt;/P&gt;
&lt;P&gt;Note that while duplicate rows are a big problem for hash join as they lead to skew in the size of the different hash buckets and make it difficult to divide the work into small uniform portions, duplicates are actually quite helpful for hash aggregate since they collapse into a single group.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Examples&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The optimizer tends to favor hash aggregation for tables with more rows and groups.&amp;nbsp; For example, with only 100 rows and 10 groups, we get a sort and a stream aggregate:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;create&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;table&lt;/SPAN&gt; t &lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; c &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;)&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: gray; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;set&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;nocount&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;on&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;declare&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: blue"&gt;int&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;set&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; 0&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;while&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: gray"&gt;&amp;lt;&lt;/SPAN&gt; 100&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;begin&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;insert&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;values&lt;/SPAN&gt; &lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;@i &lt;SPAN style="COLOR: gray"&gt;%&lt;/SPAN&gt; 10&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; @i&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;*&lt;/SPAN&gt; 3&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;set&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;+&lt;/SPAN&gt; 1&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;end&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([t].[a] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;But, with 1000 rows and 100 groups, we get a hash aggregate:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;truncate&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;table&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;declare&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: blue"&gt;int&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;set&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; 100&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;while&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: gray"&gt;&amp;lt;&lt;/SPAN&gt; 1000&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;begin&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;insert&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;values&lt;/SPAN&gt; &lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;@i &lt;SPAN style="COLOR: gray"&gt;%&lt;/SPAN&gt; 100&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; @i&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;*&lt;/SPAN&gt; 3&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;set&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;+&lt;/SPAN&gt; 1&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;end&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Notice that we hash on the group by column.&amp;nbsp; The residual predicate from the hash aggregate is used to compare rows in the hash table to input rows in case we have a hash value collision.&lt;/P&gt;
&lt;P&gt;Notice also&amp;nbsp;that with the hash aggregate we do not need a sort.&amp;nbsp; A sort requires more memory than the hash aggregate since we must sort 1000 rows but only need memory in the hash aggregate for 100 groups.&amp;nbsp; However, if we explicitly request a sort using an order by clause in the query, we get the stream aggregate again:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a &lt;SPAN style="COLOR: blue"&gt;order&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([t].[a] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;If the table gets big enough and the number of groups remains small enough, eventually the optimizer will decide that it is cheaper to use the hash aggregate and sort after we aggregate.&amp;nbsp; For example, with 10,000 rows and only 100 groups, the optimizer decides that it is better to hash and sort only 100 group than to sort 10,000 rows:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;truncate&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;table&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;set&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;nocount&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;on&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;declare&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: blue"&gt;int&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;set&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; 0&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;while&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; @i &lt;SPAN style="COLOR: gray"&gt;&amp;lt;&lt;/SPAN&gt; 10000&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;begin&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;insert&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;values&lt;/SPAN&gt; &lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;@i &lt;SPAN style="COLOR: gray"&gt;%&lt;/SPAN&gt; 100&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; @i&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;*&lt;/SPAN&gt; 3&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;set&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; @i &lt;SPAN style="COLOR: gray"&gt;+&lt;/SPAN&gt; 1&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;end&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a &lt;SPAN style="COLOR: blue"&gt;order&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Sort(ORDER BY:([t].[a] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Distinct&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Just like stream aggregate, hash aggregate can be used to implement distinct operations:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; a &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Or for something a little more interesting:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; b&lt;SPAN style="COLOR: gray"&gt;),&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; c&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Hash Match(Inner Join, HASH:([t].[a])=([t].[a]), RESIDUAL:([t].[a] = [t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1018]=COUNT_BIG([t].[b]), [Expr1019]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a], [t].[b]), RESIDUAL:([t].[a] = [t].[a] AND [t].[b] = [t].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1020]=COUNT_BIG([t].[c]), [Expr1021]=SUM([t].[c])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Hash Match(Aggregate, HASH:([t].[a], [t].[c]), RESIDUAL:([t].[a] = [t].[a] AND [t].[c] = [t].[c]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This plan is logically equivalent to the last plan from my stream aggregate post, but it uses hash aggregate and hash join instead of sorts, stream aggregates, and merge join.&amp;nbsp; We have two hash aggregates to eliminate duplicates (one for "distinct b" and one for "distinct c") and two more hash aggregates to compute the two sums.&amp;nbsp; The hash join "glues" the resulting rows together to form the final result.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Hints&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;You can use the “order group” and “hash group” query hints to force stream aggregate and hash aggregate respectively.&amp;nbsp; These hints affect all aggregation operations in the entire query.&amp;nbsp; For example:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a &lt;SPAN style="COLOR: blue"&gt;option&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;order&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a &lt;SPAN style="COLOR: blue"&gt;option&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;hash &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Sql profiler&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;You can use the SQL profiler “Hash Warning” event class (in the “Errors and Warnings” event category) to detect when a query with a hash join or hash aggregate spills.&amp;nbsp; Spilling generates I/O and can adversely affect performance.&amp;nbsp; See Books Online for more information about this event.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=763905" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category></item><item><title>Stream Aggregate</title><link>http://blogs.msdn.com/craigfr/archive/2006/09/13/752728.aspx</link><pubDate>Wed, 13 Sep 2006 23:42:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:752728</guid><dc:creator>craigfr</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/752728.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=752728</wfw:commentRss><description>&lt;P&gt;There are two physical operators that SQL Server uses to compute general purpose aggregates where we have a GROUP BY clause.&amp;nbsp; One of these operators is stream aggregate which as we saw last week is used for &lt;A href="http://blogs.msdn.com/craigfr/archive/2006/09/06/743116.aspx"&gt;scalar aggregates&lt;/A&gt;.&amp;nbsp; The other operator is hash aggregate.&amp;nbsp; In this post, I’ll take a closer look at how stream aggregate works.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The algorithm&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Stream aggregate relies on data arriving sorted by the group by column(s).&amp;nbsp; If we are grouping on more than one column, we can choose any sort order that includes all of the columns.&amp;nbsp; For example, if we are grouping on columns a and b, we can sort on “(a, b)” or on “(b, a)”.&amp;nbsp; As with merge join, the sort order may be delivered by an index or by an explicit sort operator.&amp;nbsp; The sort order ensures that sets of rows with the same value for the group by columns will be adjacent to one another.&lt;/P&gt;
&lt;P&gt;Here is pseudo-code for the stream aggregate algorithm:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size=2&gt;clear the current aggregate results&lt;BR&gt;clear the current group by columns&lt;BR&gt;for each input row&lt;BR&gt;&amp;nbsp; begin&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if the input row does not match the current group by columns&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; begin&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output the aggregate results&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; clear the current aggregate results&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set the current group by columns to the input row&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; update the aggregate results with the input row&lt;BR&gt;&amp;nbsp; end&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;For example, if we are computing a SUM, the stream aggregate considers each input row.&amp;nbsp; If the input row belongs to the current group (i.e., the group by columns of the input row match the group by columns of the previous row), we update the current SUM by adding the appropriate value from the input row to the running total.&amp;nbsp; If the input row belongs to a new group (i.e., the group by columns of the input row do not match the group by columns of the previous row), we output the current SUM, reset the SUM to zero, and start a new group.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Simple examples&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;create&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;table&lt;/SPAN&gt; t &lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; c &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;)&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;c&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[b], [t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[c]), [Expr1011]=SUM([t].[c])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([t].[b] ASC, [t].[a] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This is basically the same plan that we saw for the scalar aggregate SUM query except that we need to sort the data before we aggregate.&amp;nbsp; (You can think of the scalar aggregate as one big group containing all of the rows; thus, for a scalar aggregate there is no need to sort the rows into different groups.)&lt;/P&gt;
&lt;P&gt;Stream aggregate preserves the input sort order so if we request order on the group by columns or on a subset of the group by columns, we do not need to sort again:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;c&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b &lt;SPAN style="COLOR: blue"&gt;order&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]) DEFINE:([Expr1010]=COUNT_BIG([t].[c]), [Expr1011]=SUM([t].[c])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(ORDER BY:([t].[a] ASC, [t].[b] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Note that the sort columns are reversed from the first example.&amp;nbsp; Previously, we didn’t care whether we sorted on “(a, b)” or “(b, a)”.&amp;nbsp; Now since the query includes an order by clause, we choose to sort on column a first.&lt;/P&gt;
&lt;P&gt;If we have an appropriate index, we do not need to sort at all:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;create&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;clustered&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;index&lt;/SPAN&gt; tab &lt;SPAN style="COLOR: blue"&gt;on&lt;/SPAN&gt; t&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;c&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]) DEFINE:([Expr1010]=COUNT_BIG([t].[c]), [Expr1011]=SUM([t].[c])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Select distinct&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;If we have an index to provide order, stream aggregate can also be used to implement select distinct.&amp;nbsp; (If we need to sort to get the order, we can just let the sort distinct directly; there is no reason to use the stream aggregate.)&amp;nbsp; Select distinct is essentially the same as group by on all selected columns with no aggregate functions.&amp;nbsp; For example:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Can also be written as:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Both queries use the same plan:&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Notice that the stream aggregate has a group by clause, but no defined columns.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Distinct aggregates&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Consider this query:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;We need to eliminate duplicate values of column b from each group.&amp;nbsp; In my last post, we saw one way to do this using a sort distinct.&amp;nbsp; However, if we have an appropriate index, we can also use the stream aggregate to eliminate duplicates:&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;The bottommost stream aggregate eliminates duplicates while the topmost one performs the aggregation.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Multiple distincts&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Finally, consider this query:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; b&lt;SPAN style="COLOR: gray"&gt;),&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; c&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([t].[a])=([t].[a]), RESIDUAL:([t].[a] = [t].[a]))&lt;BR&gt;&lt;FONT color=#ff0000&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1018]=COUNT_BIG([t].[c]), [Expr1019]=SUM([t].[c])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(DISTINCT ORDER BY:([t].[a] ASC, [t].[c] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]))&lt;BR&gt;&lt;/FONT&gt;&lt;FONT color=#0000ff&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1020]=COUNT_BIG([t].[b]), [Expr1021]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)&lt;/FONT&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;As in the multiple scalar distinct example from my prior post, we need to break this query into two parts – one for each distinct set.&amp;nbsp; Note that the computation of “&lt;FONT color=#ff0000&gt;sum(distinct c)&lt;/FONT&gt;” requires a sort to distinct on column c while the computation of “&lt;FONT color=#0000ff&gt;sum(distinct b)&lt;/FONT&gt;” uses an ordered scan of the clustered index and a stream aggregate as shown above.&amp;nbsp; We join these two sets of sums on the group by column (in this case column a) to generate the final result.&amp;nbsp; Since the two input sets are already sorted on the group by column, we can use a merge join.&amp;nbsp; (The compute scalar operators that appear to define “[t].[a] = [t].[a]” are needed for internal purposes and can be disregarded.)&lt;/P&gt;
&lt;P&gt;The merge join ought to be one-to-many not many-to-many since the aggregates ensure uniqueness on the group by column (and join column).&amp;nbsp; This is a minor performance issue not a correctness issue.&amp;nbsp; If we rewrite the original query as an explicit join, we do get a one-to-many join:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; sum_b&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; sum_c&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;from&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: gray; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;as&lt;/SPAN&gt; sum_b &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; r&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: gray; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;join&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: gray; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; a&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; c&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;as&lt;/SPAN&gt; sum_c &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t &lt;SPAN style="COLOR: blue"&gt;group&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;by&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; s&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp;&lt;/SPAN&gt;on&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; r&lt;SPAN style="COLOR: gray"&gt;.&lt;/SPAN&gt;a &lt;SPAN style="COLOR: gray"&gt;=&lt;/SPAN&gt; s&lt;SPAN style="COLOR: gray"&gt;.&lt;/SPAN&gt;a&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Merge Join(Inner Join, MERGE:([t].[a])=([t].[a]), RESIDUAL:([t].[a]=[t].[a]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1020]=COUNT_BIG([t].[c]), [Expr1021]=SUM([t].[c])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(DISTINCT ORDER BY:([t].[a] ASC, [t].[c] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1022]=(0) THEN NULL ELSE [Expr1023] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1022]=COUNT_BIG([t].[b]), [Expr1023]=SUM([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Next …&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In my next post, I will write about the other aggregation operator: hash aggregate.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=752728" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category></item><item><title>Aggregation</title><link>http://blogs.msdn.com/craigfr/archive/2006/09/06/743116.aspx</link><pubDate>Wed, 06 Sep 2006 22:12:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:743116</guid><dc:creator>craigfr</dc:creator><slash:comments>9</slash:comments><comments>http://blogs.msdn.com/craigfr/comments/743116.aspx</comments><wfw:commentRss>http://blogs.msdn.com/craigfr/commentrss.aspx?PostID=743116</wfw:commentRss><description>&lt;P&gt;Aggregation refers to the collapse of a larger set of rows into a smaller set of rows.&amp;nbsp; Typical aggregate functions are COUNT, MIN, MAX, SUM, and AVG.&amp;nbsp; SQL Server also supports other aggregates such as STDEV and VAR.&lt;/P&gt;
&lt;P&gt;I’m going to break this topic down into multiple posts.&amp;nbsp; In this post, I’m going to focus on “scalar aggregates.”&amp;nbsp; Scalar aggregates are queries with aggregate functions in the select list and no GROUP BY clause.&amp;nbsp; Scalar aggregates always return a single row.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Scalar Aggregation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;There is only one operator for scalar aggregation: stream aggregate.&amp;nbsp; For example:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;create&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;table&lt;/SPAN&gt; t &lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; b &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;,&lt;/SPAN&gt; c &lt;SPAN style="COLOR: blue"&gt;int&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;)&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: gray; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;count&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(*)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1005],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1005]=Count(*)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This is the aggregation equivalent of “Hello World!”&amp;nbsp; The stream aggregate just counts the number of input rows and returns this result.&amp;nbsp; The stream aggregate actually computes the count ([Expr1005]) as a bigint.&amp;nbsp; The compute scalar is needed to convert this result to the expected output type of int.&amp;nbsp; Note that a scalar stream aggregate is one of the only examples (maybe the only example – I can’t think of any others right now) of a non-leaf operator that can produce an output row even with an empty input set.&lt;/P&gt;
&lt;P&gt;It is easy to see how to implement other simple scalar aggregate functions such as MIN, MAX, and SUM.&amp;nbsp; We can also calculate multiple scalar aggregates at the same time using a single stream aggregate operator:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;min&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a&lt;SPAN style="COLOR: gray"&gt;),&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;max&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1004]=MIN([t].[a]), [Expr1005]=MAX([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This plan just reads each row of table t and keeps track of the minimum value of column a and the maximum value of column b.&amp;nbsp; Note that we do not need to convert the result for the MIN and MAX aggregates since the types of these aggregates are computed based on the types of columns a and b.&lt;/P&gt;
&lt;P&gt;Some aggregates such as AVG are actually calculated from two other aggregates such as SUM and COUNT:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;avg&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006]/CONVERT_IMPLICIT(int,[Expr1005],0) END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1005]=COUNT_BIG([t].[a]), [Expr1006]=SUM([t].[a])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This time the compute scalar also calculates the average from the sum and count.&amp;nbsp; The CASE expression is needed to make sure that we do not divide by zero.&lt;/P&gt;
&lt;P&gt;Although SUM does not need to be computed per se, we still need the count:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;sum&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1005]=COUNT_BIG([t].[a]), [Expr1006]=SUM([t].[a])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;The CASE expression uses the count to ensure that SUM returns NULL instead of zero if we have no rows.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Scalar Distinct&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Now let’s take a look at what happens if we add a DISTINCT keyword to the aggregate:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;count&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1007]=COUNT([t].[a])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(DISTINCT ORDER BY:([t].[a] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;This query must only count rows that have a unique value for column a.&amp;nbsp; We use the sort operator to eliminate rows with duplicate values of column a.&amp;nbsp; It is easy to remove duplicate rows once we sort the input set since the duplicates will be adjacent to one another.&lt;/P&gt;
&lt;P&gt;Not all distinct aggregates require duplicate elimination.&amp;nbsp; For example, MIN and MAX behave identically with and without the distinct keyword.&amp;nbsp; The minimum and maximum values of a set remain the same whether or not the set includes duplicate values.&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;min&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;),&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;max&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1004]=MIN([t].[a]), [Expr1005]=MAX([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;If we have a unique index, we also can skip the duplicate elimination since the index guarantees that there are no duplicates:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;create&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;unique&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;index&lt;/SPAN&gt; ta &lt;SPAN style="COLOR: blue"&gt;on&lt;/SPAN&gt; t&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;a&lt;SPAN style="COLOR: gray"&gt;)&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: gray; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;count&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;&lt;o:p&gt;&amp;nbsp;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;drop&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: blue"&gt;index&lt;/SPAN&gt; t&lt;SPAN style="COLOR: gray"&gt;.&lt;/SPAN&gt;ta&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1007]=COUNT([t].[a])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Index Scan(OBJECT:([t].[ta]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Multiple Distinct&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Consider this query:&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt; mso-layout-grid-align: none"&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: blue; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt;select&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Courier New'; mso-no-proof: yes"&gt; &lt;SPAN style="COLOR: fuchsia"&gt;count&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; a&lt;SPAN style="COLOR: gray"&gt;),&lt;/SPAN&gt; &lt;SPAN style="COLOR: fuchsia"&gt;count&lt;/SPAN&gt;&lt;SPAN style="COLOR: gray"&gt;(&lt;/SPAN&gt;&lt;SPAN style="COLOR: blue"&gt;distinct&lt;/SPAN&gt; b&lt;SPAN style="COLOR: gray"&gt;)&lt;/SPAN&gt; &lt;SPAN style="COLOR: blue"&gt;from&lt;/SPAN&gt; t&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;As we saw above, we can compute “count(distinct a)” by eliminating rows that have duplicate values for column a.&amp;nbsp; Similarly, we can compute “count(distinct b)” by eliminating rows that have duplicate values for column b.&amp;nbsp; But, given that these two sets of rows are different, how can we compute both at the same time?&amp;nbsp; The answer is we cannot.&amp;nbsp; We must first compute one aggregate result, then the other, and then we must combine the two results into a single output row:&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Tahoma size=1&gt;&amp;nbsp; |--Nested Loops(Inner Join)&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1010]=COUNT([t].[a])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(DISTINCT ORDER BY:([t].[a] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Stream Aggregate(DEFINE:([Expr1011]=COUNT([t].[b])))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Sort(DISTINCT ORDER BY:([t].[b] ASC))&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; |--Table Scan(OBJECT:([t]))&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;The two inputs to the nested loops join compute the two counts from the original query.&amp;nbsp; One of the inputs removes duplicates for column a while the other input removes duplicates for column b.&amp;nbsp; The nested loops join has no join predicate; it is a cross join.&amp;nbsp; Since both inputs to the nested loops join each produce a single row – they are both scalar aggregates – the result of the cross join is also a single row.&amp;nbsp; The cross join just serves to “glue” the two columns of the result into a single row.&lt;/P&gt;
&lt;P&gt;If we have more than two distinct aggregates on different columns, we just use more than one cross join.&amp;nbsp; We can also use this type of plan if we have a mix of non-distinct and distinct aggregates.&amp;nbsp; In that case, one of the cross join inputs aggregates without the sort.&lt;/P&gt;
&lt;P&gt;In my next post, I’ll write about aggregates with a GROUP BY clause.&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=743116" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/craigfr/archive/tags/Aggregation/default.aspx">Aggregation</category></item></channel></rss>