TSQL_language's WebLog

Query specification: but is it grouped?

Earlier, I explained what exactly a query specification is. It is important to note that query specifications come in two very different varieties: grouped and not. A query specification is grouped if it has any of the following: GROUP BY clause, HAVING clause, or aggregates. Roughly speaking, this would mean that the result of the FROM and WHERE clauses gets "compressed": the query may deliver only one, aggregated row for each group of several rows coming out of FROM. What is important to note is that even without an explicit GROUP BY clause, a query specification may still be grouped if HAVING or aggregates are present. If there is no GROUP BY clause, the entire rowset is considered a single group, and the query delivers exactly one row (if not filtered away completely by HAVING).

The SELECT clause is governed by quite different rules in grouped and non-grouped query specifications. The difference of course is that in the grouped query specifications, the SELECT clause is executed after a GROUP BY (and remember, there is *always* a GROUP BY in the grouped query specifications, even if not explicitly specified!). And after your grouping is done, you are in a very different rowset than before the grouping. Generally speaking, all your original columns are gone. An exception is made for columns that you grouped by, because their value within a group is guaranteed to be the same, so they "survive" a GROUP BY operator. Another exception is aggregates. Inside an aggregate, you can use any column freely.

SQL standard (of '92 vintage) allowed only GROUP BY individual columns, i.e. you could say "GROUP BY a, b", but not "GROUP BY a+b". As an implementor, I wish it remained that way :-), because the semantics is pretty clear, and you can still achieve "GROUP BY a+b" functionality by using a derived table. However, it was perceived so convenient, that the ability to group by an arbitrary expression was added as a language feature. Seemingly benign, it actually brings a couple of esoteric issues with it.

First, what would be a valid GROUP BY expression. The standard is pretty clear that the column you are grouping by should not be an outer reference. This makes sense, because an outer reference would be constant across the rowset, and the result of grouping by a constant is a single group - just as though the GROUP BY was not specified at all. While this is not so terrible, it's more likely that this is a coding bug than the real intent. So basically, the idea is to not allow grouping on a constant. So, how do you tell a constant from a non-constant when you have a complex expression? Here is the rule that Yukon Beta2 enforces: each individual GROUP BY expression has to contain at least one column reference, and that column has to be from the local scope (i.e. not an outer reference). Note that this is a little bit tighter than in SQL 2000, where any column reference (even from an outer scope) was enough to make a GROUP BY expression valid.

As an additional restriction, GROUP BY expressions do not allow scalar subqueries or aggregates.

Ok, that should be enough of esoterica for today, I'll leave the other GROUP BY issue for the next post :-)

This posting is provided "AS IS" with no warranties, and confers no rights.

Published Tuesday, September 28, 2004 11:20 AM by TSQL_language

Comments

No Comments
Anonymous comments are disabled

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker