TSQL_language's WebLog

Basics: query specification

Let's start from basics, if it looks boring, feel free skip to the next post :-). I just feel I need to establish some terminology first.

The main building block of SQL is what the standard calls "query specification". It consists of "clauses": SELECT, FROM, WHERE, GROUP BY, HAVING (the last three are optional). They are executed in that order, with one exception: SELECT clause is executed last.

How is this different from a "SELECT statement"? First, UNION. Several query specifications may be combined using "set operators" (we only support UNION and UNION ALL at this point), and that will still be a single SELECT statement. Second, subqueries. Single SELECT statement (or "query") may contain several query specifications, some serving as subqueries (table or scalar). So, "query specification" is an all-encompassing term, meaning that it contains general-purpose stuff that would be good either in a top-level query or in a subquery. Top-level query, aka SELECT statement, also contains some specialized syntax that is not accepted in a subquery. I'll mention ORDER BY clause, COMPUTE clause, and FOR BROWSE as examples.

While I am on the topic of subqueries, table subqueries (aka "derived tables") deserve a special mention. This feature is what makes SQL "fully composable". This means that you can take any SQL query that delivers a relational stream and use it as a source for another query. It's amazing how many "hard to write" queries that don't really fit into a single query specification become possible to express if you use derived tables.

So, here is the test: we can only allow something in a query specification if it produces a regular relational stream, easily composable with other relational operators. So, the three example clauses I mentioned above as "top-level-only" features all have something that is not really relational. ORDER BY introduces ordering, which the relational model does not support. COMPUTE returns multiple result sets. FOR BROWSE adds "hidden columns" to the result set, something that does not completely fit into the relational data model.

This posting is provided "AS IS" with no warranties, and confers no rights.

Published Thursday, September 23, 2004 12:35 PM by TSQL_language

Comments

 

Martin said:

In my experience the union and union all are some of the most counterintuitive in all of SQL. For the company I work at we give an interview test to candidates. As part of the test the candidates are required to write a query using the union statement. Almost invariably the answers apply the select distinct on the a subquery containing the union to eliminate duplicates. Often the answers include comments talking about how important this is to eliminate duplicates. the UNION statement implicitly eliminates all duplicates. If you want duplicates then you must use the UNION ALL statement.
The reason for this is as I understand it is that the UNION statement is a set operator (remember your courses in statistics) borrowed from mathematics and really is outside of traditional programming language concepts which are based on coding algorithms.
September 24, 2004 1:12 PM
Anonymous comments are disabled

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker