http://edge.technet.com/Media/Real-World-DBA-Episode-3--Does-Join-Order-Matter/ 

This Week

In the news, Microsoft releases a web-based database service. In this week’s feature I’m joined by Joe Webb, who answers the question “Does Join Order Matter”? The web link shows you how to architect Service Broker Applications, and the tip of the week is on scripting replication options. There’s a lot to cover, so let’s get started!

In the News

Microsoft has introduced another database for your company to store data. This time, however, there isn’t any installation for you to set up, or hardware to buy. It’s called the SQL Server Data Services (SSDS), and it’s a web-based, on-demand data storage offering along with query processing. You use the amount of data you want and pay as you grow. You can use a web-API to hit the data, which is secured behind an SSL layer. You just add attributes to your data model and the system generates indexes and other structures automatically. You can find out more at: microsoft.com/sql/dataservices.

This Week’s  Feature – Does JOIN order Matter?

This week, Joe Webb joins me to answer the question "does join order matter?" Joe is a database consultant and SQL Server MVP with more than 12 years experience with Microsoft SQL Server, and I’ve asked him to present for us today. See if you agree with Joe’s findings.

A very astute SQL Server professional and reader of this blog posed the following question in a comment to a prior posting entitled "Does the order of criteria the WHERE clause matter?" The comment, submitted through Plaxo Pulse, is not available outside the Pulse community so I've included it below since it's the genesis of today's post.

"What if you were using join statements? Does it matter what order your columns are used in the ON statement? ~ Ariel M."

There are several derivatives of the original question:

  1. Does the order of tables referenced in the ON clause of the JOIN matter?
  2. Does the order of tables referenced in the FROM clause matter?
  3. Does the order of columns referenced in the ON clause of the JOIN matter when multiple columns are used in the JOIN?

I'll address questions 2 and 3 in another post; for now let's consider the first question. To paraphrase and exemplify the question: Will the following two queries produce differing execution plans and thus varying performance? Or does the Query Optimizer take this into account as it analyzes a statement before choosing an execution plan?

--Query #1
SELECT
c.Customer_ID
,c.Last_Name
,c.First_Name
,s.Order_ID
,s.Order_Date
FROM
dbo.Customers AS c JOIN
dbo.Sales_Orders AS s
ON
s.Customer_ID = c.Customer_ID
--Query #2
SELECT
c.Customer_ID
,c.Last_Name
,c.First_Name
,s.Order_ID
,s.Order_Date
FROM
dbo.Customers AS c JOIN
dbo.Sales_Orders AS s
ON
c.Customer_ID = s.Customer_ID


Notice that the only difference in the two queries lies in the ON clause of the JOIN. The first query references Sales then Customers while the second reverses the order.

To answer this question, let's look at the execution plan as provided by SQL Server Management Studio. The first query produces the following execution plan.

JoinPerform1-2008-02-29

 

In this query, SQL Server has chosen a Clustered Index Scan of the Sales_Orders table and an Index Scan for the Customers table. The two tables are joined using a Hash Match Inner Join.

Now, let's look at the execution plan for the second query.

JoinPerform2-2008-02-29

 

Query #2 produced the exact same execution plan! So, we can conclude from this simple example that the order of tables referenced in the ON clause of a JOIN doesn't affect the performance of a query.

As an aside, though, both execution plans use a Hash Match Inner Join. Generally speaking this is not the most efficient join type for SQL Server; Loop Join is much more efficient. When you see a Hash Join in a query execution plan, consider your indexes. A Hash Join is frequently an indicator of inefficient indexing. I'll delve more deeply into this in another post.

Cheers!

Joe


kick it on DotNetKicks.com

posted @ Friday, February 29, 2008 9:15 AM | Feedback (9)

Friday, February 22, 2008

Does the order of criteria the WHERE clause matter?

Recently, I demonstrated that the order of columns in a composite index greatly determines its usefulness. Microsoft SQL Server can efficiently resolve queries using a composite index if the search criteria includes the first column in the index. If the search criteria omits the first column and includes only subsequent columns, the index is of no use to SQL Server for resolving that query. To see the proof, refer to this post.

The example I used in the prior post included the following query as a basis for the discussion.

SELECT         
*
FROM
Customers
WHERE
Last_Name = 'smith' AND
First_Name = 'Jake'

One reader noted the order of the search criteria in the WHERE clause and wondered if it has any impact on the applicability of a composite index. In other words, would the following query still make use of the composite index?

SELECT         
*
FROM
Customers
WHERE
First_Name = 'Jake' AND
Last_Name = 'smith'

Great observation! To answer that question, let's once again look at the query execution plan for the first query.

WhereOrder1-2008-02-22

From this execution plan, we can see that the query optimizer found the composite index named ix_Customer_Name to be useful in resolving the query. This is certainly to be expected since the order of search criteria in the WHERE clause exactly matched the order of columns in the composite index.

In the second query, however, the order of columns in the search criteria in the WHERE clause is reversed as compared to the composite index. Let's look at the execution plan so see if that makes any difference to SQL Server.

WhereOrder2-2008-02-22

This query produced the exact same execution plan as the prior query. Why?

Generally speaking, the order of criteria in the WHERE clause is evaluated and optimized by the query optimizer prior to creating an execution plan. This is good. However, I would encourage you to review the query execution plan for each query prior to putting it into production. You may have some unique expressions or circumstances in your scenario that may be the exception to this rule.

Thanks, Joe! As always you can provide feedback at my blog.

Web Links

Some of the more complicated features in SQL Server need a bit more explanation. This week the web link shows you how to architect service broker applications. You can find it at msdn2.microsoft.com/en-us/library/aa964144.aspx. As always, I’ll include this link in the show notes on my blog.

Tip of the Week

This week’s tip deals with replication. As part of your disaster recovery, you should always keep a script of your latest replication objects handy.  Here’s how. Using SQL Server Management Studio, Connect to the Distributor, Publisher, or Subscriber, and then expand the server node.

Next, Right-click the Replication folder, and then click Generate Scripts. From there, specify any options you want in the Generate SQL Script dialog box, and then click Script to File. Enter a file name in the Script File Location dialog box, and then click Save.

Links:

 Joe's Blog: http://weblogs.sqlteam.com/joew/