The SQL Server optimizer has many algorithms and operators to put together reusable execution plans for queries that you run against a system. One is the ability to perform index intersection and/or union, which basically means that it can use multiple indexes for a query against one table and perform intersect or union operations to return a result set. Being able to use multiple indexes against a table allows you to create a number of narrow indexes that the optimizer can put together in different ways for different query plans. This in turn minimizes the need for a multitude of specialized indexes, which minimizes index storage space and modification impact.

The following diagram shows an example of index intersection. This is a screenshot from a query against the AdventureWorks sample database. Note that there are two indexes on the Person.Contact table that are being used to return the result set: one is a single column index on the LastName column, and the other is a single column index on the FirstName column. This allows the optimizer to select the appropriate plan from the building blocks that you provide, without really needing to know what is in the query.

image

Indexes Are Not Bad

I often get push back from some DBAs when I recommend creating nonclustered indexes on all foreign keys and common predicate columns as a best practice. By default, a foreign key will be used in a join at some time, and a query with a join will generally need a supporting index. This is true of predicate columns as well. As they are often part of a search expression, you generally want an index to facilitate seek operations. The fear is the impact on inserts. This is a valid concern, but the impact is small. Except in specialized, insert-intensive applications, the impact doesn't remotely compare with the performance and scalability problems for queries that are not optimized, let alone the amount of time and effort that is needed to individually optimize all queries.

Joins Are Not Bad

This reminds me of the story of a woman who always cut off the ends of a ham before cooking it. When someone asks her why, she responded that her mother always did that. When she asks her mother why, the mother says that her mother always did that. When they ask the grandmother why, the grandmother replies that she did that because her oven was too small.

SQL Server is a relational database, and the relation model is based on set theory. Set theory is all about joins. So, why would anyone assume that joins are by default "bad"? Well, because of legacy issues and incorrect models.

In previous versions, there were not a lot of algorithms and operators to build good execution plans. A great example of this is with SQL Server 6.0 and associate tables. When you had a many-to-many relationship in SQL Server, you modeled that with an intermediate associate table. You had only one index per table that could be used. There was no way for one index to support queries against an associate table, so you tried to minimize those joins. There was also (and still is, to some degree) a degradation after a certain number of joins. (That number is getting better over time, but not enough to incorporate into your model and query design.) Generally, you run into I/O issues that affect performance before you would worry about the number of joins.

Another issue is the problem of nonoptimal execution plans because of incorrect data modeling. Additional joins seem to exacerbate this issue. But, they are a symptom, not the problem. Only solving the original model problem and giving the optimizer good relational model information through correct PK and FK constraints actually resolves the issue. The white paper "Best Practices for Semantic Data Modeling for Performance and Scalability" addresses some of these issues and offers some best practices for modeling for performance and scalability.

Summary

  • Let the optimizer do its job by giving it the building blocks. Your job should be to deal only with the anomalies.
  • Avoid covering indexes except as an outlier solution until you can find a more enterprise solution.
  • Don't design your application or write your query specifically to avoid a join. You may find that you have to rework some queries, but only if your performance testing shows that to be the case.

Sharon Bjeletich is a database architect and performance and scalability expert who has worked with SQL Server since version 4.2. She is a former senior program manager in the Microsoft SQL Server product group and a former technical officer for the World Health Organization. Currently, she is an independent consultant at SQL Scalability.