Conor vs. FOREIGN KEY join elimination

Conor vs. FOREIGN KEY join elimination

Rate This
  • Comments 2

I received a question from a friend in Brazil related to foreign keys and join elimination in SQL Server.  Yes, SQL Server can detect that some joins are not needed to return results to the user and skip them.  What is this evil magic, you ask?  Well, let’s walk through it before we get out our torches and pitchforks…

 

What’s a Foreign Key?  To most users, it is something that throws errors and blocks you from inserting data where you want it.  However, to SQL Server, it is a glorious thing!  It lets the Query Optimizer know that the data in one table absolutely correlates to the data in another table.  (So, if I have a Customers table and an Orders table, a Foreign Key from Orders to Customers tells me that every Order has a Customer).  Now, when one queries the join of Orders and Customers, it might be that you only really need columns from the Orders table.  Now we’re in business – the Optimizer can detect this and see that the result of the query is the same if we just skip all of the work of doing random IO lookups into the Primary Key index on Customers to make sure that they are there.

 

So, you should think of Foreign Keys as a great way to give the Optimizer hints to speed up your query, in most cases.  The only cases where I recommend that customers NOT define Foreign Keys is in very large data warehouses (which I will save for another blog post).

 

So, the question from my Brazillian friend is why we DO the elimination for single-column Foreign Keys and not for multi-column cases.

 

(I will post the mail I got from him here – so, his example + comments)

   1:   
   2:  IF OBJECT_ID('Tab2') IS NOT NULL
   3:  BEGIN
   4:    DROP TABLE Tab2
   5:  END
   6:  IF OBJECT_ID('Tab1') IS NOT NULL
   7:  BEGIN
   8:    DROP TABLE Tab1
   9:  END
  10:  GO
  11:  CREATE TABLE Tab1 (Tab1_Col1 Integer NOT NULL PRIMARY KEY, Tab1_Col2 Char(200))
  12:  CREATE TABLE Tab2 (Tab2_Col1 Integer NOT NULL PRIMARY KEY, Tab1_Col1 Integer NOT NULL, Tab2_Col2 Char(200))
  13:  ALTER TABLE Tab2 ADD CONSTRAINT fk FOREIGN KEY (Tab1_Col1) REFERENCES Tab1(Tab1_Col1)
  14:  GO
  15:   
  16:  -- Fine, the execution plan not use Tab1
  17:  SELECT Tab2.* 
  18:    FROM Tab2
  19:   INNER JOIN Tab1 
  20:      ON Tab1.Tab1_Col1 = Tab2.Tab1_Col1
  21:  --|--Clustered Index Scan(OBJECT:([dbo].[Tab2].[PK__Tab2__993AF6027FB886E3]))
  22:   
  23:  GO
  24:   
  25:  IF OBJECT_ID('Tab2') IS NOT NULL
  26:  BEGIN
  27:    DROP TABLE Tab2
  28:  END
  29:  IF OBJECT_ID('Tab1') IS NOT NULL
  30:  BEGIN
  31:    DROP TABLE Tab1
  32:  END
  33:  GO
  34:  CREATE TABLE Tab1 (Tab1_Col1 Integer NOT NULL, Tab1_Col2 Integer NOT NULL, Tab1_Col3 Char(200), PRIMARY KEY(Tab1_Col1, Tab1_Col2))
  35:  CREATE TABLE Tab2 (Tab2_Col1 Integer NOT NULL PRIMARY KEY, Tab1_Col1 Integer NOT NULL, Tab1_Col2 Integer NOT NULL, Tab2_Col2 Char(200))
  36:  ALTER TABLE Tab2 ADD CONSTRAINT fk FOREIGN KEY (Tab1_Col1, Tab1_Col2) REFERENCES Tab1(Tab1_Col1, Tab1_Col2)
  37:  CREATE INDEX ix ON Tab2(Tab1_Col1, Tab1_Col2)
  38:  GO
  39:   
  40:  -- Why not use the same behavior above ? Just because the multi column foreign key?
  41:  SELECT Tab2.*
  42:    FROM Tab2
  43:   INNER JOIN Tab1 
  44:      ON Tab1.Tab1_Col1 = Tab2.Tab1_Col1
  45:     AND Tab1.Tab1_Col2 = Tab2.Tab1_Col2
  46:  --|--Nested Loops(Inner Join, OUTER REFERENCES:([dbo].[Tab2].[Tab1_Col1], [dbo].[Tab2].[Tab1_Col2]))
  47:  --     |--Clustered Index Scan(OBJECT:([dbo].[Tab2].[PK__Tab2__993AF602084DCCE4]))
  48:  --     |--Clustered Index Seek(OBJECT:([dbo].[Tab1].[PK__Tab1__6D1AC6A2047D3C00]), SEEK:([dbo].[Tab1].[Tab1_Col1]=[dbo].[Tab2].[Tab1_Col1] AND [dbo].[Tab1].[Tab1_Col2]=[dbo].[Tab2].[Tab1_Col2]) ORDERED FORWARD)
  49:   
  50:  -- Why not apply the predicate at Tab1.Tab1.Col2 too? We have the trusted foreing key to be able to do that.
  51:  -- If we do it, in that case we could use the PK of Tab1 and the Index ix of Tab2 just filtering by Tab1_Col1(JOIN) and Tab1_Col2(WHERE)
  52:  SELECT Tab2.* 
  53:  FROM Tab2
  54:  INNER JOIN Tab1
  55:    ON Tab1.Tab1_Col1 = Tab2.Tab1_Col1
  56:  WHERE Tab2.Tab1_Col2 = 10
  57:  --|--Nested Loops(Inner Join, OUTER REFERENCES:([dbo].[Tab2].[Tab1_Col1]))
  58:  --     |--Clustered Index Scan(OBJECT:([dbo].[Tab2].[PK__Tab2__993AF602084DCCE4]), WHERE:([dbo].[Tab2].[Tab1_Col2]=(10)))
  59:  --     |--Clustered Index Seek(OBJECT:([dbo].[Tab1].[PK__Tab1__6D1AC6A2047D3C00]), SEEK:([dbo].[Tab1].[Tab1_Col1]=[dbo].[Tab2].[Tab1_Col1]) ORDERED FORWARD)

 

Before I answer, let me just say that I love getting questions like this – it shows that people really do like to deeply understand the engine and use what we put in it to its fullest. 

 

So, the answer is that we did the single-column case many years ago since most FKs are single-column.  We haven’t seen enough cases where the multi-column FK is used to make it more important than all of the other things that we’ve been requested to add to the product.  I wish the answer had more intrigue, but it’s just a case where we have lots and lots of things to code and we haven’t had time to do this one yet.  The technical side of me obviously wishes we had all of these things coded into the product.  The other way to look at it is that people are upset if query compilation takes too long, so we have to find a balance between what is mathematically possible and what is useful to the broadest set of customers.

If we do find that this kind of construct becomes more common, we would likely go and add this support for a future release of the product.

 

For the last example, predicate duplication across multi-column joins is another one of these cases where it is possible to do but does not show up as being the dominant factor in queries often enough to enable by default.

 

One last note – we also do not do FK join elimination on tempdb, even if you have a FK defined.

Happy Querying!

Conor Cunningham

Leave a Comment
  • Please add 7 and 3 and type the answer here:
  • Post
  • "Before I answer, let me just say that I love getting questions like this – it shows that people really do like to deeply understand the engine and use what we put in it to its fullest."

    And that's why we love you so much. :)

  • Suppose I have a view for DimShippingDate that is based on DimDate.  In my fact table I have a foreign key on ShippingDateKey that references DateKey in DimDate.  If I do not use ShippingDate in my query, the join to DimDate is still not eliminated.  This seems to be the case for all my virtual dimensions.  Is there something else I need to do for the optimizer to know the join can be eliminated?

Page 1 of 1 (2 items)