10/05/2014 - Latest update on the AdaptiveIndexDefrag procedure v1.5.8. Look for it in the "Ezequiel shortcuts" section on the right or just click HERE.

10/05/2014 - Updates to Maintenance plan deployment scripts available HERE

On my previous post, we covered how the Query Optimizer handles row estimation when using Table variables under specific conditions.

Following up on that, I will demonstrate other scenarios where the Query Optimizer must try to optimize queries when no statistics and histograms are available.

In the 2nd example of the series I will use a larger table in the AdventureWorks2012 database (Sales.SalesOrderDetail), and load it into a table variable. Note that I could just as easily have used the table in the previous post, which would yield the exact same practical result:

INSERT INTO @tblSalesOrderDetail SELECT [SalesOrderID] ,[CarrierTrackingNumber] ,[OrderQty] ,[ProductID] ,[SpecialOfferID] ,[UnitPrice] ,[UnitPriceDiscount] FROM [AdventureWorks2012].[Sales].[SalesOrderDetail];

Then select from that table variable as follows:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [CarrierTrackingNumber] LIKE'5A1A-4E3D-B2'

OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

The Estimated Rowcount is 12131,7 while the Actual Rowcount is 21.

What does that mean in this context? The table variable has a 121317 cardinality, so 12131.7 rows are exactly 10 percent of that (12131.7 * 100 / 121317 = 10).

Notice that while I’m using a LIKE operator, I have no wildcards in the example. In this example, the LIKE is treated as an equality operator, and the cardinality of the filter is equal to the cardinality of its left child multiplied by the probability of the comparison being true, which is always 10 percent. If wildcards are used, then some more complex calculations are used.

Conversely, when using the same query but with NOT LIKE, we get an Estimated Rowcount of 109185.3 while the Actual Rowcount is 60898. The result is ignoring NULL values.

In this context, 109185,3 rows are exactly 90 percent of the table variable cardinality (109185.3 * 100 / 121317 = 90). As NOT is introduced here to negate the predicate, so to the initial probability is negated, hence 90 percent.

Still with the LIKE scenario, take the following query:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [ProductID] LIKE''

OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

While the Actual Rowcount is 0 rows of the table, the Estimated Rowcount is still 12131,7, equivalent to 10 percent of the table variable cardinality.

Moving on to the 3rd example, with the BETWEEN operator, take the following query:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [ProductID] BETWEEN 700 AND 800

OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

The above example has a single BETWEEN operator, which is translated into the expression “OBJECT:(@tblSalesOrderDetail), WHERE:([ProductID]>=(700) AND [ProductID]<=(800))”. The Estimated Rowcount is 10918.53 while the Actual Rowcount is 47560.

” In this context with a single BETWEEN operator, 10918.53 rows are exactly 9 percent of the table variable cardinality (10918.53 * 100 / 121317 = 9).

However, the negation behaves differently. Take the following query:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblSalesOrderDetail WHERE [ProductID] NOT BETWEEN 700 AND 800

OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

The NOT BETWEEN is translated into the expression “OBJECT:(@tblSalesOrderDetail), WHERE:([ProductID]<(700) OR [ProductID]>(800))”, and the Estimated Rowcount is 61871.67 while the Actual Rowcount is 73757.

The negation of the previous query type estimates exactly 51 percent of the table variable cardinality (61871,67 * 100 / 121317 = 51).

Another interesting estimation scenario shown in my 4th example involves BIT type search arguments. For this I will use the HumanResources.Employee table again, and load it into a table variable:

INSERT INTO @tblEmployee SELECT [NationalIDNumber] ,[LoginID] ,[OrganizationNode] ,[JobTitle] ,[BirthDate] ,[MaritalStatus] ,[Gender] ,[HireDate] ,[SalariedFlag] ,[VacationHours] ,[SickLeaveHours] FROM [AdventureWorks2012].[HumanResources].[Employee];

Then, search on the SalariedFlag column as follows:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblEmployee WHERE [SalariedFlag]=1 OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

For the above query, the Estimated Rowcount is 145 while the Actual Rowcount is 52:

Why was the Estimated Rowcount 145 rows? As we established in the previous post, this table’s cardinality is 290, so 145 rows are exactly 50 percent of that (145 * 100 / 290 = 50).

This is because by design, when a BIT column is searched on (remember that SalariedFlag does not allow NULLs), it can only have 2 known values (true or false). Therefore, because there are no histograms to support an accurate estimation, the Query Optimizer estimates 50% of rows for either value.

To further demonstrate this point, if we change the search argument in the same query to:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblEmployee WHERE [SalariedFlag]=0 OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

The Estimated Rowcount is still 145 while the Actual Rowcount is 238.

Notice I explicitly stated that the SalariedFlag does not allow NULLs. What if it did?

Let’s recreate the table variable allowing NULLs in the very same column, and load it into a table variable:

INSERT INTO @tblEmployee SELECT [NationalIDNumber] ,[LoginID] ,[OrganizationNode] ,[JobTitle] ,[BirthDate] ,[MaritalStatus] ,[Gender] ,[HireDate] ,[SalariedFlag] ,[VacationHours] ,[SickLeaveHours] FROM [AdventureWorks2012].[HumanResources].[Employee];

Then, search on the SalariedFlag column just as we did before:

SETSTATISTICS PROFILE ON;

SELECT * FROM @tblEmployee WHERE [SalariedFlag]=1 OPTION(RECOMPILE)

SETSTATISTICS PROFILE OFF;

For the above query, the Estimated Rowcount is 95.7 while the Actual Rowcount is of course 52.

For a table cardinality of 290, 95.7 rows are exactly 33 percent of that (95.7* 100 / 290 = 33).

Keeping in mind that SalariedFlag now allows NULLs, it can have 3 known values (true, false or NULL). Therefore, and in a scenario where there are no histograms, the Query Optimizer estimates 33% of rows for each possible value. Note that searching on NULL values in the same query would yield 0 as the Actual Rowcount, although the Estimated Rowcount is still 33%.

Disclaimer: I hope that the information on these pages is valuable to you. Your use of the information contained in these pages, however, is at your sole risk. All information on these pages is provided "as -is", without any warranty, whether express or implied, of its accuracy, completeness, fitness for a particular purpose, title or non-infringement, and none of the third-party products or information mentioned in the work are authored, recommended, supported or guaranteed by Ezequiel. Further, Ezequiel shall not be liable for any damages you may sustain by using this information, whether direct, indirect, special, incidental or consequential, even if it has been advised of the possibility of such damages.