Tips, Tricks, and Advice from the SQL Server Query Optimization Team

  • Why should I create an index?

    There are many cases where the database administrator does not control the queries being submitted of the system.  As a result, the physical database design is often not tuned as well as it could be.  In a number of actual customer cases where we investigated performance issues with them, we’ve found that this can often be the an effective way to improve performance.

     

    Some reasons indexes are helpful in a deployed system:

    1. To pre-materialize a sort over the data
    2. To enable singleton lookups (seeks) for particular values
    3. To reduce locking requirements caused by scans
    4. To reduce memory requirements caused by hash joins

     

    The first two are more generally understood benefits.  You can avoid a sort in your plan with (1) and you can avoid scans with (2).  As long as the update cost to maintain these indexes is acceptable, then you can improve select query performance with an index.

     

    The third issue, locking, is a problem in many applications about which there isn’t really a lot written.  Some applications are “bound” on how many locks they take.  One example could be a billing application with a number of small queries that access individual (or a small number) of rows but have many, many such queries concurrently.  If the query plans chosen are all seeks, then shared or exclusive locks are likely only taken on the rows being specifically accessed/changed in the query results.  However, if one plan changes to be a scan instead of a seek, the number of locks will increase dramatically.  These locks may also be held for longer (depending on the isolation level), since the query may be running longer as well.  An index can improve the performance of such an application because it improves overall query throughput for many concurrent users.

     

    Memory contention is another throughput-related issue that can cause problems when an application runs with many concurrent connections.  In SQL Server, queries reserve memory for the duration of their execution.  This prevents some queries from failing on a memory allocation after they have completed only a portion of the query.  If the system does not have any more memory to give, not-yet-started queries will be blocked and wait for previous queries to complete (and release their memory reservations).  In some applications, the total system throughput is limited by the available memory necessary to enable operations like hash joins. 

     

    Memory contention is most likely a problem for scaled applications running on 32-bit hardware that use a lot of memory at peak load.  While Microsoft Windows does have a mechanism called AWE that enables a process to use more than the normal maximum memory (usually 2 GB), that mechanism is only really useful to store pages from the database in memory.  Internal memory consumers, such as the Query Optimizer and Query Execution, are limited to the memory available in the virtual address space (usually 2 GB minus whatever is used for the database page buffer pool and other internal caches/memory consumers).  This can constrain systems that have few indexes but many queries requiring joins without backing indexes because a hash join is often a good choice for the Optimizer.  Additionally, the optimizer does not pick different plan shapes based on system load in SQL Server 2005.  So, creating an index can cause the system to pick loops joins or merge joins in cases where hash joins would be picked otherwise.  Creating an index or two on such a system can reduce memory contention and improve overall system throughput.

     

    SQL Server has long shipped a program to help with physical database design called the Index Tuning Wizard (and now the Database Tuning Advisor in SQL Server 2005).  This can help find a reasonably optimal index set for a set of queries.  It works by running the Optimizer with a series of “what if” scenarios and evaluating the cost of the plan the optimizer picked in each case.  It then reasons about the best set of indexes to match the workload.

     

    There is also a newer mechanism that you can use in SQL Server 2005 that compliments the existing Database Tuning Advisor Tool.  It does not require that you pre-identify a workload, and it is integrated into the engine and runs as part of the regular operation of the server (you do not need to run anything).  It does not do as much work as the DTA, but it can find the common problems that cause significant performance problems in deployed systems.  The development team used this to debug a number of customer applications and found that it did identify “better” indexes in a number of cases.  If you have to debug a live system where the physical database design is not known to be close to optimal, this may be a useful tool to help improve the system performance.

     

    The specific details of the DMVs are documented here:

    http://msdn2.microsoft.com/en-US/library/ms345434(SQL.90).aspx

    http://msdn2.microsoft.com/en-US/library/ms345407(SQL.90).aspx

    http://msdn2.microsoft.com/en-us/library/ms345421(SQL.90).aspx

     

    The following examples show an example about how they can be used to identify and improve the physical database design in a simple example:

     

    use tempdb

     

    drop table t

    create table t(col1 int, col2 int, col3 binary(500))

    declare @i int

    set @i = 0

    while @i < 10000

    begin

    insert into t(col1, col2) values (@i, rand()*1000)

    set @i = @i + 1

    end

     

    set showplan_text on

    select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

     

    StmtText                                                                                                                                                    

    ------------------------------------------------------------------------------------------------------------------------------------------------------------

      |--Hash Match(Inner Join, HASH:([t2].[col1])=([t1].[col1]), RESIDUAL:([t].[col1] as [t2].[col1]= [t].[col1] as [t1].[col1]))

           |--Table Scan(OBJECT:( [t] AS [t2]), WHERE:( [t].[col2] as [t2].[col2]=(500)))

           |--Table Scan(OBJECT:( [t] AS [t1]))

     

    set showplan_text off

     

    -- run the query

    select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

     

    -- look at our management views to see what recommendations exist for this query

    select * from sys.dm_db_missing_index_details

    select * from sys.dm_db_missing_index_groups

    select * from sys.dm_db_missing_index_group_stats

     

    (columns removed for space)

    index_handle database_id object_id   equality_columns      included_columns 

    ------------ ----------- ----------- ----------------------------------------

    11           2           741577680   [col2]                NULL             

    9            2           741577680   [col1]                [col2], [col3]    

     

    (2 row(s) affected)

     

    index_group_handle index_handle

    ------------------ ------------

    10                 9

    12                 11

     

    (2 row(s) affected)

     

    group_handle unique_compiles      user_seeks           user_scans           last_user_seek                                         last_user_scan                                         avg_total_user_cost                                   avg_user_impact                                       system_seeks         system_scans         last_system_seek                                       last_system_scan                                       avg_total_system_cost                                 avg_system_impact                                    

    ------------ -------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- ----------------------------------------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- -----------------------------------------------------

    10           1                    1                    0                    2006-04-06 11:14:55.130                                NULL                                                   1.2693552627256595                                    50.409999999999997                                    0                    0                    NULL                                                   NULL                                                   0.0                                                   0.0

    12           1                    1                    0                    2006-04-06 11:14:55.130                                NULL                                                   1.2693552627256595                                    46.479999999999997                                    0                    0                    NULL                                                   NULL                                                   0.0                                                   0.0

     

    (2 row(s) affected)

     

     

    create index i1 on t(col2)

     

    -- let's see if the plan changed

    set showplan_text on

    select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

     

     

    -- now we have an index seek, followed by a bookmark lookup, then followed by a hash join

    StmtText                                                                                                                                                    

    ------------------------------------------------------------------------------------------------------------------------------------------------------------

      |--Hash Match(Inner Join, HASH:([t2].[col1])=([t1].[col1]), RESIDUAL:([t].[col1] as [t2].[col1]= [t].[col1] as [t1].[col1]))

           |--Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1003]))

           |    |--Index Seek(OBJECT:([t].[i1] AS [t2]), SEEK:([t2].[col2]=(500)) ORDERED FORWARD)

           |    |--RID Lookup(OBJECT:([t] AS [t2]), SEEK:([Bmk1003]=[Bmk1003]) LOOKUP ORDERED FORWARD)

           |--Table Scan(OBJECT:([tempdb].[dbo].[t] AS [t1]))

     

    set showplan_text off

     

    -- run the query again (it should be a bit faster now)

    select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

     

    -- let's look at our management views again

     

    select * from sys.dm_db_missing_index_details

    select * from sys.dm_db_missing_index_groups

    select * from sys.dm_db_missing_index_group_stats

     

    (columns removed for space)

    index_handle database_id object_id   equality_columns included_columns 

    ------------ ----------- ----------- -----------------------------------

    13           2           741577680   [col1]           [col2], [col3]  

     

    (1 row(s) affected)

     

    index_group_handle index_handle

    ------------------ ------------

    14                 13

     

    (1 row(s) affected)

     

    group_handle unique_compiles      user_seeks           user_scans           last_user_seek                                         last_user_scan                                         avg_total_user_cost                                   avg_user_impact                                       system_seeks         system_scans         last_system_seek                                       last_system_scan                                       avg_total_system_cost                                 avg_system_impact                                    

    ------------ -------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- ----------------------------------------------------- -------------------- -------------------- ------------------------------------------------------ ------------------------------------------------------ ----------------------------------------------------- -----------------------------------------------------

    14           1                    1                    0                    2006-04-06 11:17:01.617                                NULL                                                   0.70575013268039988                                   90.670000000000002                                    0                    0                    NULL                                                   NULL                                                   0.0                                                   0.0

     

    create clustered index i2 on t(col1)

     

    -- let's see if the plan changed

    set showplan_text on

    select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

    StmtText                                                                                                                                                                                     

    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

      |--Nested Loops(Inner Join, OUTER REFERENCES:([t2].[col1]))

           |--Nested Loops(Inner Join, OUTER REFERENCES:([Uniq1005], [t2].[col1]))

           |    |--Index Seek(OBJECT:([t].[i1] AS [t2]), SEEK:([t2].[col2]=(500)) ORDERED FORWARD)

           |    |--Clustered Index Seek(OBJECT:([t].[i2] AS [t2]), SEEK:([t2].[col1]= [t].[col1] as [t2].[col1] AND [Uniq1005]=[Uniq1005]) LOOKUP ORDERED FORWARD)

           |--Clustered Index Seek(OBJECT:([t].[i2] AS [t1]), SEEK:([t1].[col1]=[t].[col1] as [t2].[col1]) ORDERED FORWARD)

     

    set showplan_text off

     

    -- now run the query again (it should be even faster)

    select * from t as t1 inner join t as t2 on t1.col1=t2.col1 where t2.col2 = 500

     

    I hope that this gives you enough information to experiment with this on your own applications.

     

    Thanks,

     

    Conor

  • I Smell a Parameter!

    Parameters are a useful way to improve overall system performance when there are many common queries with the same shape.  Instead of compiling each query, one plan can be used for all similar queries.  This can significantly reduce CPU overhead and improve throughput.  As long as the queries would have really returned the same plan, this is a big performance winner.  SQL Server internally tries to automatically turn simple non-parameterized user queries into parameterized queries to take advantage of this performance gain.

     

    Parameter use, especially in more complex scenarios, can also cause performance issues.  If the queries are complex and/or the data distribution on the columns against which the parameter is compared vary, the cost of different plan choices can change.  A plan that is optimal for one parameter value may perform poorly for another value.  The query optimizer still needs to estimate the selectivity and cardinality of predicates using a parameter value.  This section describes how this process works in more detail.

     

    The default implementation would be to use the average number of duplicates per value (average frequency) to guess how many rows that the parameter would match.  For an equality predicate, that’s exactly what will happen in the base case.  However, the optimizer tries to find out any specific value available at the time that the query is compiled and will use that value to estimate cardinality instead.  This process of using the parameter value to estimate selectivity and cardinality is called “parameter sniffing”.

     

    There are some non-obvious implementation details that can get in the way of giving the optimizer the chance to use that value.  One common issue is that SQL Server’s procedural language, T-SQL, is interpreted.  Additionally, the compilation model for batches of commands will compile all commands and then execute them.  If you set a parameter value and then run a query that uses that parameter within the same batch, the value isn’t available to the optimizer (and therefore it can’t be sniffed).

     

    Here’s some SQL examples to demonstrate the various behaviors in SQL 2005:

     

    use tempdb

     

    -- create a table with 2000 rows.  1000 of them have the values 1 to 1000 each once (no

    -- duplicates).  Then we have 1000 rows with the value 5000.

    drop table t

    create table t(col1 int)

    declare @i int

    set @i = 0

    while @i < 1000

    begin

    insert into t(col1) values (@i)

    set @i = @i + 1

    end

    set @i = 0

    while @i < 1000

    begin

    insert into t(col1) values (5000)

    set @i = @i + 1

    end

     

    -- Let's create some fullscan statistics on the column in our table

    create statistics t_col1 on t(col1) with fullscan

     

    -- note that the selectivity for the table in the statistics is 1/1001 = 9.9900097E-4;  There are 1001 distinct values in the table

    dbcc show_statistics ('dbo.t','t_col1')

     

    -- compile with no value to sniff.  We should use the "density" for the whole table to make our estimate

    -- which means that we'll take 2000 rows * 1/1001 = about 2 rows returned

    dbcc freeproccache

    declare @p int

    select * from t where col1 = @p

    -- (look at the output plan to see the estimate)

     

    -- same scenario but set a value.  The value 5000 has 1000 instances, but we estimate 2 rows. 

    -- Why? Well, we compile the batch before we execute it, so the optimizer in 2005 does not see

    -- the parameter value and we treat this the same as the previous case because it hasn’t been

    -- actually set when the query is compiled

    dbcc freeproccache

    declare @p int

    set @p = 5000

    select * from t where col1 = @p

     

    -- Let's use the option recompile as a workaround.

    -- The first optimization has the same problem as before - estimates 2 rows

    dbcc freeproccache

    declare @p int

    set @p = 1

    select * from t where col1 = @p

    option(recompile)

     

    -- now look at the compiled plan for this case - we've recompiled and correctly estimate 1 row

    select * from t where col1 = @p

    option(recompile)

     

    -- Another (better) workaround is to use the new optimize for hint - it avoids the recompile

    -- and we estimate 1 row

    dbcc freeproccache

    declare @p int

    set @p = 1

    select * from t where col1 = @p

    option (optimize for (@p = 1))

     

    -- last workaround - use a stored procedure.  This will create a new context in the

    -- server and lets the optimizer "see" the parameter and sniff it during compilation.

    create procedure foo (@p int)

    as

    return select * from t where col1 = @p

     

    -- compile and examine the plan for this - estimates 1 row instead of 2

    dbcc freeproccache

    execute foo @p=1

     

    -- how does one see if the plan used a sniffed parameter?  If sniffed, the information

    -- is visible in showplan_xml

    set showplan_xml on

    execute foo @p=1

     

    -- look for a node like this:

    - <ParameterList>

      <ColumnReference Column="@p" ParameterCompiledValue="(1)" />

      </ParameterList>

     

     

     

    The examples here demonstrate how the cardinality estimates change based on the patterns you use to invoke your queries.  In the examples, we expect to see an estimate of 1 row for each time we sniff a parameter with a value between 1-1000 since there is exactly one of each row.  If we sniff nothing, we’d expect to use the average frequency (which is 2 rows for this example).  If you pick the outlier (in this case we have 1000 rows with the value 5000), then you should see a higher estimate.

     

    For the examples I’ve given here, I purposely picked a very simple query plan so that you could easily see the differences in the cardinality estimates without having to explain any other oddities in the query plan.  While there is no plan choice impact for these examples, cardinality estimation errors such as the ones introduced here do cause changes in more complex queries. 

     

    The main points that you need to remember are:

    1. If you don’t see any problems due to parameters in your application, you don’t need to do anything.  Just keep this information in the back of your head, just in case.
    2. If you find that the optimizer is picking different plans over time that have varying performance characteristics, consider using a parameter hint with a representative “average” value to get a good, common query plan that will work reasonably for all values.
    3. If you are running really complex queries and need the plan choice to be exact in each case, you can consider using the “recompile” hint – just know that it will compile each time it runs, so this is likely more appropriate for longer-running queries to justify that compilation cost.
    4. Moving a query into a stored procedure can put it into a separate procedural context and can be a good way to get that value visible to the optimizer (Note: this works in SQL 2000 as well)
  • Row Goals in Action

    Today, we'll talk about row goals.  The optimizer in SQL Server has a feature that can bias plan choices to retrieve a certain number of rows quickly instead of the whole results.  This shows up in a few places, but the primary areas are in TOP N queries and in subqueries that need to check for the existance of "a" row - WHERE EXISTS, for example.  The query hint "option (fast N)" also translates into the same feature. 

     

    One of the problems that can arise with row goals is that the query plan can change when you add them into a query request.  Especially with the option(fast N) hint, you can find cases where the first row may come back quickly but the whole results come back more slowly.  So, if you send option(fast N) but retrieve the whole results, your system won't perform as well.

     

    Effectively, we bias the optimizer to favor plans that can return a few rows quickly compared to the minimum cost to return all rows.  In practice, this often means that joins will choose nested loops joins for row-goal limited plans and hash joins otherwise.


    The following example demonstrates the issue:

     

    use tempdb

     

    create table A(col1 int, col2 binary(100), col3 int)

    declare @i int

    set @i = 0

    while @i < 5000

    begin

    insert into A(col1, col2, col3) values (@i, 0x3333, rand()*1000)

    set @i = @i + 1

    end

     

    create clustered index i1 on A(col1)

     

    set statistics time on

    -- should pick a series of hash joins

    select A1.* from

    A as A1 inner join A as A2 on A1.col1 = A2.col1

    inner join A as A3 on A1.col1 = A3.col1

     

    -- if there is a row goal, we’ll pick the loop join plan that returns one (or a few) row(s) quickly

    set statistics time on

    select A1.* from

    A as A1 inner join A as A2 on A1.col1 = A2.col1

    inner join A as A3 on A1.col1 = A3.col1

    option (fast 1)

     

    You can run these on your installation to see the time difference for retrieving all rows with the hash join plan vs. the loop join plan (just keep adding rows until you see the difference). 

     

    This pattern is caused by the row goal logic in the optimizer.  When we have a very low row goal, the nested loops join is preferred because its initial cost (the cost for the first row) is comparatively low – it just involves a single seek for this example.  The hash join plan has a higher initial cost because it has to build a hash table before any rows can be returned.  Once built, however, the hash join plan is generally cheaper and would be picked if the estimated number of rows gets large.
  • Migrating Cardinality Eestimation Posts from Previous Blog

    Ian had a blog that contains a number of interestig tips/tricks, mostly about cardinality estimation during query optimization.

    I've chatted with him and will be migrating/expanding some content over to the team site.

    If you have specific requests from topics in his Blog that would interest you, post them up and I'll find someone on the team to work on them.

    Ian's Blog: http://blogs.msdn.com/ianjo/default.aspx

    Thanks,

    Conor

     

  • TOP 100 Percent ORDER BY Considered Harmful.

    (Updated 2006-27-03 9:00am Pacfiic Time - at the bottom)

    SQL is a declarative language.  That means that the language declares the form of the output but not the method used to generate those results.  There are cases, however, where the language is not quite rich enough to describe what customers want.  There are also cases where it is possible to over-infer guarantees that are not really in the language at all.

    ORDER BY is an example that I'd like to discuss in this post.  In ANSI SQL, the language has the ability to specify the output order of a query (the "presentation order").   This applies to query results and to cursor results, if a cursor is being used.  This is exposed via ORDER BY, and this is really only legal in the ANSI spec on the outer-most query block of a query.  Microsoft SQL Server allows this in more places than the spec indicates (which is something we can do and still be in-line with the specification).  Specifically, we allow the use of ORDER BY in sub-selects or in view definitions to help define the set of rows that qualify with a TOP operation (TOP is not in ANSI SQL, by the way).

    The default plan implementation for this code happens to sort the rows as part of performing the TOP operation.  Often this meant that the results happened to be returned in sorted order, and this led customers to believe that there was a guarantee that rows were sorted.  This is actually not the case.  If you want rows to be returned to the user in sorted order, you need to use an ORDER BY on the outermost query block (per ANSI) to guarantee the output presentation order. 

    In SQL Server 2005, you can see how the output order is *not* guaranteed through the following example:

    use tempdb

    create table t1 (col1 int, col2 int)
    declare @i int
    set @i = 0
    while @i < 20
    begin
    insert into t1(col1, col2) values (@i, rand()*1000)
    set @i = @i + 1
    end

    create view v as (select top 100 percent * from t1 order by col1 desc)

    set showplan_text on
    select * from v

    The output from this example is:

    StmtText                                      
    ----------------------------------------------
      |--Table Scan(OBJECT:([tempdb].[dbo].[t1]))

    col1        col2       
    ----------- -----------
    0           443
    1           418
    2           291
    3           726
    4           948
    5           315
    6           835
    7           247
    8           755
    9           78
    10          88
    11          906
    12          640
    13          876
    14          422
    15          746
    16          528
    17          909
    18          186
    19          868

    You'll notice that the original table was created as a heap (no clustered index) and that no secondary indexes are defined.  So, if there were an ordering guarantee for the query based on the order by in the view, the rows should be sorted in descending order and the query plan would need to have a sort in it to make that happen.  However, you'll notice that the query plan contains only a Table Scan, which will return the rows back in the order they happen to be read from disk.  In this particular case, the optimizer recognizes that TOP 100 PERCENT qualifies all rows and does not need to be computed at all.  It gets removed from the query plan, and there is no other reason to do an intermediate sorting operation.  As such, the output isn't returned in any particular order.

    So, please do not assume that nested sub-selects will guarantee order.  The optimizer will consider rewrites that invalidates this assumption.  If you need rows returned in a particular order, please add that to the outermost block of your SELECT statement.

    Thanks,

    Conor Cunningham

    PS: Update... One comment I received seeks a bit more detail on why the ORDER BY is "ignored" in this case.  I'll try to expand a bit more to see if that helps. 

    From the semantics of the query, the optimizer only really honors the ORDER BY as part of the evaluation of the TOP in that same scope.  The syntax is a bit unfortunate because it causes people to believe that things "will be ordered".  However, it really only says "I want this set of rows".  Presentation orders only apply to the output of the query, not intermediate nodes.  Since we can reorder operations, you can't actually view this as a procedural guarantee "first I sort, then I do whatever is 'above' the sort".  You don't need an optimizer if that were the case, as you aren't asking declarative questions anymore. 

    The bottom line is that even if we do the sort as part of the TOP operation in a sub-select, it does NOT guarantee anything about the output order of the query. 

    SELECT TOP 99 PERCENT * FROM T ORDER BY col1

    is not the same as:

    SELECT * FROM (SELECT TOP 99 PERCENT * FROM T ORDER BY col1) AS A

    The top query guarantees the output order of the query.  The bottom query does not (even if the rows happen to come back in sorted order)

     

     

  • Using Computed Columns to Fix Scalar Expression Estimation Errors

    (Ok, let's try something a bit more involved now.  Here's a tip on how you can use computed columns to improve query plan quality in SQL Server 2005.  The optimizer supports building statistics on the results of expressions, and this can avoid debugging query plans later if you make sure that the system is doing this for complex expressions, user-defined expressions, or other places where the cardinality estimate isn't matching what is actually happening at runtime.  Enjoy! Conor Cunningham):

     

    Computed columns can be used to correct some cardinality estimation errors.  If the optimizer is incorrectly estimating a predicate (as examples, due to correlations or because the optimizer is guessing on the selectivity of a user-defined function), a computed column can give the optimizer a hook on which statistics can be stored.  Note that computed columns in SQL 2005 can be created without increasing the on-disk row width if the scalar expression meets specific requirements documented in Books Online (for example, it must be deterministic).

     

    The following example demonstrates how computed columns can be used to affect cardinality estimates.  A few notes for the example:

     

    1. The plan is very simple in this example.  The cardinality estimate error does not impact the plan choice.  Please understand that this problem could negatively impact join order/algorithm in larger queries.
    2. User-defined functions need to be created WITH SCHEMABINDING or else the optimizer does not trust them at all (they could be changed without causing the plan to recompile, which is “not good”).

     

    use tempdb

     

    -- create a table with 20000 rows in it

    create table t1 (col1 int, col2 int)

    declare @i int

    set @i = 0

    while @i < 20000

    begin

    insert into t1(col1, col2) values (@i, rand()*1000)

    set @i = @i + 1

    end

     

    -- drop function dbo.foo

    -- create a function.  The optimizer does not understand it, so

    -- it will cause guessing logic in plan generation.

    create function foo(@a int) returns int

    with schemabinding

    as

    BEGIN

    set @a = 5;

    return 155

    END

     

    -- clean out the plan cache

    dbcc freeproccache

     

    set statistics profile on

    -- we under-guess on the UDF expression (assumes about 8% selectivity, actual is 100%)

    select * from t1

    where dbo.foo(t1.col1) = 155

     

    (Abbreviated statistics profile output)

     

    Rows                 Executes             StmtText                                                                   

    -------------------- -------------------- -----------------------------------------------

    20000                1                    select * from t1   where dbo.foo(t1.col1) = 155                            

    20000                1                      |--Filter(WHERE:( foo(col1)=155))

    20000                1                           |--Table Scan(OBJECT:(t1))                         

     

    EstimateRows     

    ------------------

    1681.7928        

    1681.7928        

    20000.0          

     

     

    -- we over-guess on the UDF expression (assumes about 8% selectivity, actual is 0%)

    dbcc freeproccache

    select   * from t1

    where dbo.foo(t1.col1) = 154

     

    Rows                 Executes             StmtText                                                                  

    -------------------- -------------------- -----------------------------------------------

    0                    1                    select * from t1   where dbo.foo(t1.col1) = 154                         

    0                    1                      |--Filter(WHERE:(foo( col1)=154))

    20000                1                           |--Table Scan(OBJECT:(t1))                        

     

    (estimates match previous example)

     

    set statistics profile off

     

    -- add (non-persisted) computed column over expression.

    -- Note that the function has WITHSCHEMABINDING enabled (which causes the optimizer to trust the expression will not change over recompiles)

    -- Note2 - this could be persisted if the function were expensive.  Non-deterministic expressions are not persistable, though imprecise expressions are.

    alter table t1 add c3 as dbo.foo(col1)

     

    set statistics profile on

    -- exact same queries

    -- we correctly estimate 100% selectivity for this query

    dbcc freeproccache

    select * from t1

    where dbo.foo(t1.col1) = 155

     

    Rows                 Executes             StmtText                                                                                                         

    -------------------- -------------------- -----------------------------------------------

    20000                1                    select * from t1   where dbo.foo(t1.col1) = 155                                                                  

    0                    0                      |--Compute Scalar(DEFINE:( c3=c3))                                  

    20000                1                           |--Filter(WHERE:( c3=(155)))                                                          

    0                    0                                |--CompSca(DEFINE:( c3=foo(col1)))

    20000                1                                     |--Table Scan(t1))                                                     

     

    EstimateRows           

    ------------------------

    20000.0                

    20000.0                

    20000.0                

    20000.0                

    20000.0                

     

     

     

    -- we estimate 0% selectivity for this query (actually we have a floor to estimate at least one row)

    dbcc freeproccache

    select * from t1

    where dbo.foo(t1.col1) = 154

     

    Rows                 Executes             StmtText                                                                                                         

    -------------------- -------------------- -----------------------------------------------

    0                    1                    select * from t1   where dbo.foo(t1.col1) = 154                                                                  

    0                    0                      |--Compute Scalar(DEFINE:( c3=c3))                                  

    0                    1                           |--Filter(WHERE:( c3=(154)))                                                          

    0                    0                                |--CompSca(DEFINE:( c3=foo(col1)))

    20000                1                                     |--Table Scan(t1))                                                     

     

    EstimateRows           

    ------------------------

    1.0                

    1.0                

    1.0                

    20000.0                

    20000.0                

     

    set statistics profile off

  • Query Recompilation Details

    SQL Server contains self-tuning functionality that will recompile your query as the source data changes to find more efficient query plans for your current data.  This process of recompiling queries is a great, general-purpose feature to keep your system running well.  We have a white paper describing some best practices for managing query recompilation in your server that is a good primer for learning more about how your system is working "under the covers".  Enjoy!

    http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx

    Thanks,

    Conor Cunningham

  • Statistics in SQL Server 2005 White Paper

    Statistics is one of the most challenging areas of our product.  While we have automated a number of the common cases so that the average installation does not need to worry about the details of statistics, it is still important for an administrator to have resources in case they are working on a complex installation.

    One of our Program Managers, Eric Hanson, wrote a detailed document outlining how statistics work in the SQL Server Query Processor.  I've attached a link below. 

    http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx

  • Greetings!

    On behalf of the Query Optimization Team for Microsoft's SQL Server product, welcome to our humble virtual abode.  We decided that we'd start a blog to help people better understand query plans, physical schema design in databases, making your application perform better, and anything else related to query optimization.  Furthermore, we'll be posting tips and tricks that we learn from our work with customers so that others can benefit from those exchanges.

    I'll be posting a few initial entries to prime the pump, and then I will be asking others from the team to post interesting ideas as they see them in their work. 

    Conor Cunningham

    SQL Server Query Optimization Development Lead

More Posts « Previous page

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker