CSS SQL Server Engineers

This is the official team Web Log for Microsoft Customer Service and Support (CSS) SQL Support. Posts are provided by the CSS SQL Escalation Services

Tracking Down Missing Indexes in SQL Azure

Tracking Down Missing Indexes in SQL Azure

Rate This
  • Comments 7

One of the challenges of SQL Azure is that not all of the TSQL that you are used to using is supported yet.  Since the underlying engine is plain ole’ SQL Server, the engine can understand the TSQL, but we just block its use because we haven’t yet made it work in the multi-tenant, multi-server environment that is SQL Azure. 

One of the classic missing index scripts can be seen in Bart Duncan’s classic post.  For simplicity, I have reposted the TSQL below:

   1:  SELECT 
   2:    migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure, 
   3:    'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) 
   4:    + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
   5:    + ' ON ' + mid.statement 
   6:    + ' (' + ISNULL (mid.equality_columns,'') 
   7:      + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END 
   8:      + ISNULL (mid.inequality_columns, '')
   9:    + ')' 
  10:    + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, 
  11:    migs.*, mid.database_id, mid.[object_id]
  12:  FROM sys.dm_db_missing_index_groups mig
  13:  INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
  14:  INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
  15:  WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
  16:  ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC

Unfortunately, if you try to use this TSQL, you immediately run into the problem that none of the DMVs are supported in SQL Azure.  So much for the easy way…

Since the DMVs are just ongoing collections of information that you can collect manually from dm_exec_query_stats, I decided to try to build this up manually.  This led me to generate the following query:

 
   1:  SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
   2:  qp.query_plan.value('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
   3:  FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp 
   4:  WHERE qp.query_plan.exist('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
   5:  ORDER BY TotalImpact DESC

This generates a list of ShowPlanXMLs ordered by “execution count * missing index group impact”.  Now that we have a list of ShowPlans ordered by overall impact, we need to parse the ShowPlanXML to pull out the missing indexes.  For those unfamiliar with the missing index information in ShowPlanXML data, here is an example:

<MissingIndexes><MissingIndexGroup Impact="98.6314"><MissingIndex Database="[BugCheck]" Schema="[dbo]" Table="[Watchlists]"><ColumnGroup Usage="EQUALITY"><Column Name="[ID]" ColumnId="1" /></ColumnGroup></MissingIndex></MissingIndexGroup></MissingIndexes>

As you can see, it contains all the information necessary to define the indexes the engine thinks are missing.

Now, for each ShowPlanXML row, we need to use XQuery to shred the MissingIndexes information into its key information.  In a classic case of copying good work already done instead of spending time doing it myself, I found that the Performance Dashboard Reports already do this shredding in one of their reports, so I could copy it:

   1:      SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
   2:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
   3:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
   4:      ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE"    return string($col/@Name)')), '] [', '],[') as included_columns 
   5:      from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)

By combining the above two queries with a cursor, I can stick each shredded missing index into a temporary table.  Then, I can use the equality, inequality, and included columns from the temporary table to generate CREATE INDEX statements as follows:

   1:  select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name + 
   2:  ' (' + IsNull(equality_columns,'') + 
   3:  CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' + 
   4:  CASE WHEN included_columns='' THEN
   5:  ';'
   6:  ELSE
   7:  ' Include (' + included_columns + ');'
   8:  END
   9:  from #results

DISCLAIMER:  As with all automated INDEX suggestion scripts, you need take a look at the CREATE INDEX statements suggested and decide if they make sense for you before you run out and apply them to your production instance!!

One important thing to point out is that even though I was designing this script for SQL Azure, it works just fine against an on-premise instance of SQL Server.  Since SQL Azure supports a subset of the overall SQL Server functionality, you will almost always find that a solution for SQL Azure works just fine against SQL Server.  Lastly, this functionality has been added to the CSS SQL Azure Diagnostics Tool (CSAD) so that you don’t have to worry about running this manually if you don’t want to.

For completeness, here is the TSQL statement in its entirety:

create table #results (target_object_name nvarchar(100), equality_columns nvarchar(100), inequality_columns nvarchar(100), included_columns nvarchar(100))
 
declare @query_plan as xml
declare @totalimpact as float
 
declare querycursor CURSOR FAST_FORWARD FOR
SELECT top (50) cast(replace(cast(qp.query_plan as nvarchar(max)),'xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan"','') as xml),
qp.query_plan.value('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/@Impact)[1]' , 'decimal(18,4)') * execution_count AS TotalImpact
FROM sys.dm_exec_query_stats qs cross apply sys.dm_exec_sql_text(sql_handle) st cross apply sys.dm_exec_query_plan(plan_handle) qp 
WHERE qp.query_plan.exist('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan/MissingIndexes/MissingIndexGroup/MissingIndex[@Database!="m"]') = 1
ORDER BY TotalImpact DESC
 
OPEN querycursor
FETCH NEXT FROM querycursor
INTO @query_plan, @totalimpact  --need to remove the namespace
 
WHILE @@FETCH_STATUS=0
BEGIN
 
    insert into #results (target_object_name, equality_columns, inequality_columns, included_columns)
    SELECT cast(index_node.query('concat(string((./@Database)[1]),".",string((./@Schema)[1]),".",string((./@Table)[1]))') as nvarchar(100)) as target_object_name
    ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "EQUALITY" return string($col/@Name)')), '] [', '],[') as equality_columns
    ,replace(convert(nvarchar(max), index_node.query('for $colgroup in ./ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INEQUALITY" return string($col/@Name)')), '] [', '],[') as inequality_columns
    ,replace(convert(nvarchar(max), index_node.query('for $colgroup in .//ColumnGroup,$col in $colgroup/Column where $colgroup/@Usage = "INCLUDE"    return string($col/@Name)')), '] [', '],[') as included_columns 
    from (select convert(xml, @query_plan) as xml_showplan) as t outer apply xml_showplan.nodes ('//MissingIndexes/MissingIndexGroup/MissingIndex') as missing_indexes(index_node)
    
    FETCH NEXT FROM querycursor
    INTO @query_plan, @totalimpact
    
END
 
CLOSE querycursor
DEALLOCATE querycursor
 
select distinct 'Create NonClustered Index IX_' + substring(replace(replace(target_object_name,'[',''),']',''), 0, charindex('.',replace(replace(target_object_name,'[',''),']',''))) +' On ' + target_object_name + 
' (' + IsNull(equality_columns,'') + 
CASE WHEN equality_columns IS Null And inequality_columns IS Null THEN ',' ELSE '' END + IsNull(inequality_columns, '') + ')' + 
CASE WHEN included_columns='' THEN
';'
ELSE
' Include (' + included_columns + ');'
END
from #results
 
drop table #results
Leave a Comment
  • Please add 7 and 4 and type the answer here:
  • Post
  • I was always wondering the low quality of stored procs MSSQL ships with. Now I know it's the copy/paste nature of building these up that is to blame.

    Why is `declare` lower-case but `WHILE` upper-case? How come `from` is lower and upper-case in different parts of the script?

    Anyway...

  • wqweto -

    I am not sure I follow your comment.  The lack of consistency in casing above is totally driven by the fact that I pulled this from multiple sources, plus handwrote parts of it.  Since the casing doesn't matter for anything by style point, I wasn't too focused on it.

    Evan

  • Evan,

    I have been looking for something like this for awhile!  I am getting a number of "String or binary data would be truncated.  The statement has been terminated." messages when I run the query though.  Some results are still returned though.

    Thanks so much for writing this!

    Brian P. Hamachek

  • In follow-up to my last comment, I was able to resolve the problems by creating the temporary #results table with larger column sizes.  If you are seeing the same messages that I was, try replacing the first line of the query with:

    "create table #results (target_object_name nvarchar(1000), equality_columns nvarchar(1000), inequality_columns nvarchar(1000), included_columns nvarchar(1000))"

    Thanks,

    Brian P. Hamachek

  • Thanks for the feedback, Brian.  I've incorporated your suggestion and the increased column sizes will be in the next release.

  • I am not sure I follow your comment -- 'as you dont understand plain english'.  

    The lack of consistency in casing above is totally driven by the fact that I pulled this from multiple sources -- ' Yes thats COPY'

    plus handwrote parts of it -- 'And thats PASTE'

    Since the casing doesn't matter for anything by style point -- 'because you are so useless'

    I wasn't too focused on it -- 'Yes we know you are Dyslexic'

    thanks -Brian

  • Thank you very much for this! Good stuff!

    wqweto: Please stop with the negative stuff.

Page 1 of 1 (7 items)