SQL Server Storage Engine

TempDB Monitoring and Troubleshooting: Allocation Bottleneck

This blog continues the discussion on the common issues in TempDB that you may need to troubleshoot. In the blog http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/05/tempdb-monitoring-and-troubleshooting-io-bottleneck.aspx, we discussed how to identify and troubleshoot IO bottleneck in TempDB. In this blog, I will describe how to indentify allocation bottleneck and to troubleshoot it.

As you may recall, the allocation bottleneck is caused when allocation structures are accessed by concurrent threads in conflicting modes. Please refer to http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/04/what-is-allocation-bottleneck.aspx for details. In this case, the concurrent thread(s) will need to wait to acquire the pagelatch thereby slowing the operation. The good thing is that the pages containing allocation structures (GAM, SGAM, PFS) are well known and have fixed page numbers in each file. For example, in file-id 1, the allocation pages IDs are

·         PFS – 1

·         GAM – 2

·         SGAM - 3

A PFS page will appear every 8088 pages in a file. The GAM and SGAM will appear every 511232 pages and similarly in other files.

Diagnosing:

You can run the following DMV query to find any latch waits that occur in allocation pages

select   session_id, wait_duration_ms,   resource_description

      from    sys.dm_os_waiting_tasks

      where   wait_type like 'PAGE%LATCH_%' and

              resource_description like '2:%'

Since the database id of TempDB is 2, the search argument ‘2.%’ represents any page in TempDB across any file. If this page happens to be GAM, SGAM or PFS, it will represent allocation bottleneck. Note, in a concurrent application, some blocking is expected so you will need to baseline the allocation waits when your application is performing normally. Only when the waits exceed the baseline significantly, it signals that you are incurring allocation bottleneck.

Troubleshooting:

SQL Server recommends the following

·         Create atleast as many files of equal size as there are COREs/CPUs for SQL Server process. The rationale is that at a given time, the number of concurrent threads is <= number of COREs/CPUs. Don’t confuse this with the number of active sessions/batches. 

·         Enable TF-1118 if you are encountering bottleneck in SGAM structures.

·         If you are still encountering allocation bottleneck, you will need to look at your application and see which query plans are creating/dropping objects in TempDB and if these objects are being cached and take corrective action when possible. Most IT shops have very limited choice here as they don’t own the application code.

Here I want to point out one seemingly harmless step to solve allocation bottleneck that can in fact make it worse. Say you are encountering allocation bottleneck and you decide to add one more file to the mix hoping that allocation will spread further. Well, if you recall proportional fill methodology; all new allocations will favor the newer file because it has the most free space. So suddenly, the allocation problem becomes even worse. It is a good practice to restart SQL Server when you add a new file, clearly it needs to be of the same size as other files, to TempDB.

 

Thanks

Sunil Agarwal

 

Published Sunday, January 11, 2009 5:36 AM by Sunil Agarwal
Filed under:

Comments

 

infoblog &raquo; TempDB Monitoring and Troubleshooting: Allocation Bottleneck said:

January 11, 2009 12:59 AM
 

PegB said:

I wondered if you'd ever seen contention in tempdb due to the SQL Agent?  I've got an issue where contention seems to increase over time (about 24 to 30 hours), to the point where the SQL server almost grinds to a halt.  At that point, restarting the SQL Agent fixes the issue and the server runs normally again.  Currently managing the issue through a proactive restart of the agent every 12 hours.

The issue isn't load-related - under heavier load, it's fine so long as the agent is restarted regularly.

It's SQL 2005 (build 3050) 64-bit, Enterprise Edition, with 6 mirrored databases.

Thanks

February 2, 2009 3:19 PM
 

Sunil Agarwal said:

No, I have not. Does your monitoring showing contention in tempdb allocaton structures? Suggest you compare your numbers with baseline

thanks

Sunil

February 2, 2009 5:09 PM
 

SQL Server Security, Performance & Tuning (SSQA.net) said:

How many times in a day you observe the TEMPDB in your SQL Server enviornment? How many times in a week

February 13, 2009 6:28 AM
Anonymous comments are disabled

About Sunil Agarwal

Sunil Agarwal is a Program Manager in the SQL Server Storage Engine Group at Microsoft. He is responsible for concurrency, indexes, tempdb, LOBS, supportability, and bulk import/export. Prior to joining Microsoft, Sunil worked at DEC, Sybase, BMC Software and DigitalThink, focusing primarily on core database engine technologies and related applications.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker