TempDB Monitoring and Troubleshooting: Allocation Bottleneck

TempDB Monitoring and Troubleshooting: Allocation Bottleneck

Rate This
  • Comments 7

This blog continues the discussion on the common issues in TempDB that you may need to troubleshoot. In the blog http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/05/tempdb-monitoring-and-troubleshooting-io-bottleneck.aspx, we discussed how to identify and troubleshoot IO bottleneck in TempDB. In this blog, I will describe how to indentify allocation bottleneck and to troubleshoot it.

As you may recall, the allocation bottleneck is caused when allocation structures are accessed by concurrent threads in conflicting modes. Please refer to http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/04/what-is-allocation-bottleneck.aspx for details. In this case, the concurrent thread(s) will need to wait to acquire the pagelatch thereby slowing the operation. The good thing is that the pages containing allocation structures (GAM, SGAM, PFS) are well known and have fixed page numbers in each file. For example, in file-id 1, the allocation pages IDs are

·         PFS – 1

·         GAM – 2

·         SGAM - 3

A PFS page will appear every 8088 pages in a file. The GAM and SGAM will appear every 511232 pages and similarly in other files.

Diagnosing:

You can run the following DMV query to find any latch waits that occur in allocation pages

select   session_id, wait_duration_ms,   resource_description

      from    sys.dm_os_waiting_tasks

      where   wait_type like 'PAGE%LATCH_%' and

              resource_description like '2:%'

Since the database id of TempDB is 2, the search argument ‘2.%’ represents any page in TempDB across any file. If this page happens to be GAM, SGAM or PFS, it will represent allocation bottleneck. Note, in a concurrent application, some blocking is expected so you will need to baseline the allocation waits when your application is performing normally. Only when the waits exceed the baseline significantly, it signals that you are incurring allocation bottleneck.

Troubleshooting:

SQL Server recommends the following

·         Create atleast as many files of equal size as there are COREs/CPUs for SQL Server process. The rationale is that at a given time, the number of concurrent threads is <= number of COREs/CPUs. Don’t confuse this with the number of active sessions/batches. 

·         Enable TF-1118 if you are encountering bottleneck in SGAM structures.

·         If you are still encountering allocation bottleneck, you will need to look at your application and see which query plans are creating/dropping objects in TempDB and if these objects are being cached and take corrective action when possible. Most IT shops have very limited choice here as they don’t own the application code.

Here I want to point out one seemingly harmless step to solve allocation bottleneck that can in fact make it worse. Say you are encountering allocation bottleneck and you decide to add one more file to the mix hoping that allocation will spread further. Well, if you recall proportional fill methodology; all new allocations will favor the newer file because it has the most free space. So suddenly, the allocation problem becomes even worse. It is a good practice to restart SQL Server when you add a new file, clearly it needs to be of the same size as other files, to TempDB.

 

Thanks

Sunil Agarwal

 

Leave a Comment
  • Please add 7 and 3 and type the answer here:
  • Post
  • PingBack from http://blog.a-foton.ru/index.php/2009/01/11/tempdb-monitoring-and-troubleshooting-allocation-bottleneck/

  • I wondered if you'd ever seen contention in tempdb due to the SQL Agent?  I've got an issue where contention seems to increase over time (about 24 to 30 hours), to the point where the SQL server almost grinds to a halt.  At that point, restarting the SQL Agent fixes the issue and the server runs normally again.  Currently managing the issue through a proactive restart of the agent every 12 hours.

    The issue isn't load-related - under heavier load, it's fine so long as the agent is restarted regularly.

    It's SQL 2005 (build 3050) 64-bit, Enterprise Edition, with 6 mirrored databases.

    Thanks

  • No, I have not. Does your monitoring showing contention in tempdb allocaton structures? Suggest you compare your numbers with baseline

    thanks

    Sunil

  • How many times in a day you observe the TEMPDB in your SQL Server enviornment? How many times in a week

  • You say "Create atleast as many files of equal size as there are COREs/CPUs for SQL Server process. " but we now have servers with 16, 24, 36, and soon 64 processors.  In technet.microsoft.com/.../cc966545.aspx it says "Having too many files increases the cost of file switching, requires more IAM pages, and increases the manageability overhead." At what point do we hit too many files and performance degrades?

  • This is a great question. I will dig into this and post it here or in a separate blog. Some of it will depend on your workload as well.

    I will not worry about the overhead of IAM pages as the number of IAM pages is very low compared to regular pages but I agree that having more files makes it harder to manage.

  • Some comments on the question 'how many files for TempDB in > 64 processor/core configuration'

    Mike Ruthruff (SQLCAT team and storage expert) >>  I would tend to think that the benefit of scaling out the structures would outweigh any performance overhead (assuming you have a heavy rate of allocations).

    Pravin Mittal (Performance owner): We have tried 128 files in 100% CPU environment with no performance degradation.

Page 1 of 1 (7 items)