Welcome to MSDN Blogs Sign in | Join | Help

How It Works: Attention, Attention or Should I say Cancel the Query and Be Sure to Process Your Results

I ran into this issue the other day and decided it would make a good post.  Since I have seen this issue more times that I can count I will again attempt to provide details.

The attention event in SQL Server trace output indicates a query cancellation.   There are three common ways this occurs.

1.  Asynchronous cancellation (ODBC SQLCancel for example)

2.  Query Timeout Exceeded and the client driver submits the attention

3.  Results are not processed  (very dangerous)

The attention is submitted to the SQL Server anytime the query is canceled and the client detects there are still pending results.   If the results stream has been processed there is no need to tell SQL Server to cancel the query because SQL Server is already done with the query processing.  So if you see the attention the results are not all processed.

An attention is treated by the SQL Server as an immediate processing termination.   Stop right where you are and terminate the results.   

Open Transaction

begin tran

update ...........

select   (becomes blocked - attention arrives at the server)

At the time of the attention the transaction is not rolled back unless transaction abort (XACT_ABORT) has been enabled.   The client needs to submit a rollback.   If the client does not submit the rollback and continues other processing the transaction remains open and the behavior scope of the application becomes undefined.

Not Processing All Rows

I often see this by a pattern in a trace showing that the attention arrived right after the batch started.  Generally milli-seconds afterwards and not something like a 30 second query timeout.   This is often a clear indication that the client processed the first row or first few rows and discarded the rest of the result set unsafely.  Take the following procedure for example.

create procedure spMyProc

as

begin

   begin tran
     insert into tbl2 select ... with output ....inserted....
   commit tran

end

The procedure uses the OUTPUT clause of the insert to return the inserted rows to the client and let's say that the results from the output clause took more than a single network packet.    The client library gets the first TDS packet and the first set of rows.   If if client only processes the first result set the commit tran may never execute.   Let's look at this in more detail with the actual ODBC client calls.

   SQLExecute("spMyProc")   <-------------- Returns as soon as the first result set if available  (first set of rows in this case)

   SQLFetch()  <---------------- Retrieves the a row   (should be done in a loop)

   SQLFreeStmt()   <------------ Releases the statement and if results are pending issues a SQLCancel.

So if the application only processes the first result it thinks the procedure ran successfully but because it did not process all results the SQLFreeStmt will issue the attention and cancel the procedure execution.   Since we have not completed the insert (still streaming output rows) the commit tran is never executed (attention stops processing immediately).   Data and application behavior becomes inconsistent.    I have even seen customers increase the default packet size to allow the procedure to stream all rows in the first packet and complete until they get their application corrected.

The application should look like the following.

SQLExecute()

do

{

    while(SQLFetch())

    {

    }

} while(SQLMoreResults())

SQLFreeStmt()

ODBC Trace

Here is an example of a bad application pattern as shown in an ODBC trace.  Notice the SQLFetch is not called until it returns NO_MORE_ROWS and no call to SQLMoreResults is made.   The SQLFreeStmt will submit the attention to the server and may create unexpected behavior for the application.

lord003 16 1000 fe4-13e0                EXIT  SQLBindParameter  with return code 0 (SQL_SUCCESS)

                                HSTMT               00B55530

                                UWORD                        3

                                SWORD                        1 <SQL_PARAM_INPUT>

                                SWORD                        1 <SQL_C_CHAR>

                                SWORD                       12 <SQL_VARCHAR>

                                SQLULEN                    8

                                SWORD                        0

                                PTR                0x00AFB2FF

                                SQLLEN                     8

                                SQLLEN *            0x00000000

 

 

lord003 16 1000 fe4-13e0                EXIT  SQLExecDirect  with return code 0 (SQL_SUCCESS)

                                HSTMT               00B55530

                                UCHAR *             0x00B15DA8 [      -3] "SELECT * FROM xx00Online.dbo.loan (INDEX=rowno_loan) WHERE xx00Online.dbo.loan.rowno>? AND ((xx00Online.dbo.loan.balance < ?) AND (xx00Online.dbo.loan.paydue < ?)) ORDER BY xx00Online.dbo.loan.rowno\ 0"

                                SDWORD                    -3

 

lord003 16 1000 fe4-13e0                EXIT  SQLFetch  with return code 0 (SQL_SUCCESS)

                                HSTMT               00B55530

 

lord003 16 1000 fe4-13e0                ENTER SQLGetData

                                HSTMT               00B55530     <-------------------- STMT HANDLE

                                UWORD                        1           <------------------- COLUMN NUMBER

                                SWORD                       -2 <SQL_C_BINARY>

                                PTR                 <unknown type>

                                SQLLEN                     4

                                SQLLEN *            0x00B1B12C

 

lord003 16 1000 fe4-13e0                ENTER SQLFreeStmt

                                HSTMT               00B55530

                                UWORD                        0 <SQL_CLOSE>

 

lord003 16 1000 fe4-13e0                EXIT  SQLFreeStmt  with return code 0 (SQL_SUCCESS)

                                HSTMT               00B55530

                                UWORD                        0 <SQL_CLOSE>

 

Bob Dorr
SQL Server Senior Escalation Engineer

Posted by psssql | 1 Comments

SQL Server 2005 Encounters Exception During Install When System has Odd Number of Processors or Logical Processors Per Core

While attempting to install SQL Server 2005 you may encounter an error familiar to the following and a retry option presented by setup.

Faulting application sqlservr.exe, version 2005.90.1399.0, time stamp 0x434f82e9, faulting module sqlservr.exe, version 2005.90.1399.0, time stamp 0x434f82e9, exception code 0xc0000005, fault offset 0x01019027, process id 0xe70, application start time 0x01c8d28679b262eb.

There are a few known issues; all corrected if you install SQL Server 2005 SP2, but getting Service Pack 2 installed makes this a chicken and egg scenario.  Here a references knowledge base references.

KB 954835 - http://support.microsoft.com/kb/954835/en-us   (This article currently states that you should use a CPU value that is a power of 2.   This is only partially correct and comments to the KB have been submitted.)

KB 931666 - http://support.microsoft.com/kb/931666/en-us  Systems with 9, 17, 18, 19, 33, 34, 35, 36,37,38,39 encounter the issue

COMMON WORKAROUND STEPS

 

1.       KB 954835 describes how to use the BOOT.INI or MSCONFIG.exe to set the NumProc boot option.   Set this to 1 and not a power of 2 to avoid the vast majority of the scenarios.

2.       Install SQL Server and SQL Server Service Pack 2 with the computer configured to look like a single CPU system.

 

ALTERNATE WORKAROUND STEPS

 

                There are a few conditions where the common workaround is not sufficient.

 

1.       At the time you get the failure to start SQL Sevrver during setup you are presented with a retry option.  

 At this time replace the SQLSERVR.exe and SQLOS.dll into the BINN folder at your install location from an Service Pack 2 installation and then Retry.  This allows the SQL Server to start and complete the setup.
YOU ARE NOW IN A PRODUCTION UNSUPPORTTED SITUATION.


2.       Immediately apply SQL Server SP2 and you are back in a supported configuration.

 

 

Reference: Triple Core, 3-Core, Tri-Core, Intel,  AMD Phenom

 

Bob Dorr
SQL Server Senior Escalation Engineer

Supporting SQL Server 2008: The system_health session

In my first post on SQL Server 2008, I introduced a new feature called Extended Events (we like to call this Xevent internally). While we were working with the XEvent development team, it dawned on us that we could perhaps build an event session that was "always-on" (in fact internally we were calling this the "Always-On" session). In this session, we could add some events that were perhaps important for troubleshooting certain problems with the SQL Server Engine.

Thus was born the system_health Extended Events session. So if you install SQL Server 2008 RC0, an run the following query:

select * from sys.dm_xe_sessions

you will find a session that is already running on your server called system_health. The target for this session is a ring_buffer so when you want to see the results in XML you would run a query like this:

select CAST(xet.target_data as xml) from sys.dm_xe_session_targets xet
join sys.dm_xe_sessions xe
on (xe.address = xet.event_session_address)
where xe.name = 'system_health'

Since this is a ring_buffer based target, when the buffer fills up we will overwrite the oldest entries.

This session has events we at Microsoft defined (in fact, our CSS team were the ones that proposed to the development team what should go in this session). If you want to see the details of what events are in the session, you can see the actual script we use to build it after you install SQL Server 2008 in the INSTALL directory in a file called u_tables.sql (this is important because if you ever accidentally drop the session you can always create it again from this script).

To give you a feel for what we put in this session, here is a summary:

  • The sql_text and session_id for any sessions that encounter an error with severity >=20
  • the sql_text and session_id for any sessions that encounter a "memory" type of error such as 17803, 701, etc (we added this because not all memory errors are severity >=20)
  • A record of any "non-yielding" problems (you have sometimes seen these in the ERRORLOG as Msg 17883)
  • Any deadlocks that are detected
  • The callstack, sql_text, and session_id for any sessions who have waited on latches (or other interesting resources) for > 15 seconds
  • The callstack, sql_text, and session_id for any sessions who have waited on locks for > 30 seconds
  • The callstack, sql_text, and session_id for any session that have waited for an extended period of time for "external" waits or "pre-emptive waits".

So what this means to you is that at any time you could take a look at this system_health session data and find out if you have had any high severity errors (and see immediately what query was run that encountered the error), queries that encountered long waits, or other scenarios as described above.

The events we added to this session were ones we felt comfortable could be turned on always without compromising server engine performance.

So anytime you have are having engine issues with SQL Server 2008, take a peek at this session and see what is inside.

In my next post for SQL Server 2008, I'll explain more what "external" waits or "preemptive waits" are and how this new feature in SQL Server 2008 could help pinpoint some unusual performance issues.

 

Bob Ward
Microsoft

clip_image001_2

Posted by psssql | 1 Comments

SQL Server reports operating system error 1450 or 1452 or 665 (retries)

Overview

The error: 1450/1452 insufficient system resources exist to complete the requested service.

These are not a SQL Server based errors. The error cause is related to a depleted system resource (non-paged pool, paged pool ,...). These issues have to be tracked down at the operating system level. This involves collection of system level performance monitor counters and evaluating the basic health of the overall system.

/3GB
A commonly overlooked configuration option is the use of /3GB. The /3GB boot switch limits the operating system space to 1GB on a 32 a bit system. If your system is running with /3GB and you can't pinpoint the resource depletion consider removing the /3GB boot switch.

SQL Server Retries

When these errors occur they usually appear during SQL Server I/O operations (ReadFile, WriteFile, ReadFileScatter, WriteFileGather, DeviceIoControl, …) SQL Server identifies the situation, pauses briefly (a few milliseconds) to allow the system to respond and retries the I/O operation.

References: http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx and http://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspx

Specialized Symptom

For SQL Server 2005 there is another 1450/665 condition that can arise when performing an online DBCC CHECKDB or if you have snapshot databases (Create Database For Snapshot) defined.

Reference: http://blogs.msdn.com/psssql/archive/2008/02/07/how-it-works-sql-server-2005-database-snapshots-replica.aspx

When a sparse file (used for snapshot database files) is populated Windows limits the amount of data that may reside in the file. Once the amount of data stored in the sparse file exceeds the limit further data storage in the file may be prevented.

· Windows 2003 - 64GB (Error 1450 returned)
· Windows 2008 and Vista - 16GB (Error 665 returned)

Windows 2003 Error: The operating system returned error 1450(Insufficient system resources exist to complete the requested service.) to SQL Server during a write at offset 0x000031abb4e000 in file with handle 0x00000F74. This is usually a temporary condition and the SQL Server will keep retrying the operation. If the condition persists then immediate action must be taken to correct it.

Windows 2008 Error: The operating system returned error 665(The requested operation could not be completed due to a file system limitation) to SQL Server during a write at offset 0x000005bd3dc000 in file 'Test.mdf:MSSQL_DBCC8'

Online DBCC uses an internal snapshot database (secondary file stream like H:\MSSQL\Data\MyDB_Data.MDF:MSSQL_DBCC8). If the total copy on write, page copy activity exceeds these limits DBCC may stall, hang or fail. Some of the messages you may see in the SQL Server error log are shown.

The operating system returned error 1450(Insufficient system resources exist to complete the requested service.) to SQL Server during a write at offset 0x000031abb4e000 in file with handle 0x00000F74. This is usually a temporary condition and the SQL Server will keep retrying the operation. If the condition persists then immediate action must be taken to correct it.

Timeout occurred while waiting for latch: class 'ACCESS_METHODS_HOBT_COUNT', id 6092AFA4, type 4, Task 0x00BD4208 : 0, waittime 600, flags 0x98, owning task 0x00000000. Continuing to wait.

*******************************************************************************
* BEGIN STACK DUMP:
* 06/26/08 12:08:00 spid 89
* Latch timeout

Other actions that are attempting to modify pages may become stalled attempting to write to the snapshot file and checkpoint can be a secondary victim.

SQL Server Mini-Dumps

The built in SQL Server health checks can detect several of the conditions, log errors and generate mini-dumps in the LOG directory. Using the debug techniques outlined in the following article http://www.microsoft.com/technet/prodtechnol/sql/2005/diagandcorrecterrs.mspx you too can look at the information.

Using the Windows Debugging Tools and the public symbols for Windows and SQL Server the following information can be obtained. You can then search for the Sleep call. There are some threads that this is valid for but for this issue you don’t want to see the SyncWritePreemptive calling Sleep.

0:000> .sympath
Symbol search path is: http://msdl.microsoft.com/download/symbols

0:000> ~*kb


27 Id: 804.b30 Suspend: 1 Teb: 7ff9a000 Unfrozen
ntdll!KiFastSystemCallRet
ntdll!NtDelayExecution+0xc
kernel32!SleepEx+0x68
kernel32!Sleep+0xf
sqlservr!FCB::SyncWritePreemptive+0x228
sqlservr!FCB::CopyPageToReplicas+0x327
sqlservr!BUF::CopyOnWrite+0x56

The previous stack shows the CopyOnWrite behavior attempting to secure the page in the replica (snapshot) before the page can be changed. Sleep is only called when 1450 or 1452 is encountered.

Resolution:

· Run DBCC CHECKDB in OFFLINE mode (WITH TABLOCK)

· Run DBCC CHECKDB at a time when less data modification is taking place

· Divide the database into a few more files. The limitations are per sparse file and each database snapshot creates matching sparse files for each database file. For example if the database files are each limited to 64GB in total size you can't exceed the limitation on a per file basis.

· Remove snapshot databases that have grown large.

Bob Dorr
SQL Server Senior Escalation Engineer

SQL Server I/O Bottleneck, I don't have one, YES YOU DO!

The mistake I see people make is when looking at the SQL Server PAGE I/O waits and stalled I/O warnings is when comparing it to the Avg. Disk Seconds/Transfer.    Everyone seems to forget that average means average and they look at the value and say I don't see any I/O taking longer than 15 seconds.

Take the following example over a 5 minute window.

2000 I/Os at 8ms   = 16000ms

1 I/O at 15000ms   = 15000 ms

TOTAL TIME: 31000ms

AVG: 15.49ms   (31000 / 2001 I/O Requests)

You are not going to see a spike to 15000ms in the Avg Disk Sec/Transfer but instead a small change from the norm that is very hard to detect from just the Avg Disk Sec/Transfer.

Remember to pay close attention to the following SQL Server data points to help determine your I/O response rates.

  • PAGE I/O* Waits   (select * from sys.dm_os_wait_stats where wait_type like '%PAGEIO%')
  • Virtual File Stats (select * from sys.dm_io_virtual_file_stats(-1, -1))
  • Stalled I/O Warnings
  • Additional disk based performance counters and available hardware utilities

Don't forget that SQL Server drives read ahead, checkpoint and other I/O patterns to depths beyond a disk queue length of 2 and this is expected.

Bob Dorr
SQL Server Senior Escalation Engineer

SQL Server, Windows Server 2008, and the Windows Firewall

Sounds like a movie title. No, it is actually a topic you need to know about if you plan to install SQL Server 2005 or SQL Server 2008 on Windows Server 2008.

Instead of me trying to explain the details I would rather you learn from the experts on this topic. A very good blog post has been created by Shawn Hernan from the security group of the SQL Server Development team to get you started. So,....if you plan to install SQL Server 2005 or SQL Server 2008 (when it releases) on Windows Server 2008, you must read Shawn's post at:

http://blogs.msdn.com/sqlsecurity/archive/2008/07/01/sql-server-and-the-windows-server-2008-firewall.aspx

Bob Ward
Microsoft

Posted by psssql | 1 Comments

Get Ready for SQL Server 2008

Someone who contributes to this blog posted back in April that after the Europe PASS conference they would start in May creating some posts about SQL Server 2008 New Features especially ones related to diagnostics. I don't know who posted that but it is now July and no posts have been seen (who was that guy?). So I guess I'll step in and start creating these posts given that we are marching towards a release of SQL Server 2008 in Q3 of this year.

The first post of this series is to tell you about a new feature for diagnostics in the SQL Server Engine called Extended Events. I've personally spent time working on this feature with the product team and just recently completed some training information for our CSS team on this topic, so this seems like a good time to talk about it.

What is Extended Events? It is a new diagnostic system for the engine that can be used to track interesting places in the code, collect data about them, and even take some action for that event. It is not a replacement for SQLTrace although there is certainly some similarities in how you can use both features.

First, to get primed, take a stroll through the Books Online on this topic at http://msdn.microsoft.com/en-us/library/bb630354(SQL.100).aspx. This gets you a feel for this feature and basics of how it can be used. But if that didn't sync in, let me show you can example T-SQL script to create a session for a particular purpose (NOTE: I recommend you install the latest release candidate to try any of this):

drop event session dumptrigger on server
go
create event session dumptrigger on server
add event sqlserver.error_reported
(
action
(
sqlserver.session_id,
sqlserver.create_dump_all_threads,
sqlserver.sql_text
)
where error = 602
and package0.counter <= 3
)
add target package0.ring_buffer
with (max_dispatch_latency=1seconds)
go
alter event session dumptrigger on server state = start
go

I wrote this script because I found a 602 error in my ERRORLOG file but I wasn't sure what query was running that encountered the error. Furthermore, when I called CSS they said they needed a userdump when this problem occurred to help me.

So if I run this script, I get a new Extended Events session that will only produce data and collect a dump file if the engine encounters another 602 error and only do this up to 3 times as long as this event is running.

If I want to look at the data from this session without using the dump file, I can do that in XML format by looking at  Dynamic Management View (DMV) called sys.dm_xe_session_targets. I can cast the target_data column to XML like this:

select CAST(xet.target_data as xml) from sys.dm_xe_session_targets xet
join sys.dm_xe_sessions xe
on (xe.address = xet.event_session_address)
where xe.name = 'dumptrigger'

The resulting XML will look something like this:

<RingBufferTarget eventsPerSec="125" processingTime="8" totalEventsProcessed="1" eventCount="1" droppedCount="0" memoryUsed="319">
  <event name="error_reported" package="sqlserver" id="100" version="1" timestamp="2008-04-30 16:11:18.602">
    <data name="error">
      <type name="int32" package="package0" />
      <value>602</value>
      <text />
    </data>
    <data name="severity">
      <type name="int32" package="package0" />
      <value>21</value>
      <text />
    </data>
    <data name="state">
      <type name="int32" package="package0" />
      <value>30</value>
      <text />
    </data>
    <data name="user_defined">
      <type name="boolean" package="package0" />
      <value>false</value>
      <text />
    </data>
    <data name="message">
      <type name="unicode_string" package="package0" />
      <value>Could not find an entry for table or index with
partition ID 491897996509184 in database 2. This error can occur if a stored
procedure references a dropped table, or metadata is corrupted. Drop and re-create
the stored procedure, or execute DBCC CHECKDB
</value>
      <text />
    </data>
    <action name="session_id" package="sqlserver">
      <type name="uint16" package="package0" />
      <value>54</value>
      <text />
    </action>
    <action name="sql_text" package="sqlserver">
      <type name="unicode_string" package="package0" />
      <value>select * from badtable</value>
      <text />
    </action>
  </event>
</RingBufferTarget>

This is a fairly advanced diagnostic scenario but the above script could be applied to any error (even ones you raise yourself). This is just an example and doesn't begin to touch on the power of Extended Events. So for my next post, I'll talk more about Extended Events including an event session we ship with the product that is always on helping you look at problems when they occur the first time (including the one I just showed you).

Bob Ward
Microsoft

clip_image001

Posted by psssql | 1 Comments

How It Works: Trace (.TRC) File Security

SQL Server 2000 shipped with different trace file security than later builds of SQL Server 2000, SQL Server 2005 and SQL Server 2008.

The SQL Server process (sqlservr.exe) writes the trace files to the specified destination.   The SQL Server 2000 release did not explicitly set the ACLs on the file when it was created.  The behavior was to allow the file to inherit the permissions of the destination directory. 

Closer inspection quickly indicates that for security reasons this is not the best design.  The trace files (.trc) can contain sensitive data (names, addresses, account numbers, ....).   These files need to be treated in a sensitive manor just like the database files (.mdf, .ldf).

SQL Server now ACLs the trace files with the same user permissions as that of the database files.   This is usually the service account SQL Server executes under and the Computer\Administrators group.   The permissions of the directory are no longer inherited.

The owner of the trace files must explicitly grant security permissions to others in accordance with security policies and company guidelines.

Bob Dorr
SQL Server Senior Escalation Engineer

Posted by psssql | 3 Comments

How It Works: More on DBCC Shrink* Activities

My peers are starting to tease me about becoming a dbcc shrink* expert. (Ha, Ha I said.)  Then, I uncovered another design facet about shrinkfile helping with a SQL Server 2000 case. 

The situation:    dbcc shrinkfile, with EMPTYFILE, against a file with only 128 pages was taking hours to complete.   In the end it is by design but it was not a design I anticipated.   

Here is what is taking place during the shrink of this file.

  • Shrink setups a GAM scan for the file to be shrunk.  Starting at the beginning of the file pages are read ... 1, 2, 3, ....  
  • As each page is processed the page type is used to determine the necessary data movement behavior.    For data pages this results in a delete and insert pairing to move the row to another file so EMPTYFILE can complete.  
  • When a BLOB page is encountered I thought that each fragment would be moved to a new page on another file.   This does happen but not as I envisioned.   Because the BLOB fragments are not doubly linked, a broader scan is required to locate the root of the chains that have fragments that need to be moved occurs.
    • When a BLOB page is located a new scan is established.  The new scan is an IAM scan over the TEXT/IMAGE (BLOB) chain for the object found on the original page to be moved.
    • Each page in the IAM chain is read and each fragment on the page is reviewed. 
    • Any fragment found that points to a fragment on the file to be EMPTIED is identified and the associated BLOB chain moved.

So to move a BLOB page may require the entire blob chain to be scanned.   In this customers case they have a 1TB database and this BLOB index was ~500GB.   So to move the BLOB page lots of I/O and scanning take place.

I have not studied the behavior of SQL Server 2005 in regards to this behavior.

You can read more about this subject from my previous posts about shrink*.

http://blogs.msdn.com/psssql/archive/2008/03/28/how-it-works-sql-server-2005-dbcc-shrink-may-take-longer-than-sql-server-2000.aspx

http://blogs.msdn.com/psssql/archive/2008/03/24/how-it-works-dbcc-shrinkdatabase-shrinkfile-tabular.aspx

References: shrinkfile, shrinkdb, shrinkdatabase

Bob Dorr
SQL Server Senior Escalation Engineer

Posted by psssql | 1 Comments
Filed under: ,

Helpful Hint: Making Review of a Query Plan Easier

I think we all have looked at the output of statistics profile and wished the Estimated Rows and Estimated Executions columns were placed beside the Rows and Executes columns.  Well, they can be.

In grid mode you simply drag the columns beside each other.  

If the output is in text mode the Visual Studio Editor provides the ATL+LEFT MOUSE column selection capability.   Since SQL Server Management Studio is hosted in this shell you can hold down the ALT key while using the mouse to select a column of data.  

Here is what the column selection look like.  Instead of a word or a row highlighted you can select the same column positions vertically in the text data.

image

After the data is highlighted you can copy and paste it next to the rows.

image

Bob Dorr
SQL Server Senior Escalation Engineer

Posted by psssql | 1 Comments
Filed under:

Query Performance issues associated with a large sized security cache

In the past couple of months, SQL Server support team has come across some customers running into performance issues attributed to TokenAndPermUserStore in SQL Server 2005. This blog post attempts to compile all the information we have so far regarding this problem.

 

TokenAndPermUserStore is one of the many caches present in the SQL Server 2005 memory architecture. As the name implies, this cache stores various security related information used by the SQL Server Engine. If you are curious to know the different types of tokens currently cached in this cache on your server, the following query will provide you the information:

SELECT COUNT(*) as TokenCount, *

FROM

(SELECT

       x.value('(//@name)[1]', 'varchar (100)') AS [Token Name],

       x.value('(//@class)[1]', 'bigint') AS [Class],

       x.value('(//@subclass)[1]', 'int') AS [SubClass]

FROM

       (SELECT CAST (entry_data as xml)

       FROM sys.dm_os_memory_cache_entries

       WHERE type = 'USERSTORE_TOKENPERM')

              AS R(x)

       ) a

GROUP BY [Token Name],[Class],[SubClass]

NOTE: Please do not run this command on a production server during peak load as it could take a while to finish depending upon the number of entries present in the cache.

 

The problems we have seen are specific to token type TokenAccessResult with a class of 65535. These tokens represent information about cumulative permission checks for queries. For more information on this, refer to the KB article: Queries take a longer time to finish running when the size of the TokenAndPermUserStore cache grows in SQL Server 2005.

 

There are several indicators you can monitor to determine if you are running into this class of problems.

1.        The amount of memory used by this security token cache

2.        The number of entries present in this security token cache

3.        The extent of contention on this security token cache

 

To find out the amount of memory consumed by this token cache, you can query the DMV’s as follows:

SELECT SUM(single_pages_kb + multi_pages_kb) AS "SecurityTokenCacheSize(kb)"

FROM sys.dm_os_memory_clerks

WHERE name = 'TokenAndPermUserStore'

 

There is no specific threshold for this size beyond which the problem starts to happen. The characteristic you need to monitor is the rate at which this cache size is growing. If you are encountering problems with this cache, then you will notice that as the size of the cache grows, the nature of the problems you experience becomes worse. On a sample server that experienced this problem, the cache grew at a rate approximately 1MB per min to reach close to 1.2 GB. We have seen the problem starting to show up even when the size of this cache reaches several hundred MB.

 

Next, to understand if there is contention while accessing this cache, you have to execute commands like the following:

 

Servers running builds < SQL 2005 SP2

SET NOCOUNT ON

CREATE TABLE #spins([Spinlock Name] varchar(50),Collisions numeric,Spins numeric,[Spins/Collision] float)

INSERT INTO #spins EXECUTE ('DBCC SQLPERF (''SPINLOCKSTATS'')')

SELECT TOP 20 * FROM #spins ORDER BY Collisions DESC

DROP TABLE #spins

Servers running builds >= SQL 2005 SP2

SET NOCOUNT ON

CREATE TABLE #spins([Spinlock Name] varchar(50),Collisions numeric,Spins numeric,[Spins/Collision] float,[Sleep Time (ms)] numeric,Backoffs numeric)

INSERT INTO #spins EXECUTE ('DBCC SQLPERF (''SPINLOCKSTATS'')')

SELECT TOP 20 * FROM #spins ORDER BY Collisions DESC

DROP TABLE #spins

 

In this output, the row you need to pay attention to is the following one:

Spinlock Name

Collisions

Spins

Spins/Collision

Sleep Time (ms)

Backoffs

MUTEX

7,714,984

1,020,994,305,589

132,339

55,000,343

101,024,820

MUTEX

8,143,896

1,055,469,534,588

129,603

61,055,906

104,387,627

This sample output was captured across 30 minute duration on a server experiencing this problem. Basically this output shows that various threads inside the SQL Server process is contending for a spinlock named MUTEX. The values above are various attributes that indicate the extent and nature of contention. Spinlock is a very lightweight synchronization mechanism used with the SQL Server engine. Depending upon the data structure that a particular spinlock protects, it is given a unique name within the SQL Engine. MUTEX is the name of the spinlock which protects the security token cache (among a few other things) that we are discussing here. Very similar to the size, what you want to monitor is the rate at which the various values here increase.

 

The symptoms that you want to correlate with the above data points include a combination of the following:

1.        Queries which normally finish faster take a long time

2.        CPU usage of SQL Server process is relatively higher. CPU usage could come down after remaining high for a period of time.

3.        Connections from your applications keep increasing (specifically in connection pool environments)

4.        You encounter connection or query timeouts

 

Once you have analyzed all this data and made a determination, then we recommend you to ensure that you have applied the following fixes on a SQL 2005 SP2 instance. Here is a short description on what fix each build is going to offer you:

 

SQL Server 2005 SP2 build [9.00.3042.00] 

                TokenAccessResult security cache entries for ad hoc queries are cached when a specific ad hoc query is executed for the second time

                KBA: 927396: Queries take a longer time to finish running when the size of the TokenAndPermUserStore cache grows in SQL Server 2005

SQL Server 2005 post SP2 hotfix build [9.00.3153.00]

                Fix to prevent duplication of TokenAccessResult entries due to change in the protection timestamp used.

                KBA: 933564: A gradual increase in memory consumption for the USERSTORE_TOKENPERM cache store occurs in SQL Server 2005

SQL Server 2005 post SP2 hotfix build [9.00.3171.00]

                Fix to prevent memory allocation contention for TokenAccessResult entries

                KBA: 937745: You may receive error messages when you try to log in to an instance of SQL Server 2005 and SQL Server handles many concurrent connections

SQL Server 2005 post SP2 hotfix build [09.00.3179.00]

Fix to prevent Memory consumption increase by the TokenandPermUserStore even if the number of entries does not increase

KBA: 939871: Not yet published to support.microsoft.com

 

In order to get all of these fixes, you can install the Cumulative update package 3 for SQL Server 2005 Service Pack 2. This will take you to build 9.00.3186.00. You might also install a later Cumulative Update package and that will include all of these fixes as well.

 

Microsoft SQL Server Best Practices Analyzer will generate the following warning:

Apply Cumulative Update to Improve Query Performance (SQL Server BPA)

On versions of SQL Server before 9.0.3171.0, a known issue can degrade query performance on 32-bit and 64-bit systems with 2 gigabytes (GB) or more of memory. When you execute queries under the context of a login that is not a member of the sysadmin fixed server role, you might encounter performance degradation symptoms that arise from a large and growing Security Token cache. These issues can include performance degradation of queries, high CPU usage for