Most of my work is supporting and troubleshooting production environments and very often developers I work with say

 

“We are having some performance issues in our website and we believe the bottleneck is in SQL Server. We have collected some profilers and some queries take a long time to finish. But if we execute them in SQL Management Studio or Query Analyzer they have small execution durations. I think there is a bug in Sql Server”

 

Now, whenever I hear this automatically a tiny voice inside my head keeps saying blocking, blocking and blocking.

 

When I need to troubleshoot this kind of issues I always take the same approach witch is to use SQL Profiler with what I believe to be one of the most valuable stored procedure but very often forgotten. I´m talking about sp_blocker_pss80.

 

This stored procedure is available at http://support.microsoft.com/kb/271509 and also there are instructions there on how to put it to run. What I’m going to talk about is how to read the output file that it generates and in conjunction with SQL Profiler we can get some clues on what´s blocking our statements.

 

The first step is to open this file with your favorite text editor (Notepad.exe J ). The structure of this file resembles something like

 

8.2 Start time: 2008-06-28 09:45:53.263 0

 

SYSPROCESSES BALEIXO2008  150996343 (2)

 spid   status                         blocked open_tran waitresource

(…)

 

SYSPROC FIRST PASS

 spid   ecid   waittype

 ------ ------ --------

(…)

 

SPIDs at the head of blocking chains (1)

 spid  

(…)

 

SYSLOCKINFO (4)

 spid   ecid   dbid   ObjId       IndId  Type Resource         Mode         Status TransID              TransUOW   

(…)

 

(3)

**************************************************

Print out DBCC Input buffer for all blocked or blocking spids.

**************************************************

(…)

 

The numbers above in yellow show the order I usually follow to find my problems. This doesn´t mean it´s the best way or the only one but it´s the one I’m used to and so far got me good results.

The first step is to do a search for SYSLOCKINFO. Above this, is the list of SPIDs at the head of blocking chains (1). If this list is empty I usually move forward to the next SYSLOCKINFO section (doesn´t mean I don´t come back).

 

In my example I found this

 

SPIDs at the head of blocking chains

 spid  

 ------

     52

 

So I have a spid 52 that is blocking something. After this I look at my list of SYSPROCESSES (2). It has a lot of columns (not all are shown below) but don´t be scared. A couple of them will give us most of the time the information we need.

 

SYSPROCESSES BALEIXO2008  150996343

 spid   status                         blocked open_tran waitresource           (…)              

 ------ ------------------------------ ------- --------- --------------------------

      2 background                           0         0                                      

      3 suspended                            0         0                                      

      4 background                           0         0                                      

      5 background                           0         0                                      

      7 background                           0         0                                       

      9 background                           0         0                                      

     10 suspended                            0         0                                      

     11 background                           0         0                                       

     12 background                           0         0                                      

     13 background                           0         0                                      

     52 runnable                             0         1                                      

     54 suspended                           52         0 KEY: 5:72057594038321152 (010086470766)

 

So it seems that spid 54 is blocked by spid 52. This means that whatever spid 54 is supposed to do isn´t doing because spid 52 does not allow it. The first columns I usually look are BLOCKED and SPID just to get the picture of what´s not running. The column open_tran is also valuable because will let me know if i´m inside a transaction. Finally the column waitresource will tell me the resource that spid 54 want´s to access but can´t. I will explain later on how to find the name of this resource.

 

Still in the SYSPROCESSES the next columns I look at for spids I find suspects (52,54) are WAITTIME, CMD and LASTWAITTYPE.

 

waittype waittime             cmd              lastwaittype                  

-------- -------------------- ---------------- -------------------------------

0x0060                    327 LAZY WRITER      LAZYWRITER_SLEEP               

0x007F                 101821 LOG WRITER       LOGMGR_QUEUE                  

0x0080                   3229 LOCK MONITOR     REQUEST_FOR_DEADLOCK_SEARCH   

0x009D               11119002 SIGNAL HANDLER   KSOURCE_WAKEUP                

0x00A2                   2230 TRACE QUEUE TASK SQLTRACE_BUFFER_FLUSH         

0x00A9               11134508 BRKR TASK        BROKER_TRANSMITTER            

0x0081                  60231 CHECKPOINT       CHECKPOINT_QUEUE              

0x007E               11135850 TASK MANAGER     ONDEMAND_TASK_QUEUE           

0x00AD               11118971 BRKR EVENT HNDLR BROKER_EVENTHANDLER           

0x00A9               11134524 BRKR TASK        BROKER_TRANSMITTER            

0x0000                      0 UPDATE           SOS_SCHEDULER_YIELD       (SPID=52)    

0x0003                  15412 SELECT           LCK_M_S                        (SPID=54)

 

The yellow lines above represent my spid 52 and 54. No wonder my spid 54 is a long runner. It´s waiting for more that 15 seconds. The CMD columns tells me that spid 54 wants to do a SELECT (using the object in WAITRESOURCE) but spid 52 is already performing an UPDATE (using the object in WAITRESOURCE) and does not allow spid 54 proceed. Finally the lastwaittype provides me more details on the last wait type.

 

You can see details on each type at http://support.microsoft.com/kb/822101 (Description of the waittype and lastwaittype columns in the master.dbo.sysprocesses table in SQL Server 2000 and SQL Server 2005)

 

Now that I have a picture of what is going on, the next step is to identify my spids. To do this I look at DBCC Input Buffer (3)

 

**************************************************

Print out DBCC Input buffer for all blocked or blocking spids.

**************************************************

(…)

DBCC INPUTBUFFER FOR SPID 52

 EventType      Parameters EventInfo                                                                                

 -------------- ---------- ----------------------------------------

 Language Event          0 begin transaction

while 1=1

begin

update Employees set FirstName='MyName'

end

commit

 

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

 

DBCC INPUTBUFFER FOR SPID 54

 EventType      Parameters EventInfo              

 -------------- ---------- -----------------------

Language Event          0 select * from Employees

 

 

In my example I see that spid 54 wants to fetch some rows from table Employees and spid 52 has a transaction and inside this transaction is performing some updates also on this table. This is the place where you will see your stored procedures or something else. Now just a small remark:

 

Not always we can trust the information in DBCC Input Buffer, and that is one of the reasons I always get a profiler so that if I have any doubt that the statements that show up don´t make any sense I can verify. To do this I look at start time and then look at SQL Profiler to a match to know what was running at the time.

 

8.2 Start time: 2008-06-28 09:45:53.263 0

 

Finally in (4) we can see exactly SYSLOCKINFO. This view has more details on what actually is going on and it´s a great source of information to see details. Probably, very often with the information you collected and looking at these statements you will be able to find why they are taking so long.

 

SYSLOCKINFO

 spid   ecid   dbid   ObjId       IndId  Type Resource         Mode         Status TransID      

 ------ ------ ------ ----------- ------ ---- ---------------- ------------ ------ --------------

     52      0      5           0      0 DB                    S            GRANT                

     54      0      5           0      0 DB                    S            GRANT               

     52      0      5  2073058421      1 PAG  1:198            IX           GRANT               

     54      0      5  2073058421      1 PAG  1:198            IS           GRANT               

     52      0      5  2073058421      3 PAG  1:202            IU           GRANT               

     52      0      5  2073058421      1 KEY  (06003f7fd0fb)   X            GRANT               

     52      0      5  2073058421      1 KEY  (0400b4b7d951)   X            GRANT               

     52      0      5  2073058421      1 KEY  (08000c080f1b)   X            GRANT               

     52      0      5  2073058421      1 KEY  (0900696fb3a3)   X            GRANT               

     52      0      5  2073058421      1 KEY  (0500d1d065e9)   X            GRANT               

     52      0      5  2073058421      0 TAB                   IX           GRANT               

     54      0      5  2073058421      0 TAB                   IS           GRANT               

     52      0      5  2073058421      1 KEY  (07005a186c43)   X            GRANT               

     52      0      5  2073058421      1 KEY  (020068e8b274)   X            GRANT                

     52      0      5  2073058421      1 KEY  (03000d8f0ecc)   X            GRANT               

     52      0      5  2073058421      3 KEY  (9f007d710bca)   U            GRANT               

     52      0      5  2073058421      1 KEY  (010086470766)   X            GRANT               

     54      0      5  2073058421      1 KEY  (010086470766)   S            WAIT                

 

Above we can verify the reason why spid 54 is waiting. First column shows spid and what kind of locks it has acquired.

The column I always look is STATUS. Whenever it says WAIT it rings a bell. Just to verify what we were already thinking spid 54 is waiting to acquire a shared lock (S). Spid 52 has acquired several locks but one is very interesting. It says that spid 52 has a X lock on the same resource that spid 54 is trying to acquire. No wonder spid 54 is suspended.

 

I´ve been talking about accessing objects but so far we only see this strange numbers (2073058421 ) or KEY: 5:72057594038321152 (010086470766). This is the moment I said earlier about discovering this objects. To find them we have to do some queries in SQL Server because our output file does not provide this information.

 

First of all don´t forget TRACE flag 3604, otherwise the statements using DBCC will execute but you won´t see details.

 

DBCC TRACEON (3604)

 

In our case the object has an id 2073058421  (columns objID) . The next steps are performed in SQL Server Studio. To discover the name of this object

 

select object_name(2073058421)

 

Employees

 

In our example the object involved is called Employees. My demo was very simple but in a big stored procedure the name of the objects are very important to help identifying the location of the problem. There are a couple of DBCC commands that can help you. The one I use the most is  DBCC PAGE. More information on this at http://support.microsoft.com/kb/q83065/

 

Just to show you the output of this I will use one of the rows in SYSLOCKINFO

 

 

SYSLOCKINFO

 spid   ecid   dbid   ObjId       IndId  Type Resource         Mod

 ------ ------ ------ ----------- ------ ---- ---------------- ---

     52      0      5           0      0 DB                    S 

     54      0      5           0      0 DB                    S 

     52      0      5  2073058421      1 PAG  1:198            IX

     54      0      5  2073058421      1 PAG  1:198            IS

     52      0      5  2073058421      3 PAG  1:202            IU

 

 

 

DBCC Page takes as mandatory arguments dbid and page number.

 

DBCC TRACEON (3604)

DBCC PAGE(5,1,198) -- (1:198->1,198)

 

PAGE: (1:198)

 

 

BUFFER:

 

 

BUF @0x0000000081FFE300

 

bpage = 0x0000000081F8C000           bhash = 0x0000000000000000           bpageno = (1:198)

bdbid = 5                            breferences = 0                      bUse1 = 17224

bstat = 0xc0000b                     blog = 0x3212159                     bnext = 0x0000000000000000

 

PAGE HEADER:

 

 

Page @0x0000000081F8C000

 

m_pageId = (1:198)                   m_headerVersion = 1                  m_type = 1

m_typeFlagBits = 0x4                 m_level = 0                          m_flagBits = 0x8000

m_objId (AllocUnitId.idObj) = 67     m_indexId (AllocUnitId.idInd) = 256 

Metadata: AllocUnitId = 72057594042318848                                

Metadata: PartitionId = 72057594038321152                                 Metadata: IndexId = 1

Metadata: ObjectId = 2073058421      m_prevPage = (0:0)                   m_nextPage = (0:0)

pminlen = 28                         m_slotCnt = 9                        m_freeCnt = 4941

m_freeData = 4998                    m_reservedCnt = 0                    m_lsn = (35:414:10)

m_xactReserved = 0                   m_xdesId = (0:0)                     m_ghostRecCnt = 0

m_tornBits = 1107450946             

 

Allocation Status

 

GAM (1:2) = ALLOCATED                SGAM (1:3) = NOT ALLOCATED          

PFS (1:1) = 0x60 MIXED_EXT ALLOCATED   0_PCT_FULL                         DIFF (1:6) = CHANGED

ML (1:7) = NOT MIN_LOGGED           

 

execution completed. If DBCC printed error messages, contact your system administrator.

 

In yellow you can see our object id.

 

Finally we just have to repeat this process until we identify all the possible bottlenecks in our statements and then optimize.

 

Just one quick note: usually in my SQL Profiler I collect  CPU, Reads, Writes, Text Data, Execution Plan, Transaction ID. This will allow me (after all the hard work identifying misbehaved queries) to begin the improvement process.

 

Bruno