Sporadic but periodic slow Inserts into a table - Why you should apply SQL Server 2005 SP2

Sporadic but periodic slow Inserts into a table - Why you should apply SQL Server 2005 SP2

  • Comments 1

Analyzing a SQL Query trace with SAP’s ST05 SQL Trace, a pattern could be observed which looks like several inserts into a specific table taking place as expected in the milliseconds range but suddenly one inserting query takes hundreds of milliseconds up to a few seconds. Afterwards inserts are back to normal millisecond times and suddenly a long one happens again. One can see a nice pattern of periodic long inserts. Looking at a ST05 output which got consolidated in Excel, the situation could look like:

Duration in ms

Obj. Name

Records

Statement

 7246

COEP  

12

 INSERT VALUES...

209

COEP  

12

 INSERT VALUES...

397

COEP  

12

 INSERT VALUES...

287

COEP  

12

 INSERT VALUES...

8942

COEP  

12

 INSERT VALUES...

468

COEP  

12

 INSERT VALUES...

154

COEP  

12

 INSERT VALUES...

749

COEP  

12

 INSERT VALUES...

9724

COEP  

12

 INSERT VALUES...

387

COEP  

12

 INSERT VALUES...

637

COEP  

12

 INSERT VALUES...

481

COEP  

12

 INSERT VALUES...

8239

COEP  

12

 INSERT VALUES...

410

COEP  

12

 INSERT VALUES...

391

COEP  

12

 INSERT VALUES...

297

COEP  

12

 INSERT VALUES...

 

As one can see every fourth insert batch of 12 rows was dramatically slower than the normal inserts. The table addressed does have a clustered index plus multiple non-clustered indexes. The rows also have a length of more than 1 KB.

Especially from SAP customers we got this kind of phenomena reported for several years, but it took SQL Server 2005 introducing new DMVs until we really could analyze what is happening behind the scenes and then accordingly improve SQL Server to avoid these situations.  The way we finally could analyze the issue was with this statement:

Query1:

select r.session_id, r.status, r.wait_time,wait_type,r.wait_resource, substring(qt.text,(r.statement_start_offset/2) +1,
(case when r.statement_end_offset = -1 then len(convert(nvarchar(max), qt.text)) * 2
else r.statement_end_offset end -r.statement_start_offset)/2) as stmt_executing,r.blocking_session_id, ,r.total_elapsed_time,r.reads as phys_reads ,r.writes, r.logical_reads
from sys.dm_exec_requests r cross apply sys.dm_exec_sql_text(sql_handle) as qt, sys.dm_exec_sessions s
where r.session_id > =50  and r.session_id=s.session_id
order by r.wait_resource desc

Ordering the results of this query in Excel according to the statements and the wait times we ended up with this list:

session_id

status

wait_time

wait_type

wait_resource

stmt_executing

blocking_session_id

total_elpsed_time

phys_reads

logical_reads

325

suspended

7

PAGEIOLATCH_SH

6:9:12126234

INSERT INTO COEP ( ….

0

9593

925

995

474

suspended

9575

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

9575

0

17

1220

suspended

8778

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

8779

0

17

432

suspended

8613

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

8613

0

17

692

suspended

8310

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

8310

0

17

424

suspended

8177

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

8178

2

51

386

suspended

7734

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

7734

0

15

453

suspended

7490

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

7490

0

15

265

suspended

7477

PAGELATCH_SH

6:25:20930394

INSERT INTO COEP ( ….

325

7478

0

16

….

 

 

 

 

 

 

 

 

 

 

What this list told us is:

·         We have session 325 waiting on I/O.

·         Session 325 seems to have a exclusive latch on page 20930394 in datafile 25. All other requests on inserting are blocked by session 325

·         Surprisingly the insert of session 325 already read 925 pages physically, which for inserting 12 rows into the given table is indicating that more than pure index and data pages are read

Checking on the page, the wait_resource of session 325 was pointing to we realized that session 325 seemed to read IAM pages.

This last discovery finally explained what we were seeing. In order to explain what we analyzed and what steps were taken to resolve the issues, follow the next few small sections.

In order to understand what is happening we need to take a little slice of how SQL Server administrates data in a database:

·         A data file within a database is divided in so called GAM intervals of 4GB size. Means if a database is 4TB in size divided over multiple data files, the database has around 1000 GAM intervals

·         A B-Tree (table or index) which has an extent allocated (collection of 8 pages and unit of allocation) in a certain GAM interval has a so called IAM page which documents which extents exactly are allocated by the specific B-Tee in a specific GAM interval. When a B-Tree has extents allocated in 500 out of  the 1000 GAM intervals of our 4TB database, additional to the data and index pages, the B-Tree also has 500 IAM pages each covering one GAM interval

·         The various IAM pages of a B-Tree are organized in a double linked list which can be traversed either from the first IAM page forward or from the last page backwards

·         If the database files get extended or new database files get added and a B-Tree is allocating new extents out of this newly added free space, new IAM pages will be added to the end of the IAM page chain

·         In case a new extent is allocated for a B-Tree, the IAM page covering the GAM interval the new extent is located in needs to be found in the chain of IAM pages for the specific B-Tree

·         For this purpose the IAM page gets scanned starting with the first IAM page. Each IAM page gets checked on the GAM interval it covers

·         If the IAM page is found, the data of the newly allocated extent is inserted into the IAM page

More details on SQL Server’s Relational Engine Design can be found in Books Online under
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/udb9/html/33572c3b-343d-45aa-bb42-1d31423ae5cc.htm

 So far this way of administrating allocations worked pretty reliable not causing any large problems and has existed since SQL Server 7.0. However, under certain conditions, the disadvantages of the way how newly allocated extents got administrated became obvious. One can list the conditions like:

·         We look at a large database with many hundred GAM intervals

·         Within the database we have large tables which have their data spread over a wide range of GAM intervals. This is typically found in databases which are productive for a long time like some of our SAP databases which basically are productive since 10 years and grew from a meager 50GB to sizes in the TeraBytes

·         Larger tables hardly got reorganized and hence have a wide spread over the many GAM intervals

·         The database has a low percentage of free space which results in most new extents to be found towards the end of the data files.

·         The workload applied to the SQL Server Instance is demanding which has the effect that pages are kept in the SQL Server buffer pool for short time only

Inserting new rows into a table now can trigger the following sequence:

·         No free pages are found in acceptable time within the already allocated extents of the B-Tree, therefore a new extent  allocation is triggered

·         A new extent is found for the B-Tree in a specific GAM interval

·         Now the IAM page of the B-Tree covering this specific GAM interval needs to be found in the IAM page chain

·         For this purpose we start to traverse the IAM page chain from the beginning. However since the table is widespread throughout the database files, the IAM page chain might contain several hundred pages long

·         Unfortunately the pressure on SQL Server’s buffer pool is very high and not all the IAM pages can be found in the buffer pool.

·         Since there hardly is free space anymore the newly allocated extent is out of one of the last GAM intervals in one of the data files. Hence the associated IAM page is pretty much at the end of the IAM page chain.

·         E.g. since the table is in the database since long time and steadily experiences growth by inserts, the Clustered Index B-Tree has extents allocated in 500 GAM intervals of our 4TB database. Since there is pressure on the buffer pool only 100 of the 500 IAM pages could be found in the buffer pool. Since the IAM page we are looking for is one of the last in the IAM chain, we need to evaluate close to 500 IAM pages. An average I/O takes 10ms. Reading around 400 IAM pages, which are not in the buffer pool , from disk then sums up to around 4seconds. Means the step of associating a new extent allocated for a B-Tree to the related IAM page can take seconds in extreme cases.

·         Looking at the sequence of events the reason for sporadic, but periodic long running inserts becomes obvious. Reading the IAM chain to identify the IAM page covering a specific GAM interval is eating up a lot of time.

In order to resolve his issue an optimization was introduced in SQL Server 2005 Service Pack2 as well as in SQL Server 2008. The optimization in mostly all cases shortcuts the linear search through the IAM chain. Both releases maintain an IAM cache where the page ID of IAM pages and their covered GAM interval are cached. Means instead of traversing the complete IAM chain, a lookup in the cache reveals the fitting IAM page. Then the IAM page can be looked up in the buffer pool or eventually needs to be read from disk. S far all our customers reported the observed insert pattern of periodic slow inserts disappearing after applying SQL Server 2005 Service Pack2. A real good reason to apply Service Pack2.  There will be no fix for SQL Server 2000 to correct the behavior described. It is highly recommended to upgrade to SQL Server 2005 SP2 instead.

There also will be SAP OSS note #1133183 released soon which covers this issue. Though not to the detail described here.

To answer one last question. How can one find out over how many GAM intervals a table is spread?

Here is a stored procedure which does it:

Query 2:

create procedure sp_gampart @tname sysname as

begin

set nocount on

declare @extentinfo9p table (file_id int, page_id int, pg_alloc int, ext_size int,obj_id int, index_id int, partition_number int, partition_id bigint, iam_chain_type varchar(20),  fs_bytes varbinary(8))

declare @exec_statement varchar(512), @tname2 sysname, @dbid int, @fname sysname, @page_count bigint

set @dbid=db_id()

set @tname2 = quotename(@tname)

set @exec_statement = 'dbcc extentinfo(' + convert(varchar,@dbid) + ',' + @tname2 + ') with no_infomsgs'

insert into @extentinfo9p exec (@exec_statement)

select @page_count = SUM(pg_alloc) from @extentinfo9p

select distinct (ei.index_id) as 'Index ID', ei.partition_number as 'Partition Number',

si.name as 'Index Name',

count(distinct((convert(bigint,ei.page_id))/511232) + (convert(bigint,(ei.file_id*10000)))) as 'Number GAM Intervals',

round(((convert(decimal,SUM(ei.pg_alloc))/convert(decimal,@page_count))*100),2) as Percentage

from @extentinfo9p ei, sys.indexes si

where ei.index_id=si.index_id and si.object_id=ei.obj_id

group  by ei.index_id, ei.partition_number, si.name

end

 

Executing the stored procedure can take quite a few minutes with larger tables. The result could look like:

Index ID

Partition Number

Index Name

Number of GAM Intervals

Percentage

1

1

VBAK~0

851

72.14

2

1

VBAK~AUD

523

3.47

3

1

VBAK~Z2

590

4.85

4

1

VBAK~Z3

539

3.57

5

1

VBAK~Z4

584

4.95

6

1

VBAK~Z5

520

3.87

7

1

VBAK~Z6

534

3.47

8

1

VBAK~ZRI

564

3.69

 As one can see, the clustered index B-Tree including the data of VBAK is spread over 851 different GAM Intervals. Whereas the B-Trees of the additional non-clustered indexes are spread over 500-600 different GAM intervals

 

Thanks to Leslie Moser for working with me on this article

 

 

 

Leave a Comment
  • Please add 5 and 8 and type the answer here:
  • Post