|
|
-
I am publishing this Blog on behalf of Hermann Daeubler, our SAP expert
Scenario:
Let’s assume we have a table consisting of five partitions and the one in the middle needs to be dropped. In SQL Server we need the following steps to accomplish this task
a) Create a non-partitioned target table with the same structure and indexes as the partitioned table
b) Use the switch command to move the data from the middle partition to the new target table. Now the original partition is empty but still visible and the old boundary value isn’t removed
c) Complete the task by altering the partition function by merging the empty partition
Problem: The partition switch command is a metadata-only operation and is frequently used by customers to add (assuming staging table has already been populated) or remove partition data almost instantaneously. This is a significant improvement over non-partitioned tables where these operations can take time proportional to the size of the data.
However, there are issues with partition merging where depending on the hardware and the size of the partitions, the step “c)” can still take much longer than expected especially when merging partitions in the middle of a table.
For details on partitions, please refer to http://msdn.microsoft.com/en-us/library/dd578580.aspx http://msdn.microsoft.com/en-us/library/ms190787.aspx
Why partition merge is slow?
It is a common scenario to put partitions on different file groups. The decision was made to make the partition function merge a deterministic function. This means that there is a fixed rule which defines on which file group the merged partition will end up. As a result the merge will be always in the same direction corresponding to the type (right or left). This approach has the downside that there is no optimization related to the size of the partitions. In a scenario where the source partition has 10 million rows and the destination partition has only 1 million rows, SQL Server would therefore move the 10 million rows of the source partition rather than moving the 1 million rows from the destination partition. The same would happen if the destination partition is completely empty as in the sample above where one would like to drop a partition.
There is a workaround for this when the partitions to be merged are mapped to the same filegroup. In the example above where the intention is to drop a partition it is clear that the destination partition is always empty. The workaround is to
· Switch both partitions out – the source as well as the destination partition
· Merge the partition function
· Switch back in the source partition.
But this workaround has a side-effect. It temporarily removes data from the table and an application using this approach has to provide a proper cleanup or rollback in case something goes wrong
Solution with the changes in SQL Server 2008R2:
CU6 for SQL Server 2008 SP1 includes a fix for this potential issue if all partitions reside on the same file group. It checks which partition has less rows and moves the data from this one during a partition function merge. This also covers the special case where one partition is empty. In case the performance of dropping partitions in the middle of a table is essential for an application it’s necessary to place all partitions on one single file group to take advantage of the fix in CU6.
Results :
The screenshots below and the SQL Script further down use a simple example to show how the workaround basically could be implemented. It also shows the difference after installing CU6 for SQL Server 2008 SP1.

Figure 1 above shows that a test table was created with five partitions and five million rows in each partition.

Figure 2 above shows that the standard way to get rid of a partition took over 2 minutes on the test system. There were obviously millions of logical reads involved due to the move of the data during the partition function merge

Figure 3 above shows that afterwards boundary value 3 is missing as expected and four partitions are left

Figure 4 above shows the result of the workaround. Due to the additional switch more tables are involved. But overall we see just a few logical reads and the whole process is done within a second compared to over 2 minutes before

Figure 5 above shows the output of the standard method after installing CU6 for SQL Server 2008 SP1. There is only one switch necessary and it finishes like the workaround before in a second.
To run the SQL Script below, you need to follow the following steps in this sequence
· First create the four procedures
· Run the procedures one after the other as seen at the end
· Create the test table as it’s necessary to specify how many rows should be generated. It’s a very basic algorithm which simply doubles the number of records 10 times. A parameter of value 25000 results in 25 million rows – five million rows per partition.
-- Procedure to create a partitioned test table
CREATE PROCEDURE create_partition_function_merge_test_tables ( @nr_rows INT ) AS BEGIN
DECLARE @counter int
SET NOCOUNT ON
IF EXISTS( select * from sys.objects WHERE name ='partition_function_merge_test_table' and type = 'U' ) DROP TABLE partition_function_merge_test_table
IF EXISTS( select * from sys.partition_schemes WHERE name = 'PARTITION_FUNCTION_MERGE_TEST_SCHEME' ) DROP PARTITION SCHEME [PARTITION_FUNCTION_MERGE_TEST_SCHEME]
IF EXISTS( select * from sys.partition_functions WHERE name = 'PARTITION_FUNCTION_MERGE_TEST_FUNCTION' ) DROP PARTITION FUNCTION [PARTITION_FUNCTION_MERGE_TEST_FUNCTION]
CREATE PARTITION FUNCTION [PARTITION_FUNCTION_MERGE_TEST_FUNCTION] ( int ) AS RANGE RIGHT FOR VALUES ( 1,2,3,4,5 )
CREATE PARTITION SCHEME [PARTITION_FUNCTION_MERGE_TEST_SCHEME] AS PARTITION [PARTITION_FUNCTION_MERGE_TEST_FUNCTION] ALL TO ( [PRIMARY] )
CREATE TABLE partition_function_merge_test_table ( key_partkey int NOT NULL, key_col1 int, key_col2 int, data_col1 char(15), data_col2 char(15),
data_col3 char(15) ) ON [PARTITION_FUNCTION_MERGE_TEST_SCHEME] ( key_partkey )
CREATE CLUSTERED INDEX PFM_IDX1 on partition_function_merge_test_table ( key_partkey )
SET @counter = @nr_rows
WHILE ( @counter > 0 ) BEGIN
INSERT INTO partition_function_merge_test_table VALUES ( @counter%5,@counter,@counter, 'Testdata Col1','Testdata Col2','Testdata Col3' )
SET @counter = @counter – 1 END
SET @counter = 10
WHILE ( @counter > 0 )
BEGIN
INSERT INTO partition_function_merge_test_table SELECT * from partition_function_merge_test_table
SET @counter = @counter – 1 END
CREATE INDEX PFM_IDX2 on partition_function_merge_test_table ( key_col1 ) CREATE INDEX PFM_IDX3 on partition_function_merge_test_table ( key_col2 ) CHECKPOINT
END GO
-- Procedure to drop the partition in the middle via switch and merge command
CREATE PROCEDURE drop_partition_3_slow AS BEGIN
IF ( OBJECT_ID('temp_partition') IS NOT NULL ) BEGIN DROP TABLE temp_partition; END
CREATE TABLE temp_partition ( key_partkey int NOT NULL, key_col1 int, key_col2 int, data_col1 char(15), data_col2 char(15), data_col3 char(15) )
CREATE CLUSTERED INDEX PFM_IDX1 on temp_partition ( key_partkey ) CREATE INDEX PFM_IDX2 on temp_partition ( key_col1 ) CREATE INDEX PFM_IDX3 on temp_partition ( key_col2 )
SET STATISTICS IO ON
ALTER TABLE partition_function_merge_test_table switch PARTITION 3 TO temp_partition ALTER PARTITION FUNCTION PARTITION_FUNCTION_MERGE_TEST_FUNCTION () MERGE RANGE (3)
SET STATISTICS IO OFF
END GO
-- Procedure to drop the partition in the middle via switch and merge command. -- This time it will also switch out the "neighbor" before the merge and switch -- it back in after the merge. This will avoid an unnecessary copy of the rows -- in the neighbor partition
CREATE PROCEDURE drop_partition_3_fast AS BEGIN
IF ( OBJECT_ID('temp_partition') IS NOT NULL ) BEGIN DROP TABLE temp_partition; END
IF ( OBJECT_ID('temp_partition2') IS NOT NULL ) BEGIN DROP TABLE temp_partition2; END
CREATE TABLE temp_partition ( key_partkey int NOT NULL, key_col1 int, key_col2 int, data_col1 char(15), data_col2 char(15), data_col3 char(15) )
CREATE CLUSTERED INDEX PFM_IDX1 on temp_partition ( key_partkey ) CREATE INDEX PFM_IDX2 on temp_partition ( key_col1 ) CREATE INDEX PFM_IDX3 on temp_partition ( key_col2 )
CREATE TABLE temp_partition2 ( key_partkey int NOT NULL, key_col1 int, key_col2 int, data_col1 char(15), data_col2 char(15), data_col3 char(15) )
CREATE CLUSTERED INDEX PFM_IDX1 on temp_partition2 ( key_partkey ) CREATE INDEX PFM_IDX2 on temp_partition2 ( key_col1 ) CREATE INDEX PFM_IDX3 on temp_partition2 ( key_col2 )
SET STATISTICS IO ON
ALTER TABLE partition_function_merge_test_table switch PARTITION 3 TO temp_partition ALTER TABLE partition_function_merge_test_table switch PARTITION 4 TO temp_partition2
ALTER PARTITION FUNCTION PARTITION_FUNCTION_MERGE_TEST_FUNCTION () MERGE RANGE (3)
ALTER TABLE temp_partition2 ADD CONSTRAINT partCheck_constr CHECK ( key_partkey = 3 ) ALTER TABLE temp_partition2 switch TO partition_function_merge_test_table PARTITION 3
SET STATISTICS IO OFF
END GO
-- Procedure to list the partitions of the test table
CREATE PROCEDURE list_partitions AS BEGIN
DECLARE @nr_partitions int DECLARE @counter int DECLARE @partrows int DECLARE @boundary_value char(10)
SET @nr_partitions = ( SELECT COUNT(*) FROM sys.partition_range_values prv, sys.partition_functions pf WHERE pf.function_id = prv.function_id AND pf.name = 'PARTITION_FUNCTION_MERGE_TEST_FUNCTION' )
PRINT 'Partitions for table partition_function_merge_test_table :'
PRINT ''
PRINT 'nr | boundary value | # rows in partition'
SET @counter = 1
WHILE ( @counter <= @nr_partitions ) BEGIN
SET @boundary_value = CONVERT(CHAR(10),
(SELECT value FROM sys.partition_range_values prv, sys.partition_functions pf
WHERE pf.function_id = prv.function_id AND pf.name = 'PARTITION_FUNCTION_MERGE_TEST_FUNCTION' AND
boundary_id = @counter ))
SET @partrows=(SELECT p.rows FROM sys.partitions p, sys.objects o WHERE p.object_id = o.object_id AND o.name = 'partition_function_merge_test_table' AND p.partition_number = @counter AND index_id IN (0,1) )
PRINT ' ' + rtrim(convert(char(5),@counter)) + ' |' + ' ' + rtrim(@boundary_value) + ' |' + ' ' + convert(char(20), @partrows)
SET @counter = @counter + 1
END END GO
-- Repro to show the performance impact of different ways to drop a partition. -- The procedure to drop the partition will turn "statistics io" on to prove -- by the number of logical reads on the test table that in the first case -- data will be moved whereas in the second case it won't. -- This will change by installing CU6 for SQL Server 2008 SP1 which fixes this -- behavior. While the runtime of the workaround was always the same the standard -- approach to drop a partition in the middle of a table improved from over 2 minutes -- to 1 second after installing CU6 for SQL Server 2008 SP1 – as shown on the -- screenshots above. The difference is of course dependent on the hardware -- configuration and might be even more or also less on other systems.
execute create_partition_function_merge_test_tables 25000 execute list_partitions execute drop_partition_3_slow execute list_partitions
execute create_partition_function_merge_test_tables 25000 execute list_partitions execute drop_partition_3_fast execute list_partitions
|
-
Another improved functionality in SQL Server 2008 R2
I am posting this on behalf of Juergen Thomas who has been with SQL Server PM team from 12+ years and is an expert in SAP.
Juergen>> In this article I’d like to talk about another improvement we made to SQL Server 2008 R2. The improvement pretty much is not noticed since it is buried deep into the SQL Server Engine. There also is nothing to tune about it. It is about the hash key algorithm which is used by SQL Server in its Lock Manager. So let’s step and explain what SQL Server does and what impact this change in SQL Server 2008 R2 has.
How does the locking work in SQL Server?
Unlike some other database vendors, there is a logical component to SQL Server’s Lock Manager. SQL Server uses a lockhash value to represent a lock on the lock structure in the SQL Server Lock Manager instead of using the physical description to a row, page or a table. The lockhash value then is kept in memory. This design was driven by major considerations like:
· No locking information to be stored on the page containing the resource. This eliminates additional IO or any space penalty on the page due to locking.
· Since the key to a row could be as large as 900 bytes, using the real key values would have inflicted larger memory consumption. Especially with applications running long transactions and holding hundreds of thousands of locks. Therefore one needed to seek for a possibility to have a lock value which would not exceed a few bytes and fixed in size for better memory management
The solution to this problem was found when designing SQL server 7.0 in 1996 and 1997 by using the key of the row and apply a hash algorithm to it which then results in a 6 byte long lockhash value. This value is stored as resource description. Added to this is HoBT ID (B-Tree ID). If an another row in the same B-Tree needs to be locked, the hash value for the key of the row gets calculated and then compared to the hash values already stored as granted or waiting locks in order to see whether a lock on this row already exists. This mechanism worked sufficiently well for many years
Issues appearing on the horizon
Using a hash algorithm to calculate a value out of keys does have one small disadvantage:
· Depending upon the # of rows, structure of the primary key, the data distribution and the complexity of the hashing algorithm, one can get hash collisions. For example, one calculated lockhash value can lock more than one row within a B-Tree.
How can we see what the hash key value for a lock held on a key is?
Let’s demonstrate this with an example.
CREATE TABLE test
(a VARCHAR(3) NOT NULL, b VARCHAR(8) NOT NULL, c VARCHAR(5) NOT NULL,
d integer NOT NULL)
GO
CREATE UNIQUE CLUSTERED INDEX ucl ON test(a,b,c)
GO
begin transaction
INSERT test VALUES('150','00001082','00345',1)
Now let’s perform this query:
select resource_type,resource_database_id, resource_description, resource_associated_entity_id,request_mode, request_type, request_status from sys.dm_tran_locks where resource_type = 'KEY'
The result will look like:
|
resource_type |
resource_database_id |
resource_description |
resource_associated_entity_id |
request_mode |
request_type |
request_status |
|
KEY |
5 |
(5c017ccf0cbf) |
72057594038976512 |
X |
LOCK |
GRANT |
As resource_description you can see the value our hash key algorithm did calculate out of the input of the three key values ‘150’, ‘00001082’ and ‘00345’. The column ‘resource_associated_entity_id’ is the id of the B-Tree. It also finds usage in the system views sys.partitions and sys.allocation_units and can be used to get to the object_id or the name of the table the B-Tree belongs to.
Now let’s insert another row into the very same table with another Query Window. We’ll try to insert with this command:
begin transaction
INSERT test VALUES('150','00024855','00012',4)
To our surprise the insert of this row seems to be blocked and doesn’t come back (please note that the transaction of the first Query Window with the first row inserted still is open). Executing the same query against sys.dm_tran_locks gives us this result:
|
resource_type |
resource_database_id |
resource_description |
resource_associated_entity_id |
request_mode |
request_type |
request_status |
|
KEY |
5 |
(5c017ccf0cbf) |
72057594038845440 |
X |
LOCK |
GRANT |
|
KEY |
5 |
(5c017ccf0cbf) |
72057594038845440 |
X |
LOCK |
WAIT |
As this shows we are basically blocked with inserting the second row. The X-Lock could not be acquired because the hash algorithm used did calculate the very same hash value for the 3 key column values of the second row to be inserted.
This issue demonstrates an undesired side effect of using hash algorithms to calculate lock values. However fortunately in most customer workloads, we rarely encountered this issue with customers. One of the reasons certainly is that the row colliding under the same hash value usually are far apart. E.g. if we lock a row of a general ledger table out of the current month in the company code of the corporation and there is a collision with a row of last year in the company code of one of the small subsidiaries, there will be no practical effect of such a hash collision. However, where we encountered a negative side effect of this was in large data loads. There it is all about getting tables of sometimes Terabyte in size loaded as fast as possible. Therefore one often splits up the loads in the export phase already. Importing those packages now in parallel (sometimes up to two dozen packages against one table) we did encounter blocking locks as well as deadlocks between those load streams.
How does SQL Server 2008 R2 resolve this issue?
In SQL Server 2008 R2, the hashing algorithm calculating the lockhash value was rewritten. The goals to make it more complex on the one side to avoid collisions without compromising the performance. We analyzed a lot of ISV customer database plus some other other customer databases to come up with a hash algorithm which used comparable CPU cycles but produced a better distribution of hashed values. In all the cases that we tested, we found a dramatic reduction of hash key collisions. For example, in 3 billion row tables where we had more than hundred thousand collisions with the old algorithm. Also, in many cases, we eliminated the hash collisions completely. With all the test data we used, the probability to get hash collisions is further lowered by a factor of 15,000 using the new hashing algorithm. After more than extensive tests for months and months, the new hash algorithm got released with the November CTP of SQL Server 2008 R2. For the example above, we don’t see any hash collision anymore with SQL Server 2008 R2 as the table below proves. This is not to say that collision will not occur but it will be very unlikely.
Result of running of performing this query against scenario described above:
select resource_type,resource_database_id, resource_description, resource_associated_entity_id,request_mode, request_type, request_status from sys.dm_tran_locks where resource_type = 'KEY'
|
resource_type |
resource_database_id |
resource_description |
resource_associated_entity_id |
request_mode |
request_type |
request_status |
|
KEY |
5 |
(4b729e3b9a50) |
72057594038845440 |
X |
LOCK |
GRANT |
|
KEY |
5 |
(7bd75c434bae) |
72057594038845440 |
X |
LOCK |
GRANT |
As one can see, in the case of SQL Server 2008 R2 (November CTP) the second insert didn’t get blocked anymore since the lockhash value in the column resource_description differs.
|
-
When you compress an object (index or table or partition), there are two components of space savings that you get. First component is fragmentaton (i.e. the original object might have been fragmented). The object gets degragmented as part of doing compression so you get some space savings. Second component is actual data compression savings. Many customers have asked the question 'Is there a way to know how much savings do I get from each of the two components?".
Interestingly, starting with SQL2008R2, you indeed have this capability. Here is one example where I create a table t1 and then simulate fragmentation by updating a column. Once the table is fragemented, I can use the sp_estimate_data_compression stored procedure to determine how much space savings I can get by defragmenting it.
drop table t1
go
-- Create the table
create table t1 (c1 int, c2 char(100), c3 varchar (300))
go
-- create a clustered index
create clustered index t1_ci on t1(c1)
-- load 10000 rows. Note, that for column C3, I am storing full-values that
-- take all 300 characters
declare @i int
select @i = 0
while (@i < 10000)
begin
insert into t1 values (@i, 'hello', replicate ('1', 300))
select @i = @i + 1
end
-- Now let us measure the average length of the row. It comes to 417 bytes.
-- This means that there are 19 rows per page
select avg_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('TEMPDB'),
object_id('T1'),
null, null, 'SAMPLED')
where object_name (object_id) like 't1' AND index_id =1
-- check the space usage. This shows the space usage of 4296 KB
sp_spaceused t1
-- This update will cause fragmentation because I am changing the column C3 from 300
-- chars to 50 chars. With the result, the average row length
-- will now become 167 but the SQL Server will still keep 19 rows/page which
-- leaves more that 50% unused space on each page
update t1 set c3 = REPLICATE ('2', 50)
-- check the space usage. It shows that the allocated or reserved space
-- has not changed
sp_spaceused t1
-- estimate how much space you can save by defragmentation. Note
-- here I am executing the stored procedure on an 'uncompressed' table
-- with target compression setting is 'NONE',(i.e. no compression).
exec sp_estimate_data_compression_savings 'dbo',
'T1',1, NULL, 'none'
-- This stored proc returns
object_name schema_name index_id partition_number
------------ ----------- -------- ---------------
T1 dbo 1 1
size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB)
---------------------------------------- ------------------------------------------
4480 1720
-- To verify this, I rebuild the index and see the space consumption
alter index t1_ci on t1 rebuild
-- check the space usage. It shows that the table now takes 1744KB
sp_spaceused t1
In SQL2008RTM, the stored procedure did not compute the space savings if the source and target compression settings were same. In otherwords, the stored proc would have returned the following instead
size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB)
---------------------------------------- ------------------------------------------
4480 4480
Interestingly, you can use it to get tell-tale signs to know if ROW compressed table already takes advantage of unicode compression in SQL2008R2. Let us say customer was using ROW compression in SQL2008 and now upgrades to SQL10.5, s/he can use to the sp_estimate_data_compression_savings stored proc by specifying target compression to be ROW as well. Note, in this case, the compression savings will include the potential savings from data defragmentation as well so one can’t easily deduce that the savings are from Unicode compressions or due to both. However, customer can get some clues by referring to the sys.dm_db_index_physical_stats DMV to see if the object in question was fragmented or not.
Thanks
Sunil
|
-
Now that we have Unicode compression available in SQL Server 2008R2 as described in, let me take a simple example using AdventureWorksDW database to show you the additional compression that can be achieved on tables with one or more columns of type NCHAR or NVARCHAR.
use [AdventureWorksDW2008]
-- the table FactResellerSales is a FACT table with three columns of type
-- NVARCAHR types. Let us find the average length of each of these columns
select AVG( LEN(salesordernumber))
from FactResellerSales
This returns a value of 7. This represents number of character in the string and not the numnber of bytes.
select AVG( LEN(CarrierTrackingNUmber))
from FactResellerSales
This returns a value of 12. This represents number of character in the string and not the numnber of bytes.
select AVG( LEN(CustomerPONumber))
from FactResellerSales
This returns a value of 12. This represents number of character in the string and not the numnber of bytes.
Based on these computations, we have approximately 31 characters of NVARCHAR types. With Unicode compression, we will expect that the average length of the row will reduce further by 31 bytes (i.e. we will save 1 byte per UCS-2 representation). To measure this, we will attach the identical copy of AdventureWorks2008 database to two different instances of SQL Server, one instance of SQL 2008RTM and other instance of SQL 20088R2 and measure the average length of the row after applying the ROW compression. Before applying the ROW compression, let us measure the average length of the row in both instances using the following query
select avg_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorksDW2008'),
object_id('dbo.factresellersales'),
null, null, 'SAMPLED')
where object_name (object_id) like '%ResellerSales%' AND index_id =1
In both instances, the average length of the row comes out to be around 191 bytes. This should not surprise you as there are no changes to the uncompressed data. Now, let us execute the command to enable ROW compression in each of the instances using the following command
alter table dbo.factresellersales rebuild
with (
data_compression = ROW)
And, now let us find the average row length in each of the instances. Since Unicode compression is available as part of ROW compression, we will expect that the average length of the row will be around 31 bytes less in SQL2008R2. Let us run the following query again to find the average row length after the row compression
select avg_record_size_in_bytes
from sys.dm_db_index_physical_stats (db_id('AdventureWorksDW2008'),
object_id('dbo.factresellersales'),
null, null, 'SAMPLED')
where object_name (object_id) like '%ResellerSales%' AND index_id =1
Here are the vallues returned
· Average Row length in SQL2008 after row compression = 132 (around 30% compression)
· Average Row length in SQL2008R2 after row compression = 100 (around 45% compression). This extra savings of 32 bytes is what we had expected.
As you can see that in this example,the unicode comrpession gave us extra 15% compression with ROW compression. We tried the Unicode compression with some in-house customer workloads and we observed the compression savings as shown in the table below. Of course, your savings will depend on your data and the schema but the key point to know is that space savings with Unicode compression can be significant if you consider tables in tera-byte range. In our testing we found that the additional impact on CPU is minimal.
|
ROW Compression |
ROW with UNICODE |
|
9% |
43% |
|
30% |
53.2% |
|
45% |
64% |
|
30% |
45% |
I enccourage you to download CTP2 verision (http://www.microsoft.com/sqlserver/2008/en/us/r2.aspx) of SQL2008R2 and see for yourself the kind of savings you can achieve with your workload.
Thanks
Sunil Agarwal
|
-
In my previous blog, I had mentioned that unicode compression will be available as part of next SQL Server release named SQL 2008R2. You can down load the CTP2 version http://www.microsoft.com/sqlserver/2008/en/us/r2.aspx to play with it. This is a new exciting addition to the data compression offering.
Let me describe the problem and how it has been addressed in SQL Server.
Problem: As more and more businesses developing global customer base, applications are being developed/modified to use unicode based data types such as NCHAR and NVARCHAR. SQL Server uses UCS-2 encoding scheme that takes two bytes of storage regardless of the locale. For example, in ASCII character set when stored as NCHAR, each character only needs 1 byte of storage but it is stored using 2 bytes with the significant byte being 0. In fact most European languages need only 1 byte of storage. When an application is either converted or written to user unicode based data types, it can, depending on the size of strings, increase the storage requirements significantly.
Solution: SQL Server implements unicode compression using SCSU (Simple Compression Schecme for Unicode Data), a standard compression scheme for Unicode data (http://unicode.org/reports/tr6/). The following table shows the compression achieved with SCSU and compares it with UTF-8 as both DB2/Oracle have implemented UTF-8 encoding for unicode data. You will notice that for English (i.e. ASCII) we are compressing 50% or in other words, we will take 1 byte instead of 2. Same is true for German locale. On the other hand for Korean locale, SCSU scheme takes 2 bytes so there is no compression savings. The compression achieved using SCSU is very comparable to UTF-8.
|
Locale |
SCSU |
UTF-8 |
|
English |
0.5 |
0.5 |
|
Japanese |
.85 |
1.0 |
|
Korean |
1.0 |
1.0 |
|
Turkish |
.52 |
.53 |
|
German |
.5 |
.5 |
|
Vietnamese |
0.61 |
0.68 |
|
Hindi |
0.5 |
1.0 |
The Unicode compression is enabled or disabled as part of ROW or PAGE compression. The implications are
1. The customers who are already using ROW or PAGE compression will get the benefit of unicode compression but it will require them to rebuild the index or table after the upgrade to SQL 2008R2. The interesting thing is that the Unicode compression will kick in when a new row inserted or an existing Unicode value is modified even if the corresponding compressed index or the table is not immediately rebuilt after the upgrade.
2. Existing scripts used for data compression can be used without modifications
If NCHAR or NVARCHAR string does provide any compression savings with unicode compression, like the example with korean locale, unicode compression is not applied. For this case, the compression overhead is paid at the time of compression to know if we can get unicode compression savings but not when the value is read. Like before no application changes are needed to take advantage of unicode compression.
At this time, the SQL Server will not support Unicode compression for NVARCHAR (MAX) type, including in-row values or for NTEXT. This is something we will consider in the future. In my next blog, I will provide an example using AdventureWorksDW database to show you the compression benefits achieved using new improved ROW compression.
Thanks
Sunil Agarwal
|
-
As I described in my previous blog, the data compression feature has been very successful. We do appreciate all the feedback that we have received as this provides us a way to know how customers have been using the feature and the challenges they are facing. Based on the feedback, we plan to improve our data compression offering in future releases. Here is some of the feedack and the respective rationale. Please feel free to contact me for your suggestions on data compression.
(1) Provide Unicode compression: For the uninitiated, the SQL Server stores columns of type NCHAR, VARCHAR, NTEXT and NVARCHAR(MAX) in UCS-2 format which takes 2 bytes per character even when 1 byte will do (e.g. ASCII data), a rather common case for locale used with USA (en-USA) and other major european languages. This leads to a waste of 1 byte per Unicode character.
(2) Compress out of row LOB data: SQL Server compresses in-row LOB data but not if the LOB data is stored out of row. With the result many LOB heavy applications are unable to take full advantage of data compression. However, there are two workaround available. First, LOB data can be compressed at the application tier but this means (a) the application needs to be modified (b) the application cannot take advantage of search and partial update capabilities provided in SQL Engine. Second, use filestream feature to store LOB data on a compressed volume. This is our recommended solution where applicable. Please refer to Books-Online for details on filestream feature
(3) Improve column prefix compression: Currently PAGE compression provides column prefix compression but for some workloads the repeating words occur in the middle or towards the end of the column value which does not qualify for prefix compression. This reduces the data compression savings that can be achieved. It is not to say that these customers have not benefited from data compression but clearly they could have achieved even more compression savings.
(4) Enable estimate data compression savings computations in lower SKUs: Currently, you need to run sp_estimate_compression_savings stored procedure on EE or Dev edition. Many customers who are running on standard SKU would like to know what additional space savings they can get with data compression before they invest into a EE edition. Current work around is to use Dev edition to estimate the space savings which requires customers to install Dev Edition and restore the production database. It would be nice if we could just estimate the compression savings in the lower editions.
On this note, I am very pleased to announce that we have enhanced the data compression feature to include Unicode compression for NCHAR and NVARCHAR datatypes. The Unicode compression will be available as part of ROW compression in SQL Server 2008R2 release. You can down load the CTP2 version http://www.microsoft.com/sqlserver/2008/en/us/r2.aspx to play with it. In my next blogs, I will go into more details with Unicode compression.
Thanks
Sunil Agarwal
|
-
It has been a while since I blogged about data compression so I thought it will be good to provide an update on data compression usage within SQL Server community. I am happy to say that the Data compression feature has been a tremendous success in SQL Server 2008 with customers. Many customers have been able to reduce the size of their database significantly leading to reduction in the cost of hardware and storage management. The table below shows the actual space savings that some of the customers have achieved with ROW and PAGE compression in their production environment. Each row in the table below represents a different customer. Couple of key observations; First, the space savings can be huge. For example, one customer has been able to achieve 81% space savings with PAGE compression. Second, not surprisingly, the PAGE compression provides better compression at the cost of additional CPU cycles. As I had emphasized in my earlier blogs, the space savings achieved will depend on the schema and the data distribution.
|
Data Compression Space Savings |
Notes |
|
70% |
PAGE. Data Warehouse application. |
|
40% |
PAGE. OLTP Web application. |
|
62% |
PAGE. DW application. |
|
38%, 21% |
PAGE, ROW. |
|
80%, 50% |
PAGE, ROW. |
|
52% |
PAGE. |
|
50%, 15% |
PAGE, ROW. |
|
81% |
PAGE. ERP application. |
|
35% |
PAGE. |
Now, this may all sound good, but one question that keep coming back to me is 'Well space savings is alright, but how does data compression impact the performance of the workload?'. I agree we can all create best/worst cases to make our point but the real answer is know how data compression has impacted the real production applications. The table below shows the performance impact of data compression (the green color shows the improvement in the performance while the red color shows the performance degradation). Each row in the table below represents a different customer.
|
Performance impact |
Notes |
|
5% |
PAGE compression. OLTP Web application. Large volume of transactions. |
|
40%-60% |
PAGE compression. Large sequential range queries. A DW Application (IO intensive) |
|
1% |
PAGE compression. 500 users, 1500 Transactions / sec. OLTP with some reporting queries. |
|
11% |
PAGE compression. A lot of insert, update and delete activity which leads to increase in CPU usage. A better choice would have been to go with ROW compression. |
|
2% - 3% |
PAGE compression. OLTP Application. |
|
3% |
PAGE compression. ERP application – small transactions. |
I am hoping that this has provided some excitement to customers on the sideline. I would like to encourage trying out the feature and seeing for yourself if this is right for your environment. I will also encourage you to read some of the white papers listed below
http://www.unisys.com/eprise/main/admin/corporate/doc/41371394.pdf
http://h71028.www7.hp.com/ERC/downloads/4AA1-8766ENW.pdf?jumpid=reg_R1002_USEN
http://www.microsoft.com/casestudies/casestudy.aspx?casestudyid=4000002956
http://www.microsoft.com/downloads/details.aspx?FamilyID=b5bb47a4-5ece-4a2a-a9b5-5435264f627d&DisplayLang=en
http://sqlcat.com/whitepapers/archive/2009/05/29/data-compression-strategy-capacity-planning-and-best-practices.aspx
Thanks
Sunil Agarwal
|
-
One of the key challenges in TempDB is that it is a common resource for all applications running on an instance and any misbehaving application or rouge user command can take up all the space in TempDB bringing down other applications with it. In my discussions with customer during various conferences, I often hear of following suggestions
1. Provide a way to control how much TempDB space can be allocated by various applications on an instance of SQL Server. Clearly, this will provide a very good way to isolate applications from misbehaving ones. In this case, if an application exceeds its limit, it may come to a stop even if there was space on TempDB. To address this, the SQL Server can possibly provide some alternatives like to allow space allocation if the TempDB is not in-use by other aplications and then do force deallocations when pressure from other applications mount.
2. Provide multiple TempDBs and then assign different TempDBs to different applications. In my opinionm if SQL Server could do (1) well, then this may not be as use useful.
These suggestions are well taken but unfortunately SQL Server does not support this functionality today. So you wonder what you can do. Well, the SQL Server exposes a way using DMVs to identify TempDB space allocations by currently executing queries. If you identify that the TempDB space is running awfully low, you can use this new way to identify currently executing requests. May be some user ran an adhoc query that took significant space in TempDB. You, as an administrator, can then make the decision if you need to kill one or more of these queries to get back the space in TempDB.
Let me illustrate this with an example. I will use two large (actually not so large) tables and then join them using a hash join. You may recall that during hash join, one of the tables in hashed in memory and is backed by persistence in TempDB.
create table t1 (c1 int primary key, c2 int, c3 char(8000))
go
create table t2 (C4 int, c5 char(8000))
go
declare @i int
select @i = 0
while (@i < 6000)
begin
insert into t1 values (@i, @i + 1000, 'hello')
insert into t2 values (@i,'there')
set @i = @i + 1
end
-- now let us clean the buffer pool so that this
-- query takes some time to complete and allows us
-- to monitor the TempDB space usage
dbcc freeproccache
DBCC DROPCLEANBUFFERS
-- Now run the query. Note, I have used a hash-join hint
select c1, c5
from t1 INNER HASH JOIN t2 ON t1.c1 = t2.c4
order by c2
Now in another session, I will run the following DMV query
-- This DMV query shows currently executing tasks and
-- tempdb space usage
-- Once you have isolated the task(s) that are generating lots
-- of internal object allocations,
-- you can even find out which TSQL statement and its query plan
-- for detailed analysis
select top 10
t1.session_id,
t1.request_id,
t1.task_alloc,
t1.task_dealloc,
(SELECT SUBSTRING(text, t2.statement_start_offset/2 + 1,
(CASE WHEN statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(max),text)) * 2
ELSE statement_end_offset
END - t2.statement_start_offset)/2)
FROM sys.dm_exec_sql_text(sql_handle)) AS query_text,
(SELECT query_plan from sys.dm_exec_query_plan(t2.plan_handle)) as query_plan
from (Select session_id, request_id,
sum(internal_objects_alloc_page_count + user_objects_alloc_page_count) as task_alloc,
sum (internal_objects_dealloc_page_count + user_objects_dealloc_page_count) as task_dealloc
from sys.dm_db_task_space_usage
group by session_id, request_id) as t1,
sys.dm_exec_requests as t2
where t1.session_id = t2.session_id and
(t1.request_id = t2.request_id) and
t1.session_id > 50
order by t1.task_alloc DESC
Here is the sample output of the query for my workload. I have simplified it by shortening the long DMV query and just put a symbolic name where XML show plan appears. This output shows that the query with hash-join is causing the most allocations in TempDB. Though in this case, we do know about the workload but you can run the above DMV query on any SQL Server without any knowlede of the workload and it can show you the top consumers (batches currently executing) of the space in TempDB. You can also take a look at the query plan to see what is causing the allocations in TempDB.
session_id request_id task_alloc task_dealloc
---------- ---------- --------- -------------
51 52 6016 1112
52 0 0 0
select c1, c5 from t1 INNER HASH JOIN t2 ON t1.c1 = t2.c4 order by c2
select top 10 t1.session_id, t1.request_id … (THE DMV query)
<XML-showplan fpr the first query>
<XML-showplan fpr the DMV query>
Thanks
Sunil Agarwal
|
-
This blog continues the discussion on the common issues in TempDB that you may need to troubleshoot. In previous blogs, I discussed how to identify and troubleshoot IO and allocation bottleneck in TempDB. In this blog, I will describe how to indentify DDL bottleneck in TempDB and to troubleshoot it.
It will be useful to understand why DDL bottleneck is most commonly related to TempDB and not in user databases. Well, if you think about user database(s), they are created as part of application design. Once the application is tested and deployed, the schema in user databases does not change except may be to add/drop indexes or when the application is upgraded to a newer version. TempDB on the other hand is quite different and is used as a scratch pad both by user applications to store intermediate data in # and ## tables as well as table variables and also by SQL Server as part of processing queries. Please refer to the http://blogs.msdn.com/sqlserverstorageengine/archive/2008/12/22/tempdb-basics-cont.aspx for some background. Now, let us take a simple case where an application creates a #table containing 10 columns in a stored procedure. When this stored procedure is invoked, the SQL Server creates this #table by inserting one row in system table that stores meta-information about tables and 10 rows, each representing 1 column the #table, in the system table storing meta-information about column. There may be other system tables affected, for example when you create index(s) on the #table, but it is clear that at minimum, it will require 11 rows to be inserted into system tables at create time and then these rows to be deleted when the stored procedure terminates. Now for moments assume that this table is not cached and this stored procedure is executed thousands of times by concurrent threads. It will cause a large number of inserts/deletes in system tables in TempDB. I hope that this provides you a good understanding why and how the DDL contention, i.e. blocking when accessing system tables, can happen inside TempDB. Note, it is possible that an application may create another ‘user’ database as a scratch pad. In that case, this database can potentially incur DDL bottleneck but then you may wonder why create a scratch database when TempDB provides optimized logging and easy manageability.
Now let us discuss how to detect DDL contention and what you can do to minimize it.
Diagnosing DDL Contention
As I indicated in the previous paragraph, the DDL contention refers to the contention in pages belonging to system tables. For this specific case, we need to look at PAGELATCH contention in system tables within TempDB. You can use the following query to identify it
select session_id, wait_duration_ms, resource_description
from sys.dm_os_waiting_tasks
where wait_type like 'PAGE%LATCH_%' and
resource_description like '2:%'
You may recall that this query is very similar, in fact it is identical, to the DMV query to detect allocation bottleneck. The difference however is that you will need to look at pages other than GAM, SGAM and PFS that are incurring PAGELATCH waits and see, using undocumented DBCC PAGE command, if these pages indeed belong to system table(s) in TempDB. This will show you that you are indeed incurring DDL bottleneck. Now for any performance metrics, you will need to compare this number with baseline when the workload was performing normally and see if you are incurring excessive DDL bottleneck.
You can also look at the following PERFMON counters to see if you are encountering significant change in these numbers. Note, internal objects in TempDB don’t cause DDL contention.
· Temp Tables Creation Rate The number of temporary table or variables created/sec.
· Temp Tables For Destruction The number of temporary tables or variables waiting to be destroyed by cleanup system thread.
Troubleshooting:
OK, now you have determined that you indeed are encountering DDL bottleneck, what can you do? Clearly, the goal is to minimize creation/destruction of user objects (i.e. #, ## and table variables). Unfortunately, there are no easy fixes. My suggestions are
(1) Check if the user objects are indeed getting cached. Please refer to http://blogs.msdn.com/sqlserverstorageengine/archive/2008/12/22/tempdb-basics-cont.aspx for details on detecting user object caching.
(2) Look at query plans for queries that are creating/destroying user objects in TempDB and see if you can do something to minimize it. For example, a #table may be created in a loop and maybe it is possible to create it outside the loop.
Thanks
Sunil Agarwal
|
-
This blog continues the discussion on the common issues in TempDB that you may need to troubleshoot. In the blog http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/05/tempdb-monitoring-and-troubleshooting-io-bottleneck.aspx, we discussed how to identify and troubleshoot IO bottleneck in TempDB. In this blog, I will describe how to indentify allocation bottleneck and to troubleshoot it.
As you may recall, the allocation bottleneck is caused when allocation structures are accessed by concurrent threads in conflicting modes. Please refer to http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/04/what-is-allocation-bottleneck.aspx for details. In this case, the concurrent thread(s) will need to wait to acquire the pagelatch thereby slowing the operation. The good thing is that the pages containing allocation structures (GAM, SGAM, PFS) are well known and have fixed page numbers in each file. For example, in file-id 1, the allocation pages IDs are
· PFS – 1
· GAM – 2
· SGAM - 3
A PFS page will appear every 8088 pages in a file. The GAM and SGAM will appear every 511232 pages and similarly in other files.
Diagnosing:
You can run the following DMV query to find any latch waits that occur in allocation pages
select session_id, wait_duration_ms, resource_description
from sys.dm_os_waiting_tasks
where wait_type like 'PAGE%LATCH_%' and
resource_description like '2:%'
Since the database id of TempDB is 2, the search argument ‘2.%’ represents any page in TempDB across any file. If this page happens to be GAM, SGAM or PFS, it will represent allocation bottleneck. Note, in a concurrent application, some blocking is expected so you will need to baseline the allocation waits when your application is performing normally. Only when the waits exceed the baseline significantly, it signals that you are incurring allocation bottleneck.
Troubleshooting:
SQL Server recommends the following
· Create atleast as many files of equal size as there are COREs/CPUs for SQL Server process. The rationale is that at a given time, the number of concurrent threads is <= number of COREs/CPUs. Don’t confuse this with the number of active sessions/batches.
· Enable TF-1118 if you are encountering bottleneck in SGAM structures.
· If you are still encountering allocation bottleneck, you will need to look at your application and see which query plans are creating/dropping objects in TempDB and if these objects are being cached and take corrective action when possible. Most IT shops have very limited choice here as they don’t own the application code.
Here I want to point out one seemingly harmless step to solve allocation bottleneck that can in fact make it worse. Say you are encountering allocation bottleneck and you decide to add one more file to the mix hoping that allocation will spread further. Well, if you recall proportional fill methodology; all new allocations will favor the newer file because it has the most free space. So suddenly, the allocation problem becomes even worse. It is a good practice to restart SQL Server when you add a new file, clearly it needs to be of the same size as other files, to TempDB.
Thanks
Sunil Agarwal
|
-
I hope my earlier blogs on TempDB (http://blogs.msdn.com/sqlserverstorageengine/archive/tags/TempDB/default.aspx) have given you a good background on the objects contained in TempDB and its configuration and have deployed your workload in production and it runs for ever without any glitches. Oh, really? Well, don’t we all wish it was so but as you may have already experienced, perhaps one time too many, that the performance/configuration problems do happen. In fact, most DBAs/Administrators spend 20-30% of the time in troubleshooting the issues. With that context, let us look into key issues with TempDB in production and steps to troubleshoot them.
TempDB is more vulnerable because it is a shared resource and any misbehaving application or adhoc query can probably cause extreme resource issues in TempDB thereby affecting other applications running on the same instance. Here is a list of 4 common issues that you may encounter with TempDB
· IO bottleneck
· Allocation Contention
· DDL Contention
· Running out of space
I will cover IO bottleneck in this blog and others in my next blogs with the emphasis on how to detect that this indeed is happening and the steps that you can take to address.
IO Bottleneck: Now, the IO bottleneck is nothing specific to TempDB. Any database can incur IO bottleneck if the IO bandwidth is not configured to meet the workload demands. It is possible, and in fact likely, that your workload has changed overtime such the initial configuration is not able to handle it.
When a customer hits an IO bottleneck, it is tempting to conclude that the IO subsystem needs to be upgrades. While this may in fact be the final solution, but it should not be the first and the only step. Here are the recommended actions
1. Check if you have memory pressure: If you don’t have sufficient physical memory allocated to the SQL Server process, you may incur additional IO that could be avoided. As a first step, you need to make sure you don’t have memory bottleneck by looking at the following counters
a. Buffer Cache Hit ratio
b. Page Life Expectancy
c. Checkpoint pages/sec
d. Lazywrites/sec
I will also advise you to look at DBCC Memorystatus command that gives details on how the memory is being used. You may want to refer to http://support.microsoft.com/kb/907877
2. Identify queries that are generating the most IOs: you can run the following DMV query to identify it. The output of the query shows the top 10 queries (the SQL text and the query plan handle) that are generating the most IOs across all its executions. Agreed that the IOs include IOs in TempDB and also in other databases, but it is a good place to start. You can investigate these queries to identify any bad query plan (perhaps an index was not chosen) and/or possibly look into rewriting the queries that minimize IOs.
SELECT TOP 10
(total_logical_reads/execution_count) AS
avg_logical_reads,
(total_logical_writes/execution_count) AS
avg_logical_writes,
(total_physical_reads/execution_count)
AS avg_phys_reads,
execution_count,
statement_start_offset as stmt_start_offset,
(SELECT SUBSTRING(text, statement_start_offset/2 + 1,
(CASE WHEN statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(MAX),text)) * 2
ELSE statement_end_offset
END - statement_start_offset)/2)
FROM sys.dm_exec_sql_text(sql_handle)) AS query_text,
plan_handle
FROM sys.dm_exec_query_stats
ORDER BY
(total_logical_reads + total_logical_writes) DESC
3. Once you feel satisfied that you have configured your memory correctly and the query plans are ok, then it makes sense to add more IO bandwidth for TempDB. Now, here is one catch. Say, you add 1 new file to TempDB on a faster disk. Guess what happens? Due to proportional fill algorithm, the access to new file will be skewed in its favor and this may in fact lead to more IO and allocation bottleneck. The recommended action is to add the new file(s) with the same size as existing files and then restart the SQL Server. This will guarantee that the IOs are distributed across all files. Other alternative will be to move all the files in the TempDB to a different disk subsystem.
Thanks
Sunil Agarwal
|
-
In my previous blogs, I described the types of objects in TempDB and how they are managed. I hope that it provided you with a good working knowledge of TempDB. Now the next question is how do I configure the TempDB for my production workload? In this context, there are three common questions as follows:
1. Where should I create TempDB?
2. What should be the size of TempDB?
3. Single file vs multiple file?
Let us tackle each of these questions in Order.
Where (i.e. what disks) should I create TempDB?
More often than not, customer may not realize that the cause of the slowdown in the workload is because the TempDB is on the slower device. This can happen as workload changes over time or the TempDB was configured on a slower device. The performance of your TempDB is critical to your workload as, I had indicated in my earlier blogs, the SQL Server uses TempDB to store intermediate results as part of executing a query, for example to create a hash table or to sort as a result of order by. So what should you do? Well, it is not any different than what you will need to for your user databases. You need to measure the IO bandwidth needed to meet the demands of your workload. Since the persistence requirements of TempDB are different (i.e. no REDO needed), you may want to consider creating TempDB on its own spindles. In fact, you may even consider using RAM Disk to achieve better performance. Please refer to the KB article http://support.microsoft.com/kb/917047
What should be the size of TempDB?
Unlike user databases where you can probably estimate the growth rate of tables over time based on the workload, estimating the size of TempDB is non-trivial. The size of TempDB can be affected for example by a change in query plans, version store when there is a long running transaction and so on. The best way to estimate the size of TempDB is by running your workload in a test environment. Once you have a good estimate of the size of TempDB, use Alter Database command to set its size with a safety factor that you feel is appropriate. Never, I mean never, let the TempDB grow to its steady state size through auto-grow. You should only use auto-grow as a last resort but not as a strategy. Also, remember that the TempDB is created every time you restart a SQL Server but its size is set to either default of Model database or the size you had set using Alter Database command (the recommended option)
· Don’t rely on auto-grow to manage file sizes. You must set the appropriate size of TempDB using Alter Database command. Here are the pitfalls of auto-grow
o Auto-grow causes a pause during processing when you can least afford it
o Less of an issue with instant file initialization
o Auto-grow leads to physical fragmentation
Should I use 1 file or multiple files for TempDB?
SQL Server recommends that you create as many files as there are COREs or CPUs to minimize allocation bottlenecks. Yes, it is true that starting with SQL Server 2005, there is better caching of TempDB objects but there are cases when the caching does not work and even with caching, the size of temp objects grows beyond 1 page (# tables) or 1 extent (internal objects), the SQL Server can potentially incur the allocation bottleneck. Please refer to the allocation-bottleneck for details on what causes allocation bottleneck. Before we proceed further, you may wonder why we talk about allocation contention in the context of TempDB and why not in the context of other databases. Clearly, the allocation contention can happen even in user database. The main reason is that the objects in TempDB are created/destroyed all the time to process customer workload which leads to order of magnitude more allocation contention.
To minimize allocation bottleneck, here is the recommendation
· Spread TempDB across atleast as many equal sized files as there are COREs or CPUs. Since allocation in SQL Server is done using proportional fill, the allocation will be evenly distributed and so is the access/manipulation of the allocation structures across all files. Note, you can always have more files than COREs but you may not see much improvement. One of the questions that people often ask is if they should create multiple files on the same physical spindle or multiple? The allocation bottleneck can be reduced by creating multiple files on single or multiple spindles. Creating files across multiple spindles will help increase the IO bandwidth but will have no additional impact on allocation bottleneck.
· If you are still encountering allocation bottleneck, you may want to consider enabling TF-1118. Under this trace flag SQL Server allocates full extents to each TempDB object, thereby eliminating the contention on SGAM page. This is done at the expense of some waste of disk space in TempDB. This trace flag has been available since SQL Server 2000. With improvements in TempDB object caching in SQL Server 2005, there should be significantly less contention in allocation structures. If you see contention in SGAM pages, you may want to use this trace flag. If your workload is not encountering SGAM contention, then enabling TF-1118 will not help. Also, this TF is instance wide and impacts the user databases as well.
Thanks
Sunil Agarwal
|
-
Allocation bottleneck refers to contention in the system pages that store allocation structures. There are three types of pages that store allocation structures as follows
· PFS (Page Free Space): it tracks the following key information about each page. It uses 1 byte/page and each PFS page stores status for 8088 pages.
o Free space available
o If it is allocated or not
o Has ghost records (when a row is deleted, it is marked as ghost)
· GAM (Global Allocation Map): Tracks if a uniform extent is free or not. A bit ‘1’ indicates that extent is available for allocation. Each GAM page tracks 64000 extents or 4GB
· SGAM (Shared Allocation Map): Tracks if an extent is mixed extent or not. A bit ‘1’ indicates that it is a mixed extent and has one or more free pages. Mixed extents are used to allocate first 7 pages of an object. If number of pages in an object is 8 or larges, its pages are allocated from uniform extents. Each SGAM covers 64000 extents or 4 GB
So for example, if you have a database file of size 16GB, it will have 4 GAM pages (each GAM covers 4GB), 4 SGAM pages and 256 PFS pages. It is worth mentioning that each file has its own PFS, GAM and SGAM pages.
Now, let us look at a high level what happens during allocation. A page allocation is done when an object is first created or as part of inserting the row that requires a new page to be allocated. Please note, my intent in the following description is not to describe the allocation algorithm of SQL Server (in fact, the allocation mechanism is much more complex and possibly widely different) but to give you an idea how you can get contention in allocation structures.
· SQL Server looks at the target extent if there are any pages available for allocation by examining the PFS page under SH latch. If a page is found, it is allocated and the PFS information is updated under UPDATE latch. For HEAPs, but not for Btree, the SQL Server will look into other allocated pages with enough free space to store the new row and then updates the new free space information by taking the UPDATE latch on PFS page.
· If no such page is found, the SQL Server looks at GAM page under SH latch to find a free uniform extent (assuming the object already has > 8 pages). If no such extent is found, then it looks at the next GAM page and so on. When a free extent is found, the SQL Server takes an UPDATE latch on the GAM page and updates the information.
· If the object has <= 8 pages, a page from mixed extent needs to be allocated. The SQL Server looks at SGAM page under SH latch and if a mixed extent with one or more free page is found, the page is allocated and the SGAM page is updated by taking the UPDATE latch
Similarly, when page is deallocated or an object is dropped or when a row is deleted (for HEAPs), the allocation structures need to be updated. As you can now imagine, if there is significant allocation/deallocation activity in a SQL Server database, multiple threads may need to wait to acquire X or SH latch in non-conflicting mode which can lead to allocation bottleneck.
Starting with SQL Server 2005, the caching mechanism for objects in TempDB has been improved significantly which will reduce the allocation contention incurred by your workload. you can address SGAM bottleneck by enabling TF-1118. Please refer to Managing TempDB in SQL Server: TempDB Configuration.
Thanks
Sunil Agarwal
|
-
In the previous blog http://blogs.msdn.com/sqlserverstorageengine/archive/2008/12/31/managing-tempdb-in-sql-server-tempdb-basics-version-store-logical-structure.aspx , I described the logical/physical structure of a row in version store. A natural question that comes up is how these pages are grouped together and how does SQL Server garbage collect these rows. You may recall that if you need to DELETE a large number of rows from a table, an efficient option is to truncate or even drop the table rather than deleting one row at a time. Similarly, the SQL Server has to delete 1 row at a time. It can be quite expensive. So a better option is to DELETE bunch of rows together. For this reason, the SQL Server groups the row versions in a physical unit called ‘Append-Only store’. All the versioned rows in an Append-Only store are garbage collected together. Alternatively, you can think of Append-Only store as an internal table, but it is not really a table like #table, and this table is dropped once SQL Server determines that it does not need the row versions in it. You can see that this operation is much more efficient compared to deleting 1 versioned row at a time. A new ‘Append-Only Store’ is created very minute, if needed (i.e. if there is no transactional activity that generates a row version, the Append-Only store will not be created). The version store represents the aggregation of all active ‘Append-Only’ stores.
Let us now work through an example to illustrate the functioning of the version store. For simplicity, let us assume that the transactional activity is such that a new Append-Only store is being created every minute. Let us say that Append-Only store AS1 was created at time T and the two subsequent Append-Only stores AS2 and AS3 were created at time (T+1) and (T+2) respectively. You may now wonder how SQL Server chooses a specific Append-Only store to “insert” the row version. Well, the SQL Server associates an Append-Only store to a transaction at its start. So for example, for transactions that start between T and (T+1), the version store is stored in AS1. Similarly, for transactions that start between (T+1) and (T+2), the row versions are stored in AS2. Note, a transaction may be long running say 5 minutes and can be generating row versions throughout its life time, but still all the row versions it creates go to the Append-Only store that was assigned to it in the begin of the transaction. You can also imagine that in its life-time, each Append-Only store will have different sizes depending upon the row versions generated by associated transactions. One important point that I did not mention earlier is the “insert” into the Append-Only store are not logged. Why? Well if SQL Server has to fail, all active transactions will be rolled back anyways so the SQL Server does not need the “state” of version store as of the time when SQL Server stopped.
Let us now look at when it is safe to remove an Append-Only store. To remove an Append-Only store, the SQL Server must guarantee that no transaction will need to look at the row versions it contains. For the above example, to remove Append-Only store AS1, the following conditions must be met
1. All transactions that started between T and (T + 1) have completed. Let us say that these transactions completed at time T2.
2. All transactions that started between (T + 1) and T2 and needed row versions (i.e. the transactions running with RCSI or SI) from AS1 have completed.
3. All Append-Only stores created before AS1 have been removed
Here are some key observations (a) typically, the transactions commit in the order they were started so this scheme of deleting Append-Only store works well. (b) if you have a long running transaction, it will prevent deleting Append-Only stores created since the start of the transaction. This can cause version store to grow and subsequently the TempDB may go out of space. While this may sound alarming but it is similar to when a long running transaction can cause the transaction log to fill up. The key difference here is that the impact of versioning is wider as it can impact all databases that need to create row versions. Typically a long running transaction indicates a poor application design. SQL Server provides tools to detect the version store growth. I will cover this in troubleshooting section in later blogs.
On another note, you may wonder how the version store is managed for ONLINE index which can potentially take an order of magnitude more time than a transaction. The SQL Server gets around this issue by ‘internally’ creating another version store, distinct from the one I just described, for ONLINE index build containing the row versions of the table on which the index is being built.
This concludes my discussion on the version store. I will cover the troubleshooting of version store as part of overall troubleshooting of TempDB. Hope this provided you with some deeper understanding and I look forward to your comments. I also want to take this opportunity to wish you all a very happy new year.
Thanks
Sunil Agarwal
|
-
Now that we know a few things about Version Store, let us now look into its structure to understand how it stores rows from different tables/indexes with different schema. Interestingly, you don’t need to look far and the answer is available when you examine the DMV sys.dm_tran_version_store.
This DMV shows the full logical structure of the version store. There are two key points to note. First, the version store consists of 8K pages just like data or index pages. These pages exist in the buffer pool and can be flushed to the disk, in this case to the TempDB, under memory pressure. The rows on the version store follow the same rules as any data/index row. Second, the row versions are stored as their full ‘binary’ image, like the way it is stored in the data page’, in the version store. The binary image is broken into ‘first-part’ and ‘second-part’ which is then combined internally by the SQL Server to interpret it just like it would interpret the actual data/index row according to the schema. This makes the row version storage independent of the owning schema of the object. So a given version store page can have rows from multiple tables and indexes and in fact they can be from any database under SQL Server instance. In other words, the version store is shared among all databases in the SQL Server instance. Just like the pages of a table or an index may need to be kicked out of buffer pool under memory pressure so are the pages of version store. For this reason the version store is backed by persistence in TempDB.
If you look at the fields in the sys.dm_tran_version_store DMV, you will note that the version row stores many other attributes like database-id, row-length etc which is not part of the original data/index row. So you may wonder how does the version of a row with a size of 8060 (max allowed length of a row in SQL Server) stored in the version store as the version store ‘row’ stores many other attributes as described below? Well, the answer is that the data row is broken into 2 rows physically on the version store page but it does not show as such in the DMV because it is ‘virtual’ and shows it as 1 big row.
Here is an example of the contents of version store where transaction with XSN 57 has updated three different rows while a transaction with XSN 58 has updated only 1 row. Note, if a transaction updates the same row multiple times, there is only 1 version created because for other transactions it looks like the first transaction is holding a X lock.
transaction_sequence_num version_sequence_num database_id
------------------------ -------------------- -----------
57 1 9
57 2 9
57 3 9
58 1 9
rowset_id status min_length_in_bytes
-------------------- ------ -------------------
72057594038321152 0 12
72057594038321152 0 12
72057594038321152 0 12
72057594038386688 0 16
record_length_first_part_in_bytes
---------------------------------
29
29
29
33
record_image_first_part
--------------------------------------------------------------------
0x50000C0073000000010000000200FCB000000001000000270000000000
0x50000C0073000000020000000200FCB000000001000100270000000000
0x50000C0073000000030000000200FCB000000001000200270000000000
0x500010000100000002000000030000000300F800000000000000002E0000000000
record_length_second_part_in_bytes record_image_second_part
---------------------------------- ----------------------
0 NULL
0 NULL
0 NULL
0 NULL
In the next blog, I will describe how the pages in the version store are grouped together and the subsequent garbage collection and gotchas
Thanks
Sunil Agarwal
|
|
|
|