Microsoft SQL Server 2008 introduces two new compression features (available in Enterprise Edition) that are of interest to users with large databases. Data compression applies to individual tables and indexes within a database and comes in two options: ROW and PAGE. Backup compression is a new option when backing up a database to external storage. Both compression techniques can be used to reduce the amount of online storage required for databases. Data compression can be configured for an entire heap (table with no clustered index), clustered index, non-clustered index, or indexed view. For partitioned tables and indexes, the compression option can be configured separately (and differently) for each partition. Compression can also reduce disk utilization and sql memory utilization as dead is stores on disk in a compressed state and also reads in the SQL cache in a compressed state. Compression can add 10 - 30% percent increased CPU utilization depending on what tables and indexes are compressed and what level of compression is used.
For Dynamics NAV we recommend only compressing tables and indexes that have a read to write ration of 80%/20% (This is a conservative threshold) or higher as compressing tables with a higher write ratio can actually decrease performance. We also recommend using ROW compression if the space saving between ROW and PAGE level compression is less that 10%; if the difference is over 10% then we recommend PAGE compression. This is because if the space savings from PAGE compression is close to or similar to ROW compression then it is not recommended to incur the additional overhead associated with PAGE compression. An example of NAV tables that would benefit greatly from compression are the large "Entry" tables such as G/L Entry, Value Entry, and Item Ledger Entry. An example for NAV tables that would not benefit from compression and where performance may actually decrease due to compression are "Line" tables where the data is temporary in nature such as Sales Line, Purchase Line, and Warehouse Activity Line. SQL Server compression is completely transparent to the Dynamics NAV application.
Compression is done on Table or Index basis. On tables with clustered indexes the clustered index is the table so compressing the clustered index is equal to compressing the table. Running the ALTER TABLE.. WITH COMPRESSION is only necessary on HEAPS for all tables with Clustered Indexes you can use the ALTER INDEX.. WITH COMPRESSION.
How to determine the read/write ratio of an index? Luckily SQL keeps track of this for us and all we need to do is extract this data. SQL Server stores this information in the sys.dm_db_index_operational_stats DMV. Remember DMV's are "recycled" each time the SQL Server service is restarted so if SQL has only been up and running for a day or a week this information will be of minimal use. Ideally you would want the server to be up any running for several weeks and through a month end close to get a true idea of actual data access patterns of your database.
The following query will tell you how long the SQL Server instance has been up and running:
select 'Sql Server Service has been running for about '
+ cast((datediff(hh, create_date, getdate()))/24 as varchar(3)) + ' days and '
+ cast((datediff(hh, create_date, getdate())) % 24 as varchar(2)) + ' hours'
from sys.databases where name = 'tempdb'
The following query will give the approximate read write balance of all the used indexes in the database.
i.leaf_update_count * 100.0 /
(i.range_scan_count + i.leaf_insert_count
+ i.leaf_delete_count + i.leaf_update_count
+ i.leaf_page_merge_count + i.singleton_lookup_count
) as Writes,
i.range_scan_count * 100.0 /
) as Reads
FROM sys.dm_db_index_operational_stats (db_id(), NULL, NULL, NULL) i
JOIN sys.sysobjects o ON o.id = i.object_id
JOIN sys.indexes x ON x.object_id = i.object_id AND x.index_id =
JOIN sys.sysindexes s ON s.id = x.object_id and s.indid = x.index_id
WHERE (i.range_scan_count + i.leaf_insert_count
+ i.leaf_delete_count + leaf_update_count
+ i.leaf_page_merge_count + i.singleton_lookup_count) <> 0
AND objectproperty(i.object_id,'IsUserTable') = 1
You can also run a simulation with the sp_estimate_data_compression_savings stored procedure in SQL with will run a sample of the data through the chose compression level and give you the estimate of the space saved by enabling compression.
I strongly recommend reading the following MSDN article before engaging in database compression activities.
Note: You can use the following query to check to see if there are currently any compressed indexes in the database.
SELECT o.name Table_Name, p.index_id as Index_ID, i.name as Index_Name,
WHEN p.data_compression = 1 THEN 'ROW Compression'
WHEN p.data_compression = 2 THEN 'PAGE Compression'
FROM sys.partitions p
JOIN sys.objects o ON p.object_id = o.object_id
JOIN sys.sysindexes i ON o.object_id = i.id AND p.index_id = i.indid
AND p.data_compression in (1,2)
ORDER BY o.name, p.data_compression, p.index_id
Michael De Voe
Senior Premier Field Engineer
Microsoft Certified Master - SQL Server 2008
great article, and the script to determine the read/write ratio is really helpful! To add my two cents:
as you mentioned, all kinds of "Ledger Entry" tables are perfect candidates for PAGE compression. According to this, all kinds of posted documents (Sales Shipmnet Line, Sales Invoice Line, etc.) and posted Dimensions (Posted Document Dimension, Ledger Entry Dimension, etc.) are also "classic" candidates.
In my experience I have never seen an increased CPU load about 30%, usually it's something between +5 and +10 percent, +20 at worst if the Server sizing is not really sufficient.
In my opinion the DATA COMPRESSION is the coolest feature with SQL 2008 - some real value you get for the Enterprise license fee.
I agree that 30% is a worst case scenario if you were to compress every index in the database with PAGE compression. The tests I have run in the lab and at customer sites was between 10-15%.
Cool feature, cool post, and cool scripts, thank you!
P.S.: there's an extra comma here: ") as Reads," that makes a syntax error
Thanks, Mirko! The query is now updated.
"Remember DMV's are "recycled" each time the SQL Server service is restarted ..."
For sys.dm_db_index_operational_stats the "recycling" is done each time the object leaves the metadata cache.
According to books online
The data returned by sys.dm_db_index_operational_stats exists only as long as the metadata cache object that represents the heap or index is available. This data is neither persistent nor transactionally consistent. This means you cannot use these counters to determine whether an index has been used or not, or when the index was last used. For information about this, see sys.dm_db_index_usage_stats (Transact-SQL).
The values for each column are set to zero whenever the metadata for the heap or index is brought into the metadata cache and statistics are accumulated until the cache object is removed from the metadata cache. Therefore, an active heap or index will likely always have its metadata in the cache, and the cumulative counts may reflect activity since the instance of SQL Server was last started. The metadata for a less active heap or index will move in and out of the cache as it is used. As a result, it may or may not have values available. Dropping an index will cause the corresponding statistics to be removed from memory and no longer be reported by the function. Other DDL operations against the index may cause the value of the statistics to be reset to zero.
Wouldn't it be better to use sys.dm_db_index_usage_stats?
Can you explain the calculation for the 'Writes' and 'Reads'?
You are correct the DMV's are recycled when the SQL Server instance in restrart. That is why I included the query to determine how long the SQL Server instance has been up and running. As to using sys.dm_db_index_usage_stats as a replacement for sys.dm_db_index_operational_stats it is funny you should ask as I have built a whole tool around analyzing and implementing compression and I am actually thinking about making this replacement in the tool. I am currently testing to see which one is more accurate. I will also be doing a session at Convergencein the US on this topic.
What I was trying to say is that the data returned from sys.dm_db_index_operational_stats does not represent the period from the startup of SQL Server. It represents the period starting from the point where an object enters the cache.
If you have an index that is used only (for example) once a day and a limited amount of memory the object might very well leave the buffer pool completely before being used the next time. As from what I understand the counters will then be reset to zero.
The DMV sys.dm_db_index_usage_stats only resets when the service is restarted. This guarantees results that represent the period from the start of the SQL Server.
IJeb Reitsma - Point taken, and I recently went through the compression process with a customer with a 1.8 TB database and we ended up tossing out the data from sys.dm_db_index_operational_stats and using sys.dm_db_index_usage_stats for a varity of reason and I have rewritten my compression tool to now use usage stats DMV. Below is the new query that I am using now.
O.ID AS OBJECT_ID,
O.name AS TABLE_NAME,
S.NAME AS INDEX_NAME,
S.INDID AS INDEX_ID,
X.TYPE_DESC AS INDEX_TYPE,
S.ROWCNT AS ROW_COUNT,
WHEN ( SUM(user_updates + user_seeks + user_scans + user_lookups) = 0 ) THEN NULL
ELSE ( CAST(SUM(user_updates) AS DECIMAL) / CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL) )
END AS PERCENT_UPDATE,
ELSE ( CAST(SUM(user_seeks + user_scans + user_lookups) AS DECIMAL) / CAST(SUM(user_updates + user_seeks + user_scans + user_lookups) AS DECIMAL) )
END AS PERCENT_SCAN
FROM SYS.DM_DB_INDEX_USAGE_STATS I
JOIN SYS.SYSOBJECTS O ON O.ID = I.OBJECT_ID
JOIN SYS.SYSINDEXES S ON S.ID = I.OBJECT_ID AND S.INDID = I.INDEX_ID
JOIN sys.indexes x ON x.object_id = i.object_id AND x.index_id = i.index_id
WHERE O.NAME NOT LIKE ('SYS%')
AND S.ROWCNT > 0
AND S.NAME IS NOT NULL
AND I.DATABASE_ID = DB_ID()
O.ID, O.name, S.NAME, S.INDID, X.TYPE_DESC,S.ROWCNT
Any discussion yet on making vardecimal available in NAV? To go with that it would be really nice to have that feature included in the standard edition.