Estimating the space savings with data compression

Estimating the space savings with data compression

Rate This
  • Comments 4

Like I had indicated in my previous blog, it is recommended that you estimate compression savings on the object of interest before actually enabling compression as enabling compression is an expensive operation. To show how to estimate data compression, I have created a very simplistic and definitely not realistic example to show case data compression estimates for the following table.

 

 

create table t1_big (c1 int, c2 int, c3 char(8000))

go

 

I will show you the compression estimates both for ROW and PAGE compression. You may recall that

·         ROW compression removes unneeded bytes from the column values by storing them in variable length format.

·         PAGE compression stores the repeating value only once per page and sets the pointer from the respective columns within the page.

 

 

Now load 6000 rows into this table. Since there is one row per page, the size of this table will be 48 MB (6000  * 8K = 48MB). Note that for the third column, we are inserting has same value for all the rows and that we are only inserting  60 characters even though the size of of the column is 8000.

 

declare @i int

select @i = 0

while (@i < 6000)

begin

insert into t1_big values (@i, @i + 6000,  replicate (‘a’, 60))         

            set @i = @i + 1

end

 

-- find the current size of the uncompressed table

EXEC sp_spaceused N't1_big'

 

The output of this command is as follows showing that the size of the table is 48 MB as we expect.

 

Name                Rows                Reserved          Data                 Index_size         unused

t1_big               6000                 48008 KB          48000 KB          8 KB                 0 KB

 

 

Now, estimate the size of compression by executing the following stored procedure. When you execute this stored procedure, you are instructing SQL Server to find compression savings with ROW compression on the table t1_big. The parameter 3 and 4 refer to index-id and partition-id. You can use this stored procedure to estimate compression savings even for an index or for a  specific partition(s).

 

exec sp_estimate_data_compression_savings

            'dbo', 't1_big', NULL, NULL, 'ROW'

 

 

Here is the “re-arranged” output of the above command for better readability. There are few key points to note. First, the estimated space savings from ROW compression is around 99% for this example (the estimated size goes from 48MB to ½ MB). Though this kind of space savings are not realistic but it is easy to see why we got such a huge savings for the current schema. For example, in this case, the majority of space savings come from removing the trailing blanks from column C3. Second, when estimating compression savings, the SQL Server creates a sample table/index in tempdb and then compresses it with the requested compression mode to find the space savings. The stored procedure extrapolates the savings achieved on the sampled table to the original table.

 

object_name   schema_name  index_id   partition_number 

-----------           -----------             -------      ----------------- 

t1_big              dbo                      0             1               

 

size_with_current_compression_setting(KB) 

-----------------------------------------

48008                                                                               

 

size_with_requested_compression_setting(KB) 

-------------------------------------------

472

 

sample_size_with_current_compression_setting(KB)

----------------------------------------------- 

39648                                           

 

sample_size_with_requested_compression_setting(KB)

--------------------------------------------------

392

 

 

Note, in the case of ROW compression, all SQL Server does is to remove the trailing blanks from the CHAR(8000) column type and remove extra bytes from integer values. Since we have the same value (60 characters) for all the rows for column C3, this value will need to be stored only once on the page with PAGE compression. So it is easy to see why PAGE compression will lead to huge space savings. Here is the estimate with PAGE compression.

 

-- estimate the PAGE compression

exec sp_estimate_data_compression_savings

            'dbo', 't1_big', NULL, NULL, 'PAGE'

 

Here is the output that shows the estimated compression to be

object_name   schema_name  index_id   partition_number 

-----------          ----------------       ----------   ----------------- 

t1_big              dbo                    0              1               

 

size_with_current_compression_setting(KB) 

-----------------------------------------

48008                                                                               

 

size_with_requested_compression_setting(KB) 

-------------------------------------------

80

 

sample_size_with_current_compression_setting(KB)

----------------------------------------------- 

39960                                           

 

sample_size_with_requested_compression_setting(KB)

--------------------------------------------------

72

 

In my next blog, I will show you the estimates for a more realistic example.

Leave a Comment
  • Please add 2 and 4 and type the answer here:
  • Post
  • very nice feature, indeed. One question though. Just in case, if that particular page which actually holds the value(60 characters) in this case gets corrupted or somehow damaged, what would happen to the pointers?

  • for your case, the value pointed to by the pointers is corrupt. There are two cases. First, the value got corrupted but it is still a valid value (e.g. corruption caused value 'a' to become 'c'), then SQL Server can't detect it. This kind of issue is likely due to corruption in IO path. SQL Server recommends that you enable CHECKSUM on the database and then SQL Server will flag this issue when the page is read in the buffer pool. Second case, the value is actually invalid. This will be treated like any other corruption in the page. Note, in this case, a corruption in one value can potentially impact one or more rows.

  • I attended your session at SQL Connections in Las Vegas and was excited with your presentation. As soon as I received the email with the November CTP, I downloaded it quickly so I could see how compression would work with my databases. But… did data compression not make this build? If not, when do you think it will be released?

  • thanks for your interest. data comrpession will be available starting from CTP-6.

Page 1 of 1 (4 items)