We just released CTP2 of SQL Server 2008 R2. What is this CTP? Think about it as Beta2 of SQL Server 2008 R2. There will be another CTP which should be released in the last quarter of this calendar year. Final release time frame should be very early in the second half of next calendar year. As a background, the R2 version of SQL Server 2008 only has minimal changes in the SQL Server Relational Engine. Most of the changes have to do with new components in the overall SQL Server package. However there is one change in the Relational Engine which could have significant impact on TCO. As most of you know, SQL Server stores Unicode strings in columns of the datatype nvarchar. As encoding we use UCS2. This encoding usually stores every character in 2 bytes. This is different to other encodings like UTF8 where characters out of single byte code pages get stored as one byte on disk. (However on the other side UTF8 can use up to 4bytes in typical double byte collations like Kanji.) – The current solution had room for improvement: Save space for characters out of single byte code pages.

The goal we set ourselves for SQL Server 2008 R2 looked as follows:

·         Spend one byte of disk storage for single byte collations like the typical Latin1 collation

·         Improve storage efficiency even for double byte character collations

·         Make the usage of the new way of storing nvarchar content completely transparent. Means beyond the usage of either ROW or PAGE compression, no further steps need to be applied.

·         For already existing ROW or PAGE compressed tables in databases which were attached or restored from a SQL Server 2008 source newly inserted or modified tables will be stored in the new format even in combination with row in the old format on the same page

·         Performance impact shouldn’t be measurable in productive scenarios

In order to achieve our goal, we implemented a new way how we actually store our UCS2 encoded nvarchar datatypes on disk. The algorithm applied is widely known as SCSU (Simple Compression Scheme for Unicode). It is applied at the moment a row is placed on a page and it is absolutely transparent to the application and other layers of SQL Server. We achieved the goal of making the usage transparent with coupling our UCS2 compression with ROW and PAGE compression. Means if one chooses to compress a table in SQL Server 2008 R2 with ROW or PAGE compression the new UCS2 compression comes with it automatically. Not using any compression against a table also doesn’t give any Unicode compression. So we implemented UCS2 compression as an extension of our database compression features we implemented already in the SQL Server 2008 release.

In terms of performance impact I spent a lot of time over last Christmas testing based on the SAP SD benchmark and some examples of typical BW databases. One really can say that the impact of the UCS2 compression on top of our existing compression methods is in the extreme low single digits. This is basically a change that in a productive system isn’t really recognizable. These tests and measurements conducted with other workloads, including standard TPC based benchmark workload, gave us enough confidence to simply couple the new UCS2 compression with our existing methods because the increase on CPU consumption proofed negligible.

So what is the benefit we measured in some of the comparisons? See this table:

Locale   

Pure UCS2 in SQL Server 2008

UTF-8   

UCS2 compression in SQL Server 2008 R2   

English   

1

0.5

0.5

Kanji   

1

1+

0.85

Korean   

1

1

1

Turkish   

1

0.53

0.52

German   

1

0.5

0.5

Vietnamese   

1

0.68

0.61

Hindi   

1

1

0.5

 

What does this really mean for a SAP ERP system? In order to answer this question I took a copy of our own 5+TB SAP ERP database. I completely defragmented the database and did a series of exercises which ended up with the results presented in the following table:

Compression Type

Percentage of original

No compression

100%

ROW Compression SQL Server 2008

83.35048047

ROW Compression SQL Server 2008 R2

64.28181983

Page Compression SQL Server 2008

46.35964575

PAGE Compression SQL Server 2008 R2

44.05743641

 

Please keep in mind that the percentages are based on a completely reorganized database. In real productive systems we usually see a larger gain due to the compression plus additional effects of data reorganization when enabling database compression. A many of our customers already confirmed, ROW compression didn’t introduce a measurable more on resource consumption and already reduced the volume by a nice portion. Using SQL Server 2008 R2, the effect will even be larger: The used space within the database will basically shrink to 2/3 of its original size; without measurable increase on CPU resource consumption. As expected with Page Level compression the benefit of UCS2 compression is not as significant since Pre-Fix and duplicate optimizations already do a great job. These are measurements conducted with compressing the clustered indexes (which include the data layer) only, as supported by SAP at this point in time. We will work in the next 12 months to get the non-clustered indexes compressed as well. Especially with Page Level compression we will see another dramatic impact on database size.

Please see my next article explaining more about how one can use UCS2 compression and apply it to already existing ROW/PAGE compressed tables after running the database on SQL Server 2008 R2.