Recently I worked with a partner who was seeing some interesting behavior.  Upon rebuilding their indexes they noticed that the total space used by all indexes increased significantly.   The table has no clustered index but does have a total of nine non-clustered indexes. 

The sequence of events is as follows:

·      Step 1: Approximately 12 million rows are inserted into an existing table via some batch loading of the data. 

 

·       Step 2: All the indexes on the table are rebuilt using:

 

ALTER INDEX MyIndex ON MyTable WITH (SORT_IN_TEMPDB=ON, ONLINE=ON)

The sp_spaceused procedure was used before and after each of the steps above to measure the amount of space used by the table and indexes.  Here are the results:

BEFORE STEP 1:

 

sp_spaceused MyTable

 

Name          Rows         Reserved       Data          Index Size      Unused

------        ------       ----------     -------       ----------      -----------

MyTable       1156563588   324009704 KB   88318384 KB   235511080 KB    180240 KB

 

AFTER STEP 1, BEFORE STEP 2 (Data added, nothing done to indexes):

 

sp_spaceused MyTable

 

Name          Rows      Reserved       Data         Index Size   Unused

------        ------    ----------     -------      ----------   -----------

MyTable       1169556034 329729960 KB   89645944 KB 240051312 KB 32704 KB

 

 

AFTER STEP 2 (All indexes rebuilt):

 

sp_spaceused MyTable

 

Name          Rows      Reserved       Data         Index Size   Unused

------        ------    ----------     -------      ----------   -----------

MyTable       1169595370 459848840 KB   89649160 KB 363548216 KB 6651464 KB

 

Notice the size of the index after adding the rows to the table and then after rebuilding the indexes (highlighted in red above).  Before the rebuild the index size was approximately 240GB but after the rebuild it was nearly 365GB, an increase of nearly 50%.

 

It is also worth noting that in the above sample there were concurrent inserts against this table while the 12 million rows were added, as well as during the index rebuilds. This is why the indexes were rebuilt online and explains the differences in rowcount between each step.

 

Why did my index sizes increase so much?

The answer is related to the fact that:

  • Read Commited Snapshot Isolation (RCSI) was enabled on the database
  • the index was rebuilt ONLINE 

When RCSI is enabled on a database there is an additional 14 bytes appended to each row as it is inserted, updated or deleted.  This applies to the table as well as any index modified by the action. The purpose of this extra space is to maintain information about row versions that is needed for the RCSI functionality.  This is described in greater depth in the following blog: http://blogs.msdn.com/sqlserverstorageengine/archive/2008/03/30/overhead-of-row-versioning.aspx.

When an index is rebuilt using the option ONLINE=ON SQL Server will append these 14 bytes to every row during the rebuild.  However, when the index is rebuilt with the option ONLINE=OFF the 14 bytes are not appended but instead removed from any existing rows.   The difference in behavior is by design.  

Please note: When RCSI is not enabled, the above does not apply since there is no need to maintain row version information.

An existing index can be expected to increase in size after an online rebuild when either of the following is true.

1.    An index with existing data has been rebuilt offline prior to the online rebuild. In this case the size of the index will increase because the 14 bytes removed during the offline rebuild are added to each row during the online rebuild. 

2.    The database had existing data prior to RCSI being enabled.  In this case any online rebuild will add the additional 14 bytes to each row.

 

For this specific scenario the database had existing data prior to RCSI being enabled and these indexes were also rebuilt offline at some point in time before the online index rebuilds were performed.  

How much can I expect my indexes to grow in size?

The size increase as a percentage of the original index sizes will be dependent on the size of the index keys for the indexes.  This can be much higher in cases when the size of the key columns in bytes is relatively small. 

The below illustrates the approximate size of the index keys and the increase in size as a percentage of the key size for two of the nine indexes.  The Row Identifier below (RID) is added to each row since this table is a heap (no clustered index) and is used to identify the FILEID:PAGEID:ROWID for each index row.

CREATE UNIQUE NONCLUSTERED INDEX [NC_Idx_1] ON [MyTable]

(

      [Column1] ASC, --[PersonID] [numeric](16, 0)  (9 bytes)

      [Column2] ASC, --[int]                   (4 bytes)

      [Column3] ASC, --[varchar](20)                (20 bytes max)

      [Column4] ASC, --[varchar](20)                (20 bytes max)

      [Column5] ASC  --[char](10)                   (10 bytes)

 

--9+4+20+20+10+8(RID)=71 (Maximum original key size)

--                    +14 (RCSI – versioning information)

--                    =85

-- (~20% increase in size, could be more if varchar columns have <20 bytes)

)

 

CREATE UNIQUE NONCLUSTERED INDEX [NC_Idx_2] ON [MyTable]

(

      [Column2] ASC, --[int]                 (4 bytes)

      [Column5] ASC  --[char](10)           (10 bytes)

 

--4+10+8(RID)=22 (Approx. original key size)

--           +14 (RCSI – versioning information)

--            =36

-- (~60% increase in size)

)

 

As shown above, the additional 14 bytes introduced as part of the rebuild is significantly more as a percentage of the total row size for NC_Idx_2 than for NC_Idx_1.  For this particular example there were nine indexes on the existing table, five of which had relative small key sizes (similar to NC_Idx_2 above). This explains way there was such a large increase in size as a percentage of the original size.  In addition to the sp_spaceused procedure, the DMV sys.dm_index_physical_stats exposes a column avg_record_size_in_bytes which can be used to measure the average row size within an index before and after index rebuilds.  The avg_record_size_in_bytes includes the 14 bytes added by RCSI, if present, as part of the calculation.

Related to this behavior, there are some other interesting considerations.

1.     After an index is rebuilt offline, workloads that do many updates or deletes may introduce fragmentation to the index.  When an index is rebuilt offline any existing versioning information is removed from the row however any update or delete will add these 14 bytes back into to the row.  When data pages are nearly full, as is the case after index rebuilds, the increase in row size as a result of the addition 14 bytes may result in page splits.  This problem can be avoided by explicitly specifying a FILLFACTOR less than 100% when rebuilding an index.  This will leave free space on the data/index pages and reduce the likelihood of splits. This is not a consideration if indexes are rebuilt online since the row versioning information will exist on each row already.

 

2.    Data compression (in SQL Server 2008) performed on an index or table (either ROW or PAGE compression) using the ONLINE=ON option may result in less space saving than expected when RCSI is enabled due to the addition of the 14 bytes.  These 14 bytes are initialized with a timestamp and place holder for the version record pointer and the information is not compressed by either ROW or PAGE compression.  The stored procedure sp_estimate_data_compression_savings can be used as a method to measure the expected impact of data compression on the index.

 

It is possible that this behavior may change in a future release of SQL Server.