SQL Server Storage Engine

Why is the NULL bitmap in a record an optimization?

I've had some questions sent in comments and I wanted to reply to some of them using a post so others who aren't subscribed to the comment sections can see the answers.

 

Wesley Backelant said:

I was wondering what exactly you mean by "this allows an optimization when reading columns that are NULL".  Could you elaborate on that?
Firstly, having a null bitmap removes the need for storing special 'NULL' values for fixed-length datatypes. Without the null bitmap, how can you tell whether a column is NULL?
 
That's easy for a variable-length column - just check the length. If it's zero, then the column is NULL.
 
[Edit] Ryan Stonecipher (the dev responsible for DBCC) pointed out that I'd forgotten the case of empty strings - thanks Ryan. In this case, an empty string also has a zero length so for varchar columns you'd need to use the fixed-length solution described below.
 
It's not so easy for fixed-length columns, which, as their name suggests, have a fixed-length so that trick doesn't work. The only solution is to define a special 'NULL' value, which limits the effective range of the datatype being stored.
 
Secondly, it saves CPU cycles. If there was no NULL bitmap, then there are extra instructions executed for fixed- and variable-length columns.
 
For fixed-length:
  1. read in the stored column value (possibly taking a cpu data cache miss)
  2. load the pre-defined NULL value for that datatype (possibly taking a cpu data cache miss, but only for the first read in the case of a multiple row select)
  3. do a comparison between the two values

For variable-length:

  1. calculate the offset of the variable length array
  2. read the number of variable length columns (possibly taking a cpu data cache miss)
  3. calculate the position in the variable length offset array to read
  4. read the column offset from it (possibly taking a cpu data cache miss)
  5. read the next one too (possibly taking another cpu data cache miss, if the offset in step 4 was on the boundary of a cache line size)
  6. compare them to see if they're the same

But with a NULL bitmap, all you have to do is:

  1. read the NULL bitmap offset (possibly taking a cpu data cache miss)
  2. calculate the additional offset of the NULL bit you want to read
  3. read it (possibly taking a cpu data cache miss)

So, its about even for a lookup of a single fixed-length column, but for variable-length columns, and for multiple row selects, there's a clear advantage to having the NULL bitmap.

Hope this makes my comment a bit clearer.

Published Thursday, June 29, 2006 2:42 AM by Paul Randal - MSFT
Filed under:

Comments

 

Wesley Backelant said:

Oh if you put it that way ^^ :-)

Thanks for the great elaboration!
June 29, 2006 3:05 AM
 

Glorf IT - Bedenkliches aus dem IT-Alltag » SQL Server Storage Engine said:

February 18, 2009 1:27 PM
Anonymous comments are disabled

About Paul Randal - MSFT

Paul started in the industry in 1994 working for DEC on the VMS file system and check/repair tools. In 1999 he moved to Microsoft to work on SQL Server, specifically on DBCC. For SQL Server 2000, he concentrated on index fragmentation (writing DBCC INDEXDEFRAG and DBCC SHOWCONTIG) plus various algorithms in DBCC CHECKDB. During SQL Server 2005 development Paul was the lead developer/manager of one the core dev teams in the Storage Engine, responsible for data access and storage (DBCC, allocation, indexes & heaps, pages/records, text/LOB storage, snapshot isolation, etc). He also spent several years rewriting DBCC CHECKDB and repair. For SQL Server 2008, Paul managed the Program Management team for the core Storage Engine to become more focused on customer/partner engagement and feature set definition. In 2007, after 8.5 years on the SQL Server team, Paul left Microsoft to join his wife, Kimberly Tripp, running SQLskills.com and pursuing his passion for presenting and consulting. Paul regularly presents at conferences and user groups around the world on high-availability, disaster recovery and Storage Engine internals. His popular blog is at http://www.sqlskills.com/blogs/paul/.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker