SQL Server Storage Engine

CHECKDB (Part 2): What does CHECKDB really do? (1 of 4)

Hmmm - I sat for 5 minutes thinking of something amusing to say to start this one off and nothing came to mind, so I'm afraid this will be a humor-free post. Maybe I'm jet-lagged from being on the East coast all last week.

As with all things related to DBCC, this topic has its share of misinformation. In this post I'll set the record straight by running through all the stages of CHECKDB in SQL Server 2000 and 2005. I'll need to split this up into seperate posts otherwise I'll be writing a book. I also introduce a whole raft of new terms which will also be subjects for future posts (my list is already getting pretty long!)

So the very first thing it does is work out how to get the transactionally consistent view it requires (see CHECKDB Part 1) and then, if needed, either record the relevant LSN and switch to full-logging (for SQL Server 2000) or create a database snapshot (for SQL Server 2005).

Then it runs through the checks in the order shown below:

1. Primitive checks of critical system tables

This stage is in SQL Server 2000 and 2005. First of all, what are critical system tables? These are the system tables that hold Storage Engine metadata. Without these we'd have no idea where any data was stored in the database files or how to interpret records.

In SQL Server 2000, the critical system tables are:

  • sysindexes
  • sysobjects
  • syscolumns

These tables have to be checked first because we use the metadata they contain to access all the other tables and indexes in the database. These tables are freely queryable so poke about and see what's stored in there.

In SQL Server 2005, the metadata layer has been rewritten and the critical system tables are:

  • sysallocunits
  • syshobts
  • syshobtcolumns
  • sysrowsets
  • sysrowsetcolumns

More on allocation units, hobts, and rowsets later in the week - for now you can assume they serve the same function as the three critical system tables in SQL Server 2000. You can't see these system tables because they're 'hidden' - the parser won't allow them to be bound to in a query. Try running 'select * from sysallocunits' to see what I mean.

The primitive checks are designed to check that internal queries on the metadata tables won't throw errors. Each of the critical system tables has a clustered index. The primitive checks just check the leaf-level data pages of the clustered indexes. For every one of these pages, the following is done:

  • Read and latch the page (a latch is a lightweight internal version of a lock).  This makes sure that there aren't any IO problems with the page such as a torn-page or bad page checksum and ensures that we can put the page in the buffer pool correctly. This is the most common cause of failure of the primitive system table checks and results in error 8966, which in SQL Server 2000 could look something like:

          Server: Msg 8966, Level 16, State 1, Line 1
    Could not read and latch page (1:33245) with latch type SH. sysobjects failed.

  • Audit the page. This is a series of checks of the page structures which I'll cover in a seperate post. If these pass, the page looks like a SQL Server page of type its supposed to be.
  • Check the basic page linkage. Pages in each level of a clustered index are linked together in a doubly-linked list to allow range scans to work. At this stage we only check the left-to-right linkage to ensure the linked-to page actually exists.
  • Check the page linkage for loops. This is simple to do - have two pointers into the page linked-list with one advancing at every step and one advancing at every second step. If they ever point to the same thing before the faster-advancing pointer reaches the right-hand side of the leaf level then there's a loop. Its important that there are no linkage loops otherwise a range scan may turn into an infinite loop. I've never seen this occur in the field.

Any error found at this stage cannot be repaired so you must restore from a backup. This is because the repair would have to deallocate the page, effectively deleting metadata for a whole lot of tables and indexes. As people's databases get larger and more complex (thousands of tables and indexes), the percentage of pages that comprise these critical system tables rises and so the chance of a hardware problem corrupting one of these pages also rises - I see several of these a month on the forums. Without a backup, the only alternative is to try to export as much data as you can - not good.

If all the pages are ok then we know we've got solid enough metadata on which to base the next set of checks.

2. Allocation checks
(Part 2...)

3. Logical checks of critical system tables
4. Logical checks of all tables
(Part 3...)

5. Service Broker checks
6. Metadata cross-checks
7. Indexed view and XML index checks
(Part 4...)

Published Monday, June 19, 2006 2:09 AM by Paul Randal - MSFT

Comments

 

Jetstar said:

Why does CHECKDB not check the statistics?

Not long ago we had a table in a database with an index on it. If you read the table by opening a cursor (sp_cursoropen) you got an sqldump. CHECKDB said everything is OK. Microsoft Support explaint that there is an corrupted statistic on the accesed index. A DBCC DBReindex or Update Statistics will solve the problem. Wouldn't it be nice if CHECKDB or any other check syntax, like DBCC CheckStatistcs, would recognize the error?
July 5, 2006 9:06 AM
 

Paul Randal - MSFT said:

Hi Jetstar,

This has come up several times and to be honest, because the stats blobs aren't vital structures for the correct operation of the database it's never been high enough priority to make the list of additions to CHECKDB.

I do understand the issue and the frustration it causes when a situation such as yours occurs. Please take the time to file a bug at connect.microsoft.com and I'll see if I can bump its priority for Katmai.

Thanks
July 7, 2006 1:34 AM
 

Julian Kuiters - Inside DBCC CHECKDB... How does it work? said:

October 30, 2006 8:38 PM
 

SQL Server Storage Engine : CHECKDB (Part 8): Can repair fix everything? said:

February 7, 2007 4:54 PM
 

SQL Server Storage Engine & Tools (SSQA.net) said:

Long ago, not long ago.... No doubt that many of you might have gone through the error above within your

August 14, 2007 5:37 AM
 

Glorf IT - Bedenkliches aus dem IT-Alltag » SQL Server Storage Engine said:

February 18, 2009 1:27 PM
 

Blogs from Suhas said:

In this post, we would have a look at the differences between DBCC CHECKTABLE and DBCC CHECKDB. We would

May 14, 2009 8:56 AM
 

SQL Server Storage Engine CHECKDB Part 2 What does CHECKDB really | Toe Nail Fungus said:

June 9, 2009 10:01 PM
 

SQL Server Storage Engine CHECKDB Part 2 What does CHECKDB really | storage bench said:

June 14, 2009 6:18 AM
 

SQL Server Storage Engine CHECKDB Part 2 What does CHECKDB really | patio set said:

June 17, 2009 10:03 PM
 

SQL Server Storage Engine CHECKDB Part 2 What does CHECKDB really | adirondack chairs said:

June 19, 2009 5:31 AM
Anonymous comments are disabled

About Paul Randal - MSFT

Paul started in the industry in 1994 working for DEC on the VMS file system and check/repair tools. In 1999 he moved to Microsoft to work on SQL Server, specifically on DBCC. For SQL Server 2000, he concentrated on index fragmentation (writing DBCC INDEXDEFRAG and DBCC SHOWCONTIG) plus various algorithms in DBCC CHECKDB. During SQL Server 2005 development Paul was the lead developer/manager of one the core dev teams in the Storage Engine, responsible for data access and storage (DBCC, allocation, indexes & heaps, pages/records, text/LOB storage, snapshot isolation, etc). He also spent several years rewriting DBCC CHECKDB and repair. For SQL Server 2008, Paul managed the Program Management team for the core Storage Engine to become more focused on customer/partner engagement and feature set definition. In 2007, after 8.5 years on the SQL Server team, Paul left Microsoft to join his wife, Kimberly Tripp, running SQLskills.com and pursuing his passion for presenting and consulting. Paul regularly presents at conferences and user groups around the world on high-availability, disaster recovery and Storage Engine internals. His popular blog is at http://www.sqlskills.com/blogs/paul/.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker