Question: What are best practices for managing schema changes in SQL Server 2005? What are the fastest, least intrusive techniques for high availability?
Answer: A common requirement as business requirements evolve over time is managing schema changes. While SQL Server 2005 supports the changing of schema (e.g. adding columns and constraints, changing column data types and properties, and dropping columns and constraints), in Managing Schema Changes (Part 1) we will focus on performance and concurrency issues that should be considered for large tables in typical scenarios. In Managing Schema Changes (Part 2) we will cover additional best practice techniques to maximize performance and concurrency.
Let’s assume we have a Billion row table which is quite common these days. Schema changes have to be completed in the fastest, least intrusive manner with high availability.
Some schema changes include implicit reads and writes. These operations, extending the duration of the schema change transaction, can reduce concurrency on large tables by taking shared locks for reads and exclusive locks for writes. For example, adding a CHECK constraint requires a scan to enforce the CHECK constraint. Implicit updates are performed for schema changes involving NOT NULLS, base data type changes or column length changes. These implicit reads and writes can be seen either via Profiler statement level tracing or by using SET STATISTICS IO ON.
To trace these extra operations in Profiler, select the following
Adding columns – alter table MyTable add NewColumn……
#
ColumnProperty
Column Constraints
Profiler Observations (StmtStarting & StmtCompleted)
Performance & Concurrency impact
1
NULL
N/A
alter table MyTable
add MySSN numeric(9,0) NULL
Fastest.
Best performance and concurrency.
Metadata only change.
2
DEFAULT
add MyDate2 datetime NULL
default getdate()
Best performance and concurrency
3
CHECK
add MyPct3 numeric(4,1) NULL constraint MyChk3
check (MyPct3 between 0 and 100)
Samples stats, then reads entire table to enforce check constraint.
Faster than NOT NULL, but does do reads for check constraint.
In large tables, the read activity for the check constraint could cause blocking for writes.
4
CHECK, DEFAULT
add MyInt4 int NULL default 0
constraint myChk4
check (MyInt4 between 0 and 100)
In large tables, the read activity for the check constraint could cause block writes.
5
WITH VALUES
add MyInt5 int NULL default 0
with values
Updates entire table with default value
UPDATE [Northwind].[dbo].[MyTable]
SET [MyInt5] = DEFAULT
Slower than NULL & DEFAULT (see #2) because the WITH VALUES clause does an update of all rows in existing table.
In large tables, the batch update could cause concurrency issues.
6
add MyInt6 int NULL default 0
constraint myChk6
check (MyInt6 between 0 and 100)
Updates entire table for default value, samples stats, then reads entire table to enforce check constraint.
SET [MyInt6] = DEFAULT
Slowest because WITH VALUES clause requires updates with default values & reads for check constraint.
There could be significant concurrency issues in large tables due to the batch update for the default value and read activity for check constraint.
7
NOT NULL
add MyDate7 datetime NOT NULL
constraint myDft7
NOT NULL requires update of entire table for default value
SET [MyDate7] = DEFAULT
Slower than NULL (see #2) because NOT NULL requires updates for default values. Possible concurrency issues with large tables since batch update holds exclusive locks.
8
add MyInt8 int
constraint myDft8
default 0
constraint myChk8
check (MyInt8 between 0 and 100)
SET [MyInt8] = DEFAULT
Slowest due to updates for default values & reads for check constraint
The fastest way to add a new column to a large table (with or without the DEFAULT constraint) is to use the NULL property (see example #1 and #2 above). Although there is a DEFAULT constraint in example #2, MyDate will contain NULLs for all rows.
Including a CHECK constraint on a large table will cause a read of the entire table to enforce the CHECK constraint (see examples #3, #4, #6 and #8). Note that while examples #3 (MyPct3) and #4 (MyInt4) contain NULL values for the newly added columns, a NULL value will not fail the CHECK constraint. NULL just simply means that we don’t know the value. However if the value IS known, the CHECK constraint forces the known value to comply with the check requirement.
Batch updates occur with examples #5 through #8. If the WITH VALUES clause is added to NULL and DEFAULT (see example #5 and #6), all rows will be updated with the DEFAULT value. Adding a column with NOT NULL property (see examples #7 and #8) to an existing table requires a DEFAULT constraint. The NOT NULL / DEFAULT combination causes a batch update to occur. Net, batch updates hold exclusive locks and reduce concurrency, falling short of our performance and availability goals.
In addition to the writes, NULL example #6 and NOT NULL #8 add CHECK constraints that require reads to enforce the CHECK, making these the worst of the bunch from a concurrency, performance and availability point of view.
Changing column datatypes, default and check constraints, NULL properties
You can change a column’s base datatype, or change its length (for numeric includes scale and precision) using the alter table alter column statement. In these cases, performance and concurrency is affected by the accompanying batch update. If a NULL property is changed to NOT NULL, a batch update also occurs. The exception is when changing a NOT NULL property to NULL. This is a fast metadata only operation.
Changing column property, type & length, adding column constraint – alter table MyTable alter column
Change
Profiler Observations
Property
NOT NULL to NULL
alter column MyDate7 datetime NULL
Fast, metadata only change
NULL to NOT NULL
alter column MyDate7 datetime NOT NULL
Performs batch update
UPDATE [Northwind].[dbo].[MyTable] SET [MyDate7] = [MyDate7]
In a large table, concurrency issues could occur due to the accompanying batch update.
Add Default
ALTER TABLE MyTable
ADD CONSTRAINT MyDft
DEFAULT 0 FOR MySSN
Drop Default
DROP CONSTRAINT MyDft
Add Check
ADD CONSTRAINT MyDtChk
Check (MyDate2 <= getdate())
In a large table, concurrency issues could occur due to the accompanying reads for the check constraint.
Drop Check
DROP CONSTRAINT MyDtChk
Fast, schema only change
Increase
Length
NUMERIC(9) to (11)
ALTER COLUMN MySSN numeric(11,0)
Performs update
UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]
In a large table, concurrency issues could occur due to the accompanying batch update
Data type
NUMERIC to INT
ALTER COLUMN MySSN int
INT to VARCHAR(9)
ALTER TABLE MyTable ALTER COLUMN MySSN varchar(9)
Increase Length
Varchar(9) to Varchar(15)
ALTER TABLE MyTable ALTER COLUMN MySSN varchar(15)
Decrease Length
Varchar(15) to Varchar(10)
ALTER TABLE MyTable ALTER COLUMN MySSN varchar(10)
Data type change
Varchar(15) to Char(15)
ALTER TABLE MyTable ALTER COLUMN MySSN char(15)
Char(15) to char(20)
ALTER COLUMN MySSN char(20)
Dropping columns
Dropping columns is a schema only operation. There are no concurrency issues unless you move the object to reclaim space from the dropped column.
Unique and Primary Key Constraints
Adding Unique and Primary Key constraints involves reads, sorts, and writes. By default UNIQUE constraints add nonclustered indexes while PRIMARY KEY adds a clustered index. Obvious performance and concurrency issues would occur when you add a PRIMARY KEY (and default CLUSTERED INDEX) to a large table that already has UNIQUE (and nonclustered index) constraints. This is because adding a Primary Key (and clustered index) would rebuild the nonclustered indexes replacing RIDs (row IDs) with Primary Keys.
Alter table, create and alter index includes ONLINE options for managing unique and primary key constraints and indexes. Where possible, for best performance, highest concurrency and availability, ONLINE options should be used for managing indexes. Otherwise, it would wise to postpone these operations to batch maintenance windows.
Conclusions and best practices:
We can effectively manage schema changes in SQL Server 2005 by understanding its behavior. This insight helps us avoid the undesirable associated activities occurring with some schema changes such as unintended reads and writes, enabling us to utilize the fastest, least intrusive techniques for high concurrency and availability.
The fastest schema changes occur when:
The slowest schema changes have accompanying read and write operations such as:
In conclusion, a schema change best practice for performance and concurrency of large tables includes adding columns with NULL properties. If it’s important to disallow NULL values, the application would have to do this. In “Managing Schema Changes (Part 2)”, we’ll see how concurrency and performance can be maximized by de-coupling the batch update of a newly added column using the NULL property. The batch update is then replaced with a piecemeal update for high concurrency and performance. This technique can also be used to change the type or length of columns without incurring the concurrency crippling batch update.
Tom Davidson
SQL Server Customer Advisory Team
マイクロソフトの植田です。 今回はテーブル・スキーマの変更に関する話題をご紹介したいと思います。 http://blogs.msdn.com/sqlcat/archive/2006/03/01/541550.aspx
PingBack from http://paidsurveyshub.info/story.php?id=75074