Managing Schema Changes (Part 1)

Managing Schema Changes (Part 1)

  • Comments 5

Question: What are best practices for managing schema changes in SQL Server 2005?  What are the fastest, least intrusive techniques for high availability?

 

Answer: A common requirement as business requirements evolve over time is managing schema changes.  While SQL Server 2005 supports the changing of schema (e.g. adding columns and constraints, changing column data types and properties, and dropping columns and constraints), in Managing Schema Changes (Part 1) we will focus on performance and concurrency issues that should be considered for large tables in typical scenarios.  In Managing Schema Changes (Part 2) we will cover additional best practice techniques to maximize performance and concurrency.

 

Let’s assume we have a Billion row table which is quite common these days.   Schema changes have to be completed in the fastest, least intrusive manner with high availability. 

Some schema changes include implicit reads and writes.  These operations, extending the duration of the schema change transaction, can reduce concurrency on large tables by taking shared locks for reads and exclusive locks for writes.  For example, adding a CHECK constraint requires a scan to enforce the CHECK constraint.   Implicit updates are performed for schema changes involving NOT NULLS, base data type changes or column length changes.  These implicit reads and writes can be seen either via Profiler statement level tracing or by using SET STATISTICS IO ON. 

 

To trace these extra operations in Profiler, select the following

  1. TSQL event, select SQL:StmtStarting and SQL:StmtCompleted.
  2. Stored Procedure event, select SP:StmtStarting and SP:StmtCompleted

 

Adding columns – alter table MyTable add NewColumn……

#

ColumnProperty

Column Constraints

Profiler Observations (StmtStarting & StmtCompleted)

Performance & Concurrency impact

1

NULL

N/A

 

alter table MyTable

add MySSN numeric(9,0) NULL

 

Fastest.

Best performance and concurrency.

Metadata only change.

2

NULL

DEFAULT

 

alter table MyTable

add MyDate2 datetime NULL

default getdate()

 

Fastest.

Best performance and concurrency

Metadata only change.

3

NULL

CHECK

 

alter table MyTable

add MyPct3 numeric(4,1) NULL constraint MyChk3

check (MyPct3 between 0 and 100)

Samples stats, then reads entire table to enforce check constraint.

Faster than NOT NULL, but does do reads for check constraint.

In large tables, the read activity for the check constraint could cause blocking for writes.

4

NULL

CHECK, DEFAULT

 

alter table MyTable

add MyInt4 int NULL default 0

constraint myChk4

check (MyInt4 between 0 and 100)

Samples stats, then reads entire table to enforce check constraint.

Faster than NOT NULL, but does do reads for check constraint.

In large tables, the read activity for the check constraint could cause block writes.

5

NULL

DEFAULT

WITH VALUES

 

alter table MyTable

add MyInt5 int NULL default 0

with values

 

Updates entire table with default value

 

UPDATE [Northwind].[dbo].[MyTable]

SET [MyInt5] = DEFAULT

Slower than NULL & DEFAULT (see #2) because the WITH VALUES clause does an update of all rows in existing table.

In large tables, the batch update could cause concurrency issues.

6

NULL

CHECK, DEFAULT

WITH VALUES

 

alter table MyTable

add MyInt6 int NULL default 0

with values

constraint myChk6

check (MyInt6 between 0 and 100)

Updates entire table for default value, samples stats, then reads entire table to enforce check constraint.

 

UPDATE [Northwind].[dbo].[MyTable]

SET [MyInt6] = DEFAULT

Slowest because WITH VALUES clause requires updates with default values & reads for check constraint.

There could be significant concurrency issues in large tables due to the batch update for the default value and read activity for check constraint.

7

NOT NULL

DEFAULT

 

alter table MyTable

add MyDate7 datetime NOT NULL

constraint myDft7

default getdate()

NOT NULL requires update of entire table for default value

 

UPDATE [Northwind].[dbo].[MyTable]

SET [MyDate7] = DEFAULT

Slower than NULL (see #2) because NOT NULL requires updates for default values.  Possible concurrency issues with large tables since batch update holds exclusive locks.

8

NOT NULL

CHECK, DEFAULT

 

alter table MyTable

add MyInt8 int

NOT NULL

constraint myDft8

default 0

constraint myChk8

check (MyInt8 between 0 and 100)

Updates entire table for default value, samples stats, then reads entire table to enforce check constraint.

 

UPDATE [Northwind].[dbo].[MyTable]

SET [MyInt8] = DEFAULT

Slowest due to updates for default values & reads for check constraint

There could be significant concurrency issues in large tables due to the batch update for the default value and read activity for check constraint.

 

The fastest way to add a new column to a large table (with or without the DEFAULT constraint) is to use the NULL property (see example #1 and #2 above).  Although there is a DEFAULT constraint in example #2, MyDate will contain NULLs for all rows. 

 

Including a CHECK constraint on a large table will cause a read of the entire table to enforce the CHECK constraint (see examples #3, #4, #6 and #8).   Note that while examples #3 (MyPct3) and #4 (MyInt4) contain NULL values for the newly added columns, a NULL value will not fail the CHECK constraint.  NULL just simply means that we don’t know the value.  However if the value IS known, the CHECK constraint forces the known value to comply with the check requirement.

 

Batch updates occur with examples #5 through #8.  If the WITH VALUES clause is added to NULL and DEFAULT (see example #5 and #6), all rows will be updated with the DEFAULT value.  Adding a column with NOT NULL property (see examples #7 and #8) to an existing table requires a DEFAULT constraint.  The NOT NULL / DEFAULT combination causes a batch update to occur.  Net, batch updates hold exclusive locks and reduce concurrency, falling short of our performance and availability goals.

 

In addition to the writes, NULL example #6  and NOT NULL #8 add CHECK constraints that require reads to enforce the CHECK, making these the worst of the bunch from a concurrency, performance and availability point of view. 

 

Changing column datatypes, default and check constraints, NULL properties

You can change a column’s base datatype, or change its length (for numeric includes scale and precision) using the alter table alter column statement.  In these cases, performance and concurrency is affected by the accompanying batch update.  If a NULL property is changed to NOT NULL, a batch update also occurs.  The exception is when changing a NOT NULL property to NULL.  This is a fast metadata only operation. 

 

Changing column property, type & length, adding column constraint – alter table MyTable alter column

Change

 

Profiler Observations

 

Property

Change

 

NOT NULL to NULL

 

alter table MyTable

alter column MyDate7 datetime NULL

 

Fast, metadata only change

Property

Change

NULL to NOT NULL

 

alter table MyTable

alter column MyDate7 datetime NOT NULL

Performs batch update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MyDate7] = [MyDate7]

In a large table, concurrency issues could occur due to the accompanying batch update.

Add Default

ALTER TABLE MyTable

ADD CONSTRAINT MyDft

DEFAULT 0 FOR MySSN

 

Fast, metadata only change

Drop Default

ALTER TABLE MyTable

DROP CONSTRAINT MyDft

 

Fast, metadata only change

Add Check

ALTER TABLE MyTable

ADD CONSTRAINT MyDtChk

Check (MyDate2 <= getdate())

Samples stats, then reads entire table to enforce check constraint.

In a large table, concurrency issues could occur due to the accompanying reads for the check constraint.

Drop Check

ALTER TABLE MyTable

DROP CONSTRAINT MyDtChk

 

Fast, schema only change

Increase

Length

NUMERIC(9) to (11)

 

ALTER TABLE MyTable

ALTER COLUMN MySSN numeric(11,0)

Performs update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]

In a large table, concurrency issues could occur due to the accompanying batch update

Data type

Change

NUMERIC to INT

 

ALTER TABLE MyTable

ALTER COLUMN MySSN int

Performs update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]

In a large table, concurrency issues could occur due to the accompanying batch update

Data type

Change

INT to VARCHAR(9)

 

ALTER TABLE MyTable ALTER COLUMN MySSN varchar(9)

Performs update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]

In a large table, concurrency issues could occur due to the accompanying batch update

Increase Length

Varchar(9) to Varchar(15)

 

ALTER TABLE MyTable ALTER COLUMN MySSN varchar(15)

 

Fast, metadata only change

Decrease Length

Varchar(15) to Varchar(10)

 

ALTER TABLE MyTable ALTER COLUMN MySSN varchar(10)

Performs update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]

In a large table, concurrency issues could occur due to the accompanying batch update

Data type change

Varchar(15) to Char(15)

 

ALTER TABLE MyTable ALTER COLUMN MySSN char(15)

Performs update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]

In a large table, concurrency issues could occur due to the accompanying batch update

Increase Length

Char(15) to char(20)

 

ALTER TABLE MyTable

ALTER COLUMN MySSN char(20)

Performs update

 

UPDATE [Northwind].[dbo].[MyTable] SET [MySSN] = [MySSN]

In a large table, concurrency issues could occur due to the accompanying batch update

 

Dropping columns

Dropping columns is a schema only operation.  There are no concurrency issues unless you move the object to reclaim space from the dropped column.

 

Unique and Primary Key Constraints

Adding Unique and Primary Key constraints involves reads, sorts, and writes.  By default UNIQUE constraints add nonclustered indexes while PRIMARY KEY adds a clustered index.  Obvious performance and concurrency issues would occur when you add a PRIMARY KEY (and default CLUSTERED INDEX) to a large table that already has UNIQUE (and nonclustered index) constraints.  This is because adding a Primary Key (and clustered index) would rebuild the nonclustered indexes replacing RIDs (row IDs) with Primary Keys.

 

Alter table, create and alter index includes ONLINE options for managing unique and primary key constraints and indexes.  Where possible, for best performance, highest concurrency and availability, ONLINE options should be used for managing indexes.  Otherwise, it would wise to postpone these operations to batch maintenance windows.

 

Conclusions and best practices:

We can effectively manage schema changes in SQL Server 2005 by understanding its behavior.  This insight helps us avoid the undesirable associated activities occurring with some schema changes such as unintended reads and writes, enabling us to utilize the fastest, least intrusive techniques for high concurrency and availability.

 

The fastest schema changes occur when:

  1. Adding a column with NULL property
  2. Adding a column with NULL property and DEFAULT
  3. Changing NOT NULL to NULL property
  4. Adding DEFAULT constraint
  5. Dropping CHECK or DEFAULT constraint

 

The slowest schema changes have accompanying read and write operations such as:

  1. CHECK constraints require a read of the entire table to enforce the CHECK.
  2. Batch updates occur when adding a NULL column in combination with DEFAULT and WITH VALUES clauses
  3. Batch updates occur when adding NOT NULL column (to provide Default value)
  4. Batch updates occur when changing NULL to NOT NULL property
  5. Batch updates occur when column datatypes are changed or when lengths change.  The only exception is increasing the length of varchar.

 

In conclusion, a schema change best practice for performance and concurrency of large tables includes adding columns with NULL properties.  If it’s important to disallow NULL values, the application would have to do this.  In “Managing Schema Changes (Part 2)”, we’ll see how concurrency and performance can be maximized by de-coupling the batch update of a newly added column using the NULL property.  The batch update is then replaced with a piecemeal update for high concurrency and performance.  This technique can also be used to change the type or length of columns without incurring the concurrency crippling batch update. 

 

Tom Davidson

SQL Server Customer Advisory Team

Leave a Comment
  • Please add 5 and 5 and type the answer here:
  • Post
Page 1 of 1 (5 items)