Every once in a while, a process gets too complex for humans to manage consistantly. For example, when Boeing’s B-17 (the “flying fortress”) was first being evaluated, it crashed during initial tests due to pilot error. The consensus became that it “was too much airplane to fly.” Fortunately (for Boeing), someone came up with the idea of using checklists for each stage of flight. This ensured that the correct steps were taken at the appropriate time. Ultimately, this saved the program and subsequently the B-17 flew 1.8 million miles without an accident.
The SQL Server checklist is a work in process. If you have any additions or suggestions, send it to me and I'll trying to include it. For example, there’s not much here about setting up a cluster because I don’t have a lot of experience with clustering. Most of these items should be considered suggestions, rather than mandates. There may be valid reasons to skip an item or implement alternatives; the point is that you should at least consider each entry.
Initial SAN Configuration
- Configure Storage System to RAID 10.
- Use DISKPART to specify disk offset to ensure volume alignment. (You need Windows 2003 SP1 or later). You can find alignment instructions here. This step should not be necessary with volumns created by Windows Server 2008.
- Change Disk Allocation Unit to 64k when formatting drive.
- If using EMC Clariion (or such): Balance Disks between service controllers.
SQL Server Setup
- Before creating other databases, create 1 tempdb file for each CPU (or core) and allocate these files across all data LUNs. Size tempdb to largest table, plus 10% or better, then turn Autogrow off.
- Run Surface Area Configuration Tool to turn off stuff not needed.
- Setup SQL Server’s service account as domain account, ideally with different accounts for different services.
- Get a CA signed certificate for SQL Server, rather than a self-signed certificate.
- Change and block default ports (1433, 1434).
- Isolate a minimum set of data onto Primary filegroup to get database available ASAP on recovery; everything else goes to one or more other filegroups.
- Create partitions with first and last partition always empty (each in distinct filegroup). Always create a new, empty partition before swapping in/out data or populating the table.
- Put static or VERY slowly changing tables into their own filegroup; mark it read-only. If it needs to be updated, you can flip the state back to read-write, change it, then flip it back. This filegroup does not need to be backed up regularly. If this read-only data can be relegated to a single database, then it can be accessed by multiple database servers.
Post Installation
- Ensure multiple HBAs are teamed, not failover
- Enable SAN Write Caching (if battery-backed controller). That is, ensure that database is not waiting for disk commits, but only cache commits. If possible allocate almost all cache for writes (say, 80% writes, 20% reads).
- If using NUMA hardware with dual core CPUs: Set Max Degree of Parallelism to 4.
- If using NUMA hardware with quad core CPUs: Set Max Degree of Parallelism to 8.
- Enable jumbo frames on the network. Change Network Packet Size in SQL Server to 8k to support jumbo frames.
- Enable teamed NICs on the client network.
- Set “Queue Depth” (or “Execution Throttle”) on the HBA to 64 or 128, or so. Ensure that other machines on the same SAN are also adjusted or this machine may dominate the SAN.
- Turn on CheckSums in all databases.
- Enable Read Committed Snapshots.
- Enable Snapshot Isolation.
- EXEC sp_configure ‘SMO and DMO XPs’, 0 to turn off these object libraries (this is the default on new 2005 installs).
- Give engine service account "Lock Pages in Memory" permission in group policy.
- Give engine service account "Manage Volume Name" permission in group policy.
- If this is a very large database, consider turning on TF 1224 to control lock escalation.
- Manually set speed setting for each NIC