SharePoint Disk Allocation and Disk I/O
Had a good conversation with a large customer this morning at TechED SEA. They said we have questions about capacity planning, not about users and server and server performance, but about disks, sizing and that kind of thing.
<updated Dec 10>
Great performance and storage paper was recently posted with some recent best practices and recommendations: Performance recommendations for storage planning and monitoring.
</updated>
OS - The OS drive is under rated. A lot of people start out thinking that this drive doesn't matter. If you start with a 4GB partition for the OS drive you will regret it. Even if you start with 1GB or 2GB of RAM, you will quickly run out of disk. The IIS logs are stored here by default, the .NET cache and other cache is stored here as well as your page file not considering other OS and system files. These days I say 20GB is a good start. You want to have room to grow, you should also consider putting your IIS logs on another drive even at this size. The other thing is your index file (the large edb file) is by default going to sit on this drive, so unless you've got some experience it's better to find out you've been ok. Yes, you'll want to mirror this drive and speed is a factor. I wouldn't consider it crazy to have a 40GB drive even. These days more and more I'm seeing 8GB or 16GB of RAM on WFEs and Query servers, they are putting as much of the index in memory as possible and with caching, page file, etc... this disk and space is valuable. Know what happens when this disk runs out of space? Well you're down. It's a funny kind of down, pages might render partially and uploads will act wierd until you start getting 500 errors. You should monitor this drive and start on the bigger side. Disks are cheap, get 2 pretty FAST, quality disks and mirror them. These disks should almost always be local DAS disks. I can't think of any reason why not to.
APP - for WFE you may simply just put your IIS logs on this drive, but if it's also the query server you should put your Index on this drive. Better to do it before you start indexing. Performance on this disk is important. You can SAN attach this drive if indexing and query are not an issue for you. I know MMS has gone from SAN attached back to DAS to get better disk performance. On the index server speed is important and size is important. 30% of indexable content (which in reality may be close to 5-10%) is a conservative estimate for disk on the Index server. Query servers do have the possibility for master merge when more than X% of content has changed. I've put out some numbers on what MS IT has had in this space a couple of times. Having double the space on the query servers (reality check of 10-30%) for what the index server has is what is documented. I rarely find someone who hasn't over architected the disk for the index drive. Most people are conservative here. If you're indexing TB of content, having the ability to grow or add disk becomes very attractive.
Databases - first off make sure none of the database drives or transaction log drives are being scanned by antirvirus. That's one of the first things that will drive you nuts when trying to figure out performance issues. Those files are active, and antivirus should leave those files alone. There are 3 categories in my mind of databases, those that have HIGH disk I/O needs and those that you can be more liberal with and save money with.
Speed - Disk I/O is super important on your Search db. That one first. Next the Temp database, SSP database and the Config db.
Safety - Your master, Search, SSP, and config databases should be safe. Meaning you want these things to be very tollerant. If one of these becomes unavailable you're having issues across the farm. You may consider backing these databases up more often than the others.
Storage - there are simply some databases that are large and Disk I/O isn't so high. These are your classic content databases. This is where the BULK of your storage is. More and more I'm hearing people put the content databases on RAID 5 volumes. Classic SQL tells people to put all these databases on 0+1 which is conservative and fine.
What about storage? (These are estimates and will vary, tell me when I'm wrong.)
Temp db/TLogs - 10% of planned content (Tran logs will vary based on backup frequency, and logging type, so this will vary)
Master, Model, Config - fixed disk space of 10-20GB
Content - I go with 200 or 300GB Volumes on the high side (RAID 0+1)
Dump/Backup - 300 or 400 GB Volumes matched up with the content volumes (RAID 5 or less), You could go 1 to 1 or 2 to 1 depending on backup dependencies, speeds, etc... You will get faster backup with more threads to more volumes on more spindles.
If you're doing snapshots, clones or DPM this will obviously be different. In a tape backup you want to make sure you have the same or more disk for backup unless you're using something like SQL litespeed or in the new world of SQL 2008 which no one should be using, yet as I write this. In the future this will make more sense.
I do recommend putting the config on a different disk than the Search DB, I'd even put the search DB on it's own spindles and volume. That db really has high disk IO... have I said that :)
Putting your transaction logs (LDFs) separate from any of the databases (MDFs) is a best practice. You do want to optimize for your transaction logs. Your rate of change will show through in the size of your T logs.
Don't forget across all these disks and databases you *will* see fragmentation over time. Don't forget that this is a regular part of maintance. Fragmentation can hurt your performance, and defragmenting your disks can optimize your performance and make things run a ton more smooth.
MS IT fact: In order of impact these are the 3 things that impact disk I/O:
1. Backup (DPM in their pilot is reducing this hit and the length of the perf hit)
2. Indexing (This can be throttled by number of threads and you can assign times for this to run) MS IT actually sets up "target" servers to reduce the hit on the WFE perf used by users, it's basically servers that sit out of load balancing. The crawler can easily generate 50% or more load than an actively used portal or collaboration site.
3. User Load
Other References:
I've previously posted some SharePoint relevant performance counters. You should look at Disk Queue length to see if your disk speeds are working for you. Anything past 1 is bad. During backups and indexing you'll likely see your highest queue lengths on your disks. If you're getting close 1 you're likely going to soon see queueing. 0 is the best :)
The capacity planning articles on TechNet have some guidance on planning your disks.
I recommend you read Bill Baer's post on stress testing, he's got some great references on Disk I/O including tools and some reference data. For example here are 2 Disk IO tools and some sample data from MS IT.
SQLIO.EXE Disk Subsystem Benchmark Tool
The I/O system is important to the performance of SQL Server. When configuring a new server for SQL Server or when adding or modifying the disk configuration of an existing system, it is good practice to determine the capacity of the I/O subsystem prior to deploying SQL Server. SQLIO.EXE can be used to determine the I/O capacity of a given configuration. An example of what we may see on a typical day during peak usage on a x64 A/P cluster hosting 100+ databases and supports 8 unique server farms, each with a local Shared Service Provider is:
IOs/sec: 7808.43
MBs/sec: 15.25
sqlio v1.5.SG
using system counter for latency timings, 3579545 counts per second
parameter file used: param.txt
file c:\testfile.dat with 2 threads (0-1) using mask 0x0 (0)
2 threads writing for 360 secs to file c:\testfile.dat
using 8KB sequential IOs
enabling multiple I/Os per thread with 8 outstanding
using specified size: 100 MB for file: c:\testfile.dat
initialization done
CUMULATIVE DATA:
throughput metrics:
IOs/sec: 3434.19
MBs/sec: 26.82
latency metrics:
Min_Latency(ms): 0
Avg_Latency(ms): 4
Max_Latency(ms): 282
SQLIOsim
SQLIOsim simulates the I/O patterns of Microsoft SQL Server 2005, of SQL Server 2000, and of SQL Server 7.0. The I/O patterns of these versions of SQL Server resemble one another. You can use SQLIOsim to simulate read, write, checkpoint, backup, sort, and read-ahead activities for Microsoft SQL Server 2005 (see also SQLIO.EXE Disk Subsystem Benchmark Tool).