Hi All,

Our Premier Field Engineering team has provided us with some detailed technical information related to SQL Server Upgrades and SQL 2008 Performance Tuning.  I thought that this information may be helpful, so decided to make this information more accessible. 

Here is a link to several scripts, intended for SQL 2005/2008- These are provided as is, where as, with no warranty given or implied. Please be sure to test these in your test environments before using on a production system.

Upgrade related information:

SQL 2008 Upgrade Technical Reference

http://www.microsoft.com/downloads/details.aspx?familyid=66D3E6F5-6902-4FDD-AF75-9975AEA5BEA7&displaylang=en

SQL Server 2005 Upgrade Technical Reference Guide

http://www.microsoft.com/downloads/details.aspx?FamilyID=3d5e96d9-0074-46c4-bd4f-c3eb2abf4b66&DisplayLang=en

SQL 2008 Technet Virtual Labs

http://technet.microsoft.com/en-us/cc164207.aspx

All Technet Virtual Labs

http://technet.microsoft.com/en-us/bb467605.aspx

SQL 2008 Failover Clustering White Paper.

http://download.microsoft.com/download/6/9/D/69D1FEA7-5B42-437A-B3BA-A4AD13E34EF6/SQLServer2008FailoverCluster.docx

SQL Server 2005 Failover Clustering White Paper

http://www.microsoft.com/downloads/details.aspx?FamilyID=818234dc-a17b-4f09-b282-c6830fead499&DisplayLang=en

SQL 2008 Demos

http://www.microsoft.com/sqlserver/2008/en/us/demos.aspx

SQL 2005 Demos

http://www.microsoft.com/sqlserver/2005/en/us/demos.aspx

SQL 2005 Hosted trial (free time to play with SQL 2005)

http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032293449&EventCategory=3&culture=en-US&CountryCode=US

Performance tools and related information:

SQL Server 2005 Performance Dashboard Reports ( http://www.microsoft.com/downloads/details.aspx?familyid=1d3a4a0d-7e0c-4730-8204-e419218c1efc&displaylang=en).   A quick synopsis of the Performance Dashboard can be found here: (http://blogs.msdn.com/psssql/archive/2007/03/30/sql-server-2005-performance-dashboard-reports.aspx).

A SQL Server 2005 Dynamic Management View Performance Data Warehouse - SQL DMVStats (http://www.codeplex.com/sqldmvstats)

SQLNexus - SQL trace analysis tool.  The tool has been officially moved to our CodePlex site as well ( http://www.codeplex.com/sqlnexus).

SQL Server 2005 Performance Statistics Script ( http://blogs.msdn.com/psssql/archive/2007/02/21/sql-server-2005-performance-statistics-script.aspx)

PAL (Performance Analysis of Logs) – Analyze perfmon logs captured with system monitor (perfmon)

http://www.codeplex.com/PAL

RML Utilities for Microsoft SQL Server information RML Utilities for SQL Server

(x86) -  http://www.microsoft.com/downloads/details.aspx?FamilyId=7EDFA95A-A32F-440F-A3A8-5160C8DBE926&displaylang=en

(x64) -  http://www.microsoft.com/downloads/details.aspx?FamilyId=B60CDFA3-732E-4347-9C06-2D1F1F84C342&displaylang=en

Troubleshooting Performance Problems in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx)

SQL Server 2005 Script Repository (http://www.microsoft.com/technet/scriptcenter/scripts/sql/sql2005/default.mspx)

Performance Tuning Using Waits and Queues ( http://download.microsoft.com/download/4/7/a/47a548b9-249e-484c-abd7-29f31282b04d/Performance_Tuning_Waits_Queues.doc).

I would also highly recommend that you check out the following book:  SQL Server 2005 Practical Troubleshooting: The Database Engine (http://www.amazon.com/SQL-Server-2005-Practical-Troubleshooting/dp/0321447743/ref=sr_1_1/102-6286043-0154537?ie=UTF8&s=books&qid=1185328220&sr=8-1).

TEMPDB information:

TEMPDB Capacity Planning and Concurrency Considerations for Index Create and Rebuild (http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/tempdb_capacity_planning_index.mspx)

Working with Tempdb in SQL Server 2005 ( http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx)

Statistics and Query Compilation:

Batch Compilation, Recompilation, and Plan Caching Issues in SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx)

Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 (http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx)

Query Recompilation in SQL Server 2000 (http://msdn2.microsoft.com/en-us/library/aa902682.aspx)

Statistics Used by the Query Optimizer in Microsoft SQL Server 2000 (http://msdn2.microsoft.com/en-us/library/aa902688(SQL.80).aspx)

Miscellaneous information and Blogs:

SQL Server 2005 SP2 will introduce new messages to the error log related to timing activities (http://blogs.msdn.com/psssql/archive/2006/11/27/sql-server-2005-sp2-will-introduce-new-messages-to-the-error-log-related-to-timing-activities.aspx).

SQL and the Working Set Blog regarding OS memory trimming ( http://blogs.technet.com/askperf/archive/2007/05/18/sql-and-the-working-set.aspx)

Fragmentation Station Series (http://chadhoc.net/?tag=/fragmentation)

SQL Server 2005 Partitioned Tables and Indexes

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql90/html/sql2k5partition.asp

Online Indexing Operations in SQL Server 2005

http://www.microsoft.com/technet/prodtechnol/sql/2005/onlineindex.mspx

SQL Server 2005 NUMA Basics (http://blogs.msdn.com/psssql/archive/2008/01/24/how-it-works-sql-server-2005-numa-basics.aspx)

Below are some "rules of thumb" Performance Monitor counters that I typically use to troubleshoot issues from a Windows and SQL Server perspective.  These numbers are general best practice numbers and should not be construed as official thresholds from Microsoft.  There is also a link below to a webcast that you may find useful in determining which counters to pay attention, what those counters actually mean, and what values should be considered acceptable.

TechNet Webcast: SQL Server Performance Counter Guidance (Level 300) ( http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?EventID=1032357639&EventCategory=4&culture=en-US&CountryCode=US)

Memory\Available Mbytes <= rule of thumb is > 100 MB

Memory\Free Page Table Entries <= rule of thumb is > 5000

Memory\Pages Input/Sec <= rule of thumb is > 10

Paging File\%Usage <= rule of thumb < 70%

Paging File\%Usage Peak <= rule of thumb < 70%

Processor\%Privileged Time <= rule of thumb < 30% of Total %Processor Time

Processor\%Processor Time <= rule of thumb < 80%

System\Context Switches\Sec <= rule of thumb 1500 - 3000 is excellent, 3001- 5,999 good, > 6000 poor.  You may have to divide this number by the number of CPU's in the system.

System\Processor Queue Length <= rule of thumb < 4 per CPU excellent, 5 -11 good, > 12 poor

*Physical Disk\ Avg. Disk Queue Length

*Physical Disk\Avg. Disk sec\Read

*Physical Disk\Avg. Disk sec\Write

*Physical Disk\% Idle Time

SQLServer:Access Methods\Forwarded Records/sec <= rule of thumb < 10 for every 100 batches (or batch requests/sec)

SQLServer:Access Methods\Full Scans/sec <= rule of thumb (Index Searches/sec)/(Full Scans/sec) > 1000

SQLServer:Access Methods\Page Splits/sec <= rule of thumb < 20 for every 100 batches (or batch requests/sec)

SQLServer:Buffer Manager\Buffer cache hit ratio <= rule of thumb > 97%

SQLServer:Buffer Manager\Page life expectancy <= rule of thumb >= 300

SQLServer:Buffer Manager\Page reads/sec <= rule of thumb  < 90

SQLServer:Buffer Manager\Page writes/sec <= rule of thumb  < 90

SQLServer:Memory Manager\Memory Grants Pending <= rule of thumb 0

SQLServer:SQL Statistics\Batch Requests/sec

SQLServer:SQL Statistics\SQL Compilations/sec <= rule of thumb 10 - 20% of the number of batch requests\sec is a good number (or 0)

SQLServer:SQL Statistics\SQL Re-compilations/sec <= rule of thumb 10 - 20% of the number of batch requests\sec is a good number (or 0)

************************************

Logical Disk/Physical Disk

Both are to be captured and monitored due to virtualized disk environments.

\%idle Time

*             100% idle to 50% idle = Healthy

*             49% idle to 20% idle = Monitor

*             19% idle to 0% idle = Out of Spec

Note:  %idle Time can be a reasonable indicator of disk interface pressure.

On very large SAN's offering a LUN totaling (100's of drives) this counter can be 0% and still be okay, but if we see normal levels and then 0% at the precise time of reported problems, then treat it as a valid issue.  It is important to understand how many disks back up a particular array and how many of those are "bundled together" and presented as a LUN to Windows to make this counter more meaningful.

\%Avg. Disk Sec Read or Write

*             .001ms to .012ms  = Healthy

*             .013ms to .025 = Monitor

*             .026ms or greater = Out of Spec

\Avg. Disk Queue Length

*             2.5 per spindle with infrequent spikes up to around 20 (but please keep in mind that this value can become unreliable because of various SAN virtualization techniques).

Note:  If %idle is low and Avg Disk/Second Read or Write is in the "Healthy Range" => Suggestive of Interface or Infrastructure issues such as Queue Depth, Fiber switch oversubscription, etc. Something may be gating I/O in the fiber path, but the JBOD is responsive once the I/O request reaches it JBOD.

If %idle is high and Avg Disk/Second Read or Write is in the "Monitor or Out of Spec Range" =>  Suggestive of JBOD issues.  The storage I/O requests are quickly getting to the JBOD, but LUN/Cache performance or configuration is the most likely culprit.

\Disk Transfers/sec

Note:  It is necessary to know the disk speed and the type of I/O in order to determine the maximum throughput.  As a Rule of thumb measurement for database I/O, you can use something like 80 Transfers/sec (per 10,000 of RPM).  Using this as a baseline, a SAN LUN offering (for example) 800 disk Transfers\sec, could be said to be the rough performance equivalent of having ten 10,000 RPM hard drives of direct attached storage ... relatively speaking.  Caching on the SAN controllers effectively nullifies this for writes and can yield a significant performance gain in write transfers; but only as long as the cache is not overrun.

The above listed limits are per spindle, not an overall limit for a RAID set. Due to RAID set design, the limit or RAID set throughput difficult to extrapolate.  Below is a summary of the Disk I/O per second generated for each type of RAID configuration based on a given number of reads and writes per second.

  RAID 0: READS + WRITES = I/Os / sec

  RAID 1: READS + (2*WRITES) = I/Os / sec

  RAID 5: READS + (4*WRITES) = I/Os / sec

  RAID 0+1: READS + (2*WRITES) = I/Os per second.

Also please check out the following article NT Server and Disk Subsystem Performance (http://www.microsoft.com/technet/archive/winntas/maintain/ntserv.mspx?mfr=true).  The above information regarding disk counters has its basis in the information found in that article (the Detecting Disk I/O Bottlenecks section).  The article is slightly dated, but the concepts still apply for the most part today.

Another whitepaper that was used reference to the above information Disk Subsystem Performance Analysis for Windows (http://www.microsoft.com/whdc/device/storage/subsys_perf.mspx).  There is a section entitled: Tools for Storage Performance Analysis that has some additional Performance Monitor counters that may be good to watch as well.

And finally some SQL Server I/O related whitepapers:

Predeployment I/O Best Practices (http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/pdpliobp.mspx)

SQL Server I/O Basics (SQL Server 2000) (http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx)

SQL Server I/O Basics, Chapter 2 (SQL Server 2005) (http://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspx)

Nick.