You'll often hear that you should monitor the performance of SQL Server. You may read a little about performance monitoring, and you may turn on a few counters or perform a query against a dynamic management view that you know about. But, you may still wonder "Are these numbers good or bad?"

To determine if something is bad, you need to know what it looks like when it is good. Sounds obvious doesn't it? By creating a performance baseline, you can learn what your numbers are when your system is performing well. A performance baseline includes a single performance chart that is accompanied by an interpretation of the results, based on your environment.

To establish your performance baseline against Workforce Central, you'll need to find a time when the performance of your SQL Server environment is considered normal. For example, no users are complaining about slow responses, no backups or large jobs are running, and no "special" processing is taking place. Once you find that time, you'll need to collect a range of Windows Performance Monitor (perfmon) counters, information from dynamic management views, and maybe even a small SQL Server Profiler trace. Then, you can use the results of your collection as the starting point for subsequent performance collections. How do the new numbers compare to the baseline numbers, when everything was fine? Did one counter go up or down? Did several numbers change? Having something to compare the current numbers with can help you identify the source of new performance bottlenecks.

What Should You Monitor?

The actual counters, dynamic management views, or SQL Server Profiler trace events that you should collect are based on your system setup. But, the counters that we list below are a good place to start. If you capture these counters, you should have enough information to determine if you are having a performance issue—and if you are having an issue, which area is the source.

Note: Many of the counters that we list below list a threshold. These threshold numbers are not written in stone, and your actual values may be different. It is important to note that a standard threshold number is a starting point—if your value is a little higher or a little lower, the values that you see during your performance baseline collection become your new thresholds.

Monitoring the Disk Subsystem

There are several methods to monitor the disk subsystem. Since the disk subsystem is getting more and more complex each year, we recommend that database administrators monitor the following Performance Monitor counters to understand the latency of their disk I/O requests.

These two counters should provide you with enough information to determine if you have disk I/O bottlenecks. You should monitor these counters at both the host layer and the virtual server layer.

As with all performance monitoring, you should monitor on a continuous basis and then report daily or hourly, as your situation demands. This prevents nonpeak hours from watering down the peak-hour readings and distorting the performance measurements.

LogicalDisk(*): Avg. Disk Sec/Read.

This counter measures the average time, in seconds, of a read of data from the disk.

Thresholds:

· Less than 10 milliseconds (ms) = very good

· Between 10 and 20 ms = okay

· Between 20 and 50 ms = slow, needs attention

· Greater than 50 ms = serious I/O bottleneck

Note that Avg. Disk Sec/Read is a server-wide counter and therefore cannot be obtained through SQL Server 2005 dynamic management views.

LogicalDisk(*): Avg. Disk Sec/Write.

This counter measures the average time, in seconds, of a write of data to the disk.

Thresholds:

· Less than 10 ms = very good

· Between 10 and 20 ms = okay

· Between 20 and 50 ms = slow, needs attention

· Greater than 50 ms = serious I/O bottleneck

Note that Avg. Disk Sec/Write is a server-wide counter and therefore cannot be obtained through SQL Server 2005 dynamic management views.

Monitoring the CPU

We also recommend that database administrators monitor the following counters to understand the utilization of the host and virtual server CPU resources.

These three counters should provide you with enough information to determine if you have CPU bottlenecks. You should monitor these counters at both the host layer and the virtual server layer.

As with all performance monitoring, you should monitor on a continuous basis and then report less frequently—either daily or hourly. This gives you a more accurate picture of your performance.

System: Context Switching.

A high rate of context switching indicates resource queuing. In environments that have high rates of context switching, you should be careful to limit the applications and services that are placed on the boxes that are displaying the high context switch rates. The average value for this counter should remain below 1,000 per processor. Higher rates indicate that there may be CPU pressure, but this value should not be used as a single indicator of CPU pressure.

To use SQL Server dynamic management views instead of Performance Monitor for this counter, run the following query.

--Will work against databases in 80 compatibility mode
SELECT 'Context Switching by Scheduler'
SELECT cpu_id --ID of CPU if affinity mask in use (255 otherwise)
,is_online --Whether being used by SQL Server
,context_switches_count
,current_tasks_count
,current_workers_count
,active_workers_count
,pending_disk_io_count
,* 
FROM Sys.dm_os_schedulers

Processor(_Total): Privileged Time.

This counter measures the percentage of elapsed time that the process threads spent executing code in privileged mode. Average values above 10 percent indicate possible CPU pressure.

You should capture this counter for each individual processor.

Note that Privileged Time is a server-wide counter and therefore cannot be obtained through SQL Server 2005 dynamic management views.

Processor(_Total): Processor Time.

This counter measures the percentage of elapsed time that all process threads used the processor to execute instructions. Average values above 80 percent indicate possible CPU pressure.

You should capture this counter for each individual processor.

Note that Processor Time is a server-wide counter and therefore cannot be obtained through SQL Server 2005 dynamic management views.

To Be Continued

We will continue to talk about performance baseline and tracking memory usage tomorrow. Sign up for our RSS feed so you know when the blog post is published.