How do you know if you have something out of the ordinary occurring on your database application system if you don't have a good feel for what "ordinary" looks like?

Finding the Norm

It is difficult to determine if there is a performance bottleneck or other database problem when you don't have a good picture of what the performance of the system as a whole looks like. This can also lead to the common practice of spending time optimizing perceived bottlenecks that may or may not have a significant impact. You want to focus on the areas that will bring you the largest performance return. In this series of posts, I will cover how to monitor and evaluate a running system to get a better feel for actual performance characteristics.

The goal is to understand the following things:

  • The types of workloads that the system carries
  • What CPU usage usually looks like and why
  • If there are any errors or deadlocks
  • If specific queries are consuming too many resources, and how to go about fixing that
  • What data access patterns might look like
  • What scalability might look like (Will the system get slower with more data or users?)
  • What performance typically looks like so that you can identify out of the ordinary events

SQL Server provides all the tools that you need to answer these questions and to better understand how the queries being run and the results being returned affect the hardware. This series of posts will introduce you to one methodology for using these tools in combination to look at a current running system, with the goal of better understanding what the overall performance characteristics are.

Load Test Suites vs. Live System Monitoring

A quick aside: it is always best to have a load test suite to emulate actual system activity. This lets you monitor and investigate without affecting a running system. This is not always possible in the real world, so I will be applying a methodology that affects a running system in a minimal way.

Sanity Checks

Before you start looking at your database application in-depth, you want to verify that there are no external errors that might be affecting the system. First, take a look at the SQL Server and Windows logs. Ensure that there are no critical errors and that there have not been any errors over a significant amount of time. Verify that there are no hardware errors as well. You want to focus on the database layer only, during normal runs.

Deadlock Detection

You should also ensure that no deadlocks are occurring on the system, because they will have unpredictable effects and interfere with your picture of "ordinary." A best practice for database applications is to always trap, retry a number of times, and then throw an error if that doesn't work. This means that you should be getting errors when the application is running.

There are many different ways to determine if deadlocks are occurring. I personally do not like to have SQL Server Profiler traces running for a long time, even with a small number of events being captured. So, I am going to suggest that you turn on deadlock detection by using a trace flag. The following command will turn on the deadlock trace, which will then write the deadlock information to the error logs:

DBCC TRACEON (1204,-1)

(The -1 parameter indicates that the flag should be applied globally. See SQL Server Books Online for detailed trace flag information.) This will affect your system only when there is a deadlock and only by writing the data to the log. However, deadlocks are potentially serious enough that it is worth this impact to find them and fix them. Check your error logs for these errors during the day, for a day or two if needed. You do not want to go on to the next step without ensuring that there are no deadlocks or other errors in the system. If there are deadlocks, you need to resolve these before you go on (I'll provide more details in an upcoming post).

If you do not see any deadlocks at the current time, turn off the deadlock trace flag to set your system back to the norm:

DBCC TRACEOFF (1204,-1)

Peeling the Onion

Now that you looked at the anomalies that can affect your system, you can consider the system to be running at the norm, and you can start to understand the performance characteristics.

The best way to watch a system without affecting it (thus only watching yourself) is to "peel the onion," by looking at a thin slice and then drilling down to the next slice based on the previous one. Note that there are many ways to monitor database systems; in this exercise, I will look at one methodology.

Next time

The first part of the onion that you will peel will be system CPU. I'll discuss this in the next post in this series.


Sharon Bjeletich is a database architect and performance and scalability expert who has worked with SQL Server since version 4.2. She is a former senior program manager in the Microsoft SQL Server product group and a former technical officer for the World Health Organization. Currently, she is an independent consultant at SQL Scalability.