You can learn a lot about performance and potential problem areas by simply watching the CPU usage of your running system over time. As I outlined in the previous posts in this series, after you have validated that there are no errors or deadlocks occurring on the system (see Understanding the Performance Characteristics of Your System: Part 1) and that the server and database properties are set as expected for Teamcenter (see Understanding the Performance Characteristics of Your System: Part 2), you can start the next step of CPU usage monitoring.

Start by running Windows Performance Monitor. Capture only the % Processor Time counters, and gather data over your company's business periods. You don't want to be inundated with data, so select business-specific slices. For example, log over a full day, then over periods of time when you know extraordinary processes occur, such as end-of-month processes, or when you know a large model will be checked into the system. You are building a portfolio of the norm of your system, so that you can recognize out-of-the-norm occurrences. You want to end up with a panoramic view, without having to work through huge Performance Monitor logs. So, be surgical in your monitoring slices, and monitor only % Processor Time. You can dig into particular areas later.

To minimize the impact on the computer that is running SQL Server and Teamcenter, run and log Performance Monitor on a remote computer. See Windows Performance Monitor for more information about how to do this.

Next, review your Performance Monitor logs for patterns. If nothing other than a fairly smooth CPU usage line appears, as illustrated in the first image below, you probably have no outliers CPU-wise. This does not mean that there is no performance to be found or CPU usage to be improved: there is always some level of performance to be found. You are trying to understand the overall picture and its health.

Common CPU Patterns

Ideally, your CPU usage in Performance Monitor should be fairly smooth and even, and this usually indicates that there are no major performance outliers. Smooth patterns also make for a more predictable system, and it is much easier to do capacity planning with a smooth pattern. If you are running at about 50 percent average CPU usage and you have a consistent and predictable smooth pattern, you know you have some breathing room. You generally need to start looking at upgrading your system when you are at about 80 percent CPU usage.

Enterprise systems need to be predictable as well, and smooth CPU usage patterns reflect that predictability. Although it can be very rewarding and impressive to tune a number of individual queries to achieve factors of performance gains, predictable systems are what allow you to sleep at night.

The following image is an example of a smooth and even CPU pattern (please note that all of the examples are Performance Monitor screenshots, but the red CPU lines have been hand-drawn to illustrate the pattern):

image

 

CPU Spikes.

Spikes are indicators of potential resource contention, and you need to investigate each one if possible. Some common reasons for CPU spikes can be one (or both) of the following.

Timed events. Some spikes might be the result of batch or maintenance jobs that run on a schedule. These are good candidates for optimization efforts as they tend to be contained processes (for example, a nightly batch job), making it easier to find and tune them effectively. They also tend to produce spiky CPU usage even after optimization, as batch jobs by nature tend to be complex processes. If the CPU pattern is spiky during those jobs, but not pegged or plateaued for long periods of time, you can accept these as part of your system norm for now.

The following image is an example of a timed events CPU pattern:

image

Hash joins. The SQL Server Query Optimizer currently supports three types of join operators (nested loop, merge join, and hash join), which, along with other operators, are used to produce reusable query execution plans. When there are no supporting indexes for a specific query, the optimizer will often have to use a hash join to return the query result set. This results in increased CPU usage, because it needs to build hash tables and perform operations in memory to correctly limit the result set.

Spiky CPU usage as a result of hash joins is a scenario I see often in my performance optimization work. Your efforts to investigate and eliminate it tend to be well-spent because this can result in much lower overall CPU usage. This takes you closer to the desired smooth pattern and results in more predictable usage as well. How to find and eliminate unneeded hash joins will be covered in an ancillary post.

The following image is an example of a hash join CPU pattern:

image

 

CPU Valleys.

Valleys are usually not an issue; they reflect the state when there is either low CPU usage or low general system load (fewer users or processes running on the server).

The following image is an example of a CPU valleys pattern:

image

 

Pegged CPU.

The term "pegged CPU" refers to the % CPU Processor time being at or near 100 percent of available CPU time. This type of bottleneck cannot be tolerated in an enterprise environment for long because it will impede all other processes, time out other processes, and start to cause application errors. It is the equivalent of the completely blocked morning commute: Ferraris and Yugos are all moving at the same speed. So, don't spend any more effort evaluating the system as a whole. Immediately determine what is causing the CPU pegging, and solve that problem.

One of the best and least impactful ways to narrow down what queries are causing this problem is to use Performance Monitor and SQL Server Profiler integrated together. Be aware that running and logging with both SQL Server Profiler and Performance Monitor at the same time—and gathering all the counters that are required to not only correlate the two tools but identify the offending code—will impact a system. So, don't use it for gathering the general usage picture described earlier. Although you should avoid use of these tools in this manner on a running system, a pegged CPU is a serious enough problem that it warrants the impact, because it can be one of the fastest ways to get to the root of the problem. See How to Correlate a Trace with Windows Performance Log Data (SQL Server Profiler) for more information.

The following image is an example of a pegged CPU pattern:

image

Next Time

Next, I'll talk about how to get a feel for the types of workloads that Teamcenter might produce on SQL Server and the impact of each type.


Sharon Bjeletich is a database architect and performance and scalability expert who has worked with SQL Server since version 4.2. She is a former senior program manager in the Microsoft SQL Server product group and a former technical officer for the World Health Organization. Currently, she is an independent consultant at SQL Scalability.