It always fascinates me the way a certain perspective will view Capacity Planning and Performance Analysis. For instance, I could not count the number of times a developer or PM for a project has stated “we need a new server for our database”. My response, of course, is to question how they know that they need a dedicated server for their lone database and invariably the answer comes down to the fact that they don’t really know. They’re making an assumption, sometimes that assumption is based on a smattering of actual data, but most often is just a raw assumption.
Engineers on the other hand look at things a little differently, though often with the same limited perspective. Many times an engineer will ask for a new server because “the CPU said 100%”. 100% for how long? How many times did it reach 100% in an a 24 hour period?
So let’s talk a minute about Performance Analysis and Capacity Planning. The two are distinctly different, but often based on similar data. The primary difference between Performance Analysis and Capacity Planning is duration.
The purpose of a Performance Analysis, whether for a specific application or for a server, is to establish a pattern of use and to determine when and to what severity a detrimental pattern might be occurring.
The purpose of Capacity Planning, whether for a specific application or for a server, is to establish a pattern of use and to determine the short and long term operating requirements.
The different between the two is somewhat subtle. Capacity Planning is simply a view of performance data over a duration whereas Performance Analysis is an examination of a specific time slice. You might want to know what the performance characteristics of a server were for a five minute period, or the last twenty four hours, or last seven days. Examining the data in this manner will allow you to detect possible problem areas in performance. For instance, an application may generate significant time-out errors at 3 AM PST every morning and continue to generate these errors until 9 AM PST every morning, but only during the week days. By examining the performance data for these time period you could correlate a large ramp up of connections to an IIS application which in turns sucks up the available memory on that server (or server) until timeouts occur. The time slice of 3 to 9 AM PST means your East coast customers are coming onto your site as they start their work day and the progress continues until the West coast customers have followed suit. All of this data could mean that the login page for your application is too memory intensive or that the initial page load is too intensive. Obviously this is a lame example, but you get the idea.
Now take that same data and look at the performance numbers for the entire month. If the number of IIS connections was averaging 500 per day for week 1, 550 per day for week 2, 625 per day for week 3 and 700 per day for week 4, you can see a usage trend begin to develop over a duration. If test have shown your application to peak at 1000 connections then you know from your examination of the data that you will reach your peak capacity within four weeks.
Unless it is a ridiculously obvious case, there is no way you can adequately plan for capacity with data of a duration less than seven days and I personally prefer thirty days, but it may depend on the application.
I’m sure my man Chris Ball has a plethora of algorithms to twist all of that data into massive reports with standard deviation and standard error lines. In fact, Chris has implemented many of these mathematical functions into our standard performance collection routine. This is a striking benefit for us as we can now examine immediate, detailed data for a variety of performance counters, but we can also establish long duration trends that assist in planning our hardware purchases and re-purposing. I find the standard deviation line to be particularly useful. Kudos Chris!
Will