Today's article is about determining how you know if you've got a scalability problem.
First, a general principle: All non trivial, long lived applications have scalability problems. It's possible that the scalability issues don't matter to your application. For example, if you application is Microsoft Word (or mIRC, or Firefox, or just about any other application that interacts with the user)) then scalability isn't likely to be an issue for your application - the reality is that the user isn't going to try to make your application faster by throwing more resources at the application.
As I write wrote the previous paragraph, I just realized that it describes the heart of scalability issues - if the user of your application feels it's necessary to throw more resources at your application, then your application needs to have to worry about scalability. It doesn't matter if the resources being thrown at your application are disk drives, memory, CPUs, GPUs, blades, or entire computers, if the user decides that hat your system is bottlenecked on a resource, they're going to try to throw more of that resource at your application to make it run faster. And that means that your application needs to be prepared to handle it.
Normally, these issues are only for server applications living in data farms, but we're starting to see the "throw more hardware at it" idea trickle down into the home space. As usual, the gaming community is leading the way - the AlienWare SLI machines are a great example of this - to improve your 3d graphics performance, simply throw more GPUs at the problem.
I'm not going to go into diagnosing bottlenecks in general, there are loads of resources available on the web for it (my first Google hit on Microsoft.com was this web cast from 2003).
But for diagnosing CPU bottlenecks related to concurrency issues, there's actually a relatively straightforward way of determining if you've got a scalability issue associated with your locks. And that's to look at the "Context Switches/sec" perfmon counter. There's an article on how to measure this in the Windows 2000 resource kit here, so I won't go into the details, but in a nutshell, you start the perfmon application, select all the threads in your application, and look at the context switches/sec for each thread.
You've got a scalability problem related to your locks if the context switches/second is somewhere above 2000 or so.
And that means you need to dig into your code to find the "hot" critical sections. The good news is that it's not usually to hard to detect which critical section is "hot" - hook a debugger up to your application, start your stress and put a breakpoint in the ntdll!RtlEnterCriticalSection routine. You'll get a crazy number of hits, but if you look at your call stacks, then the "hot" critical will start to show up. It sounds tedious (and it is somewhat) but it is surprisingly effective. There are other techniques for detecting the "hot" critical sections in your process but they are not guaranteed to work on all releases on Windows (and will make Raymond Chen very, very upset if you use them).
Sometimes, your CPU bottleneck is simply that you're doing too much work on a single thread - if it simply takes too much time to calculate something, then you need to start seeing if it's possible to parallelize your code - you're back in the realm of making your code go faster and out of the realm of concurrent programming. Another option that you might have is the OpenMP language extensions for C and C++ that allow the compiler to start parallelizing your code for you.
But even if you do all that and ensure that your code is bottleneck free, you still can have scalability issues. That's for tomorrow.
Edit: Fixed spelling mistakes.