Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Concurrency, Part 10 - How do you know if you've got a scalability issue?

Concurrency, Part 10 - How do you know if you've got a scalability issue?

  • Comments 21
Well, the concurrency series is finally running down (phew, it's a lot longer than I expected it to be)...

Today's article is about determining how you know if you've got a scalability problem.

First, a general principle: All non trivial, long lived applications have scalability problems.  It's possible that the scalability issues don't matter to your application.  For example, if you application is Microsoft Word (or mIRC, or Firefox, or just about any other application that interacts with the user)) then scalability isn't likely to be an issue for your application - the reality is that the user isn't going to try to make your application faster by throwing more resources at the application.

As I write wrote the previous paragraph, I just realized that it describes the heart of scalability issues - if the user of your application feels it's necessary to throw more resources at your application, then your application needs to have to worry about scalability.  It doesn't matter if the resources being thrown at your application are disk drives, memory, CPUs, GPUs, blades, or entire computers, if the user decides that hat your system is bottlenecked on a resource, they're going to try to throw more of that resource at your application to make it run faster.  And that means that your application needs to be prepared to handle it.

Normally, these issues are only for server applications living in data farms, but we're starting to see the "throw more hardware at it" idea trickle down into the home space.  As usual, the gaming community is leading the way - the AlienWare SLI machines are a great example of this - to improve your 3d graphics performance, simply throw more GPUs at the problem.

I'm not going to go into diagnosing bottlenecks in general, there are loads of resources available on the web for it (my first Google hit on Microsoft.com was this web cast from 2003).

But for diagnosing CPU bottlenecks related to concurrency issues, there's actually a relatively straightforward way of determining if you've got a scalability issue associated with your locks.  And that's to look at the "Context Switches/sec" perfmon counter.  There's an article on how to measure this in the Windows 2000 resource kit here, so I won't go into the details, but in a nutshell, you start the perfmon application, select all the threads in your application, and look at the context switches/sec for each thread.

You've got a scalability problem related to your locks if the context switches/second is somewhere above 2000 or so.

And that means you need to dig into your code to find the "hot" critical sections.  The good news is that it's not usually to hard to detect which critical section is "hot" - hook a debugger up to your application, start your stress and put a breakpoint in the ntdll!RtlEnterCriticalSection routine.  You'll get a crazy number of hits, but if you look at your call stacks, then the "hot" critical will start to show up.  It sounds tedious (and it is somewhat) but it is surprisingly effective.   There are other techniques for detecting the "hot" critical sections in your process but they are not guaranteed to work on all releases on Windows (and will make Raymond Chen very, very upset if you use them).

Sometimes, your CPU bottleneck is simply that you're doing too much work on a single thread - if it simply takes too much time to calculate something, then you need to start seeing if it's possible to parallelize your code - you're back in the realm of making your code go faster and out of the realm of concurrent programming.  Another option that you might have is the OpenMP language extensions for C and C++ that allow the compiler to start parallelizing your code for you.

But even if you do all that and ensure that your code is bottleneck free, you still can have scalability issues.  That's for tomorrow.

Edit: Fixed spelling mistakes.

 

  • Andrew, NT tries, but it's not totally successful. You can see that if you have an MP machine - start up a single threaded CPU bound task and look at the task manager - both CPUs will be 50% utilized.
  • Sysinternals/Mark Russinovich gives some details about how the NT scheduler works with thread affinity here [1]. So, even if you don't explicitly set the thread affinity (hard affinity) the scheduler will set one (soft).

    For more details check out the "Inside NT Scheduler" article series there :)

    [1] http://www.sysinternals.com/publ.shtml#scheduler
  • Concurency is still spelled wrong, it should be concurrency. (two 'r's). <br> <br>Please delete this message unless you feel it adds some value to your comments. <br> <br>But please use a spell checker! <br><a target="_new" href="http://spellbound.sourceforge.net/">http://spellbound.sourceforge.net/</a> (for Firefox). <br><a target="_new" href="http://www.iespell.com">http://www.iespell.com</a> <br>(for IE) <br>both are free. <br> <br>We non-native speakers spelling mistakes can cause bigger problems than for normal. Kindly take the time to check. <br> <br>But fantastic writings, please keep blogging regularly.
  • Concurency is still spelled wrong, it should be concurrency. (two 'r's). <br> <br>Please delete this message unless you feel it adds some value to your comments. <br> <br>But please use a spell checker! <br><a target="_new" href="http://spellbound.sourceforge.net/">http://spellbound.sourceforge.net/</a> (for Firefox). <br><a target="_new" href="http://www.iespell.com">http://www.iespell.com</a> <br>(for IE) <br>both are free. <br> <br>We non-native speakers spelling mistakes can cause bigger problems than for normal. Kindly take the time to check. <br> <br>But fantastic writings, please keep blogging regularly.
  • > put a breakpoint in the ntdll!RtlEnterCriticalSection routine

    If you want to catch contention in debugger (perhaps to look at the call stack) then you should use ntdll!RtlpWaitForCriticalSection instead. Typically, the vast majority of EnterCriticalSection calls don't block and thus don't cause a context switch. It's only the blocking calls that you should be concerned with.

    But if you just want to find out which critical sections are causing context switches, the easiest way is to use the "!locks -v" command in windbg/cdb and look at the ContentionCount values.
  • On a real SMP system, even if the looping thread bounces from one CPU to another, it can't completely hose CPUs that it's not executing on. If other threads are executing poor code that tries spinlocks for ages then those other threads can hose other CPUs. If the looping thread eats a lot of bandwidth to memory or I/O then other CPUs will be slowed down by it.

    3/4/2005 8:53 AM Larry Osterman

    > start up a single threaded CPU bound task
    > and look at the task manager - both CPUs
    > will be 50% utilized.

    I guess that's because of the number of other threads that also get scheduled for short times? When it's time to resume the hogging thread, the CPU previously used by the hogging thread might be executing something else at that instant?

    On an HT machine (not SMP I know), doing a VB6 compile, one pseudo-CPU is around 70% used and the other is around 30% used, for an average of 51%. The 51% is pretty constant because other threads are nearly idle.
Page 2 of 2 (21 items) 12