I'm just giddy today.  We've been running performance tests for a while on Orcas and I've reported some of the great results on my blog.  We've also been running stress tests (running a server under heavy load for an extended period of time to measure reliability).  We haven't started doing load testing yet (similar to stress testing but designed to simulate more realistic work loads and to measure the load capacity of a server).

I've been dieing to see how all of the performance improvements we've made for Orcas net out on a real system under load.  Last week I asked the stress team to compare the throughput they are getting today on Orcas with what they were getting on TFS 2005 at the time we released it.  We got the initial results today and, what can I say?  I'm giddy.  The run was done on a relatively modest single server TFS installation.  The results are (the first column is TFS 2005 and the second is Orcas):

Total Tests

873,110

1,980,221

126.8%

Tests/Sec

30.3

68.8

127.1%

As you can see, the Orcas build ran more than twice as many tests as the TFS2005 build did in roughly the same amount of time.  This is exciting stuff.  It means better performance with less hardware - always a good thing.

Now, caveats abound.  It's still early, we're not done developing yet - things will change.  This is stress testing and not load testing so the load profile is not representative (although not way off) and most importantly the data set is not representative.  In stress testing we generally use fairly small databases whereas in load testing we use a database sized for the size of team we are simulating.  The good news is that when we increase the database size, I think the results are going to get even better because much of the Version Control perf work we did doesn't really start to show big benefits until the data set gets large.

I'm expecting to start seeing load test results in the coming months.  From that we will derive new guidance on server sizing.  I'm really eager to get there.  There's nothing I'd love more than the ability to double the supported number of users per hardware configuration.  I can just see it now - 4,000 users on a 4 way.  Wouldn't that be nice?  We'll see but I'm keeping my fingers crossed.

Oh, and one last thing... We had 2 stress runs that made it 8 hours (well over 2 million tests) without a single test failure this week.  Historically, that's unheard of at this phase of our development process.  No deadlocks, no race conditions, bad combinations, everything worked.  OK, we had several runs that hit some errors too but to hit two flawless runs now is really terrific.

Thanks for listening,

Brian