As I said in yesterday’s introduction, my job as an engineer on the Windows Vista team is to improve performance. I wanted to look at a study that measure a key area that we focused on for Windows Vista – consistent responsiveness during the times that matter most to users (when starting up their machine, after being idle, and when you are under the gun running tons of apps, etc.).
To objectively measure how we did, I’ve been working with a company named Principled Technologies. If you’ve been involved in the (admittedly somewhat niche) specialty of perf testing over the last decade you likely know the people if not the company. We commissioned Principled Technologies to develop, run, and document the results of a set of tests that compare the performance of Windows Vista RTM and Windows XP on common business tasks. Today they published their findings here. I am, of course, really excited by the results.
Now you should read the whole report, but I wanted to talk a bit about their key findings:
l “Windows Vista was noticeably more responsive after rebooting than Windows XP on several common business operations.”
As I alluded to, yesterday, superfetch is the key driver behind this. I want to point out, though, that 'after rebooting' can be seen as a proxy for lots of cold operations. Rebooting is just the easiest to reliably measure. The second bullet is:
l “Overall, Windows Vista and Windows XP were roughly equally responsive on most test operations. Windows Vista was more responsive on some operations, and on those operations on which it was more responsive, Windows XP typically responded only a half a second or so faster.”
This is great, especially since Windows Vista is doing considerably more out of the box (e.g. UAC, Defender, search indexing, etc.). One of the most interesting bits for me was their 3rd highlight... you can run Aero without guilt!
l “Windows Vista Aero had little effect on the responsiveness of Windows Vista. Over 95 percent of the response-time differences between tests we ran with and without Vista Aero were under a tenth of a second, and all of the differences were under one second.”
We put quite a bit of effort into making sure that the new visuals were as efficent as possible and it really paid off.
For the truly technical, as you would expect, the report lists exactly how PT developed and ran these tests. The short answer is that they used a range of machines (laptops and desktops, 512M to 2GB, mix of graphics cards & processors, high-end and bare minimum, etc). I encourage you to dig into the report to learn more about the perf of Windows Vista compared to XPSP2. All in all, we were more consistent - better on cold and still doing well on warm; users can come to their machine and begin working, regardless of what state the box is in.
Anyhow, in the next couple of posts I think that I'll focus on some of the things that you will run into when designing and running a performance evaluation like this one. For example, one thing that PT did that I think is important is that they ran the same workload three times on each system before beginning their timed runs. This put the system into a quiescent state by allowing SuperFetch to learn and tune itself for the work it would be facing - similar to what it does in the wild, for real users. This is important to consider because otherwise you will get weird data that is much less repeatable.
Let me also take a second to note that a lot of the advice I am going to be giving in these next posts will be targeted specifically at performance benchmarking. I am not talking about how to maximize the performance of the everyday system (we'll do that in later posts). Perf tests are almost always automated to ensure consistency and repeatability. So I’ll focus on the benchmarking impact of some key features in Windows Vista (such as SuperFetch and UAC) which may not generalize to all situations.
Well, now that we've gotten all that straight, go read the report, and come back in a couple of days. In my next post I'll start talking about preparing a system for accurate, repeatable benchmarking.