Engineering Windows 7

Welcome to our blog dedicated to the engineering of Microsoft Windows 7

Engineering the Windows 7 “Windows Experience Index”

Engineering the Windows 7 “Windows Experience Index”

  • Comments 78

We’re busy going through tons of telemetry from the many people that have downloaded and installed the Windows 7 beta around the world. We’re super excited to see the excitement around kicking the tires. Since most folks on the beta are well-versed in the hardware they use and very tuned into the choices they make, we’ve received a few questions about the Windows Experience Index (WEI) in Windows 7 and how that has been changed and improved in Windows 7 to take into account new hardware available for each of the major classes in the metric. In this post Michael Fortin returns to dive into the engineering details of the WEI.

The WEI was introduced in Windows Vista to provide one means across PCs to measure the relative performance of key hardware components. Like any index or benchmark, it is best used as a relative measure and should not be used to compare one measure to another. Unlike many other measures, the WEI merely measures the relative capability of components. The WEI only runs for a short time and does not measure the interactions of components under a software load, but rather characteristics or your hardware. As such it does not (nor cannot) measure how a system will perform under the your own usage scenarios. Thus the WEI does not measure performance of a system, but merely the relative hardware capabilities when running Windows 7.

We do want to caution folks in trying to generalize an “absolute” WEI as necessary for a given individual. We each have different tolerances or more importantly expectations for how a PC should perform and the same WEI might mean very different things to different individuals. To personalize this, I do about 90% of my work on a PC with a WEI of 2.0, primarily driven by the relatively low score for the gaming graphics component on my very low cost laptop. I run Outlook (with ~2GB of email), Internet Explorer (with a dozen tabs), Excel (with longs list of people on the development team), PowerPoint, Messenger (with video), and often I am running one of several LOB applications written in .NET. I feel with this type of workload and a PC with Windows 7 and that WEI my own brain and fingers continues to be my “bottleneck”. At the other end of the spectrum is my holiday gift machine which is a 25” all-in-one with a WEI of 5.1 (though still limited by gaming graphics, with subscores of 7.2, 7.2, 6.2, 5.1, 5.9). This machine runs Windows 7 64-bit and I definitely don’t keep it very busy even though I run MediaCenter in a window all the time, have a bunch of desktop gadgets, and run the PC as our print server (I use about 25% of available RAM and the CPU almost never gets above 10%).

–Steven

The overall Windows Experience Index (WEI) is defined to be the lowest of the five top-level WEI subscores, where each subscore is computed using a set of rules and a suite of system assessment tests. The five areas scored in Windows 7 are the same as they were in Vista and include:

  • Processor
  • Memory (RAM)
  • Graphics (general desktop work)
  • Gaming Graphics (typically 3D)
  • Primary Hard Disk

Though the scoring areas are the same, the ranges have changed. In Vista, the WEI scores ranged from 1.0 to 5.9. In Windows 7, the range has been extended upward to 7.9. The scoring rules for devices have also changed from Vista to reflect experience and feedback comparing closely rated devices with differing quality of actual use (i.e. to make the rating more indicative of actual use.) We know during the beta some folks have noticed that the score changed (relative to Vista) for one or more components in their system and this tuning, which we will describe here, is responsible for the change.

For a given score range, we hope our customers will be able to utilize some general guidelines to help understand the experiences a particular PC can be expected to deliver well, relatively speaking. These Vista-era general guidelines for systems in the 1.0, 2.0, 3.0, 4.0 and 5.0 ranges still apply to Windows 7. But, as noted above, Windows 7 has added levels 6.0 and 7.0; meaning 7.9 is the maximum score possible. These new levels were designed to capture the rather substantial improvements we are seeing in key technologies as they enter the mainstream, such as solid state disks, multi-core processors, and higher end graphics adapters. Additionally, the amount of memory in a system is a determining factor.

For these new levels, we’re working to add guidelines for each level. As an example for gaming users, we expect systems with gaming graphics scores in the 6.0 to 6.9 range to support DX10 graphics and deliver good frames rates at typical screen resolutions (like 40-50 frames per second at 1280x1024). In the range of 7.0 to 7.9, we would expect higher frame rates at even higher screen resolutions. Obviously, the specifics of each game have much to do with this and the WEI scores are also meant to help game developers decide how best to scale their experience on a given system. Graphics is an area where there is both the widest variety of scores readily available in hardwaren and also the widest breadth of expectations. The extremes at which CAD, HD video, photography, and gamers push graphics compared to the average business user or a consumer (doing many of these same things as an avocation rather than vocation) is significant.

Of course, adding new levels doesn’t explain why a Vista system or component that used to score 4.0 or higher is now obtaining a score of 2.9. In most cases, large score drops will be due to the addition of some new disk tests in Windows 7 as that is where we’ve seen both interesting real world learning and substantial changes in the hardware landscape.

With respect to disk scores, as discussed in our recent post on Windows Performance, we’ve been developing a comprehensive performance feedback loop for quite some time. With that loop, we’ve been able to capture thousands of detailed traces covering periods of time where the computer’s current user indicated an application, or Windows, was experiencing severe responsiveness problems. In analyzing these traces we saw a connection to disk I/O and we often found typical 4KB disk reads to take longer than expected, much, much longer in fact (10x to 30x). Instead of taking 10s of milliseconds to complete, we’d often find sequences where individual disk reads took many hundreds of milliseconds to finish. When sequences of these accumulate, higher level application responsiveness can suffer dramatically.

With the problem recognized, we synthesized many of the I/O sequences and undertook a large study on many, many disk drives, including solid state drives. While we did find a good number of drives to be excellent, we unfortunately also found many to have significant challenges under this type of load, which based on telemetry is rather common. In particular, we found the first generation of solid state drives to be broadly challenged when confronted with these commonly seen client I/O sequences.

An example problematic sequence consists of a series of sequential and random I/Os intermixed with one or more flushes. During these sequences, many of the random writes complete in unrealistically short periods of time (say 500 microseconds). Very short I/O completion times indicate caching; the actual work of moving the bits to spinning media, or to flash cells, is postponed. After a period of returning success very quickly, a backlog of deferred work is built up. What happens next is different from drive to drive. Some drives continue to consistently respond to reads as expected, no matter the earlier issued and postponed writes/flushes, which yields good performance and no perceived problems for the person using the PC. Some drives, however, reads are often held off for very lengthy periods as the drives apparently attempt to clear their backlog of work and this results in a perceived “blocking” state or almost a “locked system”. To validate this, on some systems, we replaced poor performing disks with known good disks and observed dramatically improved performance. In a few cases, updating the drive’s firmware was sufficient to very noticeably improve responsiveness.

To reflect this real world learning, in the Windows 7 Beta code, we have capped scores for drives which appear to exhibit the problematic behavior (during the scoring) and are using our feedback system to send back information to us to further evaluate these results. Scores of 1.9, 2.0, 2.9 and 3.0 for the system disk are possible because of our current capping rules. Internally, we feel confident in the beta disk assessment and these caps based on the data we have observed so far. Of course, we expect to learn from data coming from the broader beta population and from feedback and conversations we have with drive manufacturers.

For those obtaining low disk scores but are otherwise satisfied with the performance, we aren’t recommending any action (Of course the WEI is not a tool to recommend hardware changes of any kind). It is entirely possible that the sequence of I/Os being issued for your common workload and applications isn’t encountering the issues we are noting. As we’ve said, the WEI is a metric but only you can apply that metric to your computing needs.

Earlier, I made note of the fact that our new levels, 6 and 7, were added to recognize the improved experiences one might have with newer hardware, particularly SSDs, graphics adapters, and multi-core processors. With respect to SSDs, the focus of the newer tests is on random I/O rates and their avoidance of the long latency issues noted above. As a note, the tests don’t specifically check to see if the underlying storage device is an SSD or not. We run them no matter the device type and any device capable of sustaining very high random I/O rates will score well.

For graphics adapters, both DX9 and DX10 assessments can be run now. In Vista, the tests were specific to DX9. To obtain scores in the 6 or 7 ranges, a graphics adapter must obtain very good performance scores, support DX10 and the driver must be a WDDM 1.1 driver (which you might have noticed are being downloaded in beta during the Windows 7 beta). For WDDM 1.0 drivers, only the DX9 assessments will be run, thus capping the overall score at 5.9.

For multi-core processors, both single threaded and multi-threaded scenarios are run. With levels 6 and 7, we aim to indicate that these systems will be rarely CPU bound for typical use and quite suitable for demanding processing tasks and multi-tasking. As examples, we anticipate many quad core processors will be able to score in the high 6 to low 7 ranges, and 8 core systems to be able to approach 7.9. The scoring has taken into account the very latest micro-processors available.

For many key hardware partners, we’ve of course made available additional details on the changes and why they were made. We continue to actively work with them to incorporate appropriate feedback.

--Michael Fortin

Leave a Comment
  • Please add 7 and 1 and type the answer here:
  • Post
  • My Scores:

    CPU: 7.2

    RAM: 7.2

    Graphics: 7.4

    Gaming Graphics: 6.0

    Hard Disk: 6.0

    For info, if anyone cares, the disk is a previous model Hitachi Deskstar which cost £30. :-)

  • Computer:

    MB: Asus Striker II NSE

    Intel Core 2 Quad 9550 2,83GHz

    ATI Radeon HD 4870X2 WDDM 1.1 driver

    8GB DDR3 1333MHz

    Samsung 1TB 32MB

    Windows 7 64Bit

    Score

    Processor 7,3

    Memory 7,3

    Graphics 7,9

    Gaming graphics 6,3

    Primery Hard disk 6,0

    Question: Gaming graphics 6.3 ????? Vista 5.9

  • In all the feedback on the WEI, I don't get why on certain standard Hard Drives, the write catching is turned on. On SSD's and the HDD's, where the chipset actually benefits the write catching, I could see the benefit. However, if you're hardware suffer's because of such a feature, why doesn't Windows automatically turn it off?

    I get it that you want the best experience for everyone. Write Catching probably has its advantages. My point is that most folks who aren't investing in higher end HDD's and SSD's, are going to see this as a negative. If I can get a good Windows Experience without write catching, then I think this feature needs to be intuitive.

  • First of all, thanks for trying the beta! It was an awesome experiece, and I wil give you ALOT feedback ;)

    Annyway, my WEI if from 2.8 to 5.1(CPU 3,9, RAM 5.1, Graphics 2.8, Gaming graphics 3.0, primary HDD 2.9)

    But why was my HDD score lover than Vista?

    Martin

  • Hi Ronny49,

    I'm the development manger for WinEI.  Your gaming score went up because you have very well performing adapter, DX10 and a WDDM 1.1 driver.  Of course, WinEI is a benchmark, but it does take other things into account than simple measured performance.  Gaming graphics are a good example: to get to a 6 requires both DX10 and a WDDM driver.   Without these, the gaming and graphics scores both max out a 5.9.

  • Hi Ronny49,

    I'm the development manager for WinEI.  Your gaming score went up because you have very well performing adapter, DX10 and a WDDM 1.1 driver.  Of course, WinEI is a benchmark, but it does take other things into account than simple measured performance.  Gaming graphics are a good example: to get to a 6 requires both DX10 and a WDDM driver.   Without these, the gaming and graphics scores both max out a 5.9.

  • Thanks to everyone who replied to my question, especially to rgr.

    Although, I think I didn't aks my question quite right.

    I did expect my score to be higher than 5.9 but a lot higher than 6.3 (in Windows 7)because I have one of the market's best graphics adapters.

    What can I do to increase the 6.3 score?

  • strange...

    my WEI  CPU score grow from 5.0 in vista to 5.3 in 7, HDD score grow from 5.7 in vista to 5.9 in 7, memory remain the same 5.9 both vista or 7, and the aero and graphics grow much higher in 7 than vista. i guess its about your drivers conf., try with a vista driver if 7 are not avaible

  • I hope the RAM score is better clamped to sizes instead of bandwidth. Bandwidth is not that important and makes comparatively little difference in real performance, however size is vital. IIRC, in Vista with just 1.5GB of RAM you could have a score of 5.9. That's ridiculous, there were games coming out that required 2GB already, and it wasn't that much. I'd expect, say, 4GB to be the minimum for a 5.0 and 8GB for an 6.0. Maybe with 2GB it should be clamped to 4.0. Now, there's the issue of needing 64-bit to address it all...

  • I think it is great that the weakest link in my computer has been identified. I look forward to seeing tests on hard drives that have a meaningful ,standardized and understandable metric. This should hold the feet of some of these slackers to the fire.  I would like to see a single hard drive with 4 platters have 4 sata controllers so that I can have raid with one drive. Oh and make it a hybrid please.

  • Is there a link between calculating my WEI and the use of thumbnails by the taskbar? On 3 laptops, 1 with ATI and 2 with Intel graphics, I was getting small black squares as pop-up thumbnails on the taskbar, but after running the WEI, they started working properly.

  • Dump the "Windows Experience Index".

    Its the most idiotic thing I've seen in Windows. It was a bad idea in Vista and its even worse in Win7. The number is meaningless. It has little to no relationship to reality.

    The effort spent working on it is better spent making the rest of the OS better.

    The fact that you can "game" it by turning on and off disk caching, and video performance settings and so forth invalidates it as a serious tool for any real use.

    The fact that most of the people on this blog  seem to think it can be affected by whether or not thumbnails and the like are turned on or off and that measurements in Vista don't match in Win7 etc means that its not a tool, its a marketing "feature".

  • I have just scored 2.9 on a 300Gb Hitachi Ultrastar 15k rpm SAS drive running from a 3ware 9690SA controller. In my experience this kit gives very good real-world performance. So IMHO 2.9 is a positively misleading number. I get 5.9 if I turn off write caching, which is simply perverse.

    Tim

  • Further to my last post, if I turn off the command queuing function I can leave write caching enabled and get a score of 5.9.

    But in this situation Everest reveals a lower linear and random read score (c.18MB/s) than with command queuing enabled (c.22MB/s).

    So, again, I think something's awry.

    Tim

  • PS: those low scores are for 4KB block size, obviously: the 64KB block size gives read rates of above 80MB/s.

    Tim

Page 3 of 6 (78 items) 12345»