Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
We’re busy going through tons of telemetry from the many people that have downloaded and installed the Windows 7 beta around the world. We’re super excited to see the excitement around kicking the tires. Since most folks on the beta are well-versed in the hardware they use and very tuned into the choices they make, we’ve received a few questions about the Windows Experience Index (WEI) in Windows 7 and how that has been changed and improved in Windows 7 to take into account new hardware available for each of the major classes in the metric. In this post Michael Fortin returns to dive into the engineering details of the WEI.
The WEI was introduced in Windows Vista to provide one means across PCs to measure the relative performance of key hardware components. Like any index or benchmark, it is best used as a relative measure and should not be used to compare one measure to another. Unlike many other measures, the WEI merely measures the relative capability of components. The WEI only runs for a short time and does not measure the interactions of components under a software load, but rather characteristics or your hardware. As such it does not (nor cannot) measure how a system will perform under the your own usage scenarios. Thus the WEI does not measure performance of a system, but merely the relative hardware capabilities when running Windows 7.
We do want to caution folks in trying to generalize an “absolute” WEI as necessary for a given individual. We each have different tolerances or more importantly expectations for how a PC should perform and the same WEI might mean very different things to different individuals. To personalize this, I do about 90% of my work on a PC with a WEI of 2.0, primarily driven by the relatively low score for the gaming graphics component on my very low cost laptop. I run Outlook (with ~2GB of email), Internet Explorer (with a dozen tabs), Excel (with longs list of people on the development team), PowerPoint, Messenger (with video), and often I am running one of several LOB applications written in .NET. I feel with this type of workload and a PC with Windows 7 and that WEI my own brain and fingers continues to be my “bottleneck”. At the other end of the spectrum is my holiday gift machine which is a 25” all-in-one with a WEI of 5.1 (though still limited by gaming graphics, with subscores of 7.2, 7.2, 6.2, 5.1, 5.9). This machine runs Windows 7 64-bit and I definitely don’t keep it very busy even though I run MediaCenter in a window all the time, have a bunch of desktop gadgets, and run the PC as our print server (I use about 25% of available RAM and the CPU almost never gets above 10%).
The overall Windows Experience Index (WEI) is defined to be the lowest of the five top-level WEI subscores, where each subscore is computed using a set of rules and a suite of system assessment tests. The five areas scored in Windows 7 are the same as they were in Vista and include:
Though the scoring areas are the same, the ranges have changed. In Vista, the WEI scores ranged from 1.0 to 5.9. In Windows 7, the range has been extended upward to 7.9. The scoring rules for devices have also changed from Vista to reflect experience and feedback comparing closely rated devices with differing quality of actual use (i.e. to make the rating more indicative of actual use.) We know during the beta some folks have noticed that the score changed (relative to Vista) for one or more components in their system and this tuning, which we will describe here, is responsible for the change.
For a given score range, we hope our customers will be able to utilize some general guidelines to help understand the experiences a particular PC can be expected to deliver well, relatively speaking. These Vista-era general guidelines for systems in the 1.0, 2.0, 3.0, 4.0 and 5.0 ranges still apply to Windows 7. But, as noted above, Windows 7 has added levels 6.0 and 7.0; meaning 7.9 is the maximum score possible. These new levels were designed to capture the rather substantial improvements we are seeing in key technologies as they enter the mainstream, such as solid state disks, multi-core processors, and higher end graphics adapters. Additionally, the amount of memory in a system is a determining factor.
For these new levels, we’re working to add guidelines for each level. As an example for gaming users, we expect systems with gaming graphics scores in the 6.0 to 6.9 range to support DX10 graphics and deliver good frames rates at typical screen resolutions (like 40-50 frames per second at 1280x1024). In the range of 7.0 to 7.9, we would expect higher frame rates at even higher screen resolutions. Obviously, the specifics of each game have much to do with this and the WEI scores are also meant to help game developers decide how best to scale their experience on a given system. Graphics is an area where there is both the widest variety of scores readily available in hardwaren and also the widest breadth of expectations. The extremes at which CAD, HD video, photography, and gamers push graphics compared to the average business user or a consumer (doing many of these same things as an avocation rather than vocation) is significant.
Of course, adding new levels doesn’t explain why a Vista system or component that used to score 4.0 or higher is now obtaining a score of 2.9. In most cases, large score drops will be due to the addition of some new disk tests in Windows 7 as that is where we’ve seen both interesting real world learning and substantial changes in the hardware landscape.
With respect to disk scores, as discussed in our recent post on Windows Performance, we’ve been developing a comprehensive performance feedback loop for quite some time. With that loop, we’ve been able to capture thousands of detailed traces covering periods of time where the computer’s current user indicated an application, or Windows, was experiencing severe responsiveness problems. In analyzing these traces we saw a connection to disk I/O and we often found typical 4KB disk reads to take longer than expected, much, much longer in fact (10x to 30x). Instead of taking 10s of milliseconds to complete, we’d often find sequences where individual disk reads took many hundreds of milliseconds to finish. When sequences of these accumulate, higher level application responsiveness can suffer dramatically.
With the problem recognized, we synthesized many of the I/O sequences and undertook a large study on many, many disk drives, including solid state drives. While we did find a good number of drives to be excellent, we unfortunately also found many to have significant challenges under this type of load, which based on telemetry is rather common. In particular, we found the first generation of solid state drives to be broadly challenged when confronted with these commonly seen client I/O sequences.
An example problematic sequence consists of a series of sequential and random I/Os intermixed with one or more flushes. During these sequences, many of the random writes complete in unrealistically short periods of time (say 500 microseconds). Very short I/O completion times indicate caching; the actual work of moving the bits to spinning media, or to flash cells, is postponed. After a period of returning success very quickly, a backlog of deferred work is built up. What happens next is different from drive to drive. Some drives continue to consistently respond to reads as expected, no matter the earlier issued and postponed writes/flushes, which yields good performance and no perceived problems for the person using the PC. Some drives, however, reads are often held off for very lengthy periods as the drives apparently attempt to clear their backlog of work and this results in a perceived “blocking” state or almost a “locked system”. To validate this, on some systems, we replaced poor performing disks with known good disks and observed dramatically improved performance. In a few cases, updating the drive’s firmware was sufficient to very noticeably improve responsiveness.
To reflect this real world learning, in the Windows 7 Beta code, we have capped scores for drives which appear to exhibit the problematic behavior (during the scoring) and are using our feedback system to send back information to us to further evaluate these results. Scores of 1.9, 2.0, 2.9 and 3.0 for the system disk are possible because of our current capping rules. Internally, we feel confident in the beta disk assessment and these caps based on the data we have observed so far. Of course, we expect to learn from data coming from the broader beta population and from feedback and conversations we have with drive manufacturers.
For those obtaining low disk scores but are otherwise satisfied with the performance, we aren’t recommending any action (Of course the WEI is not a tool to recommend hardware changes of any kind). It is entirely possible that the sequence of I/Os being issued for your common workload and applications isn’t encountering the issues we are noting. As we’ve said, the WEI is a metric but only you can apply that metric to your computing needs.
Earlier, I made note of the fact that our new levels, 6 and 7, were added to recognize the improved experiences one might have with newer hardware, particularly SSDs, graphics adapters, and multi-core processors. With respect to SSDs, the focus of the newer tests is on random I/O rates and their avoidance of the long latency issues noted above. As a note, the tests don’t specifically check to see if the underlying storage device is an SSD or not. We run them no matter the device type and any device capable of sustaining very high random I/O rates will score well.
For graphics adapters, both DX9 and DX10 assessments can be run now. In Vista, the tests were specific to DX9. To obtain scores in the 6 or 7 ranges, a graphics adapter must obtain very good performance scores, support DX10 and the driver must be a WDDM 1.1 driver (which you might have noticed are being downloaded in beta during the Windows 7 beta). For WDDM 1.0 drivers, only the DX9 assessments will be run, thus capping the overall score at 5.9.
For multi-core processors, both single threaded and multi-threaded scenarios are run. With levels 6 and 7, we aim to indicate that these systems will be rarely CPU bound for typical use and quite suitable for demanding processing tasks and multi-tasking. As examples, we anticipate many quad core processors will be able to score in the high 6 to low 7 ranges, and 8 core systems to be able to approach 7.9. The scoring has taken into account the very latest micro-processors available.
For many key hardware partners, we’ve of course made available additional details on the changes and why they were made. We continue to actively work with them to incorporate appropriate feedback.
Windows 7 RC rated my graphics card (ATI X1700 built into the laptop) as 4.4 for “Graphics” which is pretty good. Then I was alerted to some Windows Updates including one for ATI graphics which was unexpected. After the update and a reboot I re-ran the performance test and was astonished that it downgraded my respectable 4.4 rating to a measly 2.1. I ran it again to check and it was the same! How can a Windows Update for 7 RC more than halve the score?
The “Gaming Graphics” rating remained at 3.3. So Windows 7 after the update reckons my card is more capable of 3D gaming that it is of moving pretty windows around the screen. This does not make sense either.
I have 3 x western digital 500GB re2 harddrives in raid0 and i cant get the disk rating over 5.9
they need to fix that faulty test
i can read with over 200MB/s and write with over 200MB/s and wlile writing and reading on the same time am on about 100MB/s R+W at the same time
I can't get the WEI tool to complete. It will start, but then it either makes no progress at all, or it dies trying to do the Direct3D 9 Aero assessment. As a result, I'm not seeing Aero. My computer has an Intel dual-core CPU and 8G of memory, and my graphics card is an ATI HD 2600 XT. How would I assess where my problem lies?
That's ridiculous, there were games coming out that required 2GB already, and it wasn't that much. I'd expect, say, 4GB to be the minimum for a 5.0 and 8GB for an 6.0. Maybe with 2GB it should be clamped to 4.0. Now, there's the issue of needing 64-bit to address it all...
I would like to see a single hard drive with 4 platters have 4 sata controllers so that I can have raid with one drive. Oh and make it a hybrid please.
Gaming graphics are a good example: to get to a 6 requires both DX10 and a WDDM driver. Without these, the gaming and graphics scores both max out a 5.9.
there were games coming out that required 2GB already, and it wasn't that much. I'd expect, say, 4GB to be the minimum for a 5.0 and 8GB for an 6.0. Maybe with 2GB it should be clamped to 4.0. Now, there's the issue of needing 64-bit to address it all...
In support of this decision, I'd like to point out we had a great deal of data in our hands highlighting some common performance issues with disks, including almost all of the early solid state disks as they hit the market. Given the WinEI tests were not sophisticated enough to catch the problem, it seemed wrong for us to continue to highlight the drives as being good, or very good, when in fact they were the root of many responsiveness issues.
Windows 7 RC rated my graphics card (ATI X1700 built into the laptop) as 4.4 for “Graphics” which is pretty good. Then I was alerted to some Windows Updates including one for ATI graphics which was unexpected.
First of all, Great job on Win7! Also, don't cave to the mass nitwits that want a 10.0 scale!
So, I have a question about the Disk test - I'm using RAID 10 with an intel onboard raid controller (SATA). I originally had 4 Raptor 10k rpm drives in Vista x64 - acheived a 5.9. The heat got to much, so I changed to (4) 7200 rpm seagate drives, also Raid10. Score stayed the same 5.9. Now in Win7, the score is still 5.9. I have a feeling if I went to a single drive that the score would still be a 5.9. :) On my little lenovo X301 with SSD I get a 6.2 so I know Win7x64 can do better! I'm curious about how RAID systems are taken into account. Thanks!
I agree that the 10.0 makes no sense. If hardware becomes more powerful in upcoming years, Windows might decide I am a 10.0. Then there will be no way to compare my performance to anything better that comes out.
I think the current scheme already suffers from this. I am also stuck at 5.9 for RAID. It's not that I think I deserve more. It's that I had a problem with a drive and changed it to another with lesser performance. I still got a 5.9.
Capping makes it impossible to compare, since by definition, devices with more capability than others will have the same scores. 5.9 might make sense when computing the overall score, but I might have a 6.1 level instead of my previous 6.3 level of a 5.9 class. Microsoft is showing you a tree's branches but hiding the leaves. The leaves are the measurable sub-factors and hard criteria.
If MS rightly decides that devices that don't meet future spec XYZ should get a 5.9 sub-sub-score, then as long as the other sub-sub-scores (leaves) are at least 5.9 then the sub-score (branch score)will be 5.9. Anybody with a leaf less that 5.9 will get a meaningful sub-score. We want to see the tree. MS is worrying about showing the forest.
Also, the overall score will tell a user that a machine with a 7.1 for calc per sec and a 4.9 for RAM will not benefit(i.e. get a better score) by getting more/faster RAM if gaming graphics keeps the overall score at 4.5.
The caps and lack of score improvement for hardware improvements is something that should be addressed. Or a white paper could explain the details.
However I agree with the overall score being the lowest of all scores. There is still room to consider sub-scores and compare them to what a software vendor prints on the box or lists on a website.
The advantage of the low score is that you won't get ambushed by a new application that is constrained by your lowest sub-score. An average might boost things based on a high score for something that is not even a constraint for a given user.
Somebody who has a killer media center PC might never need the CPU power in the box. A gaming graphics bottleneck might not result in a perceivable difference in performance. But when that user decides that this super fast PC should be perfect for gaming, the user might be in for a shock.
Back in the mainframe days, we could set the priority for individual users as well as for applications. I met with the senior VP and told him that I would give him the lowest priority. The rationale was that it would have no effect when there were no constraints caused by contention, but he would be the first to know when they caused performance to degrade. Many administrators did the opposite, making top management oblivious to performance problems until end users were crawling in mud.
He knew that users were getting at least the performance he was, and probably more. Windows users will know that a computer will give them at least what the score dictates, and probably more.
On my Seagate 500GB (7200.10) I would get A Vista WEI score of 5.3... However I knew from daily use that my HDD was rather sluggish and I couldn't understand why it had such a high WEI score.
I was pleasantly disappointed that Windows 7 rated my HDD as a measly 2.9.
Processor 6.9 E5200 @ 4.125GHZ
Memory 6.9 1320MHZ FSB - Crucial DDR2
Gaming graphics 6.9 9600GT @ 760MHZ
hard disk 6.9 OCZ Vertex SSD
Hilarious results, if you ask me.
Processor Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz score 7.5
Memory (RAM) 4.00 GB score 7.9
Graphics ATI Radeon HD 5800 Series score 6.0
Gaming graphics 2559 MB Total available graphics memory score 6.0
Primary hard disk 90GB Free (119GB Total) score 5.9
Windows 7 Ultimate
5.9 Determined by lowest subscore
where is point here
i have ssd kingston 128 GB hdd look my subscore 5.9
i have gigabyte ati radeon HD 5850 DDR5 1GB
look score 6.0
ram mem is ok
i have kingston hyper x pc2000 and subscore is good 7.9
my cpu is I7 950 look score just 7.5
PSU is 1000 wats Coolermaster
liquid cooling for cpu
extra ram fans orginal for kingston hyperx ram mem
temp in my comp is max 33 C
and my subscore is just 5.9
you need to fix that