Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
One of the areas of any release of Windows that receives a significant amount of testing and scrutiny is the performance of graphics—desktop graphics all the way to the most extreme CAD and game graphics. The amazing breadth of hardware supported for Windows and the broad spectrum of usage scenarios contributes to a vibrant ecosystem with many different goals—from just the basics to the highest frame rates on multiple monitors possible. In engineering Windows 7 we set out to improve the “real world” performance of graphics as well as continue to improve the most extreme elements of graphics. This is work we do in Windows 7 and work our partners do as they work to improve the underlying hardware/software combination through drivers (note: Windows Vista drivers continue to work as they did in Windows Vista, but we've also been working with partners on updated drivers for Windows 7 which many of you have been testing through Windows Update downloads). This post looks at this spectrum of engineering as well as the different ways performance is measured. Ultimately we want to inform you about what we have done in engineering Windows 7, while we leave room for the many forums that will compare and contrast Windows 7 on different hardware and in different scenarios. This post is written by Ameet Chitre, a program manager on our Desktop Graphics feature team. --Steven
If you have gone online to check out or purchase a new PC, you must have noticed that “faster graphics” and “great performance” are always some of the key selling points. People have come to expect faster systems which enable them to edit photos, do email, watch high-definition videos and play the latest 3D games all with greater ease, often shuffling between these tasks seamlessly. Quite a few of these users refer to the enthusiast community blogs and various review sites which run graphics benchmarks and report results evaluating how fast the graphics of new hardware or software performs. Traditionally graphics performance has been measured and analyzed through 3D games but it also impacts what we call “desktop scenarios” - such as when you are using the Windows UI, moving or maximizing windows, or scrolling within Word or IE etc. The performance requirements for these desktop scenarios are quite different from3D games. In fact, this is the reason in Windows Vista Experience Index (WinEI) we give you rating for these two scenarios separately, highlighted in the image below:
Figure 1. WEI sample with Graphics capabilities highlighted.
Graphics performance is usually assessed through many benchmarks. These can be classified into 2 broad categories:
However, there are plenty of things that we all do on our PCs that don’t have benchmarks tracking them that are still quite critical to make fast. In these cases we use the instrumentation within Windows to obtain timing information and then analyze the performance.
This blog entry discusses various aspects of graphics performance - both gaming and desktop graphics performance. It calls out the changes we made in Windows 7 to address user feedback as well as to take advantage of modern hardware to improve graphics performance.
Many have experienced scenarios where an application, or Windows itself, stops responding momentarily. This is type of a performance issue that can be impacted significantly by the performance of graphics in the PC. We categorize these as desktop responsiveness issues. Improving responsiveness, both in real terms and by avoiding non-responsive moments, is one of the key ways that performance is improved in the system. It is also hard to measure.
Measuring desktop responsiveness is a hard problem since a number of issues which affect responsiveness aren’t easily reproducible and there is a great variety of them. They are rarely caught by either kind of benchmark as these issues are dependent on real-world combinations of factors. For Windows 7 we spent a great deal of time looking at these performance glitches using a mechanism in test versions of Windows 7 which has the ability to record key OS events and when they occurred. During real-world testing when we encounter a responsiveness problem, the tester can hit a record key and enter a small description of the issue encountered. The event history with diagnostic information called a “performance trace” is written out to a file and uploaded to a server where a team of performance analysts parse the data to figure out the cause of the responsiveness issue. This process has been successful to the extent that today most responsiveness issues can be quickly tracked down and root-caused.
Using this methodology, we analyzed thousands of desktop responsiveness traces where the tester experienced a frozen desktop anywhere from 100msec to several seconds. The type of issues ranged from an antivirus blocking disk access for all applications while updating itself on the vendor’s website to an application doing network access from a UI thread. In a significant portion of these traces, we found that a GDI application is waiting on another GDI application which is experiencing slowdowns due to excessive paging activity. This was the single-most frequently occurring cause of all desktop responsiveness issues, which without this data we probably would not have assumed. Based on these investigations, we worked to improve the architecture in these two key areas:
These are described in more detail below.
A number of performance traces we investigated in the context of desktop responsiveness pointed us to the design of a key synchronization mechanism in GDI. The performance challenge happens because the design of GDI in Windows Vista allows only a single application to hold a system-wide exclusive global lock. While this seems obvious in hindsight, when this decision was originally made the performance characteristics of different parts of the system made this optimistic implementation perfectly reasonable.
Figure 2. Existing architecture of GDI concurrency.
GDI applications running simultaneously vie for this global lock in order to render on the screen. The application that accesses the global lock prevents other applications from rendering till it releases the global lock. The situation often gets exacerbated when the application that is holding the lock needs to page in a large amount of memory from the disk since moving the memory from the disk to RAM takes a relatively long time. The above picture shows two GDI applications running simultaneously, contending for the global lock. If App X gets hold of the lock, it can render to the screen while App Y is unable to do so and waits for App X to finish.
Figure 3. Windows 7 architecture of GDI concurrency.
The solution to the problem was therefore to reduce the lock contention and improve concurrency by re-architecting the internal synchronization mechanism through which multiple applications can reliably render at the same time. Contention due to the global exclusive lock is avoided by implementing a number of fine-grained locks which are not exclusive but aid parallelism. The increased number of fine-grained locks adds a small overhead for scenarios where only a single application is rendering at a time.
Special attention was paid to GDI application compatibility as changing internal synchronization mechanism in the most widely used API stack could potentially give rise to timing issues such as deadlocks and rendering corruption.
This work also resulted in better rendering performance of concurrent GDI applications on multi-core CPUs. Multi-core Windows PCs benefit from these changes as more than one application can now be rendering at the same time.
After the GDI concurrency work was implemented in the Windows 7 builds leading to the Beta, we saw a large reduction in the number of desktop responsiveness issues reported by our testers which were caused by one application blocking another one due to GDI. To further validate the scalability of our new implementation, we wrote tests that draw 2D GDI primitives and measured the rendering throughput by launching simultaneously multiple such applications. The throughput is measured by adding together the frame rate (FPS) of each application window. Below is a sample of these results on a quad-core CPU system.
Figure 4. GDI Concurrency and Scalability.
Without the Windows 7 GDI concurrency, the rendering throughput of these applications is effectively limited to the performance of a single CPU core. Since only a single application can acquire the global exclusive lock while the others are waiting, this scenario doesn’t benefit from multiple CPU cores. This demonstrates that GDI applications in Windows 7 are now much less dependent on one another. This benefit will not need any new display drivers; it will work on any Vista (WDDM 1.0) and newer display drivers.
Another area which affects system responsiveness is memory usage. Simply put, increased system memory (RAM) usage leads to an increased paging activity which directly leads to reduced system responsiveness. Thus, for the best responsiveness, all applications, processes and OS components need to use as little system memory as possible.
In Windows Vista, the amount of memory required to run multiple windows scales linearly with the number of windows opened on the system. This results in more memory pressure when there are more windows or if the monitors have higher resolution. It gets worse if you have more than one monitor. As part of investigating various means to improve system responsiveness, we saw a great opportunity in reducing the usage of system memory by DWM. In Windows Vista, every GDI application window accounts for two memory allocations which hold identical content – one in video memory and one in system memory. DWM is responsible for composition of the desktop through the graphics hardware. Hence, it requires a copy of the same allocation in video memory, which is easily accessible by the graphics hardware. The duplicate copy present in system memory is required because GDI is being rendered utilizing the CPU completely in the operating system without any assistance or “acceleration” by the graphics hardware. As the CPU performs all the tasks for rendering GDI applications, it requires an easily accessible cacheable copy of memory.
Figure 5. Existing memory allocations.
Windows 7 saves one copy of the memory allocation per application window by getting rid of the system memory copy entirely. Thus, for a GDI application window visible on the desktop, the memory consumed is cut in half.
Figure 6. Windows 7 memory allocations.
We achieved the reduction in system memory by accelerating the common GDI operations through the graphics hardware - the WDDM drivers accelerate these to minimize the performance impact of the CPU read-back of video memory. This was necessary as performing these operations otherwise on the CPU would incur a heavy performance penalty. In order to decide which GDI operations to accelerate, it was important to understand the usage pattern of various GDI applications. We profiled the top 100 GDI applications to learn more about their calling patterns and frequency and nature of the GDI operations.
Figure 7. Calling patterns and frequency of GDI operations for 100 GDI-based applications.
Based on real-world application statistics, a tiny snapshot of which is seen above, we worked with our graphics IHV partners to provide support in their drivers to accelerate the most commonly used GDI operations. Windows 7 systems with these updated drivers, called “WDDM v1.1” will thus benefit from this memory savings work. Please note that WDDM 1.0 drivers continue to function and are fully supported on Windows 7. You might have seen Windows Update providing these 1.1 drivers during the Beta—these drivers are themselves in Beta.
Figure 8. Desktop Window manager memory consumption comparison using WDDM 1.1 v. WDDM 1.0.
The above data shows that the memory savings become more and more pronounced when you have multiple application windows visible on the desktop. Since you save a lot of system memory, the paging activity gets reduced – as a result, your system responsiveness improves for the same workload.
Certain trade-offs had to be made for the desktop responsiveness improvements which benefit a wide range of systems. For example – the elimination of the duplicate system memory copies which “speed up” certain operations introduced slightly reduced performance as the CPU now has to read data back from the video memory. An analysis of real-world application statistics showed that these operations were rare. However, certain GDI micro-benchmarks which issue these operations show some performance degradation. This is something to note if you are running existing benchmarks that stress specific GDI operations repeatedly, which isn’t necessarily a reflection of real-world performance. Our observation has been that these slow-downs do not impact the end-user functionality directly and that the memory savings directly result in Windows 7 being much responsive overall. The improvements overall are definitely noticeable on memory constrained PCs with shared memory graphics.
No article on graphics performance is complete without talking about gaming, which is still the most widely analyzed and discussed aspect of graphics performance. There are a number of popular benchmarks such as 3D Mark as well as in-game benchmarks which are really a mode in which you can run your game where it renders the game scenes and animations without any user interaction. This area has thus been well tracked by the gaming industry through various industry benchmarks, which are pretty realistic and representative of actual games. The different benchmarks and tests are widely discussed and gamers all well-versed in the subtleties of these measurements and translating them into recommendations depending on their hardware, drivers, and gaming expectations.
For Windows 7, we have worked closely with our Graphics IHV partners, helping them improve the WDDM drivers’ gaming performance with specific changes to how Windows 7 works under the hood, while maintaining the same driver model and compatibility. Our continued investments in performance tools has helped us and our IHV partners track down and analyze various gaming performance bottlenecks and fix them in subsequent driver releases. The fundamentals of the Windows Display Driver Model remain unchanged in Windows 7. Some policies around GPU scheduling and memory management were changed to enable better performance in certain scenarios.
Because these benchmarks are very sensitive to the specific hardware, firmware, drivers, and overall system and because these are so widely measured and discussed elsewhere we are going to leave these comparisons to third parties. Like many areas in Windows 7, our commitment is to engineer even better performance across many dimensions. We believe it is better for you to experience these efforts directly. In comparing Windows 7, we would encourage the comparison using Windows Vista SP1 and keep in mind the difference you might see in WDDM 1.0 v. 1.1 and that the 1.1 drivers are still under development.
As you can see, in engineering Windows 7 we have worked hard to improve the architecture for graphics for real-world performance. It benefits both ends of the hardware spectrum – by enabling low physical memory systems to run a leaner and faster Windows and at the same time enabling multi-core PCs render multiple graphics applications much more efficiently.
Having no more send feedback button, I have to write the annoyances I experience here. I like the new backup. I have a weekly schedule to be done during the night and I hardly ever notice it. However, last night I turned my PC off and the backup was done this morning when I turned it on. It took half an hour during which I couldn't use my computer. CPU utilization was only 40%, memory was maxed out (2GB, not cached), don't ask why, 'cause the first 5 processes with the most memory summed up 250MB only. Disk utilization was constant, however, with Backup having Normal I/O priority. I don't see any reason why cannot scheduled backup have a low or background priority.
Some questions concerning gaming performance:
I was wondering if the Desktop (or parts of it) are shutdown when using DirectX in fullscreen mode. How much would this improve gaming performance? Does any part of the OS get moved to the page file or something when DirectX is in fullscreen mode?
My Aero performance with a Nvidia Geforce 8200M on a laptop was slower then Vista, but my gaming performance was faster.
I was very disappointed to see my Aero performance slower so point score is slower then Vista.
Sorry Microsoft there is Aero performance loss going on here that needs to be addressed!
Wonderful that the XPDM troubles from the beta are fixed in RC1. There is one thing left though. When my intel 82830 based Tablet PC resumes from hibernate (suspend to disk) it opens with 800×600 resolution but with the original 1024x786 oddly mapped on top of it. Vertically it is fully mapped, but horizontally stuff disappears in thin air at the right outside of the screen.
When I logon this problem remains. But if I then drop back to the Welcome screen the resolution returns to a sane state. This makes me believe the problem can be solved with an extra display 'reset' after resume. A simple fix for a widely used chipset in older laptops.
Would it be possible to include such a fix? Should I send driver information to somebody at microsoft?
btw, writing this on my tablet worked pretty well :D
I have 3*24" monitor and 2 nvidia GTX260 in SLI + 1 nvidia 8800GT, is the span mode of xp would be in the RTM version?
I mean having a 'virtual' resolution of 5760x1200 instead of 3 time 1920x1200 ...
Many game, application, will benefit from that, it's the only reason i will stay with xp.
Hello. i have ATI MOBILITY FIREGL 9000 on my IBM T40p. and on W7 i cannot run any game.. even couter strike or warcraft, and there is a problem in playing videos.. its horrible. when i tried to install drivers which are perfect on XP it gives me an error. even Fallout is not running. is there any chance to fix this problem??? if i need to send drivers to a developer, i will do it. please help me =(
Is there any way to improve processor performances without changing it?
The perform seems nice-to-user interface.
TANHKS for your guide.
You guys done the best job to improve Windows 7!
Reduced memory consumption is number one reason of Windows 7 success.
The only party-pooper is Intel with 945GM WDDM 1.0 chipset in millions of notebooks...
One question not covered here: what about GDI acceleration which i heard was brought back in WDDM 1.1? Is that true or optional, or not related with 1.1 at all? Why it's even 1.1 with such major changes =)
Anyway, seems like Microsoft are finally making their user interface more multi-threaded. I remember how BeOS was very responsive, even in heavy cpu using scenarioes. It was because it was SMP programmed from the bottom. But windows XP is still instant responsive, so why all the extra work with Windows 7? Is it a workaround to all the bloat?
I'm getting this parallelism. My timing stats are still showing something is being serialized, has anyone actually confirmed their claims? I'm trying to use multiple threads, from a single process? Is there still a single lock per-process? So much talk of 'applications' very little about threads.
I'm not getting this parallelism. My timing stats are still showing something is being serialized, has anyone actually confirmed their claims? I'm trying to use multiple threads, from a single process. Is there still a single lock per-process? So much talk of 'applications' very little about threads.
Hi like your blog. thought you and your viewers might find my site interesting which provides engineering questions in all subjects and specialization
<a href="www.k2questions.com/Engineering-Degree-Important-Questions.aspx">engineering questions </a>
What an excellent blog it is about performance of windows 7 . I also have found valuable information from the site below: