Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
One of the areas of any release of Windows that receives a significant amount of testing and scrutiny is the performance of graphics—desktop graphics all the way to the most extreme CAD and game graphics. The amazing breadth of hardware supported for Windows and the broad spectrum of usage scenarios contributes to a vibrant ecosystem with many different goals—from just the basics to the highest frame rates on multiple monitors possible. In engineering Windows 7 we set out to improve the “real world” performance of graphics as well as continue to improve the most extreme elements of graphics. This is work we do in Windows 7 and work our partners do as they work to improve the underlying hardware/software combination through drivers (note: Windows Vista drivers continue to work as they did in Windows Vista, but we've also been working with partners on updated drivers for Windows 7 which many of you have been testing through Windows Update downloads). This post looks at this spectrum of engineering as well as the different ways performance is measured. Ultimately we want to inform you about what we have done in engineering Windows 7, while we leave room for the many forums that will compare and contrast Windows 7 on different hardware and in different scenarios. This post is written by Ameet Chitre, a program manager on our Desktop Graphics feature team. --Steven
If you have gone online to check out or purchase a new PC, you must have noticed that “faster graphics” and “great performance” are always some of the key selling points. People have come to expect faster systems which enable them to edit photos, do email, watch high-definition videos and play the latest 3D games all with greater ease, often shuffling between these tasks seamlessly. Quite a few of these users refer to the enthusiast community blogs and various review sites which run graphics benchmarks and report results evaluating how fast the graphics of new hardware or software performs. Traditionally graphics performance has been measured and analyzed through 3D games but it also impacts what we call “desktop scenarios” - such as when you are using the Windows UI, moving or maximizing windows, or scrolling within Word or IE etc. The performance requirements for these desktop scenarios are quite different from3D games. In fact, this is the reason in Windows Vista Experience Index (WinEI) we give you rating for these two scenarios separately, highlighted in the image below:
Figure 1. WEI sample with Graphics capabilities highlighted.
Graphics performance is usually assessed through many benchmarks. These can be classified into 2 broad categories:
However, there are plenty of things that we all do on our PCs that don’t have benchmarks tracking them that are still quite critical to make fast. In these cases we use the instrumentation within Windows to obtain timing information and then analyze the performance.
This blog entry discusses various aspects of graphics performance - both gaming and desktop graphics performance. It calls out the changes we made in Windows 7 to address user feedback as well as to take advantage of modern hardware to improve graphics performance.
Many have experienced scenarios where an application, or Windows itself, stops responding momentarily. This is type of a performance issue that can be impacted significantly by the performance of graphics in the PC. We categorize these as desktop responsiveness issues. Improving responsiveness, both in real terms and by avoiding non-responsive moments, is one of the key ways that performance is improved in the system. It is also hard to measure.
Measuring desktop responsiveness is a hard problem since a number of issues which affect responsiveness aren’t easily reproducible and there is a great variety of them. They are rarely caught by either kind of benchmark as these issues are dependent on real-world combinations of factors. For Windows 7 we spent a great deal of time looking at these performance glitches using a mechanism in test versions of Windows 7 which has the ability to record key OS events and when they occurred. During real-world testing when we encounter a responsiveness problem, the tester can hit a record key and enter a small description of the issue encountered. The event history with diagnostic information called a “performance trace” is written out to a file and uploaded to a server where a team of performance analysts parse the data to figure out the cause of the responsiveness issue. This process has been successful to the extent that today most responsiveness issues can be quickly tracked down and root-caused.
Using this methodology, we analyzed thousands of desktop responsiveness traces where the tester experienced a frozen desktop anywhere from 100msec to several seconds. The type of issues ranged from an antivirus blocking disk access for all applications while updating itself on the vendor’s website to an application doing network access from a UI thread. In a significant portion of these traces, we found that a GDI application is waiting on another GDI application which is experiencing slowdowns due to excessive paging activity. This was the single-most frequently occurring cause of all desktop responsiveness issues, which without this data we probably would not have assumed. Based on these investigations, we worked to improve the architecture in these two key areas:
These are described in more detail below.
A number of performance traces we investigated in the context of desktop responsiveness pointed us to the design of a key synchronization mechanism in GDI. The performance challenge happens because the design of GDI in Windows Vista allows only a single application to hold a system-wide exclusive global lock. While this seems obvious in hindsight, when this decision was originally made the performance characteristics of different parts of the system made this optimistic implementation perfectly reasonable.
Figure 2. Existing architecture of GDI concurrency.
GDI applications running simultaneously vie for this global lock in order to render on the screen. The application that accesses the global lock prevents other applications from rendering till it releases the global lock. The situation often gets exacerbated when the application that is holding the lock needs to page in a large amount of memory from the disk since moving the memory from the disk to RAM takes a relatively long time. The above picture shows two GDI applications running simultaneously, contending for the global lock. If App X gets hold of the lock, it can render to the screen while App Y is unable to do so and waits for App X to finish.
Figure 3. Windows 7 architecture of GDI concurrency.
The solution to the problem was therefore to reduce the lock contention and improve concurrency by re-architecting the internal synchronization mechanism through which multiple applications can reliably render at the same time. Contention due to the global exclusive lock is avoided by implementing a number of fine-grained locks which are not exclusive but aid parallelism. The increased number of fine-grained locks adds a small overhead for scenarios where only a single application is rendering at a time.
Special attention was paid to GDI application compatibility as changing internal synchronization mechanism in the most widely used API stack could potentially give rise to timing issues such as deadlocks and rendering corruption.
This work also resulted in better rendering performance of concurrent GDI applications on multi-core CPUs. Multi-core Windows PCs benefit from these changes as more than one application can now be rendering at the same time.
After the GDI concurrency work was implemented in the Windows 7 builds leading to the Beta, we saw a large reduction in the number of desktop responsiveness issues reported by our testers which were caused by one application blocking another one due to GDI. To further validate the scalability of our new implementation, we wrote tests that draw 2D GDI primitives and measured the rendering throughput by launching simultaneously multiple such applications. The throughput is measured by adding together the frame rate (FPS) of each application window. Below is a sample of these results on a quad-core CPU system.
Figure 4. GDI Concurrency and Scalability.
Without the Windows 7 GDI concurrency, the rendering throughput of these applications is effectively limited to the performance of a single CPU core. Since only a single application can acquire the global exclusive lock while the others are waiting, this scenario doesn’t benefit from multiple CPU cores. This demonstrates that GDI applications in Windows 7 are now much less dependent on one another. This benefit will not need any new display drivers; it will work on any Vista (WDDM 1.0) and newer display drivers.
Another area which affects system responsiveness is memory usage. Simply put, increased system memory (RAM) usage leads to an increased paging activity which directly leads to reduced system responsiveness. Thus, for the best responsiveness, all applications, processes and OS components need to use as little system memory as possible.
In Windows Vista, the amount of memory required to run multiple windows scales linearly with the number of windows opened on the system. This results in more memory pressure when there are more windows or if the monitors have higher resolution. It gets worse if you have more than one monitor. As part of investigating various means to improve system responsiveness, we saw a great opportunity in reducing the usage of system memory by DWM. In Windows Vista, every GDI application window accounts for two memory allocations which hold identical content – one in video memory and one in system memory. DWM is responsible for composition of the desktop through the graphics hardware. Hence, it requires a copy of the same allocation in video memory, which is easily accessible by the graphics hardware. The duplicate copy present in system memory is required because GDI is being rendered utilizing the CPU completely in the operating system without any assistance or “acceleration” by the graphics hardware. As the CPU performs all the tasks for rendering GDI applications, it requires an easily accessible cacheable copy of memory.
Figure 5. Existing memory allocations.
Windows 7 saves one copy of the memory allocation per application window by getting rid of the system memory copy entirely. Thus, for a GDI application window visible on the desktop, the memory consumed is cut in half.
Figure 6. Windows 7 memory allocations.
We achieved the reduction in system memory by accelerating the common GDI operations through the graphics hardware - the WDDM drivers accelerate these to minimize the performance impact of the CPU read-back of video memory. This was necessary as performing these operations otherwise on the CPU would incur a heavy performance penalty. In order to decide which GDI operations to accelerate, it was important to understand the usage pattern of various GDI applications. We profiled the top 100 GDI applications to learn more about their calling patterns and frequency and nature of the GDI operations.
Figure 7. Calling patterns and frequency of GDI operations for 100 GDI-based applications.
Based on real-world application statistics, a tiny snapshot of which is seen above, we worked with our graphics IHV partners to provide support in their drivers to accelerate the most commonly used GDI operations. Windows 7 systems with these updated drivers, called “WDDM v1.1” will thus benefit from this memory savings work. Please note that WDDM 1.0 drivers continue to function and are fully supported on Windows 7. You might have seen Windows Update providing these 1.1 drivers during the Beta—these drivers are themselves in Beta.
Figure 8. Desktop Window manager memory consumption comparison using WDDM 1.1 v. WDDM 1.0.
The above data shows that the memory savings become more and more pronounced when you have multiple application windows visible on the desktop. Since you save a lot of system memory, the paging activity gets reduced – as a result, your system responsiveness improves for the same workload.
Certain trade-offs had to be made for the desktop responsiveness improvements which benefit a wide range of systems. For example – the elimination of the duplicate system memory copies which “speed up” certain operations introduced slightly reduced performance as the CPU now has to read data back from the video memory. An analysis of real-world application statistics showed that these operations were rare. However, certain GDI micro-benchmarks which issue these operations show some performance degradation. This is something to note if you are running existing benchmarks that stress specific GDI operations repeatedly, which isn’t necessarily a reflection of real-world performance. Our observation has been that these slow-downs do not impact the end-user functionality directly and that the memory savings directly result in Windows 7 being much responsive overall. The improvements overall are definitely noticeable on memory constrained PCs with shared memory graphics.
No article on graphics performance is complete without talking about gaming, which is still the most widely analyzed and discussed aspect of graphics performance. There are a number of popular benchmarks such as 3D Mark as well as in-game benchmarks which are really a mode in which you can run your game where it renders the game scenes and animations without any user interaction. This area has thus been well tracked by the gaming industry through various industry benchmarks, which are pretty realistic and representative of actual games. The different benchmarks and tests are widely discussed and gamers all well-versed in the subtleties of these measurements and translating them into recommendations depending on their hardware, drivers, and gaming expectations.
For Windows 7, we have worked closely with our Graphics IHV partners, helping them improve the WDDM drivers’ gaming performance with specific changes to how Windows 7 works under the hood, while maintaining the same driver model and compatibility. Our continued investments in performance tools has helped us and our IHV partners track down and analyze various gaming performance bottlenecks and fix them in subsequent driver releases. The fundamentals of the Windows Display Driver Model remain unchanged in Windows 7. Some policies around GPU scheduling and memory management were changed to enable better performance in certain scenarios.
Because these benchmarks are very sensitive to the specific hardware, firmware, drivers, and overall system and because these are so widely measured and discussed elsewhere we are going to leave these comparisons to third parties. Like many areas in Windows 7, our commitment is to engineer even better performance across many dimensions. We believe it is better for you to experience these efforts directly. In comparing Windows 7, we would encourage the comparison using Windows Vista SP1 and keep in mind the difference you might see in WDDM 1.0 v. 1.1 and that the 1.1 drivers are still under development.
As you can see, in engineering Windows 7 we have worked hard to improve the architecture for graphics for real-world performance. It benefits both ends of the hardware spectrum – by enabling low physical memory systems to run a leaner and faster Windows and at the same time enabling multi-core PCs render multiple graphics applications much more efficiently.
If you're looking for more information in DirectX see the blog http://blogs.msdn.com/directx/.
I'm glad to see this GDI bottleneck has been solved. I was always wondering why when application A was slow or not responding, switching to application B was slow as well. As application B was almost idle, not consuming much CPU power, this didn't make sense to me, other than thinking the whole graphic subsystem was single threaded. Because of the system wide lock this was in fact more or less the case. Fortunately this has been solved.
Keep up the good work!
As I attempt to write drivers for the 852/855/915 GME this commenttary is timely.
With the results that I have been produced in my initial driver following the guides that are included in WDK Beta for Windows 7, I have started to appreciate what the Windows 7 Developers have gone through to make the graphics driver have a less significant impact on system performance.
I will go through some of the suggested to sites to understand this a little more deeply.
Thanks for the Great Work and this helpful Blog.
This is one of the best topics covered on this blog and an excellent post too. Besides improved performance in WDDM 1.1, what are the other benefits? I know for one that it uses DirectX 10/SM 4.0 instead of DX9.0/SM2.0 and enables DXVA-HD. What are the requirements for Direct2D/DirectWrite, do they require WDDM 1.1? or D3D11 capable hardware? Besides a few handful of 3D games, is DX10 being used in any other OS component or third party app?
Also, when will Direct3D 11, Direct2D and DirectWrite be available on Vista? Are they included in Vista SP2?
Lastly, I've been seeing some display corruption using WDDM 1.1 (Intel GMA X3100) near the taskbar when the desktop is covered by an app window. Since WDDM 1.1 drivers are in beta, I hope it is fixed by RTM. My only graphics wish left now is if WDDM was updated to make the Win32 console-based apps work in full-screen mode.
I try more game with
Nvidia 8800 GTX
Nvidia 8600 GT (laptop)
Nvidia 7600 GS (old PC)
and work great!
next week i try Nvidia GTX 275 in Windows 7 .
Very impressive and much needed optimizations. Can't wait for the official release :)
Great work in this area! I have been looking for this information to help understand what is going on in the background.
I'm extremely impressed. Really interesting post and a really interesting topic in general.
I'm glad to see that Microsoft is spending time considering desktop responsiveness issues.
That said, I didn't need to read this post to know that something was radically better on Windows 7, I just had to use it...
Look like this is the right place to ask this - Does Windows 7 have WDDM display driver for Intel 915 desktop board. I have seen a lot of blogging on the net about this, just wanted to confirm.
The 852/855/915GME will never support a WDDM driver. I am only writing a driver so I can utilize the full display resolutions available in the card. The trade of is you can not play chess as soon as you evoke DirectX 3D and OpenGL...It is a legacy card and I now know why Intel and Microsoft do not produce drivers for it.
I am wondering when Windows 7 is RTM'd are early implimenters going to be able to purchase a licence before official release date?
And if it has not been considered, could you impliment it as a requested feature!
All I can say is Wow, Great OS, even better work!
improve color layout for currently using task icon
Taskbar color layout too bad.
If you using light color desktop background, you will don't know what is the using task icon.
Its brightness too high, so looks too blurred when light color desktop background.
Currently using task icon is recognize hard when light color desktop background.
need keep Menu color layout consistency for Windows 7.
Please keep color layout consistency for Menu.
Why the 'Include in libary' used another color layout?
Improve Start Menu and Appliaction Menu color style (suggestion consistency)
(discriminated Appliaction Menu color very difficult)
There is no consistency between the Start Menu and the Appliaction Menu.
To keep Start Menu and Appliaction Menu color style consistency, like Windows XP.
All use white color for background, light blue color for selected item.
keep theme style consistency for Windows 7
keep theme style consistency for Menu bar, Tool bar. like Windows XP theme style's consistency.
Stand-alone Pin Recycle Bin to Taskbar
Currently, Recycle Bin was Pinned of the Windows Explorer Jumplist.
It is very unhandy. Because the Recycle Bin is a frequently used module.
But put on desktop often covered with other window...
Stand-alone Pin Recycle Bin to Taskbar (default).
With not show Recycle Bin icon on desktop (default).
Of course, you can Unpin this Recycle Bin from taskbar.
Windows Fax and Scan's toolbar update to win7style
Windows Fax and Scan's toolbar update to win7style.
like Windows Photo Viewer or Windows DVD Maker.
complete remove Windows Mail from Windows 7
Now Windows Mail already can't start. It's already unsupported in win7.
But it's still remain on the win7 (\Program Files\Windows Mail\).
If you go to the \Program Files\Windows Mail\ folder, you'll still find the Windows Mail folder and those files in them. And you can't run WinMail.exe.
Please delete WinMail.exe
Move other files to system32 folder.
Don't put them in \Program Files\Windows Mail\ (folder name wrong)
You guys done the best job to improve Windows 7!
Reduced memory consumption is number one reason of Windows 7 success.
The only party-pooper is Intel with 945GM WDDM 1.0 chipset in millions of notebooks...
One question not covered here: what about GDI acceleration which i heard was brought back in WDDM 1.1? Is that true or optional, or not related with 1.1 at all? Why it's even 1.1 with such major changes =)
I am not sure whether this is a graphics issue or what, but the main window of WinSCP 4.1.9 doesn't appear in Flip 3D, although it is present on the taskbar and in simple Flip. When this window is active and Flip 3D is started another window will appear on top of the 3D stack.