Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
One of the areas of any release of Windows that receives a significant amount of testing and scrutiny is the performance of graphics—desktop graphics all the way to the most extreme CAD and game graphics. The amazing breadth of hardware supported for Windows and the broad spectrum of usage scenarios contributes to a vibrant ecosystem with many different goals—from just the basics to the highest frame rates on multiple monitors possible. In engineering Windows 7 we set out to improve the “real world” performance of graphics as well as continue to improve the most extreme elements of graphics. This is work we do in Windows 7 and work our partners do as they work to improve the underlying hardware/software combination through drivers (note: Windows Vista drivers continue to work as they did in Windows Vista, but we've also been working with partners on updated drivers for Windows 7 which many of you have been testing through Windows Update downloads). This post looks at this spectrum of engineering as well as the different ways performance is measured. Ultimately we want to inform you about what we have done in engineering Windows 7, while we leave room for the many forums that will compare and contrast Windows 7 on different hardware and in different scenarios. This post is written by Ameet Chitre, a program manager on our Desktop Graphics feature team. --Steven
If you have gone online to check out or purchase a new PC, you must have noticed that “faster graphics” and “great performance” are always some of the key selling points. People have come to expect faster systems which enable them to edit photos, do email, watch high-definition videos and play the latest 3D games all with greater ease, often shuffling between these tasks seamlessly. Quite a few of these users refer to the enthusiast community blogs and various review sites which run graphics benchmarks and report results evaluating how fast the graphics of new hardware or software performs. Traditionally graphics performance has been measured and analyzed through 3D games but it also impacts what we call “desktop scenarios” - such as when you are using the Windows UI, moving or maximizing windows, or scrolling within Word or IE etc. The performance requirements for these desktop scenarios are quite different from3D games. In fact, this is the reason in Windows Vista Experience Index (WinEI) we give you rating for these two scenarios separately, highlighted in the image below:
Figure 1. WEI sample with Graphics capabilities highlighted.
Graphics performance is usually assessed through many benchmarks. These can be classified into 2 broad categories:
However, there are plenty of things that we all do on our PCs that don’t have benchmarks tracking them that are still quite critical to make fast. In these cases we use the instrumentation within Windows to obtain timing information and then analyze the performance.
This blog entry discusses various aspects of graphics performance - both gaming and desktop graphics performance. It calls out the changes we made in Windows 7 to address user feedback as well as to take advantage of modern hardware to improve graphics performance.
Many have experienced scenarios where an application, or Windows itself, stops responding momentarily. This is type of a performance issue that can be impacted significantly by the performance of graphics in the PC. We categorize these as desktop responsiveness issues. Improving responsiveness, both in real terms and by avoiding non-responsive moments, is one of the key ways that performance is improved in the system. It is also hard to measure.
Measuring desktop responsiveness is a hard problem since a number of issues which affect responsiveness aren’t easily reproducible and there is a great variety of them. They are rarely caught by either kind of benchmark as these issues are dependent on real-world combinations of factors. For Windows 7 we spent a great deal of time looking at these performance glitches using a mechanism in test versions of Windows 7 which has the ability to record key OS events and when they occurred. During real-world testing when we encounter a responsiveness problem, the tester can hit a record key and enter a small description of the issue encountered. The event history with diagnostic information called a “performance trace” is written out to a file and uploaded to a server where a team of performance analysts parse the data to figure out the cause of the responsiveness issue. This process has been successful to the extent that today most responsiveness issues can be quickly tracked down and root-caused.
Using this methodology, we analyzed thousands of desktop responsiveness traces where the tester experienced a frozen desktop anywhere from 100msec to several seconds. The type of issues ranged from an antivirus blocking disk access for all applications while updating itself on the vendor’s website to an application doing network access from a UI thread. In a significant portion of these traces, we found that a GDI application is waiting on another GDI application which is experiencing slowdowns due to excessive paging activity. This was the single-most frequently occurring cause of all desktop responsiveness issues, which without this data we probably would not have assumed. Based on these investigations, we worked to improve the architecture in these two key areas:
These are described in more detail below.
A number of performance traces we investigated in the context of desktop responsiveness pointed us to the design of a key synchronization mechanism in GDI. The performance challenge happens because the design of GDI in Windows Vista allows only a single application to hold a system-wide exclusive global lock. While this seems obvious in hindsight, when this decision was originally made the performance characteristics of different parts of the system made this optimistic implementation perfectly reasonable.
Figure 2. Existing architecture of GDI concurrency.
GDI applications running simultaneously vie for this global lock in order to render on the screen. The application that accesses the global lock prevents other applications from rendering till it releases the global lock. The situation often gets exacerbated when the application that is holding the lock needs to page in a large amount of memory from the disk since moving the memory from the disk to RAM takes a relatively long time. The above picture shows two GDI applications running simultaneously, contending for the global lock. If App X gets hold of the lock, it can render to the screen while App Y is unable to do so and waits for App X to finish.
Figure 3. Windows 7 architecture of GDI concurrency.
The solution to the problem was therefore to reduce the lock contention and improve concurrency by re-architecting the internal synchronization mechanism through which multiple applications can reliably render at the same time. Contention due to the global exclusive lock is avoided by implementing a number of fine-grained locks which are not exclusive but aid parallelism. The increased number of fine-grained locks adds a small overhead for scenarios where only a single application is rendering at a time.
Special attention was paid to GDI application compatibility as changing internal synchronization mechanism in the most widely used API stack could potentially give rise to timing issues such as deadlocks and rendering corruption.
This work also resulted in better rendering performance of concurrent GDI applications on multi-core CPUs. Multi-core Windows PCs benefit from these changes as more than one application can now be rendering at the same time.
After the GDI concurrency work was implemented in the Windows 7 builds leading to the Beta, we saw a large reduction in the number of desktop responsiveness issues reported by our testers which were caused by one application blocking another one due to GDI. To further validate the scalability of our new implementation, we wrote tests that draw 2D GDI primitives and measured the rendering throughput by launching simultaneously multiple such applications. The throughput is measured by adding together the frame rate (FPS) of each application window. Below is a sample of these results on a quad-core CPU system.
Figure 4. GDI Concurrency and Scalability.
Without the Windows 7 GDI concurrency, the rendering throughput of these applications is effectively limited to the performance of a single CPU core. Since only a single application can acquire the global exclusive lock while the others are waiting, this scenario doesn’t benefit from multiple CPU cores. This demonstrates that GDI applications in Windows 7 are now much less dependent on one another. This benefit will not need any new display drivers; it will work on any Vista (WDDM 1.0) and newer display drivers.
Another area which affects system responsiveness is memory usage. Simply put, increased system memory (RAM) usage leads to an increased paging activity which directly leads to reduced system responsiveness. Thus, for the best responsiveness, all applications, processes and OS components need to use as little system memory as possible.
In Windows Vista, the amount of memory required to run multiple windows scales linearly with the number of windows opened on the system. This results in more memory pressure when there are more windows or if the monitors have higher resolution. It gets worse if you have more than one monitor. As part of investigating various means to improve system responsiveness, we saw a great opportunity in reducing the usage of system memory by DWM. In Windows Vista, every GDI application window accounts for two memory allocations which hold identical content – one in video memory and one in system memory. DWM is responsible for composition of the desktop through the graphics hardware. Hence, it requires a copy of the same allocation in video memory, which is easily accessible by the graphics hardware. The duplicate copy present in system memory is required because GDI is being rendered utilizing the CPU completely in the operating system without any assistance or “acceleration” by the graphics hardware. As the CPU performs all the tasks for rendering GDI applications, it requires an easily accessible cacheable copy of memory.
Figure 5. Existing memory allocations.
Windows 7 saves one copy of the memory allocation per application window by getting rid of the system memory copy entirely. Thus, for a GDI application window visible on the desktop, the memory consumed is cut in half.
Figure 6. Windows 7 memory allocations.
We achieved the reduction in system memory by accelerating the common GDI operations through the graphics hardware - the WDDM drivers accelerate these to minimize the performance impact of the CPU read-back of video memory. This was necessary as performing these operations otherwise on the CPU would incur a heavy performance penalty. In order to decide which GDI operations to accelerate, it was important to understand the usage pattern of various GDI applications. We profiled the top 100 GDI applications to learn more about their calling patterns and frequency and nature of the GDI operations.
Figure 7. Calling patterns and frequency of GDI operations for 100 GDI-based applications.
Based on real-world application statistics, a tiny snapshot of which is seen above, we worked with our graphics IHV partners to provide support in their drivers to accelerate the most commonly used GDI operations. Windows 7 systems with these updated drivers, called “WDDM v1.1” will thus benefit from this memory savings work. Please note that WDDM 1.0 drivers continue to function and are fully supported on Windows 7. You might have seen Windows Update providing these 1.1 drivers during the Beta—these drivers are themselves in Beta.
Figure 8. Desktop Window manager memory consumption comparison using WDDM 1.1 v. WDDM 1.0.
The above data shows that the memory savings become more and more pronounced when you have multiple application windows visible on the desktop. Since you save a lot of system memory, the paging activity gets reduced – as a result, your system responsiveness improves for the same workload.
Certain trade-offs had to be made for the desktop responsiveness improvements which benefit a wide range of systems. For example – the elimination of the duplicate system memory copies which “speed up” certain operations introduced slightly reduced performance as the CPU now has to read data back from the video memory. An analysis of real-world application statistics showed that these operations were rare. However, certain GDI micro-benchmarks which issue these operations show some performance degradation. This is something to note if you are running existing benchmarks that stress specific GDI operations repeatedly, which isn’t necessarily a reflection of real-world performance. Our observation has been that these slow-downs do not impact the end-user functionality directly and that the memory savings directly result in Windows 7 being much responsive overall. The improvements overall are definitely noticeable on memory constrained PCs with shared memory graphics.
No article on graphics performance is complete without talking about gaming, which is still the most widely analyzed and discussed aspect of graphics performance. There are a number of popular benchmarks such as 3D Mark as well as in-game benchmarks which are really a mode in which you can run your game where it renders the game scenes and animations without any user interaction. This area has thus been well tracked by the gaming industry through various industry benchmarks, which are pretty realistic and representative of actual games. The different benchmarks and tests are widely discussed and gamers all well-versed in the subtleties of these measurements and translating them into recommendations depending on their hardware, drivers, and gaming expectations.
For Windows 7, we have worked closely with our Graphics IHV partners, helping them improve the WDDM drivers’ gaming performance with specific changes to how Windows 7 works under the hood, while maintaining the same driver model and compatibility. Our continued investments in performance tools has helped us and our IHV partners track down and analyze various gaming performance bottlenecks and fix them in subsequent driver releases. The fundamentals of the Windows Display Driver Model remain unchanged in Windows 7. Some policies around GPU scheduling and memory management were changed to enable better performance in certain scenarios.
Because these benchmarks are very sensitive to the specific hardware, firmware, drivers, and overall system and because these are so widely measured and discussed elsewhere we are going to leave these comparisons to third parties. Like many areas in Windows 7, our commitment is to engineer even better performance across many dimensions. We believe it is better for you to experience these efforts directly. In comparing Windows 7, we would encourage the comparison using Windows Vista SP1 and keep in mind the difference you might see in WDDM 1.0 v. 1.1 and that the 1.1 drivers are still under development.
As you can see, in engineering Windows 7 we have worked hard to improve the architecture for graphics for real-world performance. It benefits both ends of the hardware spectrum – by enabling low physical memory systems to run a leaner and faster Windows and at the same time enabling multi-core PCs render multiple graphics applications much more efficiently.
Yeah, I've been noticing some data corruption or some sort of drawing issue with the taskbar open as well. This occurs mostly when I'm using IE. If you were to split the taskbar into 3 horizontal pieces the middle piece and the title bar of IE will flicker. I just installed a pre 1.1 driver and the issue disappeared (which I'm sure also made me lose the gain of the removal of the duplicate memory)
Will Vista users be offerred WDDM 1.1 drivers?
I should have mentioned It's with the mobile intel 965 express chipset. I've also noticed a weird issue with the autocomplete from the address bar is being drawn behind the IE window. I've noticed this on the 965 graphics and other computers with different types of graphic cards.
Great read, interesting work!
Windows 7 still hangs quite often when you're using applications to access data on an unreliable network. But it's getting better with every release. Can I (as a regular user) report those things as well? Or is Microsoft already collecting all "This application is not responding" events?
You could make my life as developer a little bit easier if the drivers offered by Windows Update (or those on the Windows DVD) supported OpenGL 1.5 already. Or if there was a working, 1.5-compliant translation layer (like MSOGL) which let's OpenGL applications use hardware acceleration without having users to download new drivers. Vista's solution (MSOGL) never worked for my OpenGL 1.1 apps...
Will this also affect CAD application performance?
I watched the PDC talk on Direct2D / DirectWrite last night: Interesting stuff and it looks really easy to slot into existing code and take advantage of.
I haven't used it yet, and I'll need to support GDI for a good while longer (might go for two code paths to take advantage of the speed/quality of Direct2D where available), but it looks well designed. Nice work, guys!
I am wondering if the Taskforce retoric could be removed from this blog.
Children should be supervised more closely, because I as a legitimate user of this site get tired of seeing this sort of childish demanding behaviour>
Thanks for the great news. So I am thinking this is going to be included in the builds after 7100.
Very cool. Can we get this trace tool to analyze repsoniveness issues also for Windows Vista/Server 2008? This would help us a lot to troubleshoot performance issues in todays systems.
"... The performance requirements for these desktop scenarios are quite different from 3D games. In fact, this is the reason in Windows Vista Experience Index (WinEI) we give you rating for these two scenarios separately. ..."
So Ameet (6.3 & 5.5) does better at desktop and I (3.6 & 5.2) do better at 3D business (Oh, no, he's 5.5, damn)... I have always been confused by this - as a lay person. If I scored = 1 = would I have no desktop - or - no games or business? And if I scored 7.9 I would have a Windows experience that the other 999,999,999 users do not enjoy?
Don't see the point of WEI except if it printed OK - or NO, not good enough :-) Does the Windows desktop come with a rating? Hey, you need 4.7 to run Windows desktop.. I don't game, so I suppose they have a rating on them??
What about the other stuff? processor, RAM, etc.
I just don't understand what you can do about it???? Got 512 MB of video ram and card O/C'ed. What do I need?
You see what I'm getting at? I'm not sure the ordinary retail buyer understands it properly....
I've noticed some performance issues with the DWM if you don't have any heavy graphic program running. Both in Vista and Win7 on more than one pc that are powerful enough. It can be seen when using flip3d and is even more clear in expose-clones using the DWMThumb API.
If you start the animation the first second or two will have really bad frame rate but after that it is really smooth.
But if you have a graphic intensive program running, like a movie in hd-format, then the animation will be smooth from the beginning.
This is really annoying because when switching window you don't want an animation longer than 2seconds anyway.
New graphic (vista model graphic engine) is some dark. also black shadow of windows explorer has perfected it!
What're the boundaries the application holds this global GDI lock ? If the application holds this lock from within each GDI call, then the a swapping application should not affect other applications's graphics performance. Every GDI call will acquire the lock and then release it before returning. In this scenarios, holding this global lock should not have any affect on other apps performance. (other than acquiring and releasing a global lock for each and every GDI call would be very costly IMHO).
On the other hand, if the global lock is held between multiple GDI calls, then what happens if the application forgets the call to release the lock ? Then all other apps will appear to be stopping. This certainly should not be the case.
How is the global lock used in this respect ? If I'm not missing something the global lock should not be a problem even in Windows Vista.
Obviously, I'm mistaken some way. Would you please explain how ?
I have a better idea! Install XP! I'm serious, Microsoft is actually competing with themselves. Even simple file browsing is noticably sluggier in Vista, and 7, then in XP: Do we really need the extra thumbnail in the details pane and folder thumbnails?
Anyway, seems like Microsoft are finally making their user interface more multi-threaded. I remember how BeOS was very responsive, even in heavy cpu using scenarioes. It was because it was SMP programmed from the bottom. But windows XP is still instant responsive, so why all the extra work with Windows 7? Is it a workaround to all the bloat?
I love Windows 7, mainly for its Media Center. However as a New Zealander we have DVB-T transmission of freeview TV in h.264 codec. Windows 7 decodes this using software only, when will we be able to use the hardware decoders in ATI and Nvidia cards?