Direct2D and GDI are both immediate mode 2D rendering APIs and both offer some degree of hardware acceleration. This tends to cause some confusion – Why are these APIs different? How are they similar? Does Direct2D make GDI faster? How is Direct2D hardware acceleration different to GDI hardware acceleration? The goal of this post is to provide some insights into the differences between Direct2D and GDI.
Figure 1 - Direct2D compared to GDI
Direct2D and GDI are different in the details of what each API renders and how it goes about driving the display hardware. They are also somewhat different in their underlying design principles.
GDI renders opaque, aliased geometry (paths, lines, etc). It renders aliased and ClearType text, and can support transparency blending through the AlphaBlend API. However, its handling of transparency is inconsistent. Most APIs simply ignore the alpha channel, some use it. Very few APIs guarantee what the alpha channel will contain after an operation. Perhaps more importantly, GDI’s rendering doesn’t map easily to what a modern GPU renders most efficiently on the 3D portion of its rendering engine. For example, ROPs can’t be implemented by the blend stage of a GPU. However, alpha can. (ROPs can be implemented by copying between render targets and using look up tables or bitwise operations in the pixel shader. However, this is still slower than using the blender stage of the hardware). As another example, Direct2D’s aliased lines are designed to be implemented simply as two triangles rendered on the GPU. GDI guarantees that Bresenham’s line drawing algorithm is used.
Direct2D renders opaque, transparent, aliased and anti-aliased primitives. It has strict guarantees on how it both accepts and renders transparent content. This makes designing a modern UI with Direct2D easier and, it also allows Direct2D to render all of its primitives using hardware acceleration. Direct2D is not a pure superset of GDI – primitives that would have been unreasonably slow to implement on a GPU aren’t present in Direct2D. This should at least answer one question - Direct2D doesn’t help improve the performance of GDI, since GDI doesn’t use Direct2D. Neither does Direct2D use GDI.
How Direct2D and GDI differ in terms of architectural layering is shown in Figure 1. Direct2D uses Direct3D to obtain hardware acceleration. Direct3D in turn uses a user-mode display driver which packages up command streams. Direct3D sends these command streams and resources down into kernel mode for the hardware to process.
One large advantage of this architecture is the ease with which an application can use both Direct2D and Direct3D together.
Since Windows NT 4, GDI has run in kernel mode. The application calls GDI which then calls its kernel mode counterpart. There, GDI passes the primitives to its own driver model. This driver then sends the results to the global kernel mode display driver. Since around Windows 2000, GDI and the GDI drivers have run in an independent space in the kernel called “session space.” A session address space is created for each logon session and each instance of GDI runs independently in this distinct kernel mode address space.
The most important difference between Direct2D and GDI hardware acceleration is that Direct2D is layered on top Direct3D, while GDI has its own independent notion of a driver model. This driver model (the GDI Device Driver Interface or DDI) corresponds to the GDI primitives, while the Direct3D driver model corresponds to what the 3D rendering hardware in a GPU renders. When the GDI DDI was first defined in the early 90’s most display acceleration hardware targeted these GDI primitives. Over time, however, more and more emphasis was placed on 3D game acceleration and less on application acceleration. As a consequence more of the GDI DDI wasn’t implemented by the display driver and in the end, BitBlt was accelerated and most other operations were not.
This set the stage for a sequence of changes to how GDI renders to the display. The sequence is shown in Figure 2. There were also a number of additional factors that caused changes to the GDI driver model.
Increasing complexity and size of display drivers
Over time display drivers became more complex due largely to the increasing complexity of the 3D portion of the driver. Higher complexity code tends to exhibit more defects; hence it was desirable to move the driver to user-mode where a driver bug wouldn’t cause a reboot. As can be seen in the figure, the display driver has been divided into a complex user mode component and a simpler kernel mode component.
Figure 2 - Evolution of GDI display rendering
Difficulty in synchronizing session and global kernel address spaces
In Windows XP a display driver exists in two different address spaces, session space and kernel space. Some parts of the driver need to respond to plug ‘n play and power management events. This state needs to be synchronized with the driver state in the session address space. This is a difficult task and display drivers tended to exhibit defects when attempting to deal with these distinct address spaces.
The composited desktop
The composited desktop required GDI to be able to render to a surface which was then rendered by Direct3D to display. This couldn’t be done easily in the XP driver model since GDI and Direct3D were parallel driver stacks.
As a result, in Windows Vista, the GDI DDI display driver was changed to be only implemented by a Microsoft supplied driver, the Canonical Display Driver (CDD). GDI rendered to a system memory bitmap. Dirty regions were used to update the video memory texture which the window manager uses to composite the desktop.
The chief disadvantage of the windows vista driver model was that every GDI window must be backed by both a video memory surface and a system memory surface. This results in system memory being used for each open GDI window.
For this reason GDI was changed again in Windows 7 to ensure that it does not require a system memory surface per window. Since GDI rendering requires it to read, modify and write the system memory surface, and since the CPU cannot directly access video memory, GDI was modified to render to and from a piece of aperture memory. The aperture memory can be updated from the video memory surface holding the window contents. GDI can render back to the aperture memory, and the result can then be sent back to the window surface. Since the aperture memory segment is addressable by the GPU, the GPU can accelerate these updates to the video memory surface. For example: text rendering, BitBlt with common ROPs, AlphaBlend, TransparentBlt, StretchBlt are all accelerated in these cases. In addition, some operations can bypass the aperture memory segment entirely, such as window BitBlt and ColorFill.
Direct2D and GDI are both 2D Immediate-mode rendering APIs and both can be described as hardware accelerated. However, there are a number of differences that remain in both APIs.
Location of resources
GDI maintains its resources, in particular bitmaps, in system memory by default. Direct2D maintains its resources in video memory on the display adapter. As a result, when GDI needs to update video memory, this must always be done over the bus, unless the resource happens to already be copied into the aperture memory segment, or if the operation can be expressed directly.In contract, Direct2D can simply translate its primitives to Direct3D primitives because the resources are already in video memory.
Method of rendering
In order to maintain compatibility GDI performs a large part of its rendering to aperture memory using the CPU. In contrast, Direct2D is a translator that translates its APIs calls into Direct3D primitives. The result is then rendered on the GPU. Some of GDI’s rendering is performed on the GPU when the aperture memory is copied to the video memory surface representing the GDI window, or when it is otherwise possible.
Scalability
Direct2D’s rendering calls are all independent command streams to the GPU. Each Direct2D factory represents a different Direct3D device. In contract, GDI uses one command stream for all of the applications on the system. GDI’s method can result in some amortization of GPU and CPU rendering context overhead. Direct2D’s approach has little unnecessary serialization between independent command streams.
Location
Of course, Direct2D is entirely in user mode, including the Direct3D run time and the user mode Direct3D driver. GDI, in contrast, has most of its functionality in session space in kernel mode, with its API surface in user mode.
Availability of Hardware Acceleration
GDI is hardware accelerated on Windows XP, and on Windows 7 when the DWM is running and when a WDDM 1.1 driver is in use. Direct2D is hardware accelerated on any almost any WDDM driver and regardless of whether DWM is in use. There are announced plans to port Direct2D to Windows Vista. Here it will also be hardware accelerated on almost any WDDM driver. On Vista, GDI will always render on the CPU.
Presentation Model
When Windows was first designed, there was insufficient memory to allow every window to be stored in its own bitmap. As a result, GDI always rendered logically directly to the screen, with various clipping regions applied to ensure that it did not render outside of its window. In contract, Direct2D follows a model where the application renders to a back-buffer and the result is atomically “flipped” when the application is done drawing. This allows Direct2D to handle animation scenarios much more fluidly that GDI can.
We hope this post gives you some insights on how GDI and Direct2D work. If you have existing GDI code, that will continue to work well under Windows 7. But if you are writing new graphics rendering code, you should consider using Direct2D to leverage modern GPUs better.
For more information on interoperability between Direct2D and GDI see the following:
· A previous blog post on Direct2D interoperation
· The SDK article on Direct2D and GDI interoperation