- The DirectX blog is up
-
The DirectX blog can be found at http://blogs.msdn.com/DirectX. Any Direct2D posts I write will go on the DirectX blog instead of this blog, but I may still post occasionally about non-Direct2D topics. Please subscribe to the DirectX blog.
- Windows And Video Memory
-
The following explanations are overviews. In the interests of brevity, they neglect some corner cases.
Definitions:
Video memory: For our purposes, memory accessible by the GPU. For details, please see the “Graphics Memory Reporting through WDDM” whitepaper (http://www.microsoft.com/whdc/device/display/graphicsmemory.mspx).
Hardware acceleration: The process of performing computations on the GPU instead of the CPU.
GDI: Graphics Device Interface – a 2D rendering API.
Surface: A chunk of memory representing a 2D array of pixels.
Present (verb): A function used to request that the contents of a backbuffer be displayed on screen. E.g. In DXGI, IDXGISwapChain::Present.
GPU acceleration: Also known as hardware acceleration. This means performing work on the Graphics Processing Unit (GPU) instead of on the Central Processing Unit (CPU). When the GPU can be used, it is typically much faster than the CPU, hence the term “acceleration”.
Overview:
XP uses the XP driver model (XPDM) and video memory is a fixed/finite resource. Windows Vista and Windows 7 use the Windows Display Driver Model (WDDM) where graphics memory is virtualized. This means that in XPDM you are limited to the amount by the amount of physical memory on the graphics card, but in WDDM you are not. Therefore WDDM allows running either more applications or more resource intensive applications than was possible with XPDM. You can read more about WDDM at http://msdn.microsoft.com/en-us/library/aa480220.aspx. There is no hard limit of video memory in either WDDM or XPDM, but depending on the driver, there are other limits. For example, some drivers have limits on the number of surfaces that can be created.
In Windows Vista and Windows 7 the Desktop Window Manager (DWM) is used. Instead of presenting directly to the front buffer as in Windows XP, apps present to a DWM buffer. The DWM then composes the buffers together to create the image you see on your screen.
Taking advantage of GPU acceleration:
In XP GDI is GPU accelerated to various degrees depending on how the OS is configured or the device driver (for details see Hooking Versus Punting: http://msdn.microsoft.com/en-us/library/ms799743.aspx). In Vista, GDI is not GPU accelerated however the performance difference is usually not perceptible by the user. In Windows 7, some limited GPU acceleration for GDI was added to enable some video memory optimizations. Direct3D and WPF are GPU accelerated on all 3 OS’s. Direct2D is GPU accelerated too, but it is currently available on Windows 7 only. Microsoft has announced that it will releasing Direct2D on Windows Vista, and it will be GPU accelerated there too.
The Desktop Window Manager uses GPU acceleration, so apps on Windows Vista and Windows 7 benefit automatically. For example, when you drag a window in XP the app receives a request to redraw the window. In Windows Vista and Windows 7, the DWM maintains a copy of the window contents in graphics memory, so the app doesn’t need to redraw the window.
If you notice an app performing worse in Windows Vista as compared to Windows XP, please don’t assume that there is nothing you can do about it. For example there used to be a performance problem with windbg.exe on Vista that was caused by an interaction with the DWM. Turning off the DWM did make the problem go away, but a change to windbg.exe fixed the problem for good. If you have performance problems, as a first step I’d recommend profiling your app to see what is hogging the CPU (especially compared to XP).
Virtual memory considerations:
Wikipedia has a great description of virtual memory, but here’s a quick overview: Processes store and retrieve information through addresses similar to the way people use phone numbers when making phone calls. Virtual memory addresses are to physical memory addresses as speed dial numbers are to a full phone numbers. Just as different people have the speed dial number “4” mapped to different full phone numbers, different processes have the virtual address “0x00A4B3C0” mapped to different physical memory addresses. Speed dial numbers are used as shortcuts, but virtual memory addresses are often longer than needed to address all of physical memory. So why are virtual memory addresses used?
1. Process isolation: Even assuming that you don’t have any malware on your computer, the computer equivalent of accidentally dialing the wrong phone number has devastating consequences. We restrict processes to using virtual memory addresses so they can’t interfere with memory belonging to other processes.
2. Contiguous memory: Processes often depend on having large sections of sequential addresses available.
3. Resource sharing: The amount of RAM installed on a typical PC is small enough that it’s possible to exhaust it by running many processes at the same time. If the processes used physical memory addresses, they would just fail when they ran exhausted physical memory. With virtual addresses, the OS can use both physical memory and hard disk space to hold the information to which a virtual address refers.
Virtual memory problems start when an app is compiled as 32-bit and it needs more than 2GB of virtual address space. You may wonder why 2GB? After all 2 raised to the 32 is 4GB. The answer is because the other 2GB of virtual address space is reserved for the kernel. There are ways to give your app slightly more user mode virtual address space at the cost of less kernel address space, but the best solution is to use a 64-bit OS and compile your app as 64-bit. See 64-bit programming for Game Developers (http://msdn.microsoft.com/en-us/library/bb147385.aspx) for more information (the material was written for game developers but it’s applicable to any app).
If you really, really can’t use a 64-bit OS, then you can try to look at ways to reduce virtual memory usage - and I say “try” because there’s no easy way to do that. So what does this have to do with video memory?
1. Large mapable surfaces consume large amounts of virtual address space. In order to use video memory, the CPU has to populate it. One of the mechanisms for populating a surface is to “map”* it to a range of virtual addresses. When you create a mapable surface, Direct3D allocates a range of virtual addresses for this purpose. It does not do it at the time you map the surface because there might not be a large enough contiguous chunk of virtual memory available.
2. Even non-mapable surfaces may consume virtual address space under XPDM. It depends on the driver. Non-mapable surfaces consume virtual address space in Windows Vista RTM, but this was changed in Vista SP1 (and Windows 7) because it was causing problems for games.
3. WDDM drivers have a user mode component, so I’d expect that apps would use more user mode address space and less kernel address space on Windows Vista and Windows 7 as compared to Windows XP.
4. WPF is GPU accelerated, but it keeps around a system memory copy of resources, so virtual address space is used.
5. Direct2D does not keep a system memory copy of resources and uses non-mapable Direct3D surfaces internally, so virtual address space is conserved. (The tradeoff is that Direct2D apps must handle device removed errors)
6. As mentioned earlier, on Windows XP GDI calls may or may not be handled by graphics card driver. For device-managed surfaces (see http://msdn.microsoft.com/en-us/library/ms799615.aspx for information about GDI surface types) typically no virtual memory is used. On Windows Vista, GDI calls are never handled by the graphics card, so GDI objects do consume virtual memory. On Windows 7 there is some GDI acceleration, so some apps will consume less virtual memory as compared to Windows Vista.
Also good to know:
1. There are tools that let you view virtual address space - for example, Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx). If you can figure out what is eating up your address space, you might be able to take action.
2. Remember that virtual memory and physical memory are two separate things. You can use virtual memory address space without using physical memory (e.g. by calling VirtualAlloc with the MEM_RESERVE flag) and you can use physical memory without using virtual address space (e.g. by allocating non-lockable surfaces in Direct3D).
3. This white paper might be helpful: Virtual Address Space Usage in Windows Game Development (http://www.microsoft.com/whdc/device/display/WDDM_VA.mspx)
I hope this information helps you to not exceed your virtual memory limits, but if you still have problems you’ll have to switch to 64-bit Windows and compile your app as 64-bit.
* Map is the D3D10 API. The D3D9 API was LockRect.
- Optimizing Animations Using Direct2D
-

(click image for larger view)
(video from Tom's Blog)
In my previous post I demonstrated rendering text in the style of the Star Wars opening credits. The demo works and the animation is smooth (on my machine at least), but the CPU consumption is much higher than necessary. Even though Direct2D's text rendering is hardware accelerated, it still requires a lot of chatter between the CPU and the GPU. What's really wasteful in the sample is that we are rendering the same content to the same surface once per frame. The optimization is as simple as adding a boolean that keeps track of whether or not the intermediate surface has been populated. I leave this as an exercise to the reader.
Things get more complicated if you need the intermediate surface content to change, however optimizations are still possible. Suppose you want the text to fade into black as it gets far away. One way you could do this is by modifying the pixel shader, but I'm going to show how you can do this using the Direct2D API.
The first step is to change the brush used for text from solid color to linear gradient:
D2D1_GRADIENT_STOP gradientStops[] =
{
{ 0.0f, D2D1::ColorF(D2D1::ColorF::Yellow) },
{ 1.0f, D2D1::ColorF(D2D1::ColorF::Black) }
};
ID2D1GradientStopCollectionPtr spGradientStopCollection;
IFR(m_spRT->CreateGradientStopCollection(
gradientStops,
ARRAYSIZE(gradientStops),
D2D1_GAMMA_2_2,
D2D1_EXTEND_MODE_CLAMP,
&spGradientStopCollection));
IFR(m_spRT->CreateLinearGradientBrush(
D2D1::LinearGradientBrushProperties(D2D1::Point2F(0, 0), D2D1::Point2F(0, -2048)),
D2D1::BrushProperties(),
spGradientStopCollection,
&m_spTextBrush
));
This creates a linear gradient brush with two gradient stops, one yellow and one black. Since I want the text to be completely yellow when you first see it, I've defined y=0 to be yellow and gradually fade to black as y gets more and more negative. D2D1_EXTEND_MODE_CLAMP means that I want the color beyond the gradient stop end points to be equal to the color at the gradient stop end points. For a funky effect you could change this to be D2D1_EXTEND_MODE_MIRROR.
D2D1_GAMMA_2_2 refers to the gamma at which colors are interpolated. If you haven't heard of gamma before I recommend you check out Charles Poynton's Gamma FAQ. For the purpose of this article it is sufficient to know that using D2D1_GAMMA_2_2 results in a non-linear interpolation of colors. The other option, D2D1_GAMMA_1_1, results in linear interpolation of colors. Since light intensity decreases proportionally to the square of the distance from the light source, we want the interpolation to be non-linear. Strictly speaking D2D1_GAMMA_2_2 is not the correct non-linear curve, but it is much better than D2D1_GAMMA_1_1, and it is good enough for our example here. For more exact results you can approximate quadratic interpolation using more gradient stops.
Perf Tip: When rendering using hardware acceleration, each call to CreateGradientStopCollection will result in a new D3D texture being created. Since texture creation/destruction is expensive, you should try to avoid calling CreateGradientStopCollection often. For example, my sample does not call CreateGradientStopCollection every frame, but only when the render target is resized.
Next we adjust the transform on the linear gradient brush by calling SetTransform each frame:
m_spTextBrush->SetTransform(D2D1::Matrix3x2F::Translation(0, (4096 / 16) * t));
Think of the brush as a large piece of paper and the different rendering primitives (text, geometry, rectangle, etc.) as being stencils. When we adjust the brush transform, we are sliding the large piece of paper underneath the stencil, but we are not changing the position of the stencil itself. So if we create a linear gradient brush that gradually changes from yellow to black, we can make the text fade from yellow to black by changing the brush transform. If you recall from the previous post, t is the variable we adjust to slide the surface away towards infinite in the z direction. Here we are adjusting the transform by t multiplied by a factor of (4096 / 16), which is the ratio of the surface height to the length of its projection in 3D space. The net result is the illusion that the brush maintains a constant position as we slide the surface away.
So now we've made the text fade but we had to disable our optimization in order to do so. We have to change the contents of the Direct2D render target once per frame so we can't just avoid drawing to it once it's populated. So what can we do to gain back performance? Remember that we are drawing the same text over and over again - the only thing that changes is the brush. We'd like to cache the results of text rendering and fill it with a different brush each frame. You can do this in Direct2D using an alpha-only compatible render target and the FillOpacityMask API:
D2D1_PIXEL_FORMAT alphaOnlyFormat = D2D1::PixelFormat(DXGI_FORMAT_A8_UNORM, D2D1_ALPHA_MODE_PREMULTIPLIED);
IFR(m_spRT->CreateCompatibleRenderTarget(
NULL,
NULL,
&alphaOnlyFormat,
D2D1_COMPATIBLE_RENDER_TARGET_OPTIONS_NONE,
&m_spOpacityRT));
...
m_spRT->SetAntialiasMode(D2D1_ANTIALIAS_MODE_ALIASED);
ID2D1BitmapPtr spBitmap;
m_spOpacityRT->GetBitmap(&spBitmap);
m_spRT->FillOpacityMask(
spBitmap,
m_spTextBrush,
D2D1::RectF(0, 0, rtSize.width, rtSize.height),
D2D1::RectF(0, 0, rtSize.width, rtSize.height),
D2D1_GAMMA_1_0);
This creates a render target that is compatible with m_spRT in the sense that all resources (bitmaps, brushes, etc.) that can be used with m_spRT can also be used with our new render target. We are only using this new render target for alpha content, so we create it using DXGI_FORMAT_A8_UNORM.
Perf Tip: Use DXGI_FORMAT_A8_UNORM when you are only using the alpha channel. Non-alpha-only formats will work, but they use up to 4 times the memory.
Instead of drawing text directly into m_spRT as we did previously, this time we'll draw the text into the compatible render target. Then we apply our linear gradient brush using the contents of the compatible render target as an opacity mask. Since we are always drawing the same text into the compatible compatible render target we only need to do it once (just like when we were using an ID2D1SolidColorBrush). FillOpacityMask does not support antialiased rendering, so we have to set the render target to use aliased rendering.
Sanity Tip: If you are getting D2DERR_WRONG_STATE after calling FillOpacityMask, check that the render target is in aliased rendering mode.
You'll see that D2D1_GAMMA has appeared again. This time we are using D2D1_GAMMA_1_0. Direct2D resources all use 2.2 gamma internally. For maximum correctness, colors should be converted to 1.0 gamma before blending, but blending in 2.2 gamma is usually acceptable. Since color conversion is expensive, Direct2D normally skips the color conversion step, except when rendering text (Text quality is both very important and very sensitive to gamma correctness). Since opacity masks may be used for either text or non-text content (or both!) Direct2D lets you choose whether on not to do the color conversion. So to make a long story short, D2D1_GAMMA_1_0 is slower but higher quality. D2D1_GAMMA_2_2 is faster but lower quality. Since I'm using the opacity mask for text, I chose to use D2D1_GAMMA_1_0.
I've attached the Visual Studio project. See my previous post for tips if you have problems compiling.
- Rendering Text In The Style Of The Star Wars Opening Credits Using Direct2D with Direct3D interop
-

(Click the image for a larger view)
This is a fun little sample I created. I based it off of a Hands On Lab that was presented at PDC. If you remember two spinning cubes, one rotating around the other, that's the one. My version is very similar. The main differences are:
- removed code related to cubes, geometry, or bitmaps
- modified the matrices and vectors (The "eye" in the scene is slightly above the xz-plane and it is looking towards positve-z and slightly down. The text is in a surface parallel to the xz-plane but slightly below it, sliding off towards infinite in the z direction.)
- changed the color of the text to yellow
- changed the text being displayed
- changed the filtering mode from linear to anisotropic (it looked choppy with linear interpolation)
The zip file contains the source code for this project. Compilation tips:
- You will need the PDC version of the Windows SDK and the DirectX August 2008 SDK.
- Your include path must have the header directories from each
- Your executable path must include the DirectX utility path in order to compile the HLSL code.
- Find the Star Wars theme music on Youtube and play it in the background when you run the app for the first time. :)
I hope this post has piqued your interest in Direct2D. In future posts I will further refine the sample, so stay tuned.
- Blogging about Direct2D
-
Hello, my name is Tom Mulcahy. I'm a developer on the Direct2D team and this is my first blog post.
To begin with, I will explain the name of this blog. Zemblanity is the opposite of serendipity (as coined by William Boyd in Armadillo). It means the unexpected discovery of bad things. I think the process of software development is much more about zemblanity than it is about serendipity. Everytime you find a bug, that's zemblanity. Everytime you discover a limitation in an API, that's zemblanity. I don't think think this has to be taken as pessimism. Even though you'd rather never discover any bugs in your code, it is better for bugs to be discovered sooner as opposed to later.
I hope that developers find Direct2D a pleasant API to work with. I hope my posts are full of enough serendipity to be interesting and enough zemblanity to be useful.