DirectX 12 introduces the next version of Direct3D, the graphics API at the heart of DirectX. Direct3D is one of the most critical pieces of a game or game engine, and we’ve redesigned it to be faster and more efficient than ever before. Direct3D 12 enables richer scenes, more objects, and full utilization of modern GPU hardware. And it isn’t just for high-end gaming PCs either – Direct3D 12 works across all the Microsoft devices you care about. From phones and tablets, to laptops and desktops, and, of course, Xbox One, Direct3D 12 is the API you’ve been waiting for.What makes Direct3D 12 better? First and foremost, it provides a lower level of hardware abstraction than ever before, allowing games to significantly improve multithread scaling and CPU utilization. In addition, games will benefit from reduced GPU overhead via features such as descriptor tables and concise pipeline state objects. And that’s not all – Direct3D 12 also introduces a set of new rendering pipeline features that will dramatically improve the efficiency of algorithms such as order-independent transparency, collision detection, and geometry culling.Of course, an API is only as good as the tools that help you use it. DirectX 12 will contain great tools for Direct3D, available immediately when Direct3D 12 is released.
We think you’ll like this part: DirectX 12 will run on many of the cards gamers already have. More on that in our FAQ.
We (the product team) read the comments on twitter and game development/gamer forums and many of you have asked if this is real or if our marketing department suddenly received a budget infusion. Everything you are reading is coming directly from the team who has brought you almost 20 years of DirectX.
It’s our job to create great APIs and we have worked closely with our hardware and software partners to prove the significant performance wins of Direct3D 12. And these aren’t just micro-benchmarks that we hacked up ourselves – these numbers are for commercially released game engines or benchmarks, running on our alpha implementation. The screenshots below are from real Direct3D 12 app code running on a real Direct3D 12 runtime running on a real Direct3D 12 driver.
If you’re a gamer, you know what 3DMark is – a great way to do game performance benchmarking on all your hardware and devices. This makes it an excellent choice for verifying the performance improvements that Direct3D 12 will bring to games. 3DMark on Direct3D 11 uses multi-threading extensively, however due to a combination of runtime and driver overhead, there is still significant idle time on each core. After porting the benchmark to use Direct3D 12, we see two major improvements – a 50% improvement in CPU utilization, and better distribution of work among threads.
Tested on GIGABYTE BRIX Pro (Intel Core i7-4770R + Iris Pro Graphics 5200)
Forza Motorsport 5 is an example of a game that pushes the Xbox One to the limit with its fast-paced photorealistic racing experience. Under the hood, Forza achieves this by using the efficient low-level APIs already available on Xbox One today. Traditionally this level of efficiency was only available on console – now, Direct3D 12, even in an alpha state, brings this efficiency to PC and Phone as well. By porting their Xbox One Direct3D 11.X core rendering engine to use Direct3D 12 on PC, Turn 10 was able to bring that console-level efficiency to their PC tech demo.
Direct3D 12 represents a significant departure from the Direct3D 11 programming model, allowing apps to go closer to the metal than ever before. We accomplished this by overhauling numerous areas of the API. We will provide an overview of three key areas: pipeline state representation, work submission, and resource access.
Direct3D 11 allows pipeline state manipulation through a large set of orthogonal objects. For example, input assembler state, pixel shader state, rasterizer state, and output merger state are all independently modifiable. This provides a convenient, relatively high-level representation of the graphics pipeline, however it doesn’t map very well to modern hardware. This is primarily because there are often interdependencies between the various states. For example, many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time. This delays hardware state setup, which means extra overhead, and fewer maximum draw calls per frame.
Direct3D 12 addresses this issue by unifying much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.
In Direct3D 11, all work submission is done via the immediate context, which represents a single stream of commands that go to the GPU. To achieve multithreaded scaling, games also have deferred contexts available to them, but like PSOs, deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.
Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.
In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to “record” one bundle that draws a single character model, then “play back” the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.
Resource binding in Direct3D 11 is highly abstracted and convenient, but leaves many modern hardware capabilities underutilized. In Direct3D 11, games create “view” objects of resources, then bind those views to several “slots” at various shader stages in the pipeline. Shaders in turn read data from those explicit bind slots which are fixed at draw time. This model means that whenever a game wants to draw using different resources, it must re-bind different views to different slots, and call draw again. This is yet another case of overhead that can be eliminated by fully utilizing modern hardware capabilities.
Direct3D 12 changes the binding model to match modern hardware and significantly improve performance. Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.
In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing unprecedented flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten.
Subscribe to this blog
Follow us @DirectX12
Come see us at //build
AMD Press Release - DirectX 12
NVIDIA Blog - DirectX 12
Are you a professional game developer? Do you think Direct3D 12 would ignite your game’s performance? Click here to apply for the DirectX 12 early access program.
Q: Should I wait to buy a new PC or GPU?A: No – if you buy a PC with supported graphics hardware (over 80% of gamer PCs currently being sold), you’ll be able to enjoy all the power of DirectX 12 games as soon as they are available.
Q: Does DirectX 12 include anything besides Direct3D 12?A: Also new is a set of cutting-edge graphics tools for developers. Since this is a preview of DirectX 12 focused on Direct3D 12, other technologies may be previewed at a later date. Q: When will I be able to get my hands on DirectX 12?A: We are targeting Holiday 2015 games.
Q: What hardware will support Direct3D 12 / will my existing hardware support Direct3D 12?A: We will link to our hardware partners’ websites as they announce their hardware support for Direct3D 12.