Shawn Hargreaves Blog
We made several changes to the rendertarget API in Game Studio 4.0, all with the goal of increasing usability and reducing error.
The most common cause of confusion is probably the RenderTargetUsage.DiscardContents behavior, but this is one thing we did not change. PreserveContents mode is just too slow on Xbox, and even slower on phone hardware, which typically uses some variant of tiled or binned rendering, and thus has the same preference for discard behavior as Xbox but with less memory bandwidth for the extra buffer copies if you do request preserve mode.
Making our API simple is well and good, but not if that is going to cost enormous amounts of performance! So discard mode rendertarget semantics are here to stay. Learn em, love em, live with em :-)
Here are the things we did change:
I often see people attempt something like:
RenderTarget2D rt = new RenderTarget2D(...);
List<Texture2D> textures = new List<Texture2D>();
// Prerender animation frames
for (int i = 0; i < 100; i++)
This doesn’t work, because GetTexture returns an alias for the same surface memory as the rendertarget itself, rather than a separate copy of the data, so each drawing operation will replace the contents of all previously created textures. But these semantics are not all obvious from the API! GetTexture returns a reference to shared data, but the API makes it look like this could return a copy.
This is the classic has-a versus is-a distinction. Rendertargets are a special kind of texture, but our API made it look like they just had associated textures, or perhaps could be converted into textures.
We fixed this by removing the GetTexture method, and instead having RenderTarget2D inherit directly from Texture2D (and RenderTargetCube from TextureCube). It is harder to get these semantics wrong with the 4.0 API:
List<Texture2D> textures = new List<Texture2D>();
for (int i = 0; i < 100; i++)
RenderTarget2D rt = new RenderTarget2D(...);
How do you un-set a rendertarget? In previous versions of Game Studio we would often write:
That mostly works, but after using multiple rendertargets we must use this more complex version:
for (int i = 0; i < HoweverManyRenderTargetsIJustUsed; i++)
Ugly, not to mention error prone if the un-set code does not loop enough times.
In Game Studio 4.0, we made SetRenderTarget an atomic method, so it always sets all the possible rendertargets at the same time. This call will always un-set all rendertargets, no matter how many were previously bound:
To set a single rendertarget, you no longer need to specify an index:
If multiple rendertargets were previously bound, this will change the first one to the specified value, then un-set the others.
To set multiple rendertargets (which is a HiDef feature, so not supported in the CTP), specify them all at the same time:
GraphicsDevice.SetRenderTargets(diffuseRt, normalRt, depthRt);
That is a shortcut for this more flexible but verbose equivalent:
RenderTargetBinding bindings =
Making the set call atomic has two main benefits:
Our bloom sample contains a subtle bug in this line:
renderTarget1 = new RenderTarget2D(GraphicsDevice, width, height, 1, format);
The problem is that when we later draw to this rendertarget, we do not explicitly un-set the depth buffer. Even though we are not using depth while rendering the bloom postprocess, the default depth buffer is still bound to the device, so must be compatible with the rendertarget we are using.
If you change the bloom sample by turning on multisampling, the default depth buffer will be multisampled, but the bloom rendertarget will not, so the two are no longer compatible and rendering will fail.
We could fix this by changing the bloom rendertarget to use the same multisample format as the backbuffer, or we could explicitly un-set the depth buffer before drawing bloom:
DepthStencilBuffer previousDepth = GraphicsDevice.DepthStencilBuffer;
GraphicsDevice.DepthStencilBuffer = null;
GraphicsDevice.DepthStencilBuffer = previousDepth;
This is ugly and far from obvious. We forgot to put this code in our sample, and I see other people making the same mistake all the time!
The more we thought about this, we realized some things:
We decided the DepthStencilBuffer class was so useless, we should get rid of it entirely! Instead, the depth format is now specified as part of each rendertarget. If I call:
new RenderTarget2D(device, width, height);
I get a rendertarget with no associated depth buffer. If I want to use a depth buffer while drawing into my rendertarget, I use this constructor overload:
new RenderTarget2D(device, width, height, false, SurfaceFormat.Color, DepthFormat.Depth24Stencil8);
Note: I could specify DepthFormat.None to use the full overload but get no depth buffer.
Note: when using MRT, the depth format is controlled by the first rendertarget.
With this design, many previously common errors become impossible:
Several of you expressed concern that this design could lead to wasted memory, as you can no longer share a single depth buffer between many rendertargets.
Not at all! The key shift here is from an imperative API, where you explicitly create depth buffer objects, manage their lifespan, and tell us which one to use at what times, to a declarative API, where you tell us what depth format you want to use, and we figure out how best to make that happen.
The two important pieces of information you need to provide are:
Armed with this data, we can choose the appropriate implementation strategy for different situations:
Honesty compels me to admit that we haven’t actually implemented this sharing optimization yet. It’s currently on the schedule for 4.0 RTM, but things can always change, so please don’t beat me up too hard if we for some reason fail to get that part done in time :-)
In an old article, Sean neatly described all the different depth/stencil trick necessary to get light volumes to work in the context of deferred rendering:
It'd be really awesome if someone could come up with a simple description on how to get this to work with the new rendertarget framework of XNA 4.0 (assuming it's still feasible).
Shawn, if sharing a depthbuffer is not possible among render-target switches, maybe it may help overloading the operation I suggested to:
I'm also getting my hands into Deferred Lighting & Light Pre-pass methods, so I'd love not to lose this info.
> You speak of performance improvments because it no longer has to validate many rendering states etc... what kind of performance boost did you see on the 360 with this? significant?
Not particularly significant (the validation logic was already pretty well optimized so not usually a big overhead) but it's still slightly faster to do nothing at all, even compared to something that was already pretty quick.
> In current XNA, the requirement is that the Depth has to be equal or bigger (besides the MSAA type), so a DepthBuffer for a fullscreen renderTarget will work for half and quarter render targets. Do we get the same behaviour?
This is actually a great example of where a declarative API can beat a more traditional imperative design.
The detailed rules about what depth buffers are compatible with what surfaces are somewhat complex, and not at all consistent. Some platforms have this >= size behavior, while others require an exact size match. Some have bit depth or format restrictions where depth has to match color, while on other hardware these things are totally orthogonal.
The nice thing about a declarative API is it gives the framework room to do the right thing for each platform, applying the appropriate rules to share as much as possible, without requiring you to understand or care exactly how these rules vary across platforms.
> For depth buffers, what about situation such as:
> 1) Depth pre pass + normalss to colour channel
> 2) other stuff... (eg AO)
> 3) render solid objects using depth buffer from before and a new render target. (also reads things like AO)
> 4) use depth buffer again for another pass(eg volume fog)
The biggest restriction of the new API is that you cannot share a single depth buffer across multiple different rendertargets. However, it is usually possible to achieve this kind of rendering architecture (on Windows, anyway) just by arranging things so that any time you want to reuse the depth buffer, you are drawing to the same rendertarget.
For instance, the classic deferred shading optimization of using depth/stencil to cull light volumes is totally doable: you just need to arrange your buffer operations so that these light accumulation passes are done into the same rendertarget that was bound on index #0 during the initial scene rendering.
> Should we expect the same focus on the "least common denominator" in future XNA versions?
I don't think "lowest common denominator" is entirely fair (for instance we introduced the concept of Reach vs. HiDef profiles specifically to give us a way of formalizing capability differences, in order to avoid forcing all platforms to match the least capable), but yes, we place a high value on cross platform consistency.
Consistency is actually a multi-dimensional problem:
- There is the obvious cross platform (Windows, Xbox, Phone)
- There is cross devices within what a consumer would see as a single 'platform' (NVidia, AMD, Intel)
- There is the time axis (DX9, DX10, DX11)
Like most things in software engineering (not to mention life as a whole :-) this is something of a balancing act. We see a great deal of benefit in maximizing all three axes of consistency, but at the same time, there is also benefit in exposing the richness of specific platforms (for instance, we didn't cut programmable shaders from Windows and Xbox just because we didn't get them in this first Windows Phone release).
Rendertargets are one of the areas that varies most across platforms, so this was definitely a tough area to rationalize. I think the 4.0 API does a good job of providing this consistency while still exposing enough richness (MRT, floating point formats, etc) to implement advanced rendering techniques (for instance it is totally possible to do deferred shading, including the various depth/stencil volume optimizations, with this API).
> And instead, when you want to unset rendertargets you call something like:
> Imo is less error-prone, for the cases when a rendertarget is disposed by a bug in our code and then we have to chase why this is happening.
We thought about that. The problem with separating "set" and "unset" into separate APIs, and disallowing null as a parameter to the "set" API, is this makes a couple of common usage patterns more awkward.
For instance, it's harder to implement a "save whatever was previously set, then restore it later" operation if the restore might have to call either of two different APIs.
This can also be a pain for engine/middleware type APIs that want to implement operations which take in a rendertarget selection, use that if provided, or use the backbuffer if null. It's handy if these things can just pass whatever parameter they are given straight through to SetRenderTarget, and have this do the appropriate thing regardless of whether or not that value is null.
> Shawn, I know you've stated many times that don't believe in deferred rendering
Huh? That's not accurate at all.
I think that deferred shading is, like most things, a tool that is appropriate for some situations but not others.
Some people assume that because I wrote one of the earlier papers about this technique, I must be a died-in-the-wool fan, which is definitely not true. I wrote that paper based on some tech research, which never made it into an actual game because deferred shading turned out to be a sub-optimal approach for the design and hardware we were working with at that time.
But this doesn't mean I don't think deferred shading can be a great fit for different situations!
I do think that the "classic" style of deferred shading, like I described in that early article, is a somewhat awkward fit for Xbox 360, thanks to the EDRAM hardware architecture. I've seen many people get awesome results from deferred-ish architectures on Xbox, but the most successful tend to be hybrid designs, and often rely more on CPU visibility culling rather than stencil for optimizing the light volumes.
You can certainly do the classic depth volume optimization (like I described in my paper) on Xbox, including with Game Studio 4.0, if you specify PreserveContents rendertarget usage, but there is a significant cost to treating EDRAM in that way. But of course, PreserveContents mode is cheap on most Windows hardware...
>>The biggest restriction of the new API is that you cannot share a single depth buffer across multiple different rendertargets. However, it is usually possible to achieve this kind of rendering architecture (on Windows, anyway) just by arranging things so that any time you want to reuse the depth buffer, you are drawing to the same rendertarget.
I am not actually doing full deferred rendering... (reducing bandwidth is king:-)
But the initial occlusion/z pass is a big gain, since I have some rather heavy shaders. Plus most importantly it can cull lights quite well and avoid the need for rendering shadow maps.
What concerns me is the cost of ensuring the target with the right depth buffer is always #0. So I have to copy colour data(twice the size of depth stencil when using HDR). Bandwidth is one of the largest bottleneck, now and increasingly so in the future:-(
Or I can always set target #0 as the same thing and disable colour writes. Apart from being a pain to adapt shaders for I am concerned that hardware isnt smart enough to avoid either consuming EDRAM or consuming other resources(eg ROP units).
Also this assumes that MRT is present, which could be a big problem for Apps targeting reach...
What would be really cool, assuming we are stuck with this deign would be a method to blit depth buffers. Perhaps not possible on windows dx9 though.
all this is nice , but
as i understand if we can share a single dept buffer across all our rendertarget in our renderpipeline
the depth rendertarget has to be stored in the Specieal ram on xbox360 , so you dont have 10mb of ram for all your other rendertarget
if we can not share the depthbuffer , no , light,ao,and so on...
we can do this by render a fresh copy of the deptbuffer every time..
Hay we can only push 200 drawcalls on the xbox360
let us divide it up
let say 40 3d models
we are now a 40 drawcalls
and now some shadows
we are now at 80 drawcalls
some light,ao and other postprocessing
let say 7 draw calls
now we need some explosions and some bullit
we are now at 130 drawcalls
we need a sky and pehaps some water and a nice terrain , we are now at 133 draw calls
we allso need some spotlight and pointlight
so we can have 70 light before we hit the limit
this a bad , we can simple not do a real 3d game,, with these limitations,,
bad idear if we have to render a fresh copy of the depth buffer if we gonna do ao as postprocess...
> "You can certainly do the classic depth volume optimization ... including with Game Studio 4.0, if you specify PreserveContents rendertarget usage ... PreserveContents mode is cheap on most Windows hardware..."
But only by keeping the same target in slot 0 for the buffer and lighting passes?
This consumes fill rate even if target is not referenced in shader, and even if the ColorWrite mode is disabled.
Never mind losing the ability to share the depth with the back buffer (for pre-pass, or just to reduce memory).
> "Some people assume that because I wrote one of the earlier papers about this technique, I must be a died-in-the-wool fan, which is definitely not true."
All we're asking is for the XNA team to consider those people who DO use these techniques.
> "I don't think "lowest common denominator" is entirely fair"
Can you elaborate on the XNA team's goals with the platform?
Every release further limits how the system can be used, so obviously serious games are not being considered when decisions are made.
XNA is catering more and more to the 2D bubble-popper / 3D BasicEffect shovelware crowd.
The team has to be aware of this, so there must be some drive behind it - why continue to limit the platform?
Hi Shawn, first of all I have to say thanks for the information, been following your posts lately.
I also wanted to add that GS 4.0 seem to bring back a bug from GS 2.0 which I believe was fixed in GS 3.0.
You guys probably want to fix this as soon as possible, hopefully before the official release of it.
> What would be really cool, assuming we are stuck with this deign would be a method to blit depth buffers. Perhaps not possible on windows dx9 though.
To be clear: for Game Studio 4.0, this design will ship exactly as described above.
For future versions, we are definitely interested in hearing what additional scenarios you guys are interested in having supported, to make sure we are focusing our thinking and design efforts in the right areas!
> Can you elaborate on the XNA team's goals with the platform?
I thought Michael summed this up well in his MIX talk: this is a little corny but actually very accurate :-)
"Our goal is to create a game development platform which is powerful, productive, and portable".
There is obviously some tension between those three goals, so balancing is required. If you only care about one or two out of the three, you will inevitably disagree with some of our decisions, in areas where we chose to prioritize a different goal to the one you care most about.
It is absolutely not (and has never been) our goal to replicate the entire native DirectX API. We believe there is a vast and extremely interesting space at a somewhat higher level than typical native APIs, but still much lower level and closer to the metal than a game engine.
> XNA is catering more and more to the 2D bubble-popper / 3D BasicEffect shovelware crowd.
If that was true, we could have stopped at SpriteBatch, and skipped pretty much everything I've been working on the last couple of years!
I don't believe that simplified, more productive development and high quality games are in any way mutually exclusive, and I can assure you that high quality games are absolutely an important goal for our platform.
> I also wanted to add that GS 4.0 seem to bring back a bug from GS 2.0 which I believe was fixed in GS 3.0.
This behavior should not have changed between GS3 and GS4. In both versions:
- You can GetData from a texture at any time
- You cannot SetData while the texture is set on the device
If you are seeing something different to this with GS4, please file a bug on the Connect site (along with a repro app) so we can take a look at it.
Is this behavior you are seeing with the Windows framework, or on Windows Phone, btw?