Rendertarget changes in XNA Game Studio 4.0

Rendertarget changes in XNA Game Studio 4.0

  • Comments 44

We made several changes to the rendertarget API in Game Studio 4.0, all with the goal of increasing usability and reducing error.

The most common cause of confusion is probably the RenderTargetUsage.DiscardContents behavior, but this is one thing we did not change. PreserveContents mode is just too slow on Xbox, and even slower on phone hardware, which typically uses some variant of tiled or binned rendering, and thus has the same preference for discard behavior as Xbox but with less memory bandwidth for the extra buffer copies if you do request preserve mode.

Making our API simple is well and good, but not if that is going to cost enormous amounts of performance! So discard mode rendertarget semantics are here to stay. Learn em, love em, live with em :-)

Here are the things we did change:

 

Has-a versus Is-a

I often see people attempt something like:

    RenderTarget2D rt = new RenderTarget2D(...);
    List<Texture2D> textures = new List<Texture2D>();

    // Prerender animation frames
    for (int i = 0; i < 100; i++)
    {
        GraphicsDevice.SetRenderTarget(0, rt);
        DrawCharacterAnimationFrame(i);
        GraphicsDevice.SetRenderTarget(0, null);

        textures.Add(rt.GetTexture());
    }

This doesn’t work, because GetTexture returns an alias for the same surface memory as the rendertarget itself, rather than a separate copy of the data, so each drawing operation will replace the contents of all previously created textures. But these semantics are not all obvious from the API! GetTexture returns a reference to shared data, but the API makes it look like this could return a copy.

This is the classic has-a versus is-a distinction. Rendertargets are a special kind of texture, but our API made it look like they just had associated textures, or perhaps could be converted into textures.

We fixed this by removing the GetTexture method, and instead having RenderTarget2D inherit directly from Texture2D (and RenderTargetCube from TextureCube). It is harder to get these semantics wrong with the 4.0 API:

    List<Texture2D> textures = new List<Texture2D>();

    for (int i = 0; i < 100; i++)
    {
        RenderTarget2D rt = new RenderTarget2D(...);

        GraphicsDevice.SetRenderTarget(rt);
        DrawCharacterAnimationFrame(i);
        GraphicsDevice.SetRenderTarget(null);

        textures.Add(rt);
    }

 

Atomicity

How do you un-set a rendertarget? In previous versions of Game Studio we would often write:

    GraphicsDevice.SetRenderTarget(0, null);

That mostly works, but after using multiple rendertargets we must use this more complex version:

    for (int i = 0; i < HoweverManyRenderTargetsIJustUsed; i++)
    {
        GraphicsDevice.SetRenderTarget(i, null);
    }

Ugly, not to mention error prone if the un-set code does not loop enough times.

In Game Studio 4.0, we made SetRenderTarget an atomic method, so it always sets all the possible rendertargets at the same time. This call will always un-set all rendertargets, no matter how many were previously bound:

    GraphicsDevice.SetRenderTarget(null);

To set a single rendertarget, you no longer need to specify an index:

    GraphicsDevice.SetRenderTarget(renderTarget);

If multiple rendertargets were previously bound, this will change the first one to the specified value, then un-set the others.

To set multiple rendertargets (which is a HiDef feature, so not supported in the CTP), specify them all at the same time:

    GraphicsDevice.SetRenderTargets(diffuseRt, normalRt, depthRt);

That is a shortcut for this more flexible but verbose equivalent:

    RenderTargetBinding[] bindings =
    {
        new RenderTargetBinding(diffuseRt),
        new RenderTargetBinding(normalRt),
        new RenderTargetBinding(depthRt),
    };

    GraphicsDevice.SetRenderTargets(bindings);

Making the set call atomic has two main benefits:

  • It reduces the chance of accidentally forgetting to unset multiple rendertargets

  • It makes our validation code more efficient, as we now have a single place to validate MRT rules such as all surfaces being the same size and bit depth. Previously, SetRenderTarget had no way to know when it had the final state, or whether other calls were about to change rendertargets on different indices, so it had to just set a dirty flag which clued the next draw operation to validate and commit the new surfaces. This added a small but measurable overhead to all draw operations (even when MRT was not in use), which is no longer necessary now these operations are atomic.

 

Declarative depth

Our bloom sample contains a subtle bug in this line:

    renderTarget1 = new RenderTarget2D(GraphicsDevice, width, height, 1, format);

The problem is that when we later draw to this rendertarget, we do not explicitly un-set the depth buffer. Even though we are not using depth while rendering the bloom postprocess, the default depth buffer is still bound to the device, so must be compatible with the rendertarget we are using.

If you change the bloom sample by turning on multisampling, the default depth buffer will be multisampled, but the bloom rendertarget will not, so the two are no longer compatible and rendering will fail.

We could fix this by changing the bloom rendertarget to use the same multisample format as the backbuffer, or we could explicitly un-set the depth buffer before drawing bloom:

    DepthStencilBuffer previousDepth = GraphicsDevice.DepthStencilBuffer;
    GraphicsDevice.DepthStencilBuffer = null;

    DrawBloom();

    GraphicsDevice.DepthStencilBuffer = previousDepth;

This is ugly and far from obvious. We forgot to put this code in our sample, and I see other people making the same mistake all the time!

The more we thought about this, we realized some things:

  • Any time you change rendertarget without also changing depth buffer, that is almost certainly a bug waiting to bite.

  • Any time you un-set from a rendertarget to the backbuffer without also resetting the depth buffer, that’s another bug.

  • Many rendertargets are used in ways that do not actually require a depth buffer. The correct thing to do here is set the depth buffer to null. If you forget to do that, things will often still work, but can fail in subtle and confusing ways.

  • For rendertargets that do require a depth buffer, it can be a pain making sure your depth buffer has the same size and multisample format as the rendertarget. The XNA framework had much code dedicated to validating these rules, and I often see people getting this wrong.

  • Our DepthStencilBuffer class had no interesting methods or properties. In fact, the only thing you could do with it was to set it onto the graphics device, which always happened at the same time as setting a rendertarget.

We decided the DepthStencilBuffer class was so useless, we should get rid of it entirely! Instead, the depth format is now specified as part of each rendertarget. If I call:

    new RenderTarget2D(device, width, height);

I get a rendertarget with no associated depth buffer. If I want to use a depth buffer while drawing into my rendertarget, I use this constructor overload:

    new RenderTarget2D(device, width, height, false, SurfaceFormat.Color, DepthFormat.Depth24Stencil8);

Note: I could specify DepthFormat.None to use the full overload but get no depth buffer.

Note: when using MRT, the depth format is controlled by the first rendertarget.

With this design, many previously common errors become impossible:

  • The depth buffer cannot fail to match the rendertarget size and multisample format, so we don’t even need to bother validating this (which speeds up all rendering).

  • When you are doing 2D work like the bloom sample, and not thinking about depth buffers at all, the device is automatically set to use null depth.

  • Whenever you un-set a rendertarget, the default depth buffer is automatically restored. No need to explicitly save and restore means no chance of getting that wrong!

Several of you expressed concern that this design could lead to wasted memory, as you can no longer share a single depth buffer between many rendertargets.

Not at all! The key shift here is from an imperative API, where you explicitly create depth buffer objects, manage their lifespan, and tell us which one to use at what times, to a declarative API, where you tell us what depth format you want to use, and we figure out how best to make that happen.

The two important pieces of information you need to provide are:

  • Do I want a depth buffer when using this rendertarget? If so, what format?

  • Do I want to be able to go back to this buffer later and continue drawing over it (RenderTargetUsage.PreserveContents) or am I only interested in the final texture image? (RenderTargetUsage.DiscardContents)

Armed with this data, we can choose the appropriate implementation strategy for different situations:

  • On Xbox, there isn’t really such a thing as a depth buffer in the first place: it’s actually all just a small piece of shared EDRAM, plus backing store if you request PreserveContents mode. So this design takes less memory than what we had before, as we no longer need to jump through hoops to give the illusion that these are real objects with actual memory of their own.

  • On Windows, if you request PreserveContents mode, we allocate a separate depth buffer per rendertarget.

  • On Windows, if you use the default DiscardContents mode, we can be smarter, and can do things like automatically sharing a single native depth buffer between many rendertargets (as long as they all have the same size and multisample format).

Honesty compels me to admit that we haven’t actually implemented this sharing optimization yet. It’s currently on the schedule for 4.0 RTM, but things can always change, so please don’t beat me up too hard if we for some reason fail to get that part done in time :-)

  • I like a lot that's not needed to manually set an index for each renderTarget, and that setting a render target unsets all others.

    Also the bindings will help to order the code a lot more, you may create bindings according to some options: going to do motionBlur? going to do lighting?

    Then create the bindings with linearDepth, normals, and a velocities renderTargets.

    SetRenderTargets may always be:

    SetRenderTargets(prePassBindings);

    I was concerned regarding the depthBuffers too... For what I understand, it would work like this?:

    diff = new RenderTarget2D(device, width, height, false, SurfaceFormat.Color, DepthFormat.Depth24Stencil8); --> A common depth.

    linearDepth = new RenderTarget2D(device, width, height, false, SurfaceFormat.Single, DepthFormat.None); ---> None depth

    velocity = new RenderTarget2D(device, width, height, false, SurfaceFormat.HalfVector2, DepthFormat.None); ----> None again

    And then:

    SetRenderTargets(diff, linearDepth, velocity);

    It would use the diff depthBuffer. However if I swap the order:

    SetRenderTargets(linearDepth, diff, velocity);

    It will be rendering without a depth, am I right?

    Seems a little bit counter-intuitive depending on your code practices, I feel like setting redundant depthFormats, i.e. all render targets would have Depth24Stencil8 instead of SetRenderTargets(a, b, c, commonDepth) or even declared in the same bindings. However that is not an issue.

    The main question I have is, can I use the depthBuffer the API declared for some sized renderTarget in another lower sized renderTarget? In current XNA, the requirement is that the Depth has to be equal or bigger (besides the MSAA type), so a DepthBuffer for a fullscreen renderTarget will work for half and quarter render targets. Do we get the same behaviour?

    An example: using only a full screen depthBuffer not MSAA'ed. RT1 = fullScreen, RT2 = halfScreen.

    1. render full-screen color --> to RT1;

    2. downsample to half size both color and depth rewriting the depth with DEPTH0 semantic in pixel shader --> to RT2

    3. render particles on the same RT2.

    4. Combine with RT1.

    I doubt I always had, is it wasteful to write the Depth with the DEPTH0 semantic? How does that compare with the PreserveContents option?

    In this case it wouldn't work unless PreserveContents automatically downscales the data, but supposing that there is no downscale, what do you do? direct-to-the-metal byte to byte copy? (that would obviously be way faster).

    Anyways I'm really happy with how everything is coming out and the over-the-top amazing things you guys do, not scared anymore :). Several platforms, new gadgets all within one framework... that's just amazing, can't wait to see what we will in the future.

    Thanks in advance!. Have a nice day.

  • Very interesting. I was afraid before that 4.0 was going to be more limiting but some of these render target issues are one i had when i was just starting a year ago (not much of a problem now thou) but i can see that this will make it much easier for new people to learn and it should make my code easier to read too :)

    You speak of performance improvments because it no longer has to validate many rendering states etc... what kind of performance boost did you see on the 360 with this? significant?

    Anyway XNA 4.0 is looking good.

  • So how do you share depth buffer *data* between targets?

    As you know (and have written in many papers) using depth testing during deferred lighting improves performance.

    However in your new design this appears impossible.

    Forcing us to redraw the scene every target switch defeats the purpose of deferred rendering and is overly redundant.

    Why not provide the option to share (actually share) depth buffers on Windows?

    There are people legitimately using XNA on Windows who need the performance and have very little or no interest in Xbox or mobile.

    Should we expect the same focus on the "least common denominator" in future XNA versions?

  • Re: depth buffers and render targets.

    Hmmm sounds suspiciously similar to what I've been doing for awhile...

    You didn't happen to break into my computer and steal my codes did you?

    Seriously though, sounds like it will make the whole process much easier/confusing and less gotchas for new comers.

  • Hi Shawn,

    Thanks for another great post. I wanted to ask something. I'm currently doing as suggested (by you I think) while working on a landscape 2D game in the windows phone SDK (XNA). I'm rendering to a render target then rotating it to landscape on drawing. Everything works fine. However my co-ordinates for player position are based on X/Y as if the phone was in portrait - which is fine at the moment, when the sdk is updated to support explicit 'landscape' mode (I assume there will be a device call to set it so the resolutions is now 800x480 rather than 480x800), will this automatically take into account the X/Y switch and give you the full 800 pixels in the *X* direction. I assumed it would or it wouldn't really be a supported landscape mode but wanted to check before I coded everything as X/Y only to find a real landscape mode would need to switch Y to X and X to Y.

    I could of course just code it now as X to Y and Y to X and draw it 'on it's side' so to speak (i.e not using a render target) but I like to think of it properly as a horizontal 800x480 viewport to code in.

    Hope I've explained my question ok.

    thanks

  • Sorry, if my earlier post sounded a bit harsh but we put a lot of work in our game. :/  And i'm sleep deprived.

    Would still like to know about my questions though. ;)

  • That all sounds good.

    I have to ask though, will XNA4 fix the whole-application 5-second freeze issue when trying to do anything with MediaPlayer on Windows 7? Even though I'm coding for the X360, it's really frustrating the alpha testers.

  • Thanks for the insight Shawn, this is exactly how I was hoping depth buffers would be treated in 4.0 :)

  • Great post. Thanks for clearing a lot of this stuff up :)

  • For depth buffers, what about situation such as:

    1) Depth pre pass + normalss to colour channel

    2) other stuff... (eg AO)

    3) render solid objects using depth buffer from before and a new render target. (also reads things like AO)

    4) use depth buffer again for another pass(eg volume fog)

    With the new system, I dont see a good way to do this on windows without a copy of depth between render targets. Perhaps using MRT if all writes to colour are disabled on first pass?

  • Shawn, this is probably the best change to the whole XNA framework! I can't describe how many times I fought with getting render targets to work only to not get the right output or throw an exception because I still had some other render target set! I've already used the texture inheritance and it's genius. You guys really did break it good!

  • > "Honesty compels me to admit that we haven’t actually implemented this sharing optimization yet. It’s currently on the schedule for 4.0 RTM, ... "

    Then one suggestion: not allowing to pass a null rendertarget to the set operation. And instead, when you want to unset rendertargets you call something like:

    GraphicsDevice.ResetRenderTargets();

    Imo is less error-prone, for the cases when a rendertarget is disposed by a bug in our code and then we have to chase why this is happening.

    Now when you call SetRenderTarget or SetRenderTargets you don't have to verify whether the render target is null and move that check to the internal operation you call last, and if a rendertarget is null, then you throw an exception.

    What would happen if I call "SetRenderTargets(rt1, null, rt3)"? Or "SetRenderTarget(rtNullByMistakeForWhateverReason)"?

    By not allowing to pass null rendertargets to those operations we could rapidly identify that a rendertarget we pass is getting to null somewhere in our codes and fix it.

    Thoughts?

  • Thank you for the post!

    I've also tried to use one render target to draw to an array of textures before. I was very surprised to find that, strangely, the contents of all of the textures were the same...

    Thank you for clearing that up. I'm looking forward to the stable release.

    These changes are getting me excited :D

  • hmm, like Still Alarmed mentions, my current implementation of deferred rendering assumes depth buffer can be shared between render targets - first I render all the object and create a depth buffer, then in later passes (like light volume effects and analytical occlusion proxy boxes) against the previously generated d-buffer.

    It's already a giant pain that a stencil buffer can't be shared between render targets (prevents me from culling a significant amount of pixels when processing my light volumes and occlusion proxies).

    Shawn, I know you've stated many times that don't believe in deferred rendering, but now it seems that you really don't want it to work for anyone... (rather than fixing the stencil buffer problem, you're breaking the depth buffer now?!).

    Maybe I'm missing something, but I'm really disappointed about this.

  • Just like StillAlarmed and FlyingWaffle I still have my concerns about the depth-stencil buffer.

    When I first started to learn deferred rendering I would set my geometry buffer, render into it, then set null. A separate component would set the light buffer, render lights, and then set null again. However this proved counter-productive with lighting because then I could not use any depth or stencil test because it was invalidated after the double set render target operation (null and then light buffer).

    Lately I found when composting the image into the back buffer again if I set null to all render targets the depth-stencil buffer is invalidated, to get around this I simply employed yet another render target, set it in slot 1 (the second render target) and render the same image into the back buffer and that target. This actually became helpful as now I don't need to resolve the back buffer to perform post processing effects.

    Mostly it worries me between the geomtry and lighting phases though as being able to put the depth and stencil buffer to work in lighting has moved my lighting phase in a normal scene from a couple milliseconds to 100s of a millisecond. That would be a significant performance hit to have to go back to the old naive approach.

Page 1 of 3 (44 items) 123
Leave a Comment
  • Please add 1 and 8 and type the answer here:
  • Post