Shawn Hargreaves Blog
Once upon a time there lived a young graphics coder named Johnny. One day Johnny wrote a program that displayed a 3D model. As is frequently the case with people his age, Johnny was somewhat careless and did not bother to set all the renderstates which an older or wiser programmer would undoubtedly have taken care of, but luckily for him these states had sensible defaults, so his omission did not cause any problems. The model was shiny. All was well with the world.
The next morning Johnny decided to add a framerate counter, so he inserted a simple call to SpriteBatch.DrawString. Woe! The framerate showed up ok, but the model no longer rendered correctly. Disaster!
Johnny has been bitten by the dreaded state management monster.
Keeping track of which states are set when, and by who, is not particularly glamorous, but this is an important part of creating a robust 3D program. If you get it wrong, your graphics will display incorrectly. If you get it right but change states too often, your program will run slowly. If you get it right but fail to make each drawing method sufficiently self contained, your program will be fragile and you can find one piece of code affecting the behavior of other unrelated modules.
There are many pieces of rendering state associated with the graphics device:
All must be set correctly for drawing to behave as expected. And yet most programs do not actually bother to set every state! This works because states have sensible default values, so for instance if your program never modifies the ScissorRectangle, you can assume it will be set to cover the entire screen, thus no scissoring will be performed.
But here's the thing. What if your program contains more than one module, perhaps even written by different people? Class A does not use the ScissorRectangle, and assumes it will always be set to a good default, but class B changes the ScissorRectangle. Although A and B both work perfectly in isolation, when you combine them in a single program the actions of B will cause A to render incorrectly.
The solution is to standardize how graphics device state is managed, and do this consistently across your entire program. There are several possible approaches:
An obvious solution is to say every piece of drawing code must always set every possible state to whatever values it needs. This is simple and robust, producing self contained drawing methods that make no assumptions about how things are set up before they are called. This is particularly suitable for reusable framework code, for instance the XNA Framework SpriteBatch class works in this way.
The downside is that the graphics device has a lot of state, so setting everything can be expensive.
You might think you can optimize this by checking what states are currently set, and only setting your new values if they are different to the current ones. This is not a good idea, because reading states back from the graphics device can actually be much slower (especially on Xbox) than just setting them regardless.
People are often tempted by logic along the lines of "in order to avoid class B messing up the rendering of class A, I will just have B put back whatever states were previously defined". The SaveStateMode.SaveState parameter to SpriteBatch.Begin and ModelMesh.Draw implements this behavior.
Trouble is, this is not a smart way to do things.
For one thing, it is slow. Really slow.
For another, it leads to a logical mess. Who says that B must restore state in order to avoid messing up A? Why isn't it A that should avoid messing up B? What about if you introduce a third class C? It quickly gets confusing trying to keep track of who does what when.
To avoid having to set the same state values over and over again, it can be useful to define some standardized values. For instance you might decide the scissor rectangle will always be set to the full screen, clip planes will be off, stencil will be disabled, and z-test will be enabled using a less-than compare mode.
As long as you make sure these standard values are always set, drawing methods only need to change whatever things they want different to the defaults. Then when they have finished drawing, they must set these few things back to the defaults. In psuedocode:
This can be faster than approach #1, because even though it has the extra overhead of cleaning up after itself, the number of states needing to be modified by any given call is typically very small compared to the total number of available states.
The downside is that some states just don't have sensible default values. What would you pick as the default vertex buffer, or default texture? Every drawing method is going to want different ones, so trying to set these things back to standardized defaults doesn't really make a lot of sense.
It can be useful to divide renderstates into two categories.
Transient states, which are likely to be different for every drawing method, might include:
Persistent states, which are likely to be the same for most drawing methods, could include:
Once you have decided which states are which, you can use approach #1 for the transient states and approach #3 for the persistent states. In psuedocode:
This gives the best of both worlds. You don't waste any time setting persistent states which are likely to already have been set by the previous guy, and you also don't waste any time restoring transient states where the next guy in line is likely to want something else in any case.
The trick is to decide up front which states are which, then make sure you follow the rules consistently across your entire codebase.
What I've never understood is why somewhere in between $3D_Application and the GPU there isn't someone else taking responsibility for ensuring redundant state changes are caught and can also be read back quickly. This could (conceivably) be either the graphics card driver or DirectX itself. Although I've not given this plan much thought, the renderstates that easily map to enums (like FillMode, the Alpha ones etc) could be stored in RAM and just get updated every time the GPU gets updated. Meaning that changing from FillMode.Line to FillMode.Line never gets passed on to the GPU, and querying any of those states doesn't require talking to the GPU.
I've thought about trying something similar just in my XNA apps, a "RenderStateManager" of sorts, but using things like SpriteBatch mean that I can't run *all* state changes through my manager, and it just ends up being a pain to keep track of.
This can also be combined with a sorting helper so as to minimize switching states. You just define when to call each object's draw method based on, say, material, texture, type of object/entity (good occluder, transparent one), or categories (3D models first then 2D sprites) and so on.
> What I've never understood is why somewhere in between $3D_Application and the GPU there isn't someone else taking responsibility for ensuring redundant state changes are caught and can also be read back quickly.
There is! At least for setting, and on Windows for reading back as well: reading back is especially slow on Xbox due to the user/supervisor memory split.
The trouble is not that everybody isn't doing their best to optimize state setting, but that there are simply so many states. A DX9 GPU has kilobytes worth of state data, and it takes hundreds of function calls to set all this state. Even if every one of those function calls could just trivially compare the state it is being asked to set against the current value and return in just a couple of clock cycles, that is still hundreds of calls, comparing hundreds of values, for no good reason.
Even if T is quick, it is still faster to not call it in the first place, and in this case we are talking about a large number of T's!
The key thing here is that games are able to use higher level architectural decisions to reduce work, where the driver is only able to apply dumb "this state is the same value I have already" checks. The game knows more, so it can make assumptions like "I don't even need to bother thinking about setting states X, Y, or Z, because I know the previous guy already set them for me".
There was an interesting reading on ShaderX4 book about this. In the article, it was even considered setting some states back and forth within each shader it-self.
For our material system we've tried a few different approaches EMFH and Consistency.
Right now we are using kind of a hybrid. When we change materials we set the shader params to the base "known" state and then apply a set of deltas that are specific to the particular material In our testing this seemed to do better than the Consistency approach.
But I have a question about shader state optimization... Right now we are making heavy use of Effect Parameter Blocks (for static state) since individual calls to set each parameter seems like a really bad idea.
I'm curious though how fast EPB's are in XNA. What is the underlying mechanism for commiting the parameter block to the GPU. Since I assume the XNA EPB's are simply handles to the underlying native effect blocks in DX9 I guess they are pretty efficient.
Can it be almost as quick as a native mode block copy? If they are really this quick maybe Approach 1 (EMFH) might actually be the quickest.