BasicEffect optimizations in XNA Game Studio 4.0

BasicEffect optimizations in XNA Game Studio 4.0

  • Comments 18

The BasicEffect API and feature set did not change in Game Studio 4.0, but the implementation saw some aggressive optimizations.

In previous versions, BasicEffect was intended as a starting point for beginners. We expected expert programmers to soon move on to writing their own shaders, so as long as BasicEffect performed adequately on both Windows and Xbox, it wasn't worth spending time on further optimization.

Windows Phone changed this priority for two reasons:

  • Mobile GPUs are slower than Windows or Xbox, so every shader instruction makes a significant difference.

  • Because we do not support custom shaders on Windows Phone, BasicEffect needs to be as fast as humanly possible.

I though it would be interesting to describe the details of how we sped things up.

 

Shader permutations

In spite of its name, BasicEffect is actually quite complex! If you look at the HLSL source code, you will see that the previous versions included 12 different vertex shaders to support all permutations of these options:

  • Lighting: none, per vertex, per pixel
  • Vertex color: off, on
  • Texture: off, on

Plus 4 different pixel shaders:

  • Per pixel lighting: off, on
  • Texture: off, on

BasicEffect has many other adjustable knobs, but we did not include specialized shaders for every possible combination. For instance we always evaluated the fog equation, and implemented the FogEnable property by setting parameter values to make the result come out zero if we did not want fog.

There is a balance between providing many shader permutations (which minimizes GPU instruction counts), versus fewer shaders (which minimizes memory overhead and development/test cost). When we reevaluated this balance in the light of Windows Phone, we decided to add more specialized shaders. As of Game Studio 4.0, BasicEffect now has a total of 32 permutations. There are 20 vertex shaders:

  • Lighting: none, one vertex light, three vertex lights, three pixel lights
  • Vertex color: off, on
  • Texture: off, on
  • Fog: off, on (fog=off only for the versions that do not include lighting)

Plus 10 pixel shaders:

  • Lighting: none, per vertex, per pixel
  • Texture: off, on
  • Fog: off, on (fog=off only for the versions that do not include per pixel lighting)

Implications:

  • Changing FogEnable used to have no performance impact, but turning off fog is now a little faster.

  • Using just one vertex light (PreferPerPixelLighting=false, DirectionalLight0.Enabled=true, DirectionalLight1.Enabled=false, DirectionalLight2.Enabled=false) is faster than if you use all three lights.

 

Preshaders

A common tension in shader programming is that when you design effect parameters to provide a nice clean API, the resulting parameter formats are not always the most efficient for HLSL optimization.

D3D tries to correct any such mismatches through a feature called "preshaders". The HLSL compiler looks for computations that are the same for all vertices or all pixels, and moves these out of the main shader into a special setup pass which runs on the CPU before drawing begins. This is a great feature, but has a couple of fatal flaws:

  • The HLSL compiler does not always spot every optimization possibility
  • The virtual machine that evaluates preshaders is not especially efficient
  • Preshaders are not supported on Xbox or Windows Phone

Game Studio 4.0 adds the ability to implement preshader computations in C#, by overloading this new method, which is called immediately before EffectPass.Apply sets parameter values onto the graphics device:

    protected virtual void Effect.OnApply();

This allows BasicEffect to expose whatever properties the API requires, without needing these to match the underlying HLSL shader parameters. When the programmer changes a managed property, we just set a dirty flag, then recompute derived HLSL parameter values during OnApply. We used this new ability to precompute many things:

  • Collapse the World, View, and Projection matrices into a single WorldViewProj matrix.

  • When lighting is enabled, compute the WorldInverseTranspose matrix. This is neccessary for correct normal transforms when using non-uniform scales, but something we never bothered to do right in previous versions.

  • Extract the EyePosition vector from the View matrix.

  • Combine FogStart, FogEnd, World, and View, generating a vector that can compute fog amount with a single dot product.

  • Merge the DiffuseColor, EmissiveColor, AmbientLightColor, and Alpha properties into a more efficient set of combined parameters.

 

Do less work

We also applied some good 'ole algebraic optimizations, using math to find cheaper ways of getting the results we wanted.

We got a nice win from vectorizing the lighting computations, using matrix operations to evaluate all three lights at the same time. The new code is harder to read, but a couple of instructions shorter.

One place we slightly changed the final output is the fog equation. Previous versions used distance fog, which is computed from the distance between camera and vertex. We now use depth fog, which only considers how far in front of the camera each vertex is, ignoring any sideways offset. The visual difference is subtle, but depth fog is much cheaper to evaluate.

 

Results

To take one example, here are the instruction counts for BasicEffect using vertex color and texture, but no lighting or fog:

  Vertex Shader Pixel Shader
Game Studio 3.1 30 6
Game Studio 4.0 6 3

 

Preemptive question: "can we get the source for these optimized shaders?"

I would certainly love to release this, but we haven't worked out he details yet. Stay tuned!

  • >>> is it better to have a single BasicEffect instance and change it's properties dynamically or would we have better performance with multiple instances, each with unique set of properties?

    <<<

    In my engine I find the latter to be a rather sizeable gain.

    But this might be to some extent due to having large pre-shaders or too expensive parameter computations(not sure if the effects system is smart enough to avoid pre-shader evaluation when not setting dependant parameters though?).

    Perhaps time for dual quaternion support...

  • > Preshaders are not supported on Xbox or Windows Phone.

    Oh my, well that's something I didn't know about...

    I'm tempted to ask for what else is not supported, because some things have to be manually done (i.e. unrolling loops inlining or using attributes).

    Does the compiler for XBOX targeted files will perform those overly aggressive optimizations that it does to the windows ones? Compiling a shader on windows with no optimizations flag will make your instructions counts sometimes go twice the compile-optimized one. Even after all Matrix packing, taking advantage of vector4 testing, branching, dot products, etc. the optimizations check will go several steps further and almost always throw at you a "not humanly possible to do for every shader" slick high performance compiled file.

    Have noticed that, at least on windows, in Debug mode the shaders will be compiled as "WITHOUT optimizations" and on Release mode it will be "WITH optimizations".

    On a side note: amazing work crunching down to dust those instruction numbers!.

    Really want to test that OnApply() method, since I really didn't get it... if I have separate World View Projection Matrices, how would I make my shader see that as a single WVP matrix automatically as in pre-shaders? (In my case, I opted to rewrite all shaders with a single WVP and WV matrices, plus some other similar things, but just to understand the logic behind it a little bit)

    Thanks Shawn!.

    Congrats on your wedding.

  • > if I have separate World View Projection Matrices, how would I make my shader see that as a single WVP matrix automatically as in pre-shaders?

    You write your shader to expose a single WVP matrix, but then you write your C# wrapper class to expose separate W, V, and P properties. The setters for these properties just store the new value, and set a dirty flag, but don't directly call EffectParameter.SetValue (since there is no single EffectParameter corresponding to these properties). Then in your OnApply method, you check if this dirty flag is set, and if so, combine the three matrices to compute the combined WVP, then pass this derived value through to the shader using EffectParameter.SetValue.

    To see how we used this, you could check out the 4.0 CTP implementation of BasicEffect using .NET Reflector...

Page 2 of 2 (18 items) 12
Leave a Comment
  • Please add 5 and 7 and type the answer here:
  • Post