Shawn Hargreaves Blog
The BasicEffect API and feature set did not change in Game Studio 4.0, but the implementation saw some aggressive optimizations.
In previous versions, BasicEffect was intended as a starting point for beginners. We expected expert programmers to soon move on to writing their own shaders, so as long as BasicEffect performed adequately on both Windows and Xbox, it wasn't worth spending time on further optimization.
Windows Phone changed this priority for two reasons:
I though it would be interesting to describe the details of how we sped things up.
In spite of its name, BasicEffect is actually quite complex! If you look at the HLSL source code, you will see that the previous versions included 12 different vertex shaders to support all permutations of these options:
Plus 4 different pixel shaders:
BasicEffect has many other adjustable knobs, but we did not include specialized shaders for every possible combination. For instance we always evaluated the fog equation, and implemented the FogEnable property by setting parameter values to make the result come out zero if we did not want fog.
There is a balance between providing many shader permutations (which minimizes GPU instruction counts), versus fewer shaders (which minimizes memory overhead and development/test cost). When we reevaluated this balance in the light of Windows Phone, we decided to add more specialized shaders. As of Game Studio 4.0, BasicEffect now has a total of 32 permutations. There are 20 vertex shaders:
Plus 10 pixel shaders:
A common tension in shader programming is that when you design effect parameters to provide a nice clean API, the resulting parameter formats are not always the most efficient for HLSL optimization.
D3D tries to correct any such mismatches through a feature called "preshaders". The HLSL compiler looks for computations that are the same for all vertices or all pixels, and moves these out of the main shader into a special setup pass which runs on the CPU before drawing begins. This is a great feature, but has a couple of fatal flaws:
Game Studio 4.0 adds the ability to implement preshader computations in C#, by overloading this new method, which is called immediately before EffectPass.Apply sets parameter values onto the graphics device:
protected virtual void Effect.OnApply();
This allows BasicEffect to expose whatever properties the API requires, without needing these to match the underlying HLSL shader parameters. When the programmer changes a managed property, we just set a dirty flag, then recompute derived HLSL parameter values during OnApply. We used this new ability to precompute many things:
We also applied some good 'ole algebraic optimizations, using math to find cheaper ways of getting the results we wanted.
We got a nice win from vectorizing the lighting computations, using matrix operations to evaluate all three lights at the same time. The new code is harder to read, but a couple of instructions shorter.
One place we slightly changed the final output is the fog equation. Previous versions used distance fog, which is computed from the distance between camera and vertex. We now use depth fog, which only considers how far in front of the camera each vertex is, ignoring any sideways offset. The visual difference is subtle, but depth fog is much cheaper to evaluate.
To take one example, here are the instruction counts for BasicEffect using vertex color and texture, but no lighting or fog:
Game Studio 3.1
Game Studio 4.0
Preemptive question: "can we get the source for these optimized shaders?"
I would certainly love to release this, but we haven't worked out he details yet. Stay tuned!
>>> is it better to have a single BasicEffect instance and change it's properties dynamically or would we have better performance with multiple instances, each with unique set of properties?
In my engine I find the latter to be a rather sizeable gain.
But this might be to some extent due to having large pre-shaders or too expensive parameter computations(not sure if the effects system is smart enough to avoid pre-shader evaluation when not setting dependant parameters though?).
Perhaps time for dual quaternion support...
> Preshaders are not supported on Xbox or Windows Phone.
Oh my, well that's something I didn't know about...
I'm tempted to ask for what else is not supported, because some things have to be manually done (i.e. unrolling loops inlining or using attributes).
Does the compiler for XBOX targeted files will perform those overly aggressive optimizations that it does to the windows ones? Compiling a shader on windows with no optimizations flag will make your instructions counts sometimes go twice the compile-optimized one. Even after all Matrix packing, taking advantage of vector4 testing, branching, dot products, etc. the optimizations check will go several steps further and almost always throw at you a "not humanly possible to do for every shader" slick high performance compiled file.
Have noticed that, at least on windows, in Debug mode the shaders will be compiled as "WITHOUT optimizations" and on Release mode it will be "WITH optimizations".
On a side note: amazing work crunching down to dust those instruction numbers!.
Really want to test that OnApply() method, since I really didn't get it... if I have separate World View Projection Matrices, how would I make my shader see that as a single WVP matrix automatically as in pre-shaders? (In my case, I opted to rewrite all shaders with a single WVP and WV matrices, plus some other similar things, but just to understand the logic behind it a little bit)
Congrats on your wedding.
> if I have separate World View Projection Matrices, how would I make my shader see that as a single WVP matrix automatically as in pre-shaders?
You write your shader to expose a single WVP matrix, but then you write your C# wrapper class to expose separate W, V, and P properties. The setters for these properties just store the new value, and set a dirty flag, but don't directly call EffectParameter.SetValue (since there is no single EffectParameter corresponding to these properties). Then in your OnApply method, you check if this dirty flag is set, and if so, combine the three matrices to compute the combined WVP, then pass this derived value through to the shader using EffectParameter.SetValue.
To see how we used this, you could check out the 4.0 CTP implementation of BasicEffect using .NET Reflector...