Shawn Hargreaves Blog
The BasicEffect API and feature set did not change in Game Studio 4.0, but the implementation saw some aggressive optimizations.
In previous versions, BasicEffect was intended as a starting point for beginners. We expected expert programmers to soon move on to writing their own shaders, so as long as BasicEffect performed adequately on both Windows and Xbox, it wasn't worth spending time on further optimization.
Windows Phone changed this priority for two reasons:
I though it would be interesting to describe the details of how we sped things up.
In spite of its name, BasicEffect is actually quite complex! If you look at the HLSL source code, you will see that the previous versions included 12 different vertex shaders to support all permutations of these options:
Plus 4 different pixel shaders:
BasicEffect has many other adjustable knobs, but we did not include specialized shaders for every possible combination. For instance we always evaluated the fog equation, and implemented the FogEnable property by setting parameter values to make the result come out zero if we did not want fog.
There is a balance between providing many shader permutations (which minimizes GPU instruction counts), versus fewer shaders (which minimizes memory overhead and development/test cost). When we reevaluated this balance in the light of Windows Phone, we decided to add more specialized shaders. As of Game Studio 4.0, BasicEffect now has a total of 32 permutations. There are 20 vertex shaders:
Plus 10 pixel shaders:
A common tension in shader programming is that when you design effect parameters to provide a nice clean API, the resulting parameter formats are not always the most efficient for HLSL optimization.
D3D tries to correct any such mismatches through a feature called "preshaders". The HLSL compiler looks for computations that are the same for all vertices or all pixels, and moves these out of the main shader into a special setup pass which runs on the CPU before drawing begins. This is a great feature, but has a couple of fatal flaws:
Game Studio 4.0 adds the ability to implement preshader computations in C#, by overloading this new method, which is called immediately before EffectPass.Apply sets parameter values onto the graphics device:
protected virtual void Effect.OnApply();
This allows BasicEffect to expose whatever properties the API requires, without needing these to match the underlying HLSL shader parameters. When the programmer changes a managed property, we just set a dirty flag, then recompute derived HLSL parameter values during OnApply. We used this new ability to precompute many things:
We also applied some good 'ole algebraic optimizations, using math to find cheaper ways of getting the results we wanted.
We got a nice win from vectorizing the lighting computations, using matrix operations to evaluate all three lights at the same time. The new code is harder to read, but a couple of instructions shorter.
One place we slightly changed the final output is the fog equation. Previous versions used distance fog, which is computed from the distance between camera and vertex. We now use depth fog, which only considers how far in front of the camera each vertex is, ignoring any sideways offset. The visual difference is subtle, but depth fog is much cheaper to evaluate.
To take one example, here are the instruction counts for BasicEffect using vertex color and texture, but no lighting or fog:
Game Studio 3.1
Game Studio 4.0
Preemptive question: "can we get the source for these optimized shaders?"
I would certainly love to release this, but we haven't worked out he details yet. Stay tuned!
OnApply sounds very interesting! I would love it hear more on how this works.
Is is only for effect parameters or could this be used to build a inheritance based effect system?
Damn, 30 to 6. You guys are raising the bar to production level.
>>>Damn, 30 to 6. You guys are raising the bar to production level.<<<
Just how slow is the phone GPU compared to the xbox gpu(a good benchmark for performance in general)? eg 5x, 10x, 100x?
On windows/360 a shader of mostly ALU instructions doesnt even register for a GPU... In a realistic situation(ie not 1 million cat particles:-) the only thing I have found to even tax the GPU shader wise are post processing effects.
However the effects changes look very interesting, most of my draw loop is spent calling into the effect object, begining the effect, commiting changes, setting the technique etc.
How does the change in effect code format improve things(I noticed this mentioned before)? Faster validation in those calls perhaps?
>Preemptive question: "can we get the source for these optimized shaders?"
>Collapse the World, View, and Projection matrices into a single WorldViewProj matrix.
Thaaaank you. This annoyed me when I was looking at it in reflector.
Also, thank you for OnApply. I did something similar on my own effects.
>We expected expert programmers to soon move on to writing their own shaders
Out of curiosity, how close was this expectation to reality? Do you know?
As usual, thanks for the update. It sounds like the optimizations were well thought out, and should make for a nice improvement in performance.
The one area I am concerned about is the switch from distance fog to depth-based fog. It seems to me that if fog is now in its own specialized set of shaders anyways, you might as well leave it as distance based. Otherwise, in exchange for a bit of performance, people will experience objects popping in and out near the FogStart depth as they rotate their cameras.
>>We expected expert programmers to soon move on to writing their own shaders
>Out of curiosity, how close was this expectation to reality? Do you know?
I don't have the numbers, but I'd guess that this expectation didn't really hold. I'm pretty comfortable with writing shaders, but when BasicEffect just fits the bill why bother?
What prevents the inclusion of custom shaders on Windows phone? Are you using mobile GPU compilers that aren't authorised for public release?
Unrelated question; do the new XNA4 render states mean that setting render state from C# is effectively as fast as doing it in the HLSL technique definition? Or is the later still faster due to fewer driver calls?
Very nice, but it would still be nice to have custom effects or something similar for the phone to do post processing.
Thank you for this information. Very nice work, I'm having a great time working with the Windows Phone 7 developer tools.
Since I'm still new to XNA, I have a question: is it better to have a single BasicEffect instance and change it's properties dynamically or would we have better performance with multiple instances, each with unique set of properties?
Just as an example, I could have a BasicEffect with lighting on, texture off, fog off and another with lighting off, texture on and fog on. Before rendering an object, I would only need to set the World, View and Proj Matrix on each of these effects. Is this a better approach?
> Just how slow is the phone GPU compared to the xbox gpu
Mobile GPUs are an order of magnitude slower than game consoles. I'm hesitant to quote exact numbers because performance is too multi-dimensional a problem to be meaningfully boiled down to just a single ratio (the only way to get really accurate data is to measure your specific code once hardware is available) but you definitely shouldn't be expecting anywhere near Xbox console level perf. Also, the CPU/GPU balance is significantly different from Xbox (the ARM CPU in Windows Phone is really rather fast).
> > We expected expert programmers to soon move on to writing their own shaders
> Out of curiosity, how close was this expectation to reality? Do you know?
It depends on which specific expert programmer you look at: many write all their own shaders, while others get great mileage and never see the need to move beyond the built-in stuff. I've been surprised how many awesome games made by awesomely skilled teams (especially on the XBLA side) achieve amazing graphical quality just from having great game design and great artists, and do all their drawing with SpriteBatch!
However, even when we saw people making polished games with BasicEffect, I never saw a game that was bottlenecked by BasicEffect GPU performance on Windows or Xbox, so this was never a sensible place to spend optimization effort (until Windows Phone came along, anyway).
> The one area I am concerned about is the switch from distance fog to depth-based fog. It seems to me that if fog is now in its own specialized set of shaders anyways, you might as well leave it as distance based. Otherwise, in exchange for a bit of performance, people will experience objects popping in and out near the FogStart depth as they rotate their cameras.
There isn't a really clear right answer here. From our tests, the two fog modes are visually indistinguishable for the vast majority of games, although of course there are a handful where this can make a difference. The performance difference is pretty significant, though, which matters a lot for Windows Phone. In the end we came down on the side of the more efficient version, reasoning that Windows or Xbox developers who needed different visual results can always write their own shader to implement that, while Windows Phone developers who needed more perf would have no other options, so the Windows Phone requirement should take priority here (it didn't seem like an important enough issue to warrant supporting both options).
> What prevents the inclusion of custom shaders on Windows phone?
> do the new XNA4 render states mean that setting render state from C# is effectively as fast as doing it in the HLSL technique definition?
C# vs. FX state setting performs roughly the same in all Game Studio versions. Even prior to state objects, the cost of changing states is almost entirely in the driver, so it makes little difference whether you trigger this work from jitted C# code vs. interpreted FX data tables.
> is it better to have a single BasicEffect instance and change it's properties dynamically or would we have better performance with multiple instances, each with unique set of properties?
Either way is fine. I usually tend toward the latter, but it's unlikely to make a huge performance difference either way.