Shawn Hargreaves Blog
I'm cross posting this discussion from an internal Microsoft mailing list, because I'm so awesomely cool that I just can't bear the thought of everything I ever wrote not being indexed and archived for posterity :-)
My reply to a question about hardware instancing on Xbox 360:
The 360 doesn’t support vertex stream frequency in the same sense as DX9 SM 3.0 uses it. It just provides the vfetch instruction, which you can use to implement all kinds of crazy addressing schemes.
I’m familiar with several good ways to implement instancing using vfetch:
Chris Tector suggested a cunning fifth option:
There is 2a: indirect your transform indices. Store a transform index vertex buffer which holds 1 DWORD index of which transform to use on an instance. Then you can avoid the lock stalls by playing dirty and never locking. You write a modified transform to a not in use location in the transform vertex buffer. Then you rewrite the index to point to the newly written transform. You’re relying on atomic updates of the single DWORD transform index. So:
Since I haven’t tried it in GS, my question is a more general “loose” multi-threading one. Is this possible? Can I play dirty like that in safe only managed land? I’m guessing no since I don’t ever get the pointer to the VB memory.
To which I replied:
That should work in GS. You don’t get a raw pointer to the VB memory, but you can use SetData with the NoOverwrite semantic to update pieces of a dynamic VB without a stall.
I seem to remember NoOverwrite also isn't supported on the 360.. Perhaps I am wrong.
How many frames buffer would be needed on the 360 (with / without vsync?). I'd imagine it would be less than the 2-5 on the PC.
I'm currently in the process of implementing vfetch myself - and I am on the fence between #2 and a variation of your '2a' technique as well.
And I'll just say that when your not using Effects, dealing with xbox shaders is a royal pain.
:-)
Some more feedback:
I've found that using fmod(index,freq) is much more reliable than using index%freq.
@StatusUnkown:
When using index % freq, you can add 0.5 to your index first to avoid rounding errors in the modulus calculation. E.g.:
int vertexIndex = (index + 0.5) % vertexCount;