Shawn Hargreaves Blog
Normally, the CPU and GPU run in parallel. Framerate = max(CPU time, GPU time).
If your code causes a pipeline stall, however, the processors must take turns to run while the other one sits idle. Yikes! Now framerate = CPU time + GPU time. In other words, programs that stall can be both CPU and GPU bound at the same time.
The easiest way to cause a stall is to draw some graphics into a rendertarget, then GetData on the rendertarget texture to read the results back to the CPU. Think about what happens if you do this:
One of the great successes of the Direct3D API is how it hides the asynchronous nature of GPU hardware. Many graphics programmers are writing parallel code without even realizing it! But as soon as you try to read data back from GPU to CPU, all this parallelism is lost (one reason it is hard to accelerate things like physics or AI on the GPU).
A similar problem occurs with occlusion queries. To avoid a stall, the query returns immediately, but with the IsComplete property set to false. The query completes at whatever later time George gets around to processing the relevant drawing instructions. Games must deal with this data not being available straight away. For instance our Lens Flare sample falls back on occlusion data from the previous frame if the latest information is not yet available.
There is one situation where you can cause pipeline stalls purely by writing data to the GPU, rather than reading back from it. Can anyone figure out what that is?
Could you cause a stall by modifying a texture that is due to be used by a draw call in the buffer that has yet to be executed?
A shot in the semi-dark:
Would it be setting the device.Texture = ...
in the draw call just before drawing the mesh that needs the texture?
a) trying to write a huge polygon?
b) Or a huge mesh with a great deal of small polygons?
b) lots of state changes?
c) Pushing the GPU limit using large texture sizes?
d) locking a source of data like the vertex buffer when the CPU and GPU need to sync? As if we forget unlocking it?
What about calling present? Or am I thinking too simple now?
This stall doesn't happen with RenderTarget2D.GetTexture(), does it? I thought that call made a copy of the RT2D texture to a new texture in video memory, the return is just a reference to a new chunk of vram. Shoudl be nice and fast, since the op is not working within a SpriteBatch.Begin-End block and is pulling the data from a RT that is not the current RT. Right?
Zyxil: don't worry, that won't stall.
RenderTarget2D.GetTexture just returns a reference to a texture object, so it doesn't care whether the GPU has actually finished drawing data into that area of video memory yet. It can give you back an object straight away, while the driver stores some internal state to remember that this texture is not actually filled in yet.
What happens next depends on what you do with that texture.
If you draw it (using a SpriteBatch, or a custom shader, or whatever), there is no problem. The driver can send another command to the GPU saying "ok, now draw using this texture". It doesn't matter that the texture is not filled in yet, because it will be by the time the GPU gets around to processing this drawing call which uses it.
The problem is if you call GetData on the texture. At that point, the contents of the texture are read back to the CPU, which cannot proceed until the data is actually filled in, so you have a stall.
Thanks for the clarification, Shawn. I was pretty sure that's how it worked, but I just wrote a bunch of RT2D code for drawing GUI Dialogs and I didn't want to have to go back and rewrite it all three months from now when I start stressing the app.
Love the blog! Keep it up!