Shawn Hargreaves Blog
We have a game. We want to know whether it is limited by CPU or GPU performance. There are three possibilities:
There is no direct way to measure this, but we can work it out by applying some detective skills.
First off, we must get the game into a state where we will be able to notice even small changes in performance. This means adding a framerate display, and using a special profiling project configuration that disables vsync and selects variable timestep mode (you don't want those settings while actually playing the game, but they give more precise results while you are profiling it).
Next, run your game under a CPU profiler. Using the inclusive view (the one that includes all the functions called by any given function, not just the function itself), note how much time was spent inside each of the Big Three methods: Update, Draw, and Present. If you see much time in Draw or Present, that tells us nothing, because this time could be the result of either CPU or GPU overhead, but if all your CPU is inside Update, that proves you must be CPU bound. Congratulations, you reached the end of the investigation!
Most often, the profiler will show some combination of Update, Draw, and Present. If there is a lot of time in Present, this is a hint you might be GPU bound (in which case the CPU is having to wait for the GPU to catch up), but it isn't absolute proof: it is also possible your game might just be causing a lot of CPU translation work for the driver.
To gain more information, we can change something about the game, then observe whether the framerate changes in response.
Paradoxically, even though we are interested in measuring graphics performance, changing our graphics code is not a good way to do that, because drawing calls require work on both the CPU and GPU. Sure, if we temporarily commented out half our drawing code, the framerate will probably go up, but that tells us nothing about which processor was the bottleneck!
To get a useful measurement we must change something that affects one processor but cannot possibly alter the other. The Update method is a good place to start, since that only involves the CPU.
What happens if you add this at the top of your Update?
Does your framerate change? If not, you are GPU bound. Even better, you can gradually increase the time delay until the framerate does change, which will tell you exactly how long the CPU was idle.
If the sleep call does affect the framerate, you must be either CPU bound or evenly balanced. To distinguish between those possibilities, go the other way and try reducing the amount of CPU work.
An easy way to reduce the CPU load is to skip the entire Update method (caveat: this only works if your Update takes a greater than negligible amount of time, which you can check using the CPU profiler). We can't permanently remove Update, because that would also affect our Draw performance (there won't be anything to draw if the Update never got around to spawning any monsters!) but after we run the game normally for a while, we can later skip some Update calls to get a clean measurement.
I like to bind this to a spare key in my profiling builds:
protected override void Update(GameTime gameTime)
This lets me play the game until I reach a "typical" point, then press these keys to see how the framerate changes.
If the sleep has no effect, I must be GPU bound.
If skipping Update speeds things up, I must be CPU bound.
If skipping Update has no effect but sleeping does slow things down, the two must be evenly balanced.
What if skipping Update speeds thing up (33fps -> 38fps), but sleeping also slows things down (33fps -> 31fps)? Or should I look for bigger changes than 2 fps?
Dennis: that means you are CPU bound. Skipping CPU work by taking out the Update is boosting framerate, while slowing down the CPU by adding a sleep is hurting framerate. Your framerate is directly affected by altering the CPU load, so it must be the CPU rather than the GPU that is limiting how fast your game can run.
This series of posts on GPU- and CPU-bound processing is fascinating. I wonder if the framework might benefit debuggable performance counters here, similar to what the Garbage Collector subsystem does? I would assume that an XnaFramework.TimeWaitingToPresent metric that totals the amount of time XNA has spent blocked while waiting to write to the GPU would be particularly useful, especially with vsync disabled, and wouldn't add significant performance overhead to write (after all, the processor is idling during this time anyway). Another useful metric would be XnaFramework.InstructionBufferUsage, although I suspect this one would be more expensive to track...
I have Thread.Sleep(1) in main loops of both game threads :o. Without it, game would consume 100% of cpu. But removing it doesn't seem to change fps!
aaa: that's a textbook example of a game that is GPU bound (or possibly vsync bound, if you haven't turned of vsync before taking these measurements)
I am having weird behavior in XNA4 .My Frame rate was getting capped at 60fps , even using the Profile configuration. While debugging the game in break point and adding the Thread.Sleep method , my games Frame rate went above the 60fps thresh hold , it went to over 500fps. I don't really get why I'm seeing this after adding thread.sleep methods.
Here's a weird one - what's it mean when my game has all the signs of CPU-boundedness (slowdown on sleep, speedup skipping update) but framerate improves when I simplify my pixel shader? Don't have pipeline stalls I can think of (no reads of GPU resources, all buffers use SetDataOptions.Discard).
Tricksy - that sure sounds like you are bottlenecked by BOTH at the same time! Which is only possible if CPU and GPU are for some reason running in series rather than parallel. That most likely means a pipeline stall is occurring somewhere along the way, but could also be a driver issue. What hardware is this on, and what framerate are you at?