I was recently involved in some analysis of a WPF application which used 3D heavily that wasn't hitting its performance goals. The application was pegging the CPU and acted incredibly sluggish.  What didn't make sense was that the CPU was getting pegged despite a video card with 256mb.  Watching Tim Cahill and company go through troubleshooting the application proved illuminating and I want to share some of the learnings.

The first test to make the window of the application smaller to see the implications on performance.  When shrinking the window to 150 x 150 pixels, the performance increased dramatically, suggesting that the application was "fill rate bound."  What is "fill rate bound?"  This is the speed with which the video card can "fill" pixels on the screen.  I found this article quite useful in its discussion of fill rates and of performance in general.

The big question was why is the CPU getting pegged if it was fill rate bound?  Shouldn't the GPU be handling this?  The first suspicion in this case would be that something was rendering in software and the logical fix would be to remove the offending code that was causing WPF to be rendered in software.  However, profiling the application showed that the application was largely running in kernel mode.  This might seem like a dead end -- it is, um, unreasonable to remove offending code in the kernel -- but it actually is revealing.

Kernel mode CPU usage is usually a strong indicator that the GPU is overloaded.  The way hardware acceleration works, the GPU processes instructions asynchronously that it receives from the CPU.  Because it’s asynchronous, the GPU can fall behind the CPU, and there’s a fixed-size GPU instruction queue.  When the CPU issues instructions too quickly and the queue is filled, the video driver has to prevent the CPU from issuing any more instructions.  This manifests itself as kernel-mode CPU time.  WPF tries to prevent this condition from happening in the general case, though edge cases still exist. 

(Sidenote: to get at profiling information that reveals more granular CPU usage information, one could use AMD's Code Analyst or Intel's VTune.)

The moral of the story?  Just because you are hardware accelerated doesn't mean that you are guaranteed great performance.  You can still max out the GPU.  Game developers, who are used to squeezing every inch of performance out of their application, know this.  However, it is easy to get yourself into trouble, especially if you are new to 3D programming and are doing lots with complex 3D meshes, complex materials (visual brushes, video) and animation. 

What are some possible causes of being fill rate bound? 

  • Too many visual / drawing brushes in your scene.  These brushes use video memory much more slowly than image/solid color brushes and you should use as few of them as possible, as small as possible.  
  • Non-front meshes with large idle visual brushes on them.  Even though they’re not visible they’re still going to be rendered if they’re there.
  • Too much overdraw.  Writing to the same pixel 5 times is 5 times as slow as writing to it once. This can be especially problematic z-order issues in 2D and the z-axis in 3D. 
  • Too many materials on your 3D models.
  • Too many lights in your scene.

Generally, the best way to try and fix this problem is to isolate the different features of the application that are guilty of the above and, if possible, add them back one at a time to determine the culprit.

Thanks to the WPF performance team for their insights on this topic.