Profiling with Stopwatch

Profiling with Stopwatch

  • Comments 10

A while back I was doing some performance investigation on a Zune game. It was CPU bound, and Remote Performance Monitor said garbage collection was not a significant problem, so I knew this must be a good ole "too much code taking too long to run" CPU performance issue. That meant a sampling profiler was my tool of choice.

Luckily, I had a Windows version of the game, so I fired up the VSTS profiler and recorded a couple of minutes of gameplay. Because I was running a Zune game on a fast PC, I saw 90% of CPU spent in the Game.Tick method, which was just sitting there waiting until it was time for the next frame. That wasn't very interesting, but when I dug into the remaining 10% (which would account for 100% of CPU on the slower Zune device) I easily identified the slowest couple of methods. Task complete.

And then my conscience caught up with me. How can I be sure the things that are slow on Windows are also the reason this game runs too slowly on Zune? They are highly likely to be the same, but I didn't want to risk giving a report back to the game developer, only for it to turn out the Zune bottleneck was somewhere else entirely!

There are no CPU profilers for Xbox or Zune, so the only way to find out was to instrument my code. I started by adding this timer class:

    class Profiler
    {
        public static List<Profiler> AllProfilers = new List<Profiler>();

        string name;
        double elapsedTime;
        Stopwatch stopwatch;
        
        public Profiler(string name)
        {
            this.name = name;
            AllProfilers.Add(this);
        }

        public void Start()
        {
            stopwatch = Stopwatch.StartNew();
        }

        public void Stop()
        {
            elapsedTime += stopwatch.Elapsed.TotalSeconds;
        }

        public void Print(double totalTime)
        {
            Trace.WriteLine(string.Format("{0}: {1:F2}%", name, elapsedTime * 100 / totalTime));
            elapsedTime = 0;
        }
    }

I also added this component, which prints profiler data to the Visual Studio output window every 5 seconds:

    class ProfilerComponent : GameComponent
    {
        double totalTime;

        public ProfilerComponent(Game game)
            : base(game)
        { }

        public override void Update(GameTime gameTime)
        {
            totalTime += gameTime.ElapsedGameTime.TotalSeconds;

            if (totalTime >= 5)
            {
                foreach (Profiler profiler in Profiler.AllProfilers)
                    profiler.Print(totalTime);

                Trace.WriteLine(string.Empty);
                totalTime = 0;
            }
        }
    }

To use this system, add a ProfilerComponent in your Game constructor:

    Components.Add(new ProfilerComponent(this));

You can then create a separate Profiler instance for each thing you are interested in timing. For instance I added these fields to my Game class:

    static Profiler updateProfiler = new Profiler("Update");
    static Profiler drawProfiler = new Profiler("Draw");

Call Profiler.Start and Profiler.End before and after the code you want to measure, and Bob's your uncle.

It was a pain having to remember to call the End method, though, especially when there were multiple return statements or exceptions that could be thrown from the code I was measuring. So I added this helper:

    struct ProfileMarker : IDisposable
    {
        public ProfileMarker(Profiler profiler)
        {
            this.profiler = profiler;
            profiler.Start();
        }

        public void Dispose()
        {
            profiler.Stop();
        }

        Profiler profiler;
    }

It is used like so:

    protected override void Draw(GameTime gameTime)
    {
        using (new ProfileMarker(drawProfiler))
        {
            GraphicsDevice.Clear(Color.CornflowerBlue);

            // TODO: Add your drawing code here

            base.Draw(gameTime);
        }
    }

Armed with this infrastructure, it was but the work of a moment to confirm that yes indeed, the Zune game was spending 60% of CPU in the same place indicated by my earlier Windows profiling.

So why bother profiling on Windows at all, if I was going to check the results using Stopwatch no matter what?

To save time.

Stopwatch profiling is a blunt tool. It tells how long the code inside each ProfileMarker block takes to run, but this is only useful if you somehow know or guess which pieces of code to wrap these markers around. Identifying performance hotspots by trial and error or binary search is laborious and not my idea of fun. It is much quicker if I can go in already armed with ideas about where to look.

You can think of the Windows profiler as circumstantial evidence ("I saw someone who looked like him hanging around before the robbery"), while using Stopwatch on Xbox or Zune is the equivalent of a finding the stolen goods stashed in his basement. You need the latter to secure a conviction, but you also need the former to know which basement to look in!

  • What about the perf impact of calling new ProfileMarker() every time? For smaller methods that are executed often in the code this can IMO significantly skew the results (especially on Zune).

  • > What about the perf impact of calling new ProfileMarker() every time?

    Allocating a 4 byte value type is pretty much insignificant unless the code you are trying profile is incredibly quick (in which case it most likely isn't your hotspot in any case :-)

  • hi, correct me if I am wrong.

    Since ProfileMarker implement IDisposable,

    there will be a boxing of ProfileMarker when

    Dispose is called?

  • > Since ProfileMarker implement IDisposable, there will be a boxing of ProfileMarker when Dispose is called?

    No. ProfileMarker is a value type, and is never used polymorphically, so there is no heap allocation or boxing.

  • Most .NET profilers I've used tend to cause the .NET compiler/JITer (or both) to cause issues with optimizations.

    The main issue is structs - given that a game in a tight loop is sensitive to objects created on the heap (later GCed), the use of structs is fairly important.

    I noticed that the XNA framework uses "ref structs" in quite a few places for input only parameters - going against the "spirit" and design guidelines of .NET. Why? .NET doesn't optimize structs properly in the debugger. The same goes with profilers.

    I had "ref structs" all over my game and fairly unreadable code littered everywhere as a result. Ripped them all out and replaced them with the "slower" normal structs. No difference in release mode, but the second I attach a profiler the game slows down by 50% and it shows I'm spending all my time copying structs around on every function call.

    In other words, 99% of the time I end up chasing "Heisenbugs" when profiling XNA games. I really wish the struct issue would be fixed in debug mode, but it's probably for some arcane debugger reason deep in the CLR :(

    The best thing I ever did was follow my guts, use Stopwatch, stop profiling in all but specific cases, and use structs *normally* :) The CLR figures out the best thing to do during Compile time/JIT anyways.

  • @ShadowChaser: that sounds like you were using an instrumented profiler (which hooks into the code generation and can alter performance characteristics in a similar way to when a debugger is attached) rather than a sampling profiler (which does slightly slow execution due to the overhead of taking the samples, but is a fixed cost that will not slow one part of your program more than others).

    There is a time and place when instrumented profiling makes sense, but I most often find sampling profilers to be a better starting point.

    Passing large value types by reference can be a very worthwhile optimization when applied in the right places, but it's only worth doing this in critical hotspot code.

  • The only problem I have with this is that the profilers are auto listed on construction but never removed from the list.

    If this were C++ I would just remove it in the destructor, simple and consistent. But since this is managed code and people seem to shun destructors here, how could this be resolved in a good way?

    Regards

    Pontus Birgersson

  • @HerrUppoHoppa: the standard pattern in .NET is for things that require deterministic cleanup to implememt IDisposable and do whatever cleanup they need in their Dispose method.

    I didn't bother with that since I never needed to remove profiler instances.

  • @ShawnHargreaves: that makes a lot of sense - all of my profiling had been done using instrumented profilers. Thanks for the great tip - I'll definitely look into sampling profilers a bit more.

  • This Profile class is nice, just helped me find some issues with my sound, Thanks!

Page 1 of 1 (10 items)
Leave a Comment
  • Please add 5 and 2 and type the answer here:
  • Post