How often should we be measuring?

Testing performance is tough. As a product is being developed, you need to track the performance of what you're building. The question is how often?

It turns out it's a very difficult and time consuming thing to identify the cause of failures in a performance suite when the amount of change to the product is large. You spend a lot of time profiling something, fixing a bug, and re-running your profiles. Say, for example, that there are 15 changes in the build you're testing, base lined against the last one. You end up with 43 performance degrades. What caused what? Finding out is going to be a long process. Your options, essentially, are to profile each individual degrade and find the hotspots in the new code, or to profile a sample of them and hope the fix for those problems fix the rest. The first option is time intensive to the max: Running the tests again under the appropriate profiling tools, finding the bugs, fixing them, and validating the work. The second approach is a crapshoot. What do you do the next day when your tests haven't come back? Now you have to ask yourself the question “Did the fix I made really fix this problem, and this is caused by something new? Or did I not fix it?”

A good solution to the problem is serious performance testing on each and every change to the product. This is not to say that you must run days of testing for each change, but some testing is prudent. The problem with this approach is that it costs some infrastructure time. As we discussed earlier, performance testing can't be done on any old machine sitting around the office. It should have dedicated hardware and as static an environment as possible, to avoid putting noise into the process.

On my team, there's a system which produces a build of the product for nearly every change. This is to facilitate running automated check in BVTs. As a nice side effect, we on the performance test team can grab those builds and test them. We have a bank of machines set up and some automation in place to monitor the drop shares for the builds and start tests automatically. The results are then pumped into a database and viewable through a webpage.

This system is new in the last six months. The old process was to test a build or so a day, and write bugs in hopes something could be done. Today, we're able to write much more targeted, actionable bugs. The result is that problems are fixed much faster, before they turn into “Death by paper cuts“, which we'll explore soon.