My last update was about indirect performance testing of OneNote when I was creating the Gutenberg addin. That was an example of some bare minimum performance testing - using a stopwatch and built in memory tracking tools (Task Manager) to measure performance. For the addin I was writing, this was adequate. "Stopwatch" testing (more generally, measuring responsiveness of applications) is a very obvious test to complete. Personally, sluggish responsiveness of an application I am trying to use is my single biggest point of frustration so I keep an eye on it.
Using a stopwatch doesn't always give us enough data. Sluggishness can "come and go" too quickly to time with a watch, but still be easily noticeable visually. In order to really understand what we may be seeing, one technique we use is to automate testing with markers placed in our code to measure timing and even more metrics such as memory usage, CPU utilization, handles used, GDI, file IO, disk IO, network bandwidth, etc… Opening a page is a key test in which we want to minimize the time spent before the UI is rendered. It's also a pretty well understood test almost as simple as "start a timer, open a page in a known state, stop the timer when done." We will design some fixed test pages and track the time it takes to open them over time to make sure we don't slow performance down from one version to another. It also makes it very easy as a tester to create automation. If we were to add a new feature to OneNote like some new data type, we could create a new "baseline" page or series of pages with that data type on it and just plug it into the existing framework.
We can reuse that same testing framework to test scalability. Like I mentioned before, the more Outline Elements you have, the slower OneNote runs. We can create a series of tests with 1, 10, 100, 1000 and so on elements on a page and compare performance. This is where we want linear performance and where we can see results that let us talk about "Big O" notation (which computer geeks always want to talk about :) ). Defining the "and so on" is also a fun question to answer. When designing a product for the first time, you really don't have an idea of how many outline elements users will put on a single page. We can debate statements like "No one will have a book with 10 billion sentences in it." That may be true for now, but may not be true in the future. When we design our tests, we analyze current data to get a feel for the distribution of what exists and then apply that to our testing. For this addin, an easy task would be to examine the stock of eTexts at Project Gutenberg, get an idea for the max size, and use that as the upper limit for the area you will define as optimizing for perf. There's much more to be said about designing these performance tests - this is only a vast simplification.
An unseen task that falls to test is to build the framework to store and analyze the results between runs. A comment I've gotten a few times is "I never realized testers write that much code!" and believe me, when it comes to the framework surrounding automation, SDETs really earn the "D" for developer in the title. The half joke/half reality comment I tend to make is that writing the code for an automation script is the easiest part of the process, and dealing with integrating it into our automation framework is where the hurdles lie. Office has the concept of "shared teams" which provide services to everyone. They create, maintain and upgrade the systems we all use, and we usually have some tweaking to do to make the system work with each individual application.
As an example, each application has its own performance testing criteria. For OneNote, for instance, the performance of our "napkin math" is not nearly as critical as the performance of math operations in Excel. Switching folders in Outlook is a key task for them, and it seemingly parallels nicely with navigating notebooks in OneNote. The structure to measure performance is vastly different though. Outlook is a MAPI client and ON doesn't use MAPI and is far more dependent on the file system for navigation. And so on… Each team will need to define its most critical performance tests, which metrics are most interesting to log and develop a system to track any changes over the course of a development cycle. The shared teams provide a basic framework for this, but if you have a very unique test, you may need to develop some tracking infrastructure on your own. OneNote will design the test to measure the performance of navigating around in notebooks, and the shared team will log and store our results for us.
Questions, comments, concerns and criticisms always welcome,