One of the benefits we get from Watson and our internal usage of OneNote is that we get a different set of feedback from internal users when we have a new build of OneNote for everyone to use. It works like this:
Here's where we look at the Watson data for any new spikes or hits. This can be a little tricky since not everyone will update at the same time, some folks will delay updating for weeks and so on. The end result is that we may only get a small number of users on any given build, so we have to take that into account for any Watson reports.
For example, suppose we look at the data for the first 50 people that upgrade. If anyone hits a crash, that represents a 2% failure rate, so we have to take that very seriously and start an investigation. As more people upgrade, we start to get more reports and then we can start prioritizing according to how many people are hitting each error.
Having fewer users makes this much more critical to get right since any one user can represent a large percentage of the reports. But we also have to take each report and examine it thoroughly so that we don't introduce a bug that a whole lot of people will see as well - we can afford to let these initial reports slip by.
It's a challenge we face each day on the test team and I thought it would be interesting for you to think about. There's more I could say here from a statistical viewpoint if you like, Just let me know.
Questions, comments, concerns and criticisms always welcome, John