Elementary, my dear watson

Elementary, my dear watson

  • Comments 2

Hi! I’m Luis Dieguez and I am a tester on the Expression Web team.

luis

One of my tasks on the team is to monitor Watson data for our product. Watson is probably one of Microsoft’s most useful customer feedback technologies but it definitely takes the first spot as one that our customers hate the most. Why? Because Watson is a tool that appears only when Expression Web has crashed.

clip_image002

This dialog, my dear friends, is one of the main components of Watson.

In all seriousness, it seems inevitable that a software program will ship with bugs. Our testing team and Beta efforts target to minimize the number and severity of the bugs we ship with in order to make the program as stable as possible. However, the chance of not finding a critical crashing bug is always there and in this case, we want to know about it and fix it as soon as possible.

Watson allows us to comprehensively track what crashes and hangs are being hit by users of Expression Web in the real world (and during development of the program). Every time a user hits a crash, by providing the dialog above, we give our users the option to transfer the crash data to Microsoft automatically, providing us with all the system information we need in order to investigate the root cause for what could have caused the crash.

Basic Watson Flow

In a nutshell, this is what happens the instant the Watson dialog hits your screen:

· When a crash, hang or other tracked failure event occurs, the Watson program is activated and the dialog appears.

  • If you opt to send the information to us, Watson prepares an error report containing multiple files including a minidump. The minidump contains a subset of memory with the data that is most useful for developers, such as stack details (a list of the program code functions that were last hit), some system information, a list of loaded program modules, the type of error, and global and local variables. Minidumps are packaged into CAB files and sent via a secure HTTPS connection to Microsoft.
  • Error reports that indicate the same code defect are grouped into buckets.

Bucketing permits us to prioritize the Debugging Effort and allows us to focus on the buckets that get the most hits. (Approximately 80% of failures in the real world are caused by 20% of the bugs.)

It is important to note that we only receive the data after you explicitly give your permission and the data is strictly used to improve the Expression Web program.

Watson is particularly useful in providing crashing trends in our product. Every time I look at the Watson database, if I notice that many users are hitting a particular bucket I need to escalate this immediately and trigger the investigation process. Once we know the cause of the crash and we can reproduce it locally, we implement and test the fix and depending on the severity of the bug, we will ship it in the form of a security patch, service pack or in the next release.

What can you do to help?

While Watson provides us with very good information on the system and the place in code where the crash might have occurred, one of the hardest challenges with Watson data is coming up with the right set of steps that trigger the crash. For example, Watson might tell us that we are trying to use a null pointer, but it doesn’t tell us where or how in code that pointer was nullified.

Moreover, if we cannot reproduce the crash locally, we can’t fix the bug. Here is where you can become a key element in helping us fix a bug.

When you encounter a crashing bug:

1. Always, always submit the Watson data. It is easy, just click the “Send information” button and the data will take a short time to submit.

2. Log a bug at connect.microsoft.com/expression with the steps you did to get the crash. Adding the file or set of files you reproduced the bug with also helps.

3. Make a comment on the bug with the Watson bucket that was created with your crash. This will help us match a set of repro steps with Watson buckets we don’t have repro steps for. To get the Watson bucket, just follow these steps:

· Right-Click on “My Computer”

· Click “Manage”

· Go To “System Tools | Event Viewer | Application”

· Sort by Type

· All Watson Events will have ID 1000 and 1001.

Event 1000 is fired when Watson first encounters a crash.

Event 1001 is fired when information (including the bucketID) comes back from the server.

clip_image004

If you were successful at reporting, double click on the most recent 1001 event and you will see the bucket ID:

clip_image006

Just add this bucket ID into the bug you log and that's it!  We will then link your bug to our Watson buckets and we'll have a set of clear set of steps to reproduce the bug. 

Cheers!  See you next time.

Leave a Comment
  • Please add 8 and 3 and type the answer here:
  • Post
Page 1 of 1 (2 items)