Previously, we described our Adoption reports. These reports provide you with information on downloads, adoption rate, user ratings and usage which, together, can help you determine the popularity of your app. Adoption reports are useful, but they're just a part of the reporting tools we provide in the Windows Store. We also provide reports related to app quality. These reports help you measure and improve the quality of your app. In this post, program manager Kalyan Venkatasubramanian describes the Store's Quality reports and how you can use them to improve your apps.

-- Antoine Leblond


Before we get started, we need to define what we mean by "quality." After all, there are many facets to determining quality—like usability, reliability, security, and so on. For the Windows Store, we chose to focus our Quality reports on providing you analytics data based on your app’s reliability as experienced by your customers.

You access the Quality reports through your app's app summary page. From that page, click the Quality link, which takes you to the Quality reports screen.

Quality Reports

An example of what the quality reports screen looks like in the dashboard.

These reports help you measure the quality of your app by tracking the app's failure rate—the number of failures customers experienced. A failure is defined as the unexpected closure of an app due to one of the following reasons:

  1. A crash
  2. An unresponsive app (hang)
  3. An unhandled JavaScript exception

(Note: Unhandled JavaScript exceptions, which we refer to simply as JavaScript exceptions, are of course only applicable to apps written using JavaScript. )

With these reports, you can:

  1. Understand the quality of your app over the different versions that were published to the Store. This tells you if customers of your app are having a better experience in successive versions.
  2. Improve the quality of your app. You can improve the quality of your app by knowing about and understanding the top failures (as seen by your customers) in the latest version published to the Store. Understanding the top failures enables you to fix them and publish updates to your app in the Store.

Understanding the quality of your app

As we defined in the last section, you can determine the quality of an app by its failure rates. Failure rates are computed for each one the failure types: crashes, hangs or unhandled JavaScript exceptions (in the case of JavaScript apps).The data for calculating the failure rate is collected from a random sample of machines—called a quality panel—on which your app is used. We consider a quality panel of at least 500 machines to be an adequate sample size for calculating the failure rates. If the quality panel for your app is less than 500 machines, the reports contain a warning that says “The data displayed is from a sample set that is not statistically significant.” We'll continue to evaluate this threshold to ensure we provide you with the most accurate information as quickly as possible.

We compute these failure rates as the average number of failures encountered on a machine during the first 15 minutes of active usage. Looking at data from all apps in the PC ecosystem, we saw that the measured reliability of an app tends to stabilize over time—after a certain amount of usage we see very little variation in the rate of failure. For Metro-style apps in the Consumer Preview, this stabilization occurs after about 15 minutes of usage. This timeframe ensures that the data we give you is both accurate and timely. (Setting the panel to include a longer period of time would increase the amount of time we need to wait before reporting back to you.) As with the quality panel size, we’ll continue to monitor this threshold as the Metro style app market evolves. Also, as we calculate failure rates, we remove any outliers to ensure they don't skew the results.

Here's an example of failure rate.

An example of a failure rate graph

Improving the quality of your app

In the previous section we discussed how you can understand the change in quality of your app over different versions. We also realized that you would be interested in knowing top failures your customers faced in the latest version of your app. So we also provide a list of common failures for the latest version of your app, ordered by prevalence. We determine the prevalence of the failure by counting the total number of occurrences your customers' experience.

Remember, failure rates are calculated from machines in the quality panel, meeting stringent criteria around initial active usage of your app. The data for the most common failure list comes from all customers of your app. But what if a majority of the customers for your app have not been able to meet the usage requirements because of the failures they are experiencing? In such a case the failure rate will be 0, but you will still see the top failures for the app, as shown here:

An example of a report that shows the most common failures.

By giving you the list of most common failures seen by your customers independent of the calculation of the failure rates (for example, for crashes, as shown in picture above) and broadening the reach for collection, we enable you to be aware of and fix failures seen by all of your customers. This also enables you to know about and react to the failures in your app early in the release.

Crashes and hangs

For crashes and hangs, we show you the 5 most common failures in the latest version of your app. The count is the total occurrences of the failure among all customers of your app. The Download link provides you with a .cab file containing the process dump for that failure.

An example showing the most common crashes for an app

A failure is uniquely identified by a failure name. For hangs and crashes, an example of a failure name is

NULL_CLASS_PTR_READ_c0000005_mydll.dll!myfunc::DoOp

The failure name is broken down into the following elements

  • Problem class (NULL_CLASS_PTR_READ)
  • Error Code (c0000005)
  • Symbol (mydll.dll!myfunc::DoOp)

Note: How we determine the root cause of failure can be found here. Even though the blog post is not tailored specifically for Metro style apps, it is a great read to understand the details on collection and processing of failures.

You can determine the reason for the crash or hang in your app by downloading the associated .cab file. The .cab file contains a process dump associated with the failure in your app. You can get the stack traces and other details for the failure from the process dump.

Pre-requisites for processing the .cab file and extracting the stack traces are:

  1. Install WinDbg.exe on your machine.
    WinDbg.exe is the recommended debugging tool to get stack traces from the process dump. If you do not have WinDbg.exe on your machine, you can get it here.
  2. Symbols for the application.
    To get the stack traces from the process dump, you should have the symbols corresponding to the current version of your app in the Store.

Getting stack traces for crashes and hangs

These steps are not intended to be a thorough debugger tutorial. However they will enable you to get the stack traces for failures in your app.

  1. Click on the Download link next to the failure name for any associated with your app (crash or hang). Let us assume that the failure name is:
    STATUS_INTEGER_DIVIDE_BY_ZERO_c0000094_FaultoidEx.Engine.dll!?__abi_FaultoidEx_Engine___IEngineServerPublicNonVirtuals____abi_DivideByZero
  2. Save the .cab file to a location of your choice.
  3. Launch WinDbg.exe.
  4. Click File > Open Crash Dump.

    The main WinDbg screen.
  5. In the Open Crash Dump dialog box, point to the location of the file saved in step 2 and open it.

    An example of opening a file in WinDbg.
  6. Click on File > Symbol File Path and type in the path for the symbols corresponding to the version available in the Store. Check the Reload check box and click OK.

    An example of specifying the symbol path in WinDbg.

    If you want to point to the publicly available symbols from Microsoft (for binaries other than that of your app), use the following format for the symbols path:
    Srv*;<<your symbols path here>>
    If your symbols path is c:\symbols, the equivalent path per the above guidance would be
    Srv*;c:\symbols
  7. In the prompt on the command Window, type !analyze –v and press enter.

    An example of getting a stack trace in WinDbg.

    The errors in the previous screenshot are because the symbols for some of the DLLs are not matched. While setting the symbols path as mentioned in step 7 would reduce the number of errors you see, you should be concerned if the error corresponds to the DLLs and exes in your app. If the errors and warnings are about binary files in your app, it means that the debugger was not able to find the correct symbols for your app. You should identify the correct path for where the symbols are stored and add it as mentioned in step 7.
  8. The stack trace is displayed in the command window as follows:

    An example of a stack trace in WinDbg.


You can see from the call stack that the failure was a “divide by zero” exception in a function called DivideByZero in FaultoidEx.Engine.dll. This corresponds to the failure name we saw in step 1, helping you to understand the failure and what you can do to fix it.

JavaScript exceptions

The failure rate and most common JavaScript exceptions are applicable only to apps that use JavaScript. For JavaScript exceptions you see the 15 most prevalent failures in the latest version of your app. We chose to present more JavaScript failures in the list because our telemetry data tells us that JavaScript exceptions occur more frequently compared to other crashes. The higher number of failures presented for JavaScript exceptions will help you improve on the quality of your app in the same consistent fashion as crashes and hangs.

An example of a report that shows common JavaScript exceptions

By default, we first list the top 5 failures for JavaScript exceptions. Clicking Show All button results expands this list to show up to 15 failures.

An example of failure name for JavaScript exceptions is as follows

WinRT error_8007007E_msappx://Contoso.ContosoApp8wekyb3d8bbwe/ContosoApp/program.js!scenario1Run

For JavaScript exceptions a failure name is defined as follows

  • ErrorTypeText (WinRT error)
  • ErrorNumber (8007007E)
  • Filename_FunctionName (program.js!scenario1Run)

Getting stack traces for JavaScript exceptions

You can understand the reason for the JavaScript exception associated with a failure by executing the following steps:

  1. Click the Download link next to the failure name for any associated with your app (JavaScript exception).
  2. Save the .cab file to a location of your choice.
  3. The file contains a file with a name starting with ErrorInfo (the ErrorInfo file). Extract the file and save it to a location of your choice.
  4. Open the ErrorInfo file from the location chosen in step 3 using notepad.
  5. The ErrorInfo text file that has the stack traces associated with the failure. Here's an example:
    An example of an ErrorInfo file.

In this example, the error was due to an undefined function. The call stack leading up to the failure is also in the ErrorInfo file.

Conclusion

We believe that understanding and improving quality is critical to building a successful app. We have designed the quality reports to provide you useful and actionable data to improve your app. We are confident that these reports will help you prioritize improvements and deliver quick updates to your apps in the Store.

We look forward to hearing from you about your experiences using the quality and other reports in the analytics portal.

--Venkat