Crashes Suck

In the beginning, we needed a way to close the loop with our customers in order to ease the pain felt from software defects (bugs) that caused crashes.  A simple client service was produced that collected crash dumps from Windows desktops and sent them back to Microsoft for analysis. The crash dumps were debugged, bugs exposed, fixes were made, problems went away. Lather, rinse, repeat.  Note: This is how user mode error reporting began.  Interestingly, kernel mode error reporting (bluescreens) had its origin in a roughly parallel timeframe (actually a little earlier) but it began in a completely different group at Microsoft. For the rest of this series of posts I’m talking about the user mode side of WER.

Buckets

An early design assumption was that this client had zero analysis ability and so we needed a simple and deterministic way to organize the data that would allow both the client and the backend servers at Microsoft to speak to each other in a common language.  Oh, and the solution needed to be (*ahem*) scalable. That common language is the unique combination of event parameters which give us the Event ID. Internally at Microsoft these are actually known as bucket IDs – or just buckets.  It’s more natural for me to talk about buckets and I’ll try to be consistent, but just in case, know that “bucket” and “event” are mostly interchangeable in the context of error reporting.

Originally, plain old crashes were bucketed using 5 parameters:

  1. The application’s name
  2. The application’s version
  3. The name of the module containing the instruction that causes the crash exception
  4. The version of that module
  5. The byte offset into that module where that instruction resides

crash event with hits The idea was that crashes with the same set of parameters were caused by the same bug. It turns out that this isn’t always true (We’ll save that for a later post) but in general it works pretty well. Today, this “client” is built as a core component of Windows and became generally referred to Windows Error Reporting (WER) Services.  In Vista we added 3 more parameters to plain old crash reporting: file link timestamps for both the application and modules (to try and avoid some issues of naming collisions) and exception code.  And by the way, the client itself has become smarter and in fact does some cursory analysis when deriving bucketing parameters.

Notice that you can tell the explorer crash to the right is from Windows XP because those extra 3 parameters don’t have data.

< Read Part 1 [Part 2] Read Part 3 >