Hang Bucketing, v1

On Windows XP, hangs have it rough.  Like a younger sibling, error reporting for hangs has to wear the hand-me-down clothes of crash reporting – it piggybacks on the same 5 fixed bucketing parameters used by crash reporting. However with a hang there is no exception context and so there is no faulting instruction, therefore there is no module name, module version or module offset.  So on XP, hangs really only have 2 effective bucketing parameters (application name and application version).  The other parameters are set where module name is "hungapp", module version is "0.0.0.0", and the offset is 0.

hungapp box explorer

This means all of particular version of an application’s hangs ended up in a single bucket.  Knowing this, you might correctly guess that these buckets have very high hit counts when compared with crash buckets for the same app name/version (though some corruption buckets are not far behind).  When someone runs a generalized query similar to, "what are my highest hitting XP buckets this week?", the top end of the results are often littered with hang buckets.  A naive conclusion would be "Oh my! Hangs are a huge problem and we must focus all our effort to eradicate them!"  While hangs are indeed a big problem, arriving at this conclusion based on bucket hit volume is incorrect.  Also, when the actual failures contained in these hang buckets are studied, it is quickly apparent they are composing many bugs – not just the ideal 1 (or few). This is a big reason development teams historically struggled to make progress with hang bugs - the data was difficult to sort and measure.

< Read Part 2 [Part 3] Read Part 4 >