The most interesting recent bug was a Vista upgrade error that was initially assigned to my component. We had a couple of Watson reports where the symptom was there would be an error logged during upgrade which caused a fatal error and a roll-back. Unfortunately there was no more information present and none of the reports we were getting were on a kernel debugger. We had no way to investigate further. This was occurring just as Vista SP1 was getting ready to ship so it was very consternating to our team - even though we were convinced it was not an issue in our code! I borrowed a computer over the holidays from a person in another area of campus who had a repro, but it would not repro in my office under a debugger. Finally it boiled down to me doing upgrade after upgrade of Vista to SP1 in my office - desperately seeking a repro. That is the unglamorous side of testing to say the least. Eventually I hit the issue and our team was able to really start the investigation after which the bug unraveled quite quickly. It turned out to be a bug in setup. Coincidentally right after we "root caused" the bug, a timing change made it more likely to occur and it turns out it would have caused something like 3-4% of Vista upgrades to fail. When you're talking about Windows, that would have been a huge number of customers affected! (Of course, we would never have shipped without a fix.) On the lighter side, we got a "good job" email from a partner who hit the bug. The mail thanked the teams involved for providing a private fix to "turn-around" the issue less than 24 hours after he hit it. It was nice to hear, but he was not aware of the weeks of work we'd put in tracking down the repro – it just looked really fast. :)

-- Tom Whalen


Do you have a bug whose story you love to tell? Let me know!