We were seeing weird issues for Microsoft Systems Management Server's (SMS) Software Update Reports like sometimes the reports would show a negative number of compliant machines or a negative number of machines where Updates had been downloaded. The customer requirement was that the number of computers in different compliance states should add up to the total number of computers that were picked up for a particular deployment and no negative numbers :) Now Software Update Data for the reports was coming from complex SQL triggers where we doing things like add 1 to # of machines to a certain state and subtract 1 from the machine’s previous state etc. The actual State Message generating and processing of state messages was all magic to us and another team owned that testing.
The challenge was that this issue would show up very infrequently and sometimes the negative numbers would become 0 or positive by the time we got a dev to my test machines; the repro for totals not adding up was comparatively easier. To dig deeper we wrote SQL Queries to find the anomalies in State messages that were coming into the SMS Server. There was our answer: many scenarios where Server got a message of Download Completed, but the Download Started message was dropped, as well as instances where a Success message would come before the Installing Update message. In the end we were able to write a query which returned 0 rows when life was good; anything else meant call the dev to investigate.
As it turned out, it was not just our feature. The underlying infrastructure bugs were found and fixed and that made SMS 2003 reports ready for End users.
Do you have a bug whose story you love to tell? Let me know!