Rather than following up with the topics on other issues I’ve encountered while rolling out Scrum, I’m going to focus on something I think we do right: finding our mistakes, fixing them, and fixing what caused them.

On our team, the biggest thing that we do that isn’t Scrum-like revolves around the handling of high-priority defects. We categorize all code defects into severity 1 (crashing or data loss), severity 2 (major loss of functionality), or severity 3 (persnickety details). A Scrum Master is supposed to deflect all work that wasn’t committed to by the team during the Sprint Planning Meeting. In this case, though, I make an exception. If we find a severity 1 bug, all other work for the owner is put on hold. We stop, identify the problem, hold a 15-minute brainstorm to find out of there’s a root cause hiding other bugs or indicating other missing work, and then get in a fix. If other work was identified, it’s put onto the Product Backlog and usually prioritized high for the next Sprint. Severity 2 and 3 defects are prioritized at the top of the work to do for the next Sprint, above any new feature work.

The problem with not fixing bugs early
Everyone likes to quote statistics about the cost savings resultant from fixing defects in earlier stages rather than later stages, but that’s just one part. Not fixing bugs early has two bigger issues: bugs represent work that must be completed before you can ship, and bugs often hide larger issues. Since bugs represent work that you have to do but whose completion doesn’t demo well to upper management, the urge is usually to postpone them until a later, stabilization phase. Doing that unfortunately increases the amount of “work in progress” - if a feature isn’t done until it’s at ship quality, and you’re not fixing bugs in any of your feature areas, you won’t be able to handle requests to ship part of your functionality early or redirect investments because even though you haven’t officially planned work for people multiple sprints out (that would be un-Agile!) you actually have committed to finishing up those features, which will take future investment.

Hiding larger issues frequently happens with severity 1 and severity 2 defects. First, since they both break large pieces of functionality, they tend to take entire sets of testable scenarios offline, rather than just a single piece of narrow functionality. If there are other issues downstream that are obscured by that defect, they won’t be identifiable until after the defect is fixed, prolonging the amount of time until a feature returns to ship quality. Additionally, high-severity defects are usually good indicators of a meta-defect: an issue with the way you’re going about work. Did you forget to handle an entire set of test cases? Is the feature’s implementation overly complex and going to be a bug farm? Is the area not architected well for testability, so there’s a drag affecting development of automated suites to catch regressions? Are we just getting sloppy in our code reviews? They’re all great questions to ask, and I’d heartily encourage spending 15 minutes with representatives of development, test, and management to chat about it – not to finger-point, but to find out possible places where we’re failing to excel.

Why we also try to find them early – or why “no bugs found!” isn’t good enough
You’re running an agile process, you’ve got a customer involved, they’re writing acceptance tests (or helping define them…), you’re doing TDD, and there are no bugs in the bug tracking system. Things are great – ship it! Or should you be skeptical?

Always be skeptical when the metrics read zero or low numbers for bug counts. Have you been skimping on test resources, and “agilely” repurposing your specializing generalists to do more development work? Or maybe your customer is completely narrow-scenario focused, and you’re not doing any of the broad testing that will accurately represent the way your product will be used in the wild. Most likely – especially if you’ve put together a new team – there’s an area like accessibility, security, globalization, long-haul stress, performance, or heavy usage in real-world scenarios that isn’t being looked at. As a project manager, push hard, and even look yourself – get people to do ad-hoc manual testing in the form of bug bashes with meaningful prizes, dogfood the product to do real work, or just shuffle around some of your people so that there’s somebody with dedicated time to look at the product from the customer’s perspective.

If it’s early in the product cycle and you’re finding absolutely zero defects, you aren’t looking hard enough. Take the time to look because defects have huge schedule impact, and solving meta-defects earlier will prevent them from applying to your whole codebase.

The assessment of how it’s working out…
Currently, we’re optimistic. We’ve been very happy with our deliveries, and so have our partners. The bugs we do find are harder to discover than bugs typically are at this early stage of product development, and they’re also easier to address. Of course, the real test has still to come - we’ll be shipping the first beta of our work to external customers at the Professional Developer’s Conference, and even there it’ll be hiding within the shell of a larger outer application. Customers are the ultimate judge of quality, and we’re looking forward to finally getting real feedback!

The lessons to take away: jidoka (stopping the process to build in quality) isn’t just for Toyota manufacturing plants. Empower your people to find and fix defects that meet your criticality bar. And if they aren’t finding defects, look harder.

Shameless plug
Want to work on the coolest team in Microsoft for a manager who isn’t afraid to admit his mistakes and make changes? My team’s continuing to grow, and I now have two openings for user interface framework developers, one for a core systems developer, and one for a developer in test. Contact me if you’re interested in more information!