We’ve been driving our bug counts down over the past few months, with an eye on what we call ZBB. We "went dark" a few weeks back, which had a significant impact on the overall bug count and really improved the quality of the product. We’ve fixed a tremendous number of bugs and have only postponed a small percentage. It is typical to postpone some set of bugs to the next milestone or product release. These are primarily suggestions at this point. "It would be great if, when I type [xyz], the product would …".
"Going dark" is an interesting concept. We encourage teams to ignore the daily distractions that we sometimes simply take for granted. We ask people to stop holding recurring weekly status meetings. We cancelled our centralized ship room. We brought in dinners and coffee carts. People only checked email once or twice a day. The result is that we made great progress. So you have to ask yourself, why don’t we do this all the time? I think there is a balance between ignoring everything that goes on around you and being an active participant in work life. Eventually, you need to stop and look around to see what is going on in the group, the division, and the company. We can sustain these pushes for periods of time, but we need to eventually come up and see where we are at.
After the bug push, I held reviews with all of the product teams within the Division. There are 20 teams, 1 hour meetings with each, packed into 2 straight days. My overall goal was to see if we were on track for hitting ZBB and then executing on the Security Push. I make the review topics fairly open ended, so that I can get an idea of what teams are really thinking, rather than asking them to fill out a table of data. It is a balance between efficiency of reporting, of efficiency of the organization to follow the same types of metrics, and getting an understanding of the real analysis that goes on when teams review their status. In the world we are driving for, every team tracks the same metrics and interprets them the same way. Centralized reporting of these metrics would give us an accurate picture of our state and projected dates, and we would not have to bring people through reviews. We’re not there yet. There is still a lot of understanding that needs to happen on a team by team basis and you have to dig into the "why" behind the projections to really get a sense of where you stand.
Here are the questions I ask each team about their progress towards ZBB (this is from the PowerPoint slide):
I look at the projections and ask a variety of questions depending on what the data looks like. Why did you have a spike in incoming rates 2 weeks ago, and do we expect to see that again? If you have a weekly net reduction rate which does not track to the ZBB date, what are you going to do differently to get there? The data and answers to these questions paint the overall picture. They let me identify major issues and clue us in on potential efficiencies.
Once I know where people think they will end up with bugs, I then wanted to understand how they were doing on their security work. I first asked teams to cost the work needed to be done before our Security Push. The largest effort on the code is to run a set of automated tools to scan sources and flag potential issues. The tools have the potential of producing false positives, and there are many cases where one fix clears up many issues. So, a team may get 1000 issues but only end up fixing 100 actual problems. The trick then is to look at the issues flagged and get an idea of what the actual work is to "get clean" as I like to call it. I then asked teams to tell me how long it was going to take them to execute successfully during the Security Push. How many weeks is it going to take them to hit the exit criteria, and what do those weeks look like?
The next piece of the puzzle is testing. While the development teams are trying to drive the bug counts down to zero, the test teams are trying to drive the bug counts up. At various times during the milestone and product cycle, the teams will do a full test pass on the product. Understanding how far along teams are during the test pass gives us an idea of what the incoming rates will do. It also lets us get an idea of what we call the automation coverage, how many of the tests are fully automated such that we can essentially press a button and run the tests. The more automated we are, the shorter the test passes, and the better defense we have against regressions.
Here are some of the questions I asked about the test pass:
Fast forward two days. I have a picture of what the work is to hit ZBB and then execute on our security efforts. Overall, we are on track, but we had two problems that need to be solved. First, most teams said they could hit ZBB on time, but were going to do the work to "get clean" on the issues the automated tools flagged after ZBB, during the first week of the Security Push. The second issue was that some teams needed to push more bugs to the next milestone than I was comfortable with. I guess you could say those are the teams that were not on track to his ZBB with the right level of quality for Beta 2.
In the end, we decided that we could not declare ZBB until we resolved the issues driven from the static analysis tools and that we needed to fix a few more bugs before shipping Beta 2. That sounds like a slip, but since teams were doing the work to get clean during a buffer period anyway, the adjustment in ZBB simply changes the date on which we said we were done. The teams that needed to fix more bugs are able to take on those bugs during that buffer as well. As a result, our internal target for shipping Beta 2 didn’t have to change. It is still in the first quarter of CY05, and I’ll get more specific on that date when it gets closer.