Fix vs Change
It has been a painful week for checkins.
We have a complicated system for checking in our code. There are hundreds of devs in our division and tens of thousands of source files in our product. Different groups have different cultures; one may have a "death before dishonor" ethic in which they wouldn't tolerate a member of their group checking in a change which would break the daily build, others may have more of a "let's try this..." attitude. Thus, we have policies for code reviews, verification, and checkin.
This week we lost some hardware used for buddy testing and a couple of obscure bugs got checked in which didn't break the suites on one machine but did on another, making the normally tedious task unusually painful. And so the suggestion arose to change our process to improve efficiency. A noble ambition, but with its drawbacks.
How do you go about improving a system's efficiency when you don't have any baseline data, or any tests to determine the effects of your changes? We have several anecdotes in circulation this week, but what use is that for evaluation? Worse, it seems to me that the metrics that we'd need would be painful to collect. If we wanted to compare time spent coding to time spent checking in it would require substantial effort on each dev's part. I keep three computers busy most of the time. Seven, occasionally. What part of that is actual checkin time, and who wants management scrutinizing your hours spent coding if you're not already so blessed. (This is a valuable metric by itself, but streamlining the checkin process does not justify the cultural changes it takes to acquire this kind of info.)
It occurred to me that sampling might be helpful, but still involves near spyware on each dev's computer. While trust runs pretty high here, these kinds of systems are sensitive to the slightest abuse, or rumor of an abuse. Non-relative information would be simpler to collect, but it's still difficult to measure the actual amount of human engagement in a process which might consist of a significant period of sync'ing and building, then a script running a batch of suites, and then investigating and correcting build or suite failures. As the build and run phases are time consuming, devs tend to fork off and check back before and after lunch; raw numbers are useless. True, there could be buttons to press indicating "I've begun dealing with non-routine problems" and "I'm returning to doing routine tasks" but I'd imagine such buttons would get pressed very little. The benefits and burdens don't match up, and I've found that such systems are no more reliable than self-selected survey samples.
It's important to keep the infrastructure running efficiently, and increasing the productivity and reducing the level of frustration of the team is a wonderful goal. At the moment, however, I'm stumped when it comes to collecting useful metrics.