The third and fourth posts from my series on “Principles” outline a system of shipping of software, that takes into account the various lessons I have learned during my years as a manager at Microsoft, dating back to the first product I worked on – Windows XP. The post lists a set of guiding engineering principles that result in stable, high quality releases, delivered on time.
Before we jump into the principles, I want to call out a few things:
For a team to be successful, everybody on the team needs to (at all times) understand “the plan” for the release and his/her role in its various cycles and milestones. The litmus test here is asking a random person on the team what “the plan” (schedule, owners, feature set) is.
A common theme during software releases is a fire-drill towards the end of the release, due to the syndrome of “biting more than we can chew” and due to managing the deliverables in an open-ended, “agile” fashion. One way to overcome this situation is ensuring that we deliver features in smaller “sets”, on a more frequent cadence. Every “feature set” represents a meaningful collection, which gets developed and stabilized together before proceeding to the next set.
Thus, we end up with a number of “feature set development cycles”, producing stable sets that integrated together comprise our release. I like to use the metaphor of “putting features in the freezer” i.e. we start working on the new feature set only when we have gotten the current feature set to high shippable quality.
This sounds like a common sense “must do”, but it is regularly not done sufficiently well. The release criteria generally fall into two groups: (a) RI release criteria, necessary to RI (reverse integrate) a feature into the main product branch and (b) shipping release criteria, necessary to release a feature or a set of features. We need to have the complete list of both before we check in a single line of code. That helps us understand the true cost of every feature.
A lot of our effort is spent on generating and fixing defects that could have been prevented in the first place. We need to make heavy use of all available tools in an attempt to prevent as many defects as possible from entering the product in the first place.
Examples of such defect prevention opportunities are enabling of tools such as FXCOP in the build environment, by default.
Another common challenge in software releases is the lack of a stable, transparent, predictable schedule. Our product schedule mimics our features. It starts as a very “low-quality” feature, progressively solidifying throughout the release, as a result of “we are not on track” realizations.
I have heard claims that the schedule of the parent organization (division, company, etc.) is too “fluxy” which in turn affects the ability of the team to build a predictable, stable and transparent schedule. Even if we assume that this claim is true (it probably is, but it should not be – large companies and divisions prove that they can ship with a stable schedule), the local team still has the ability to be in control of its own fate by defining stable sub-schedules to combat randomization from the larger organization.
Having a stable, transparent, fully-internalized and fully-embraced schedule is a key ingredient to the success of any product – big and small.
PM spends the first milestone in the product cycle (MQ – the quality milestone) on defining the “what” for the release. Dev and Test complete engineering debt work, preparing the engineering system for execution. While there may be limited prototype code generated during this milestone, none of this code is allowed to be reverse-integrated in any official branch. Dev and Test should spend the majority of their time stabilizing the engineering system (builds, static analysis tools, unit tests, functional tests and infra, stress tests and infra, performance tests and infra, meaningful reverse-integration tests to prevent other teams from breaking us, etc.) Having a stable factory will ensure producing a stable product.
Dev and Test start working together on the features in M0 (not earlier) once the “what” has been defined, scratching any previously generated code.
Another common problem teams face is Setup instability that typically carries on, way into the release. Not having a stable Setup complicates significantly lab operations and prevents the team to self-host the build and discover issues early, so it is something that needs to be addressed.
Setup must always be stable and must visibly display the important metrics that we are optimizing against. Metrics such as speed (of setup), number of custom setup actions, etc. should be visible to the whole team from the start.
If we believe that reverse- and forward-integration (RI and FI) gates have a valuable role in ensuring the health of the team branch, then we must ensure that the gates that we have in place are reasonable and that they are strictly adhered to. This simplifies immensely feature development, by defining a simple “rule” for RI-ing and FI-ing code that can be easily understood and embraced by the team.
No code changes in the team branch will be allowed without a fully operational test and validation systems (unit tests passing, test infra running, functional tests at 99.5% pass rate, Perf lab running, Stress lab running, Code Coverage generating results).
A company or a division ships a successful product by getting all team to adhere to the same fundamental schedule and gates. A development organization can and should be doing the same. Having a single schedule with a single set of expectations for all teams simplifies management of the release and creates synergy between the feature owners.
Some claim that “agility” is achieved by having different feature teams work against different schedules. My experience is that such pseudo-agility results in a fragmented organization, that is confused about its own deliverables and that delivers low-quality products, because products are not built with the same rhythm. Management of the product deliverables in such situations often degrades to “managing exceptions” as opposed to “managing by the rules”.
We can and will achieve agility through the power of contained, incremental milestones, not through randomization.
All teams in the development organization should start together, work together and finish together, adhering to a common schedule, RI gates and exit criteria. All disciplines (Dev, Test, PM) should do that too, working side by side as opposed to serializing work.
“Out-of-band” releases and “off-the-books” investments are good and necessary, but they must be planned as all other work and managed within the constraints of the main development cycle.
This means that they never appear randomly in the middle of a development cycle (they can appear on the backlog to be considered for the next dev cycle). This includes dev work, test work, prototypes, forum engagement, conference prep work, etc.
Every major system on the team (“system” being a generic name for anything that needs an owner) has clearly defined and documented ownership. This includes, but is not limited to fundamentals, lab systems, build systems, test systems, infrastructures, etc.
AGILE is a magic word in our industry these days that gets thrown around somewhat lightly in my opinion. Some people hide behind the term AGILITY to justify the lack of basic structure, basic process, and basic planning adequacy in their organization. Invariably such pseudo-agility leads to constant fire-drills, regular death marches, and the eventual burn out of the team.
It is my strong belief that every project and team needs basic “scaffolding”. Larger projects and teams tend to need a bit more of that, at least until the scaffolding gets transformed into “team culture”.
Clearly, the team leaders have to be accountable for providing that basic scaffolding. Success is easy to define – if you have a happy, productive organization that delivers high-quality products on time, then you have provided the necessary basic structure.
I am of course aware of the goals of the agile movement, which I am very much supportive of. The best way to achieve agility in my opinion, is by sizing of your development cycle appropriately – a technique that I will demonstrate in my next post.