Everything you want to know about Visual Studio ALM and Farming
Brian Harry is a Microsoft Technical Fellow working as the Product Unit Manager for Team Foundation Server. Learn more about Brian.
More videos »
I've seen a lot of people looking for more information on Team Project organization and branching. The Version Control PMs are putting together some good whitepapers on it. Recently, however, there have been some questions about how we do branching at Microsoft so I figured I'd wade into that in my blog. I'm most familiar with what the Developer Division does so I'll stick primarily to that - it's pretty close to what Windows does but Office is pretty different and I don't know much about what SQL or MSN do.
To understand why we do what we do, you need to understand the problem. We have a lot of developers collaborating on Visual Studio. I'd estimate that we have on the order of 800 developers (I'm sure that's not right but it's within a factor of 2 and probably a good deal closer than that). Note, I'm just talking about developers, not testers, program managers, User education, localization, etc.
This has some serious ramifications. Let's imagine for a minute that any given developer has a brain fart and checks in a build break, serious bug, etc once every 6 months (which in my opinion would be a team of 800 of the best developers I've ever met). There's about 250 working days in a year and simple math says 800 * 2 / 250 > 6 serious bugs checked in every single working day. The fact is it's higher than that. Of course we do unit testing, code reviews and the like to reduce bugs that make it into the system but it's simply not possible to eliminate them all. The ramification of this is that (if everyone is checking into the same tree) the build is broken frequently and the daily tests rarely pass. In fact the product won’t even start half (an exaggeration) the time.
This is where we were when we shipped VS.NET. The org grew a lot in the late 90's as we got more and more ambitious with what we were trying to accomplish and it took a while for our processes to catch up. This led to the system that I will now describe - the one we used to ship VS2005. It's worth noting we are in the process of switching to a slightly different system for the next version. I'll describe it briefly and some of the motivations for it but as it is unimplemented and unproven it's hard to see how we will tune it so I won't spend too much time on it.
Virtual Build Labs
To make the division more agile and to isolate developers from each other's churn we went to a heavy branching model. The heart of the branching model is what we call a Virtual Build Lab (VBL). A VBL is a branch of the entire division's code along with the entire infrastructure (scripts, hardware, drop sites, etc) to build the entire product. VBLs are run by a central build lab. Builds are produced in varying frequencies based on the needs of the occupants of the VBL (although most of them are built every night Sun - Thurs with occasional on-demand builds on Fri or Sat nights).
The root of the whole system is a VBL we call "Main". It lives at the path /Main in the tree. The division is then broken up into a series of "child" VBLs - Lab21, Lab22, Lab23, ... (I think it goes up to 27 but that info may be out of date).
BTW, they start at 21 because Windows invented the VBL system and used numbers 1 - 6 and although our VBLs are completely independent, we decided to start at 20 (which is actually the number for Main) just to avoid any confusion in casual conversation.
The division is broken up into product units - for example VB, C++, C#, CLR, ... Each product unit is assigned to a VBL. Lab21 is the VBL for the .NET Framework product units to a first approximation. Lab22 is the core "VS" VBL. Lab23 has setup, data access, Team System (after we moved out of Lab 26) and others. I forget some of the others - Lab26 was the Team System VBL for a long time, Lab24 was the "breaking" change VBL. Localization has one - Lab27, I think. The number of VBLs is important because it defines how long it takes code to move through the system. Each VBL (21, 22, ...) in turn gets their code ready (there is a set of test suites that have to pass) and Reverse Integrates (RI) their changes into main. After this each of the VBLs Forward Integrates (FI) those changes into their own VBL and the next VBL in sequence gets a turn. The more VBLs you have the greater the latency to pick up changes (new features, bug fixes, etc) from teams in another VBL. This caused us to keep the number of VBLs reasonably small. To a first approximation it takes a week to FI the last set, pass all the tests and RI a branch into Main. With an average of 4 VBLs in weekly rotation, this gives about a 1 month latency.
In the source tree the primary VBLs all lived at the same level:
Some of the VBLs were still pretty big (Lab21 and Lab22 particularly). As a result most product units have a "private branch". The product unit dev teams check directly into their private branch. Just as the primary VBLs has a rotation schedule for RIing into main, each Lab has a schedule for RIing private branches. Private branches are managed in a separate folder because there were so many and we didn't want to clutter the main part of the tree.
/Main/Lab21/Lab22/Private TeamFoundation ......
Unlike VBLs, private branches are managed by the product units directly. The build lab does not build them or support them. Basically they are a buffer to allow the dev team to isolate itself from churn in the VBL and to do more rigorous testing before they share their changes with the other members of their VBL. Also unlike VBLs, private branches don't contain all the code for the whole division. They generally only contain the code for the people who are using it - usually the code for one product unit and some stuff that is shared by everyone. The Development manager for the Product Unit is responsible for defining the checkin policies in their branch but is subject to the VBL policies with RIing their private branch.
A product unit’s test team generally does their testing out of the VBL, not out of the team's private branch. This is because they want to test the bits that the build lab builds and it gives them a way to control instability in daily testing.
In addition to team based private branches, individuals or small groups of developers create private branches whenever they need an isolated environment in which to work. The best example is when they are making a big breaking change that involves many people and they want to work together with out disrupting other developers.
When we do a release where we want to stabilize, we create a new branch off of Main for the release. Generally there will be a period where we FI all changes from Main into the release branch as we lock down. When we get to the point where we want to open up the main line branches for post release work we stop FIing from main into the release branch. At that point it becomes the developer’s responsibility to make sure any fix that is approved for the release gets both into the mainline branches and into the release branch. Frequently this is done by checking the change into one of the private branches and merging it into the release branch. Sometimes this doesn’t work because you want to make a different fix in the mainline vs the release – either because the “right” fix is to risky for the release or due to conflicting changes that have been made in the main branch for post release work.
We never RI from the release branch back into main. As I said it’s the developer’s responsibility to make sure the fix gets in both places. For “small” releases Tech Previews, etc we frequently reuse the same release branch over and over. When the next release comes around, we just push all of the code from Main down into the release branch and repeat the whole process.
Branch Structure vs Folder Structure
As a detail, the branch structure is a bit more complicated than I have described so far. For each of the primary VBLs there is a “staging” branch and a “dev” branch. What I have described above is basically the “dev” branches. The staging branches are used in the FI/RI process to verify that the combined bits pass all the appropriate tests before moving up or down. You’ll also notice that the folder structure I’ve described doesn’t match the branch structure I’ve described. They don’t have to. However, I generally recommend that you keep them close or it gets confusing. Here are both in their (relatively) full glory:
Folder structure:MainLab21Lab21devLab22Lab22devPrivate CLR TeamFoundation VB …RTMServicingVSTFRTM…
Branch structure:Main Lab21 Lab21dev Clr … Lab22 Lab22dev VB … Lab23 Lab23dev TeamFoundation … RTM Servicing VSTFRTM …
As I said, we are making some changes going into the next product cycle. The primary issue we hope to fix is getting even more stable daily builds so that we can have a more predictable schedule with less stabilization time at the end of the product cycle. I’m not the expert on the plans but I’ll tell you what I believe to be the plan. The primary “real” change is that we will be developing every feature in a separate “feature branch”. You can think of these branches as children of the Product Unit branches. Code can’t be promoted from the feature branch until it is really done. This includes the majority of automated testing, an appropriate level of code coverage, usability testing, performance testing, etc. The idea is that this will give us finer grained control of what makes it into Main and therefore allow us to keep Main more stable. As part of this we will be removing the primary VBLs and their staging cousins (e.g. Lab23 and Lab23dev) and the Product Unit branches will come directly off Main. This will reduce the overall number of hops from one Product Unit branch to another.
That’s it – it’s another long post and probably doesn’t cover half the information people want. Oh well I’ll keep trying to find short topics :) Please feel free to ask questions and I'll fill in any missing info that I can.