Everything you want to know about Visual Studio ALM and Farming
Brian Harry is a Microsoft Technical Fellow working as the Product Unit Manager for Team Foundation Server. Learn more about Brian.
More videos »
Hmm, OK I guess I jumped too quickly into using unfamiliar terminology. Let me step back and define some of the concepts/terms a little more and then hopefully that last post will make more sense.
The Source Tree
Let’s start with what the source tree in the Developer Division looks like. It has the following top level folders (not a complete list but a relevant subset). Each of these folders has its own subtree of subfolders and files.
CSharp - is a tree that contains all of the code for the CSharp compiler, project system and related components.DDSuites - is a very large tree of tests for all of the components in the system. Any developer can get this folder and run the tests. This is where we put our unit tests.Public - This is all of the .h files, import libraries, .NET Framework assemblies, etc. that are needed to build. For example we check in the .NET Framework assemblies, Windows SDK, etc. Source code elsewhere in the tree references standard assemblies and header files from this directory.Tools - A big tree that contains all of our compilers, linkers, source control tools, build configuration files, etc. Basically everything needed to build the system. It does not include the IDE – just command line tools.VB - All the source code that the VB team has written.VC - All of the source code that the VC team has written.VSCommon - A set of shared utility library source code, midl files and the like that are shared across many components in VS.VSET - The source code for the Team System tools.
And of course there’s more – probably 30 or 40 top level folders in our tree but this is a good representative sample.
Why do we check in all of our tools and includes? Developers already have VS installed on their machine, right? Well, yes but there are several advantages. First by versioning them with the source code we can ensure that we always have a consistent set. If we need to go back and reconstruct a build from 6 months ago we also have the tools we used to build it at that time. One thing to keep in mind is that we are building the tools too so every few months we check in a new version of the compilers and libraries, etc. Using the version control system is a great way to distribute the tools to everyone. Another benefit of doing it this way is that the system is self contained. You can walk up to a newly installed machine (just the OS), create a Team Foundation workspace, do a get, build and everything works.
How a Developer Uses the Tree
I create a workspace on my machine (workspace is a mapping construct that describes what folders to get and where to put them). Very few developers put the entire tree in their workspace because it is so big. Pretty much everyone includes VSCommon, Public and Tools. Beyond that developers include the folders they need.
To give an example (from Team Foundation – which I enlist in), my workspace looks something like this.
$/Main/tools -> d:\dd\tools$/Main/public -> d:\dd\public.$/Main/vscommon -> d:\dd\vscommon$/Main/ddsuites -> d:\dd\ddsuites$/Main/vset -> d:\dd\vset
One of the nice things about our build system is that it allows me to build at any level in the tree. I can build everything in my workspace by going to d:\dd and typing "build". Or I can build just the Team System components by going to d:\dd\vset and typing build – or just Version control from d:\dd\vset\scm\SourceControl, etc.
After I build, all my built binaries end up in d:\binaries.<cpu><build type>. For example if I build x86 debug then they end up in d:\binaries.x86dbg. If I build retail they end up in d:\binaries.x86fre (don’t ask me why retail is called fre :)).
On to Branching and Merging
OK, hopefully with a little background on what the "tree" is, the branching part will be a little easier to understand. What I’ve described above is how it would all work if all of the developers worked together on the same source at the same time. As I described in my last blog post, this is impractical – too many developers changing things.
So we created branches off of Main and, in fact, no developer actually works in Main – they all work in some branch. So start at the beginning. We checked all of our source into the tree under $/Main. We then created branches of main. Using the Source Control Explorer in Team Foundation you can do this by selecting $/Main and choosing File -> Source Control -> Branch. When the dialog comes up we’d choose $/Lab21.
This would create a whole new copy of the source tree that would look something like:
$/Main CSharp DDSuites Public Tools VB VC VSCommon VSET$/Lab21 CSharp DDSuites Public Tools VB VC VSCommon VSET
Right after the branch, $/Main and $/Lab21 are basically exact copies. Fortunately, however, it doesn’t double your disk space usage. The new branch ($/Lab21) references the same copies of the files that the original, or Parent branch ($/Main) contains. Only when you actually modify (checkin) a file in one of the branches is another copy of the modified file made.
Let’s talk for a second about Branch lineage or Parenting. It’s very complicated and takes a while to internalize. Let’s take a file in the system as an example.
There is a file (you can probably guess what it is :))
When we branched the $/Main folder above a whole copy of the tree was created. So there is now another "copy" (remember we don’t actually duplicate the contents until you change it but looking at the tree you can’t tell the difference) of CommandCheckin.cs at:
From the perspective of the “folder hierarchy” the two files are in very different parts of the tree – one is deep down under $/Main and the other is deep down under $/Lab21.
However when speaking from a "branch hierarchy" perspective rather than a "folder hierarchy" perspective $/Main/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs is the parent of $/Lab21/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs. When the Lab21 tree was created each file was branched from the corresponding file in the Main tree and this relationship is maintained. Having this relationship allows changes in CommandCheckin.cs to be easily merged back and forth between the two branches. So, imagine I had a change to $/Lab21/VSET/SCM/SourceControl/CommandLine/CommandCheckin.cs that I want to move to the main branch. I can do this by (again using the Source Control Explorer) selecting CommandCheckin.cs under $/Lab21 and choosing File -> Source Control -> Merge. This will give me a choice of files to merge with and one of them will be the CommandCheckin.cs under $/Main. After you hit OK, the changes to CommandCheckin.cs under $/Lab21 will be incorporated into the CommandCheckin.cs under $/Main.
To make managing branches easier, you can do this merging at a higher level too. For example, using the Source Control Explorer as above, I can select the $/Lab21 folder itself rather than the CommandCheckin.cs file way down below it. When picking the merge target, I pick the $/Main folder. Because it tracks all of the relationships, it looks down the tree and finds that CommandCheckin.cs has been changed in the $/Lab21 tree and merges it with the corresponding CommandCheckin.cs in the $/Main tree. Being able to do this makes managing merging changes between branches dramatically easier.
Because changes can be merged in either direction and it’s confusing which one you mean. If I say I’m merging Lab21 and Main, what do I mean? In order to do this we coined some terminology to indicate the direction of the merge. A "Reverse Integration" (abbreviated RI) is a merge from the "Branch child" to the "Branch parent". And a "Forward Integration" (abbreviated FI) is a merge from the "Branch parent" into a "Branch child". So using the example from above, if I said I’m going to RI Lab21. We know that means we are going to merge changes that have been made to files under $/Lab21 into the corresponding files under $/Main.
Hopefully that helps understand the difference between the folder hierarchy and the branch hierarchy. We can talk meaningfully about both. When we do this I represent them as follows:
Repeating the folder hierarchy from above:
However, I’d represent the Branch Hierarchy as follows:
What this means is that Lab21 was created by branching from Main. Even though in the folder hierarchy they are peers, in the branch hierarchy Lab21 is a "child" of Main. All of the files in Lab21 were branched from the corresponding files in Main. So looking at a more complex example from my incomprehensible blog post:
Main Lab21 Lab21dev Clr … Lab22 Lab22dev VB … Lab23 Lab23dev TeamFoundation … RTM Servicing VSTFRTM …
This says that Lab21, Lab22, Lab23 and RTM were created by branching from Main. Lab21dev was created by branching from Lab21. Clr was created by branching from Lab21dev and so forth. When it gets this complicated it becomes even more useful to be able to talk about RI’s and FI’s. For example, I RI changes from Clr into main (done by merging Clr into Lab21dev, then merging Lab21dev into Lab21, then merging Lab21 into Main). An I FI Main into TeamFoundation (done by merging Main into Lab23, then Lab23 into Lab23dev and finally Lab23dev into TeamFoundation).
I hope that’s enough background to understand the previous blog post. That post is more about the end result and the rationale behind it than the mechanics and concepts behind it. I hope this has enough of those to make the former comprehensible. If not, let me know and I’ll try again