TFS2010: Large & Resumable Check-in Support

Grant Holliday’s blog

Senior Service Engineer, Microsoft Visual Studio Team Foundation Service

TFS2010: Large & Resumable Check-in Support

  • Comments 0

Problem

The DevDiv mainline contains over 200GB of content and more than 1.8 million files. Any version control operations that had to deal with this amount of content would occasionally run into two problems:

  1. If the number of pending changes is more than ~300,000, then the client might not be able to process all this data in the ListView and hit an OutOfMemoryException
  2. If it took longer than 60 minutes to upload all of the content and commit the check-in in the database, then it would hit the 60 minute timeout and rollback

Workarounds

People could sometimes work around the issue by using “tf checkin /noprompt” which would cut the client memory usage enough to allow it to complete.  We also have an undocumented switch “tf checkin /all” which allows you to check-in all pending changes in a workspace without having to load all the pending changes.  This switch has a limitation that it does not work with “edits” and “adds” and should only be used for branches (there is also a “tf branch /checkin” option which does the branch & check-in in one step). Another workaround was to break the check-in up into multiple check-ins, however this was less than ideal. 

New in TFS2010

We made an enhancement in Team Foundation Server 2010 to support very large check-ins by changing the client and server to page pending changes to and from the server.  It includes three core changes:

  1. Add paging capability for querying pending changes
  2. Change the list view in the Pending Changes dialog to a virtual list view and utilize the paging capability to populate the list on demand
  3. Add the ability to ‘enqueue’ pending changes for a check-in on the server

The default page size is 150,000, which means that if you try and check-in more than this, then your pending changes are written to a temporary table in batches of 150k.  This means that every batch has a full 60 minutes to upload before hitting the client-side timeout.  Once the client reaches the last page of pending changes, then it sends an extra flag which tells the server to commit the changes.  Since the server has all the data in a temporary table, it’s able to quickly move it into the real content table and avoid the 60 minute timeout.

If you cancel your check-in command before it’s complete, you can also run it again and it will reuse the pending changes that are in the temporary table.  This temporary table of pending changes is cleaned up every 24 hours by a job so that the paged changes don’t accumulate sit in the database if they never get checked in. 

Proof

Yesterday I was able to check-in and add of 1.8 million items (200GB of content) with a single action without restarting or running out of memory.  It took about 7 hours to upload all the content and then another 25 minutes to perform the final commit in the database.  The end result is a single changeset which is exactly the result I was after.