Buck Hodges

Visual Studio Online, Team Foundation Server, MSDN

  • Buck Hodges

    Caching: What could go wrong?


    There’s an old saying about regular expressions that I’ve always liked. I think it applies equally to caching. Here’s my version.

    Some people, when confronted with a performance problem, think “I know, I'll add a cache.” Now they have two problems.

    It’s so tempting to address a performance problem by adding a cache. After all, classes like Dictionary are very easy to use. Caching as a solution to the performance/latency/throughput problems means there is more complexity, which will lead to more bugs. Bugs with caches can be subtle and difficult to debug, and bugs with caches can also cause live site outages.

    We’ve had a number of live site incidents (LSIs) over last couple of years for which the root cause is a bug in caching. I asked one of our engineers to write a document with examples of the caching bugs that we’ve found in our code. I wanted to share it in the hopes that it will help folks think twice before solving a problem with a “simple” cache.

    The case of the fixed size identity cache

    When we wrote the Identity Management System (IMS) cache, our most recently released product was TFS 2008 – we were all trying to get through the multi-collection support (also known as Enterprise TFS Management (ETM)). We had some issues with IMS performance and we decided to cache the most commonly identities in memory to avoid round-trips to expensive stored procedures and we decided to use the MemoryCacheList<T> which is basically a dictionary with a clock replacement algorithm. Of course, one important thing to do when designing a cache is to figure out how big it may grow – unbounded caches are one source of out of memory errors (see case of the registry service). When we were creating the service, we picked a number that sounded good at the time for something where we expected to have a lot of hosts (accounts) in memory at once and that weren’t likely to have a huge number of group memberships: 1024.

               // set the cache value to 10000 for on-premise and 1024 for hosted.
               Int32 defaultValue = 1024;
               if (systemRequestContext.ExecutionEnvironment.IsOnPremisesDeployment)
                   defaultValue = 10000;
               else if (systemRequestContext.ServiceHost.HostType.HasFlag(TeamFoundationHostType.Deployment))
                   defaultValue = 100000;

    The code was simple – here’s the Read case:

               lock (m_getParentsLock)
                   isMember = identityCache.IsMember(groupDescriptor, descriptor);
                   if (forceCacheReload || isMember == null)
                       Identity memberWithParents = ReadIdentityFromDatabase(
                           QueryMembership.Expanded, [..]

    That looks quite reasonable. We’re locking the cache (because we don’t want it to change while we’re reading it). If we find the data, we return it, and if we don’t, we go to the database. Cache misses should be rare in steady state so we should quickly build up the cache and everything will be fine.

    The issue is that if the host in question has more than 1024 groups memberships active at any given time (which isn’t hard to hit since you multiply active identities with active groups), the cache will start thrashing (i.e. adding/removing constantly). When that happens, it results in frequent database calls, which can be fast or slow depending on the health of the database. This means that the maximum throughput of this cache in that state is:

    1/(database call duration) queries per second = 1 / (10ms ) = 100 queries/second

    That’s assuming 10ms average database call – so any variation of performance of that call will cause the throughput to go down and if that throughput goes below the average use of the cache, requests will start queuing. That happened with an account that had a lot of groups, and when it did, it took that account down.

    Lessons Learned

    • Fixed sized caches should have telemetry to show:
      Cache hit ratio
      Eviction Rate
      Average Size
    • Cache Misses should not block readers (as they do above) – any longer than the actual updating of the cache in memory (do not call the database while holding a lock)
    • Locking the entire cache isn’t usually desirable – if the underlying data is partitionable, the cache should be partitioned and the lock protecting it as well.

    The case of the Registry Service Cache

    The registry service also has its roots in the ETM work – we needed to support multiple machines serving as web front ends, and the Windows Registry, being local to a single machine, couldn’t satisfy that requirement. We decided to add a Windows Registry-like service whose data would be stored in SQL Server. We also added a notification system so that individual machines would get notified when a registry key/value changed. When we ported TFS to Azure, we realized that a lot of our SQL chatter was just reading registry settings that typically don’t change often which makes them prime candidates for caching.

    There are a few difficulties associated with caching registry keys:

    • The registry is hierarchical – the API supports reading sub-trees like /Services/*
    • There can be a lot of keys under certain hives (like the user hive which stores per-user settings) that aren’t read often.

    Version 1.0 of the Registry caching solved those issues by:

    • Using a SparseTree<T> which supports tree operations natively
    • Caching only certain roots

    That caused two problems:

    • SparseTree<T> doesn't perform well for some common registry access patterns because it does a lot of string operations, like splitting and reassembling strings, allocating a lot of memory in the process
    • We kept having to add more hives to the cached list as we found cache misses frequently

    The Feature Availability service (aka feature flags), which makes extensive use of the Registry service, was causing a lot of high CPU incidents and we were under pressure to make a quick fix so, so a few people got together one winter night and stated the following:

    • The SparseTree implementation was hard to fix (and it has a lot of other services dependent upon it) and hard to replicate (the tree operations)
    • The Feature Availability Service does only full key reads (i.e. no recursive reads)
    • It’s much better to let the cache figure out what to cache based on needs rather than arbitrarily deciding at compile time

    So we decided the following:

    1. Add a ConcurrentDictionary<String, String> on top of the existing SparseTree to cache single key reads (no locking, no string operations!)
    2. Cache “misses” so we don’t have to go to the SparseTree for non-existing keys
    3. Update the ConcurrentDictionary with the same SQL notifications used to update the SparseTree
    4. Keep the same (no eviction policy) of the current registry cache
    5. Go to the SparseTree for all recursive operations

    Of course, 2 & 4 aren’t compatible. So after a few days, the ConcurrentDictionary got bloated with all sorts of hypothetical keys that could have existed but didn’t.

    Lessons Learned

    • Don’t add a broken cache on top of another broken cache – fix the first one instead.
    • Caching cache misses is a very dangerous business.
    • Always make sure you have a limit on how big your cache can get
    • Make sure you understand the access patterns your cache will need to support efficiently (we thought we did...)

    Cases of bad hash algorithms

    These are classic problems with hash tables in general and Dictionary<T> in particular – they are very clever data structures that rely on a good hash function.

    The case of the Registry Service Callback

    The registry service supports a notification mechanism that lets callers provide a callback when a certain registry key (or hive) changes. It uses a Dictionary< RegistrySettingsChangedCallback, RegistryCallbackEntry> to keep track of the interested parties – that lets us un-register them quickly and efficiently – or so we thought…

    But how exactly does the CLR determine the hash code of a callback?

       COUNT_T hash = HashStringA(moduleName);                 // Start the hash with the Module name           
       hash = HashCOUNT_T(hash, HashStringA(className));       // Hash in the name of the Class name
       hash = HashCOUNT_T(hash, HashStringA(methodName));      // Hash in the name of the Method name

    The hashcode is basically a combination of the module, the class and the method. What could possibly go wrong?

    One thing that’s missing from the above computation is the object itself – the callback is a combination of a method, identified by name/class/module and a target (the instance of the object). In our case, our callers were mostly the same method, but with different objects. The results was that our beautiful dictionary became a small number of giant linked lists since every hash value came from a small set (e.g., only about 10 unique hash values with thousands of entries each).

    The case of the WIT Node Id Cache

    Work Item Tracking had a cache of classification nodes (indexed by the node ID which was an Int32), and everything worked like a charm and everyone was happy. A little while later, during the project model conversion work for team project rename, it was necessary to add a Guid to fully qualify the node id. The node was changed from a Int32 to a Struct with both the Guid of the project and the node id, like so:

    struct ClassificationNodeId

               public Guid DataspaceId;
               public int NodeId;

    That seemed quite efficient and everything worked properly during testing, and even after the code was in production on nearly all of the TFS Scale Units it seemed to work well. When the scale unit with our largest account was upgraded, the CPU was pegged on all of the application tiers.


    Note that the deployment was rolled back about 10 minutes after the first attempt. After getting the right instrumentation, we noticed that the CPU came from adding to a dictionary.

    Question: What’s the default hash code implementation on a struct? Answer after the break!

    The answer is actually quite complex. It depends on whether the struct has reference types embedded in it (like a String), and in this case, the hash code is the hashcode of the struct type combined with the hash code of the first non-static field. In this case, the GUID is a project GUID. In an account with a small number of projects but a large number of area paths, the result was a few unique hash values and thousands of entries that hashed to one of those few has values. Again, a Dictionary-based hash was turned into a list.

    From the implementation of struct:

           **Action: Our algorithm for returning the hashcode is a little bit complex.  We look
           **        for the first non-static field and get it's hashcode.  If the type has no
           **        non-static fields, we return the hashcode of the type.  We can't take the
           **        hashcode of a static member because if that member is of the same type as
           **        the original type, we'll end up in an infinite loop.
           **Returns: The hashcode for the type.
           **Arguments: None.
           **Exceptions: None.
           public extern override int GetHashCode();

    The other problem is that the default Equals implementation uses reflection to do the comparison (memcmp isn’t enough if you have reference types like strings).

    The case of the mostly case-sensitive Identity Descriptor

    A lot of the APIs in IMS use IdentityDescriptors which are basically a Tuple<String,String>, and we want those strings to be case insensitive (because Joe Bob is the same as joe bob). This time, we actually wrote a Comparer to get the right behavior:

       public int Compare(IdentityDescriptor x, IdentityDescriptor y)
           Int32 retValue = 0;
           if ((retValue = VssStringComparer.IdentityDescriptor.Compare(x.Identifier, y.Identifier)) != 0)
               return retValue;

           return VssStringComparer.IdentityDescriptor.Compare(x.IdentityType, y.IdentityType);

       public bool Equals(IdentityDescriptor x, IdentityDescriptor y)
           return Compare(x, y) == 0;

       public int GetHashCode(IdentityDescriptor obj)
           return obj.IdentityType.GetHashCode() + obj.Identifier.GetHashCode();

    What’s wrong with this code? Well, MSDN clearly states: “If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two objects do not have to return different values.”

    Now we can plainly see that Compare(“Joe Bob”, “joe bob”) will return true, but what about their hash codes? That’s easy! It’s the sum of the two hash codes of the two strings. From looking at the definition for String.GetHashCode() we see:

       int hash1 = (5381 << 16) + 5381;
       int hash2 = hash1;
           fixed (char* src = this)
               // 32 bit machines.
               int* pint = (int*)src;
               int len = this.Length;
               while (len > 2)
                   hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
                   hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
                   pint += 2;
                   len -= 4;
               if (len > 0)
                   hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];

    Clearly, that hash code is case sensitive. Now our dictionary is a giant cache miss machine, because we’ll have multiple entries for the same value in different buckets.

    Lessons Learned

    • If you use a Dictionary<K,V> - make sure K has a good hash code and that your data is well distributed: bad hashing doesn’t affect correctness, so don’t think that it’s fine because it doesn’t throw.
    • When in doubt, pass a comparer to the Dictionary
    • Check your performance with the data you expect
    • Make sure you override GetHashCode and Equals properly

    Caching Check List

    • Why is the data being requested so frequently and is that behavior necessary?
    • Do you really need to cache this data? Could you change your code not to require it?
    • How often does the underlying data change? Are you fine if your cache is out of date?
    • How much data will you need to store in memory?
    • What eviction policy will you have?
    • Have you fully understood the access patterns the cache will need to support?
    • What hit rate do you expect?
    • Do you have telemetry to know whether your cache is operating correctly in production?

    Follow me at twitter.com/tfsbuck

  • Buck Hodges

    NuGet packages for TFS and Visual Studio Online .NET client object model


    For the past couple of releases we’ve released the Team Foundation Server/Visual Studio Online client object model as a downloadable installer (aka MSI). Additionally, the license for it did not include the right to package it in an application for redistribution. As a result, it was inconvenient for builds and added friction for installing an app that depended on it (at least on machines without Visual Studio or Team Explorer). Additionally, the client libraries were installed into the .NET GAC in previous versions, which meant that you couldn’t have an application have its own copy of the client libraries – installing a newer client object model would affect Visual Studio and TFS if they were installed on the same machine and using the same major version but perhaps a different update level.

    We’ve addressed these problems. By packaging the object model/client libraries as a NuGet package, they are much easier to consume in a build. You don’t need to install them on a build machine, which would require administrative privileges. The license now allows you to bundle these libraries with your application for redistribution. Please note that this does not change the requirement that the end user must have a TFS CAL in the case of Team Foundation Server. In the case of VSO, simply having access to an account is sufficient (i.e., each user on VSO is assigned a license in order to gain access, which is different than TFS). As an aside, you can read more about the changes we are making to TFS CALs and VSO pricing in Brian’s post.

    As you can see here, we’ve released four packages. The descriptions aren’t as helpful as I’d like, and we are getting that fixed.

    First, I need to give you some context. We started building Team Foundation Server more than 12 years ago. Back then, SOAP was the primary protocol for building web services. So, everything in TFS used SOAP, including version control, work item tracking, build, test case management, etc. Then a few years ago REST started taking over. Very few web services are being built using SOAP today (I’d say none, but I’m sure someone would point on one I missed!). Also, REST web services are easier to consume in everything from Javascript to iOS apps. So it made sense for us to start moving our web services to REST.

    When we built SOAP web services, we declared them to be an internal implementation detail and that consumers would need to use our .NET client libraries or our Java SDK. If you’ve ever looked at the SOAP web services, you quickly see that they can be cryptic, and they were clearly designed by different teams given the dramatic differences between some of the different feature areas.

    When we made the decision to start building REST APIs, we decided we wanted to follow a different process to produce REST APIs that are consistent and easy to consume. We decided we’d have a set of guidelines and a review process. We didn’t want to create guidelines from scratch, so we adopted the guidelines the Azure team uses and gave them some feedback, which they incorporated. There’s now a broader effort across the company to standardize REST guidelines.

    With an existing product like TFS, changing protocols is not a simple thing. The product contains lots of SOAP endpoints. All of the existing clients have to continue to work for years, as breaking changes become blockers for customers moving to newer versions of TFS or potential work stoppage issues with VSO since we update VSO constantly. REST is also very different than SOAP, so you can’t just translate SOAP web services into REST (or you’ll end up with something terrible). As a result, it’s a gradual transition to move from SOAP to REST.

    With that context, here’s a description of each package of .NET client libraries. Applications that use a wide range of features are likely to need more than one of these packages.

    • Microsoft.TeamFoundationServer.ExtendedClient This package contains the traditional TFS/VSO client object model that uses the SOAP API. You will need it to get full access to the work item tracking object model, version control, test management, and more. If you’ve used the TFS/VSO client object model in applications before, this is the one you’ve used. Because not every API is available in TFS 2015 or VSO currently as a REST API, there are going to be cases where you must use this package. Similarly, there are new features that have been built with only REST APIs for which you will need the Client package.
    • Microsoft.TeamFoundationServer.Client Here you’ll find convenient .NET wrappers for our REST APIs for build, team rooms, version control, test case management, and work item tracking. You can of course call the REST APIs directly without using this library. You will encounter cases where an API is not available in this package and have to also use the ExtendedClient package.
    • Microsoft.VisualStudio.Services.Client If you need to access account, profile, identity, security, or other core platform services, you’ll need this package.
    • Microsoft.VisualStudio.Services.InteractiveClient This package provides the library necessary to show a user an interactive prompt for credentials to sign in. If you are using basic authentication, personal access tokens, or OAuth, you won’t need this.

    Here’s a diagram showing the dependencies among the packages. NuGet will automatically handle the dependencies for you. For example, if you choose to use TeamFoundationServer.ExtendedClient, NuGet will pull in the other three automatically.




    NOTE: If you only need the REST APIs and choose to use TeamFoundationServer.Client, you’ll want to use Services.InteractiveClient also if you need to allow users to log in interactively and thus add a reference to both NuGet packages (NuGet will then automatically get Services.Client for you).

    We have a few samples to get you going, and a REST API reference in addition to the traditional TFS/VSO client object model documentation.

    One thing we’ve never had is support for the Portable Class Library (PCL). I’m happy to say that we are working on that and will start adding PCL-compatible libraries as they become available.

    We are also working on adding a package for Team Explorer extensibility.

    [Update 8/18/15] For using Git via an API, you’ll need the LibGit2Sharp package.

    Follow me at twitter.com/tfsbuck

  • Buck Hodges

    Fix: Windows 10 upgrade couldn’t update the system reserved partition


    Disclaimer: This is what worked for me, and it’s not guidance from Microsoft. It may not work for you. Since this involves resizing partitions, it could wipe out all of your data. You may want to create a backup first. Proceed at your own risk.

    Over the weekend I upgraded my machines at home to Windows 10. I had two desktops and one Surface Pro (the first one) running Windows 8.1. Since I had multiple machines to upgrade, I downloaded the Windows 10 installer to a USB flash drive using the media creation tool mentioned on Download Windows 10. For me, the media creation tool wouldn’t recognize the USB drive, so I chose the ISO and copied the contents to my flash drive. I used the flash drive to upgrade my Surface Pro with no issues.

    Then I tried to upgrade my desktops. The first one failed with the message saying, “We couldn’t update the system reserved partition.” That happened after it downloaded updates (that takes a while). I tried my other desktop and got the same message.

    I pulled up diskmgmt.msc and saw that my system partition had a size of 100 MB and was essentially completely full – 3% free. Both machines were in that state. So I started searching for a solution. I ran across a couple of places, such as this one on reddit, that had a set of instructions to free up space. It included commands that I didn’t actually know existed (I’ve never needed takeown).

    I followed the instructions and had more than 50 MB free on system partition. I ran the Windows 10 upgrade, and this time it got further before failing and failed with a different message (I don’t remember exactly what it said).

    At this point I decided to expand the system partition. The thread on reddit mentioned a tool called MiniTool Partition Wizard Free. I did a search and found a review on PCMag. They were complimentary of the pro version, so I decided to give it a try.

    The UI makes it really easy to drag the OS partition to resize it a bit smaller and then expand the system reserved partition to make it bigger. I shrunk my OS partition by 200 MB and increased the system reserved partition to 300 MB. After hitting Apply, Windows has to be rebooted for the tool to make the change. I did that and let it do its work, and then all was good. Windows 8.1 booted up just fine with the newly resized partitions.

    I ran the Windows 10 upgrade again, and the upgrade proceeded smoothly. Thanks to the folks at MiniTool for a great tool!

    Follow me at twitter.com/tfsbuck

  • Buck Hodges

    How we deploy Visual Studio Online using Release Management


    We use VS Release Management (RM) to deploy Visual Studio Online (VSO), and this post will describe the process. Credit for this work goes to Justin Pinnix, who’s been the driving force in orchestrating our deployments with RM. This documentation will help you get familiar with how RM works and how you can use it to deploy your own services, and you can find even more details in the user guide.


    First, let’s briefly cover some terminology. RM has the notion of stages of a release, which are the steps to run your release. Each stage deploys a VSO scale unit.

    VSO consists of a set of scale units that provide services like version control, work item tracking, and load testing. There are scale units in multiple data centers. Each scale unit consists of a set of Azure SQL Databases with customer data and virtual machines running the application tiers that serve the web UI and provide web services and job agents running background tasks. We limit how many customers are located in a given scale unit and create more to scale the service as it grows, which is currently at seven scale units. We also have a central set of services that we call Shared Platform Services (SPS) that includes identity, account, profile, client notification and more that nearly every service in VSO uses.

    One of our scale units (SU0) is special in that it is the scale unit used by our team for our day-to-day work, and changes are rolled out first on this scale unit. SU0 is called our “dogfood” scale unit – something that others have called a “canary.” Whether you want to think of it as us eating our own dogfood first or as the canary in the coal mine, the goal is that we find problems with our team before they become problems for our customers. This has proved to be invaluable in catching issues before they affect customers.

    We currently use what’s called “VIP swap” to deploy new VMs. This means that we create a new set of VMs with the new release in a “staging slot” and then swap the VMs in production with the ones in the staging slot. The VIP, which is the virtual IP address that every client uses to talk to VSO, never changes while the VMs behind it are swapped out en masse. This is not the best approach. Because of the way that the software load balancers in Azure work, there are connections that get severed in the process, resulting in a small percentage of user requests failing and generating monitoring alerts. Later this year we plan to change the service to support a rolling upgrade where we’ll upgrade one VM at a time and then tell the service to begin serving the updated experiences once all VMs are updated in a scale unit. This better approach is the approach recommended by Azure.


    For a given update, there are a set of people involved who play particular roles. Here are the roles that we use.

    Engineer - An individual on the product team who has built a hotfix or configuration change to be deployed. This person will be responsible for driving the process of getting it deployed.

    Release Manager - An individual on the product team who will be responsible for driving a sprint deployment. Duties are similar to that of "engineer" except they are working with a larger payload that represents many teams' work over a sprint or more.

    Release Approver - Someone who is charged with reviewing and approving hotfixes and configuration changes. This person is usually a group engineering manager (GEM) but may be someone else designated by a GEM. Approvers should be well versed in product technology and release practices. They are responsible for protecting the health of the service from errant changes. For compliance reasons, this may NOT be the same person as the Engineer for a particular release.

    Overview of the release process

    We use a stage per scale unit, and the stages run in sequence. This acts as a promotion model, starting with pre-production, then internal customers, followed by external customers.  Each scale unit (stage) executes an identical set of approximately 10 steps including the binary update, several database update steps, and an automated health check that rolls back the deployment if it’s not healthy.  Most of the DB steps run synchronously, except for the part that upgrades each customer account – those run asynchronously and in parallel over a period of days for sprint updates (hotfixes are much quicker).

    We currently have two kinds of deployment execution environments. The first uses agents and the other doesn’t use any agents. We already had a significant investment in internal deployment tools before Release Management became available. These tools are PowerShell cmdlets that run on dedicated deployment VMs that are on-premises. Our VS RM release templates simply connect to agents on these deployment VMs and drive the existing deployment cmdlets.  It works great because VS RM fills in the gaps in what these tools didn’t do a good job of – delegating execution of the scripts, approval workflows, sequencing of the scale units, and storing logs for auditing and debugging/purposes.

    The services that use agentless templates work the same way. They just use remote PowerShell to execute the PowerShell cmdlets. Eventually, we will do away with the agents and use remote PowerShell for all deployment executions.

    We also use RM to manage configuration changes to the system, including their auditing and approval. For example, if someone wants to make change to a setting in a service or make a database change, it’s done with an RM release.

    Here’s an example of what the stages look like.


    Below you can see part of the workflow for a given stage (each stage deploys a scale unit, starting with pre-production). The workflow consists of the following sequence (the screen shot shows only the first two).

    • Verify prerequisites
    • Send email notifications
    • Pre-binary database update
    • Update service binaries
    • Verify service health
      • At this point roll back if there is a problem
    • Clean up the staging slot
    • Update configuration database
    • Update partition databases
    • Post-partition database update



    The Release Manager queues a release in RM using the appropriate build and “Sprint Deployment” template after making sure no other releases are in progress for this service. If it’s a binary hotfix or a configuration change, the Engineer queues the release.

    Next the release will enter the “acceptance” portion of the pre-production stage. The Release Approver must enter the acceptance approval.

    Upon acceptance approval, the pre-production stage will execute. After the VIP swap, the RM template will call the Verify-ServiceDeployment script to ensure the new binaries are healthy. If a problem is encountered, the deployment will be automatically rolled back. Upon success (approximately 6 minutes), the staging slot will be deleted and the release will progress. Once the pre-production stage completes, it will be automatically be marked as validated and the release will progress to the “acceptance” phase of the SU0 stage.

    For sprint deployments, each scale requires a manual acceptance approval. This approval should be done by the engineer and is simply there to control timing.

    For binary hotfix deployments, the deployments will automatically roll out and check service health at each stage.

    For other hotfix deployments, wait 30 minutes after SU0 is deployed before moving to other SUs.


    After selecting a build based on the test results, the Release Manager or Engineer goes to the "Configure Apps" tab of the RM client and chooses “Agent-based Release Templates” and selects the appropriate template (a sprint deployment in this case).


    After clicking “New Release,” enter the build path and choose a meaningful name for the release ("ServiceName Sprint Deployment (Prod) MSprintNumber BuildNumber").


    The client will transition to the "Releases" screen showing the status of the release.


    Once the "Deploy" step begins, clicking the "..." button will show a more detailed view of the deployment's progress.


    The process will be repeated for the remaining production SUs. Once the last stage is signed off, the release will move into the "Released" stage.


    When we started deploying VSO, it was a manually orchestrated process with someone logged into a designated machine running a special environment and often copying and pasting commands for the custom parts of each deployment. It was a tedious and error prone process! We’ve long since put an end to that, and RM has helped us run fully automated deployments and configuration changes that anyone on the team can watch and for which we have history/audit logs.

    Now you have some insight into how we deploy VSO using the VS RM product. Hopefully this gives you some ideas that will help you define your own releases.

    [Update April 9, 2015] I’ve added a few more details in the Overview section.

    Follow me at twitter.com/tfsbuck

  • Buck Hodges

    Add, edit, rename, and delete files in the web browser with Visual Studio Online and TFS 2015


    Back in the December 17th update, we added the ability to make changes to files in the web UI, and it works for both TFVC and Git. Edit is easy to find, since it’s right above the content on the screen.

    Add, rename, and delete are available through drop down menus. Let’s take a look at where those are.

    To add files, you need to click on the triangle beside the parent folder. In that same menu, you can also rename or delete the folder as well as download the folder as zip file.


    When adding files, I can either choose to create a new file, which will then take me to the editor in the browser, or I can upload existing files by dragging and dropping or by browsing.


    When you add or edit a file, don’t forget to update the comment to something meaningful – it’s the text beside your avatar.


    To rename or delete a file, you need to click the triangle beside the file’s name to get the drop down menu to appear.


    While all of my screen shots are with TFVC, this all works with Git repos as well.


    For TFVC, saving an edit checks in the change unless you have gated checkin configured. If you are using gated checkin, your change will be submitted if the build passes. If you have continuous integration enabled, saving the file will trigger a build.

    For Git, saving will by default commit your change. You have the option of committing to a new branch and issuing a pull request. To do that, click the triangle beside the save icon.


    Then you’ll have the opportunity to name the branch.


    If you go with the default option to create a pull request, you’ll be taken to the pull request experience automatically.



    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Original Surface RT is a great digital photo frame


    I have four Surfaces: the original RT, Surface 2 RT, Surface Pro, and Surface 3. When I got the original Surface RT, which Microsoft gave to every full time employee shortly after it came out, I bought the Type keyboard and stopped using my iPad. The device was sluggish, though, particularly with Outlook. I bought a Surface 2 RT when it came out because it’s a much better machine (faster and even better display), and I carry it as a backup machine when I travel (I use my Surface 2 frequently).

    A couple of months ago, I bought a new digital photo frame (the one that I had died). The display on it wasn’t very good even though it was a higher-end model. Then it hit me that the original Surface that I was no longer using would make a great photo frame. The display is good, and the kickstand is a good angle for a photo frame. You can also find them on eBay for about the same price as a high-end dedicated photo frame.

    I tried a few apps and settled on Picture Frame Slideshow, which is free. I loaded all of our pictures into a OneDrive folder, and let the app cycle through them randomly. It works really well and is the best digital picture “frame” I’ve had. I’ve set it up in the kitchen, and I wish I had done this sooner.

  • Buck Hodges

    Mainstream support for TFS 2010 ends in July


    Time flies, and the end of mainstream support for Team Foundation Server 2010 is July 14th. Yep, we’re celebrating it on Bastille Day. If you are still using TFS 2010, now is a great time to upgrade to TFS 2013 Update 4. Also, our next public pre-release of TFS 2015 will be “go-live” (the current CTP is not), meaning you can use it in production.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Moving TFS to cloud cadence and Visual Studio Online


    We get quite a few questions from customers on how we made the transition to shipping both an on-premises product and a cloud service. We moved from shipping every 2-3 years to shipping Visual Studio Online every three weeks and TFS every 3-4 months. You’ve probably seen the great set of vignettes at Scaling Agile across the Enterprise. It was also recently covered in a report from McKinsey and in an interview on Forbes.com. What’s missing is a deeper description of what changes we’ve made to how we work.

    A couple of years ago, we wrote document on how the team changed to meet the new demands of building a service. It will give you a lot more information on what we did and how for anyone who wants to go deeper. I’ve cleaned up the document by converting our internal terminology, but it’s essentially unchanged from when it was written. Here is the summary from the document, and you will find the entire document attached to this blog post as a PDF.

    The adoption of Scrum for TFS 2012 was driven by our desire to deliver experiences incrementally, incorporate customer feedback on completed work before starting new experiences, and to work like our customers in order to build a great experience for teams using Scrum. We used team training, wikis, and a couple of pilot teams to start the adoption process.

    We organized our work in four pillars of cloud, raving fans, agile, and feedback with each having a prioritized backlog of experiences. Teams progress through the backlog in priority order, working on a small number of experiences at any point in time. When starting an experience, teams break down the experience into user stories and meet with leadership for an experience review. Each three-week sprint starts with a kick off email from each team, describing what will be built. At the end of the sprint, each team sends a completion email describing what was completed and produces a demo video of what they built. We hold feature team chats after every other sprint to understand each team’s challenges and plans, identify gaps, and ensure a time for an interactive discussion. On a larger scale, we do ALM pillar reviews to ensure end to end cohesive scenarios.

    With the first deployment of tfspreview.com in April 2011, we began our journey to cloud cadence. After starting with major and minor releases, we quickly realized that shipping frequently would reduce risk and increase agility. Our high-level planning for the service follows an 18 month road map and a six month fall/spring release plan in alignment with Azure. To control disclosure, we use feature flags to control which customers can access new features.

    Our engineering process emphasizes frequent checkins with as much of the team as possible working in one branch and using feature branches for disruptive work. We optimize for high quality checkins with a gated checkin build-only system and a rolling self-test system that includes upgrade tests. During verification week, we deploy the sprint update to a “golden instance” that is similar to production. Finally, we ensure continuous investment in engineering initiatives through engineering backlog.

    We’ve made a number of changes in the last couple of years, and I’ll write about those in upcoming posts. The biggest changes have been in our engineering system and in our organizational structure. We now have the TFS/VSO team in Visual Studio Online. When we moved into VSO, we also moved from TF version control into Git. As we previously moved to Scrum in part to be able to build and use experiences needed for Scrum teams (and now Kanban as well), we wanted to ensure we build a great experience for Git and that we also derive benefits from the workflows that it enables, including being able to easily branch, do work, and commit the change back into master. TFS 2015 CTP1 and Team Explorer in VS 2015 CTP6 are the first releases we’ve made to the on-premises products from this new engineering system.

    We’ve also changed the organizational structure by combining development and testing into a single engineering discipline. We made the change four months ago, and we are still learning.

    Where we are now is also not where we want to be. For example, our architecture clearly shows our on-premises software origins, and we still have work to do including splitting out version control, build, work item tracking, and test case management into separate, independent services. At the same time, we need to collapse them back into a product that’s easy to run on-premises (service hooks, for example, is a separate service in VSO that is going to ship in TFS 2015). It’s an evolution – making changes to the product while continuing to ship valuable features for both the cloud and on-premises customers, all from a common code base owned by a single team (rather than separate cloud and on-premises teams).

    I’m sure this post will create many questions, which will build a backlog of posts for me.

    [Update March 12, 2015] Brian has written a new post about the future of Team Foundation Version Control.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Visual Studio Online reliability improvements


    We’ve had a number of outages and other serious incidents in recent months. It’s clear we haven’t done enough to invest in reliability of the service, and I want to give you some insight into what we are working on that will be coming in January and beyond.

    First I want to give you a very brief description of the the Visual Studio Online (VSO) service topology. VSO consists of a set of scale units that provide services like version control, work item tracking, and load testing. Each scale unit consists of a set of Azure SQL Databases with customer data and virtual machines running the application tiers that serve the web UI and provide web services and job agents running background tasks. We limit how many customers are located in a given scale unit and create more to scale the service as it grows, which is currently at six scale units. We also have a central set of services that we call Shared Platform Services (SPS) that includes identity, account, profile, client notification and more that nearly every service in VSO uses. When SPS is not healthy, the entire VSO ecosystem suffers.

    We have been working on making VSO and especially SPS more resilient, as many of the recent outages have stemmed from problems affecting SPS. While the work will take months to complete, every three-week sprint deployment will include improvements. Here’s an overview of the work.

    Breaking apart SPS

    One of the lessons we learned from the outages has been to ensure that the less critical services cannot take down the critical services. We’ve had cases where a dependency, such as SQL or Service Bus, becomes slow, and our code has consumed precious resources like the .NET thread pool. While we have fixed the particular issues we’ve hit, the best approach is isolation. As a result, work is currently under way to split out profile and client notification services from SPS and have them be separate, independent services. Profile is responsible for your avatar and roaming settings in Visual Studio. Client notification is responsible for notifying Visual Studio of updates, such as notifying other instances of VS when you change your roaming settings. They are both important services, but most users will not notice if either of those is not working for a short period of time. This is part of an overall move to a smaller independent services model in VSO.

    Another lesson has been that we have too much depending on a single configuration database for SPS. We are going to be partitioning that database so that if there is an issue with a database it takes down only a subset of VSO accounts and not all. Part of this work will be determining how many partitions we’ll use. This will also provide us more headroom, allowing us to absorb more spikes in load, such as those occurring during upgrades.


    Today we have redundancy provided by our dependencies. For example, SQL Azure uses clusters with three nodes per cluster. Azure Storage maintains three copies of each blob. We use multiple PaaS VMs for each role in a scale unit.

    However, we don’t have redundancy at the service level. For SPS in particular, we need to provide redundancy to be able to fail over to a secondary instance of SPS if there is a problem in the primary. We will begin this work in the first quarter of this year.


    Graceful degradation is a key principle of resilient services. We are actively working on making VSO services more resilient to their underlying dependencies, whether those are Azure services like SQL and Storage, or our own VSO services, such as SPS. Our first initiative is to contain failures by implementing circuit breakers. Circuit breakers work by detecting an overload condition, such as a backup caused by a slow dependency, and then causing subsequent calls to fail, return a default value, or some other appropriate action to reduce load and prevent the exhaustion of precious resources, such as the databases and thread pool. We now have them implemented in several places in the code. To have them work effectively, each needs to be tuned so that they trip only when needed. Based on the telemetry, we’ll configure them and let them operate automatically. We have quite a few places that need circuit breakers, and our primary focus is on SPS and the underlying VSO framework.

    We will also build additional throttling into the system both to prevent abuse as well as tools that run operations either too frequently or that are very inefficient.

    Also, we are going to be introducing back pressure into the system – indications in API responses that there is a problem and the need to back off. Azure services such as SQL and Service Bus already provide this today by using particular error codes for failed requests that tell the callers whether to retry and if so when. We make use of that information in calling those services, and we need to introduce the same for the services we build. This work will be starting within the next couple of months.

    In order to systematically analyze key components like SPS, we are adapting resiliency modeling and analysis (RMA) to our needs. This is a process we are just starting. In addition to the follow up items that it will generate, we want to build a culture of reliability.

    In order to verify our improvements as well as to continue to discover new issues, we’ve started investing in chaos monkey testing (a term that Netflix coined). We’re still in the early stages, but it’s something we will do a lot more with over the next few months.

    Reduce load

    SPS handles a very large volume of calls. We’ve been spending time understanding the sources of those calls and what we can do to reduce the call volume either by eliminating the calls altogether or caching frequently requested data, either in SPS or in the services making the requests. That’s resulted in a significant reduction in the number of calls made. Of course, the service continues to grow, so this work continues.

    We are also looking at the query plans and I/O performance data of our top stored procedures by using database management views (DMVs) in SQL. We’ve done exercises to examine these in the past, and we are looking at how we can automate this. In January we will complete the process of moving our SQL Azure databases to the new version of SQL Azure that will provide us with XEvents for more insight into what’s happening at runtime. The result of this analysis will be tuning or in some cases rewriting stored procedures for optimal performance. The new version of SQL Azure also provides us with benefits such as improved tempDB performance.

    When there is a problem, such as a service upgrade that puts too much load on SPS, we need to be able to quickly reduce the traffic to SPS in order to recover. We’ve seen several incidents lately where getting SPS back to a healthy state has required effectively draining the queue of requests that have resulted from the system being overwhelmed. Without doing this, the system doesn’t make sufficient progress to recover. Along with circuit breakers, we’ll add switches to be able to fail requests quickly to clear the queue and get back to a healthy state quickly.

    Monitoring and diagnosis

    When there is an incident, having the right telemetry available saves valuable time. We already have extensive telemetry on VSO that generates gigabytes of data per day that we analyze through use of alerting systems, dashboards, SQL warehouse, and reports. We also use Application Insights Global System Monitoring for an “outside in” view of VSO availability. We are adding more telemetry around the effectiveness of caches, identity calls, and other areas where we’ve found we haven’t had enough insight into the system during live site investigations.


    We continue to learn from every live site incident. We are investing heavily in making VSO reliable and more resilient to failures.

    Just like our customers, we are entirely dependent upon the stability of VSO for the work we do. Since May, we have had the entire team that builds VSO using VSO for source control, work item tracking, builds, etc. As part of that migration we set up a scale unit that just has our team on it and where we deploy changes first before rolling changes out to the scale units with customers. This has proven very valuable for finding issues that happen under load at scale that are very difficult to find in testing. There is no place like production.

    While the focus of this post has been to describe what we are working on, I want to share some highlights of improvements we’ve made recently.

    • Decreased total calls to SPS by 25%
    • Client notification and profile are throttled to limit resource usage
    • Decreased CPU pressure on the web service VMs by 10%
    • Reduced peak average SQL sessions on configuration database by 10%

    We apologize for the outages we’ve had recently, and I wanted to let you know we are working hard at making the service more reliable.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    How to add licensed users to VS Online via the API


    A while back someone asked a question about how to use the API to add licensed users to a VSO account and change licenses for existing users (instead of the web UI to invite and assign licenses to users). Abhijeet, a dev on my team, gave me some code to get me started, as the identity APIs are not easy to figure out.

    Licensing is different on VSO than it is with Team Foundation Server. With VSO, every user who is added to an account must have a license assigned. We need to be able to add a new user and also to change the license for an existing user. Let’s take a look at how the code to do this works.

    Shared Platform Services

    The first thing to notice is that the code is operating on the account that is stored in what we call Shared Platform Services (SPS). SPS is a core set of services, such as identity, account, and profile, that are used by every service in VSO. The notion of a collection, which is a concept that was introduced in TFS 2010, doesn’t exist in SPS. That’s because not every service in VSO has to have the notion of a collection. Collection is a TFS concept that is used by version control, work item tracking, etc.

    Since we need to make a change to the account, we are going to be talking to SPS. Rather than making calls to https://MyAccount.visualstudio.com as you are used to seeing, we are going to be calling https://MyAccount.vssps.visualstudio.com. The “vssps” part of the URL takes us to SPS, and the account name in the URL gives SPS the context.

    You can see SPS in some scenarios if you watch the address bar in your browser. For example, go to http://visualstudio.com and sign in. Then click on your name to go to your profile page. You will see that the URL is https://app.vssps.visualstudio.com/profile/view?mkt=en-us (or you can directly click that URL). In that context, you are asking SPS for your profile and the list of accounts that you have across the system. SPS is responsible for that information.

    Once we’ve connected to SPS using the same credentials that you normally use as administrator of your account, we get the client representations of the identity and account services in SPS that we’ll use. The next thing we do is to determine if the user is new or an existing user.

    Adding a New User with a License

    New users have to be added to the system. We call that process “bind pending.” What we mean by that is that we will add a reference to a user – nothing but the email address. That email address is not bound to anything. Once a user logs into VSO with an email address that matches a bind-pending entry, we’ll create a full Identity object that references the unique identifier of that user. We’ll also have enter basic info for a profile (e.g., display name) and link that to the identity.

    The Identity API calls for this operation are not at all obvious, and this is something we need to improve. I’ve added a lot of comments to the code to help explain what’s going on. At a high level, we first have to construct a descriptor for this bind pending identity. Each account in VSO is either an MSA-backed account (i.e., it only uses MSAs, also known as LiveIDs) or AAD-backed (i.e., all identities reside in an Azure Active Directory tenant). We make that determination using the properties of the administrative user. Then we first have to add a new user to a group in order for the user to be part of the system – for there to be a reference to the identity. Since every user must have a license, we add the user to the licensed users group.

    Changing the License for an Existing User

    This case is much simpler. If the user is already in the account, the code just makes a call to set the license with licensingClient.AssignEntitlementAsync.

    Notes on Licensing

    The names for the licenses match what you’ll find in our documentation for pricing levels except AccountLicense.Express. That’s the name that the code uses for Basic (at one time it was going to be called Express, and the code didn’t get updated). If you look at the AccountLicense enum, you’ll also find AccountLicense.EarlyAdopter. That was only valid until VSO became GA, so it can no longer be used.

    The MSDN benefits license is different than the other licenses because it is dependent upon a user’s MSDN license level. While you could explicitly set the license to a particular MSDN benefits level, you’d only cause yourself problems. Setting it to MsdnLicense.Eligible as the code does below means that the service will handle setting the user’s MSDN benefits level properly upon logging in.

    The licensing API right now uses an enum rather than a string for the licenses. The result is that AccountLicense.Stakeholder doesn’t exist in the client API prior to Update 4. You’ll see in the code that I commented out Stakeholder so that it builds with Update 3, which is the latest RTM update at the time this post is being written. In the future the API will allow for a string so that as new license types are added the API will still be able to use them.

    There are limits for licenses based on what you have purchased. For example, if you try to add a sixth user licensed for Basic, you will get an error message. Here’s how to add licenses to your VSO account.

    One other thing that I’ll mention is that you may have to search around for the credential dialog prompt. That dialog ends up parented to the desktop, so it’s easy for it to get hidden by another window.

    I’ve attached the VS solution to this blog post, in addition to including the code in the post. I highly recommend downloading the solution, which will build with VS 2013 Update 3 or newer (it may work with earlier versions of 2013, but I didn’t try it).


    Follow me on Twitter at twitter.com/tfsbuck

    using System;
    using System.Linq;
    using Microsoft.TeamFoundation;
    using Microsoft.TeamFoundation.Framework.Common;
    using Microsoft.VisualStudio.Services.Client;
    using Microsoft.VisualStudio.Services.Identity;
    using Microsoft.VisualStudio.Services.Identity.Client;
    using Microsoft.VisualStudio.Services.Licensing;
    using Microsoft.VisualStudio.Services.Licensing.Client;
    namespace AddUserToAccount
        public class Program
            public static void Main(string[] args)
                if (args.Length == 0)
                    // For ease of running from the debugger, hard-code the account and the email address if not supplied
                    // The account name here is just the name, not the URL.
                    //args = new[] { "Awesome", "example@outlook.com", "basic" };
                if (!Init(args))
                    Console.WriteLine("Add a licensed user to a Visual Studio Account");
                    Console.WriteLine("Usage: [accountName] [userEmailAddress] <license>");
                    Console.WriteLine("  accountName - just the name of the account, not the URL");
                    Console.WriteLine("  userEmailAddress - email address of the user to be added");
                    Console.WriteLine("  license - optional license (default is Basic): Basic, Professional, or Advanced");
            private static void AddUserToAccount()
                    // Create a connection to the specified account.
                    // If you change the false to true, your credentials will be saved.
                    var creds = new VssClientCredentials(false);
                    var vssConnection = new VssConnection(new Uri(VssAccountUrl), creds);
                    // We need the clients for two services: Licensing and Identity
                    var licensingClient = vssConnection.GetClient<LicensingHttpClient>();
                    var identityClient = vssConnection.GetClient<IdentityHttpClient>();
                    // The first call is to see if the user already exists in the account.
                    // Since this is the first call to the service, this will trigger the sign-in window to pop up.
                    Console.WriteLine("Sign in as the admin of account {0}. You will see a sign-in window on the desktop.",
                    var userIdentity = identityClient.ReadIdentitiesAsync(IdentitySearchFilter.AccountName, 
                    // If the identity is null, this is a user that has not yet been added to the account.
                    // We'll need to add the user as a "bind pending" - meaning that the email address of the identity is 
                    // recorded so that the user can log into the account, but the rest of the details of the identity 
                    // won't be filled in until first login.
                    if (userIdentity == null)
                        Console.WriteLine("Creating a new identity and adding it to the collection's licensed users group.");
                        // We are adding the user to a collection, and at the moment only one collection is supported per
                        // account in VSO.
                        var collectionScope = identityClient.GetScopeAsync("DefaultCollection").Result;
                        // First get the descriptor for the licensed users group, which is a well known (built in) group.
                        var licensedUsersGroupDescriptor = new IdentityDescriptor(IdentityConstants.TeamFoundationType, 
                        // Now convert that into the licensed users group descriptor into a collection scope identifier.
                        var identifier = String.Concat(SidIdentityHelper.GetDomainSid(collectionScope.Id),
                        // Here we take the string representation and create the strongly-type descriptor
                        var collectionLicensedUsersGroupDescriptor = new IdentityDescriptor(IdentityConstants.TeamFoundationType, 
                        // Get the domain from the user that runs this code. This domain will then be used to construct
                        // the bind-pending identity. The domain is either going to be "Windows Live ID" or the Azure 
                        // Active Directory (AAD) unique identifier, depending on whether the account is connected to
                        // an AAD tenant. Then we'll format this as a UPN string.
                        var currUserIdentity = vssConnection.AuthorizedIdentity.Descriptor;
                        var directory = "Windows Live ID"; // default to an MSA (fka Live ID)
                        if (currUserIdentity.Identifier.Contains('\\'))
                            // The identifier is domain\userEmailAddress, which is used by AAD-backed accounts.
                            // We'll extract the domain from the admin user.
                            directory = currUserIdentity.Identifier.Split(new char[] { '\\' })[0];
                        var upnIdentity = string.Format("upn:{0}\\{1}", directory, VssUserToAddMailAddress);
                        // Next we'll create the identity descriptor for a new "bind pending" user identity.
                        var newUserDesciptor = new IdentityDescriptor(IdentityConstants.BindPendingIdentityType, 
                        // We are ready to actually create the "bind pending" identity entry. First we have to add the
                        // identity to the collection's licensed users group. Then we'll retrieve the Identity object
                        // for this newly-added user. Without being added to the licensed users group, the identity 
                        // can't exist in the account.
                        bool result = identityClient.AddMemberToGroupAsync(collectionLicensedUsersGroupDescriptor, 
                        userIdentity = identityClient.ReadIdentitiesAsync(IdentitySearchFilter.AccountName, 
                    Console.WriteLine("Assigning license to user.");
                    var entitlement = licensingClient.AssignEntitlementAsync(userIdentity.Id, VssLicense).Result;
                catch (Exception e)
                    Console.WriteLine("\r\nSomething went wrong...");
                    if (e.InnerException != null)
            private static bool Init(string[] args)
                if (args == null || args.Length < 2)
                    return false;
                if (string.IsNullOrWhiteSpace(args[0]))
                    Console.WriteLine("Error: Invalid accountName");
                    return false;
                VssAccountName = args[0];
                // We need to talk to SPS in order to add a user and assign a license.
                VssAccountUrl = "https://" + VssAccountName + ".vssps.visualstudio.com/";
                if (string.IsNullOrWhiteSpace(args[1]))
                    Console.WriteLine("Error: Invalid userEmailAddress");
                    return false;
                VssUserToAddMailAddress = args[1];
                VssLicense = AccountLicense.Express; // default to Basic license
                if (args.Length == 3)
                    string license = args[2].ToLowerInvariant();
                    switch (license)
                        case "basic":
                            VssLicense = AccountLicense.Express;
                        case "professional":
                            VssLicense = AccountLicense.Professional;
                        case "advanced":
                            VssLicense = AccountLicense.Advanced;
                        case "msdn":
                            // When the user logs in, the system will determine the actual MSDN benefits for the user.
                            VssLicense = MsdnLicense.Eligible;
                        // Uncomment the code for Stakeholder if you are using VS 2013 Update 4 or newer.
                        //case "Stakeholder":
                        //    VssLicense = AccountLicense.Stakeholder;
                        //    break;
                            Console.WriteLine("Error: License must be Basic, Professional, Advanced, or MSDN");
                            //Console.WriteLine("Error: License must be Stakeholder, Basic, Professional, Advanced, or MSDN");
                            return false;
                return true;
            public static string VssAccountUrl { get; set; }
            public static string VssAccountName { get; set; }
            public static string VssUserToAddMailAddress { get; set; }
            public static License VssLicense { get; set; }
  • Buck Hodges

    How to provide non-admins access to activity and job pages


    When we shipped TFS 2012, we introduced a feature in the web UI that makes it easy to look at the activity and job history. In the post I mentioned that you had to be an admin to be able to see the information. A question about this came up last week, and Sean Lumley, one of the folks who built the feature, pointed out there is a specific permission for this.

    The permission is called Troubleshoot and it is in the Diagnostic security namespace. These are not exposed in the web UI for setting permissions, so you have to use the tfssecurity.exe tool.

    Here’s an example command line that gives a TFS group called “diag test” the permission to see this info. Anyone added to the “diag test” group would then have access to these pages.

    C:\Program Files\Microsoft Team Foundation Server 12.0\Tools>TFSSecurity.exe /a+ Diagnostic Diagnostic Troubleshoot n:"diag test" ALLOW /server:http://server:8080/tfs


    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    The ALS Ice Bucket Challenge


    Yesterday, Brian took ALS Ice Bucket Challenge after being challenged by both Scott Guthrie and Adam Cogan. Brian then challenged me, James Phillips, and David Treadwell. I didn’t want to turn down a challenge from Brian. I happen to be in Redmond this week, so I thought why not do it with my team here.

    I mentioned it to Jessica, who is my great admin, and she then got a bunch of the DevDiv admins in on it (the level of excitement among the admins was through the roof). My whole day was booked, so I had no idea when I would do this. Then my 2 PM meeting got canceled. It was on!

    Then the admin team sent email to my team to solicit a volunteer to dump the ice bucket on me. That honor went to Sam Nuziale, dev lead working on our integration into the new Azure portal.

    We did it out behind building 18 with a bunch of my team there to take pictures and video. They had the ice and water sitting out for about 15 minutes ahead of time, so it was plenty cold.

    Of course, I named three more people. I’ve challenged Martin Hinshelwood (update: Martin’s video), Munil Shah, and John Cunningham. Let’s see if they’ll take the challenge in the next 24 hours. Good luck!

    Here’s the video: http://vimeo.com/103961763

    And pictures…
    buck_als_ice_bucket1 buck_als_ice_bucket2  
  • Buck Hodges

    Ten years of blogging


    This past week was the tenth anniversary of blogging here. Over that time I’ve written 560 blog posts. There’s clearly been a drop in my blogging frequency, unfortunately, in recent years. I’ve spent more time interacting with folks on Twitter over the last couple of years than on blogging because of the Visual Studio Online service. I started using Twitter to keep an eye on feedback and to look for problems that customers hit that we don’t know about (someday, I’d love that to be zero, but reality is that you can never have enough telemetry).

    I started at Microsoft in 2003, and when I discovered the very open blog policy at Microsoft (basically, use common sense) and that anyone could do it, I decided to give it a try.

    My first couple of posts were random, one on VC++ releasing a toolkit and another about Bill Gates visiting University of Illinois. Then I started writing posts about some of the issues I was working on and later really about how TFS version control worked, how to use the API, etc. It was a great way to get deeper technical information about the product out.

    I quickly found that the more thorough blog posts on technical details generated the most readership and the most responses from people. That feedback was great encouragement to continue to blog. As those of you who maintain blogs know, a good blog post with depth takes some time to write.

    As my role changed over the years, so did my blogging. I spent my first three years at Microsoft as a developer building TFS version control, then two years as dev lead of team build along with web access for part of that time. When I moved to dev manager, that was the start of writing far less code, but I was still contributing code to the product. I wrote at least one feature in every major release of TFS except for the TFS 2013 release. Unfortunately, I haven’t contributed more than fixing a few comments in the last 18 months or so.

    Being the tenth anniversary, I can’t help but look back on my top blog posts as measured by page views. Here are some of the top blog posts.

    • With 8.5% of total page views, my number one blog post was about removing Visual Studio setup projects. That’s certainly not what I would select as my all-time best blog post, but it clearly shows that if you blog about something controversial, it’ll get a lot of traffic. Since we’re on this subject, I’ll point out the new Visual Studio Installer Projects Extension that was released last month that adds support for Visual Studio setup projects in VS 2013.
    • My second most popular post was about the licensing changes we introduced with the TFS 2010 release. That post taught me to be very careful about posting on licensing. The reason is that our licensing is very complicated. It’s why there’s a sizable white paper on licensing that still doesn’t cover everything. What happened was people started posting licensing question (because it’s very confusing, of course). Well, I couldn’t answer many the questions, but I wanted to help people. That meant I was constantly sending questions posted on my blog to the licensing folks. I’I haven’t posted about licensing since then (i.e., blog about what you know).
    • The next couple are about a path already being mapped in a workspace and authentication in web services with HttpWebRequest.
    • The first one from 2012 (all of the others are older) is Visual Studio 2012 features enabled by using a TFS 2012 server. Given that this post has less relevance over time than the others, it’s interesting how popular it was (I don’t have a graph, but I imagine it’s tailed off quite a bit). This is an example of a post that we should have with every release.
    • Skipping to the end of the top ten is how to delete a team project in Visual Studio Online. When we originally released the feature, it was pretty hard to find. Now it is more apparent in the admin experience.

    Thanks for reading and for your comments over the years.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Azure Portal Preview and Visual Studio Online: Adding a user


    Today at the Build Conference, we showed a preview of the new Azure Portal, and that includes a preview of deep integration with Visual Studio Online. As with any preview, it has some rough edges and limitations. One of the limitations is that you have to create a new Visual Studio Online account through the new Azure Portal Preview. Your existing account will not work right now. All new accounts created through the new portal will be backed by Azure Active Directory. In the near future we will provide an experience to convert your existing Visual Studio Online account.

    This also means that adding a user is different. I’ll walk you through the current process to get you off and running.

    First, go to http://portal.azure.com to get started.


    Once you sign in, you will see the dashboard.


    In the lower left corner, click New and then click on Team Project.


    Next you will need an Azure subscription.


    Clicking Sign up will take you through the standard Azure subscription sign up process. If you have an MSDN subscription associated with the identity you signed in with, you will be able to choose that.

    Once you have your subscription set up, click on New, Team Project, and fill out the name of the Team Project. Then click on Account, Create New, and enter the name you want to use for your new Visual Studio Online account (ignore skyrise in the screenshot – I created that one earlier). Click OK and then the Create button.


    After a bit, you will have your new team project. At this point, you can start using it.


    Let’s say you want to add another user. Click on your team project’s tile, scroll down, and click on Users.


    Remember how I said that all new Visual Studio Accounts created through the new portal are backed by Azure Active Directory (AAD)? Now we need to set up AAD, as everyone added to your account needs to be in your directory. Click on the Use Active Directory shown under 2 in the screenshot above.

    You can just create a default directory.


    Click on Default Directory. Then click on the Users tab.


    I chose to add someone with a Microsoft Account (MSA, formerly known as a Windows Live ID).


    I chose to add the person as a User (not an admin).


    This MSA identity is now added to my Azure Active Directory. You can now add this identity as you would before. Let’s try it out creating a new identity that exists only in my Azure AD tenant – an Organizational Identity.

    This is where it gets interesting. I don’t need to have this person have or use their personal Microsoft Account. I can create their identity and control the lifetime of that identity. I as an admin can choose to delete that identity – just like I could with Windows Active Directory. Let’s do that.




    I chose to create the temporary password. Remember that – you will need it.

    Now that I have created a new user in my directory, I need to go to my new Visual Studio Account and add the person there. Once you go to <youraccount>.visualstudio.com, click on Users.


    I enter the identity that I created and click “Send Invitation.” Of course, sending an invitation doesn’t make much sense yet for this type of identity, but we’ll get that limitation addressed soon.


    Finally, I need to add my new user to the contributors group for my “Demo” team project. That follows the standard flow to add a team member.

    Now you are ready for that user to log in.


    Now I’m asked to add a real password – follow the standard Azure Active Directory flow for a newly created identity. Once I’m done with that, I have to create a profile on Visual Studio Online (same as all new identities), and then I’m logged in with my newly created identity that exists only in Azure Active Directory.


    This process is more difficult than it should be, and we are working to smooth it out.

    At this point, you now know how to add a user to your new Visual Studio Online account that is using Azure Active Directory.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Patch for issue with Visual Studio 2013 Queue Build dialog


    In Visual Studio 2013 and Team Explorer 2013 there is a bug that will cause the “What do you want to build?” combo box to be disabled and the entire Parameters tab to be blank. We have now released a patch to fix this bug: KB 2898341. This bug occurs when using VS or TE 2013 with a TFS 2013 server. We introduced the bug at the very end of the release when we made a performance optimization for builds on the server used by the Developer Division.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Visual Studio and Team Explorer 2013 no longer require IE 10 for installation


    When Visual Studio 2013 and Team Explorer 2013 were originally released, the installation process required that Internet Explorer 10 or newer was installed. Today we released updated installers that no longer require IE 10.

    You will get a warning at the beginning of your installation that looks like the screen shot below. For VS 2013 there is a KB article titled Visual Studio 2013: Known issues when IE10 is not installed that describes what the limitations are if you don’t have IE 10 or newer installed (the “some features” link in the dialog takes you to that KB article). The good news is that there aren’t many things that require IE 10.

    TE 2013 will work as you expect without IE 10. There are no limitations for Team Explorer when IE 10 is not installed.

    The updated installers are available from the Visual Studio Download page and from the MSDN subscriber downloads. [Update Nov. 13th: Due to a problem with the update to subscriber downloads, the new bits are only available from the Visual Studio Download page. This will be fixed in the next few days.]


    Follow me on Twitter at http://twitter.com/tfsbuck

  • Buck Hodges

    Updated Team Foundation Server 2013 download addressing web and installation path issues


    Today we have updated the TFS 2013 installation packages both on the web download page and in MSDN subscriber downloads. The reason is that we found two bugs that we wanted to address broadly. We’ve made changes to be able to catch these types of issues in the future.

    Here are details on the two fixes.

    Fixed: Red error box when using Code (version control) in the web UI

    In the case of upgrading from a prior version of TFS to TFS 2013 RTM, you will see a big error box that says “Error in the application” when using the Code section in the web UI (for example, viewing History). The reason is that we had a bug that was introduced shortly before RTM where the version number for the discussion service, which is the back end for the code review feature, is not set correctly (it was left as a 5 and should have been a 6). As a result, what was returned by the server was InvalidServiceVersionException. Users had reported this in a couple of forum threads (here and here) where we had provided a simple SQL script to fix the issue until this updated download was available.


    For anyone who has the original RTM installed (not the new release mentioned above) and has this issue, the fix from Vladimir will correctly address the issue, or you can contact customer support who will be able to help you. You would need to run this SQL script on each collection database. Please do not modify this SQL script or make any other changes to the TFS databases.

                FROM    sys.extended_properties
                WHERE   name = 'TFS_SERVICE_LEVEL'
                        AND value = 'Dev12.M53')
       EXISTS ( SELECT *
                FROM    sys.extended_properties
                WHERE   name = 'TFS_DATABASE_TYPE'
                        AND value = 'Collection')
        EXEC prc_SetServiceVersion 'Discussion', 6

    Fixed: Unable to install into a custom path

    When you install TFS 2013, you do not have to uninstall TFS 2012 – the installer will take care of it for you and preserve your settings and provides a much better experience for upgrading databases compared to a patch. This was a feature we introduced with TFS 2012 specifically for easy installation of the updates for TFS 2012. There was a bug in the original TFS 2013 RTM release where if your installation for your TFS 2012 installation did not have “11” in the path (for example, d:\tfs) that you would not be able to change path and the installation would leave your TFS inoperable if you went forward with the installation (if this has happened to you, contact customer support, and we’ll get it fixed for you).

    For both of these issues, we have learned from them and now have checks in place to catch them in the future.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Git and HTTPS with TFS 2013


    Philip Kelley has written a great blog post that explains using HTTPS with Team Foundation Server 2013. In it he explains basic auth and Windows auth among other things. If you are using a self-signed certificate, the post will tell you how to add it to your client so that git.exe will work.

    Git network operations in Visual Studio 2013

    Visual Studio 2013 includes in-the-box support for Git version control. That support includes the ability to perform Git network operations from inside of Visual Studio, such as cloning a repository, or performing a fetch, push, or pull operation on an existing repository.

    An endpoint for a clone, fetch, push, or pull operation in Git is called a “remote.” Visual Studio 2013’s Git support contains support for remotes using the following protocols.


  • Buck Hodges

    How to fix an error with the Windows 8.1 upgrade utility


    I had gotten a Windows 8 upgrade key with the purchase of a computer last summer. I hadn’t used it, so I wanted to upgrade a desktop my family uses. I ran the utility WindowsSetupBox.exe that you can download from the Upgrade Windows with only a product key page.

    However, it failed quickly after choosing either to run the upgrade or to download it to media. I didn’t write down the error message, but it wasn’t very descriptive – it just said that something went wrong. It failed so quickly, it seemed like it wasn’t really trying.

    So I downloaded SysInternals’ Process Monitor in order to see which files and registry keys the upgrade utility was using. Scanning through the output, I saw C:\users\<acct>\AppData\Local\Microsoft\WebSetup. I deleted that folder. After that, the upgrade utility started working as expected.

    I noticed the dates on the files were from January, and that’s when I remembered having started to upgrade the computer from Win 7 to 8 and later canceling. Apparently having the old data there was causing the upgrade utility to fail.

    I hope this helps anyone who runs into a similar problem.

  • Buck Hodges

    What’s installed on the build machine in Visual Studio Online?


    If you are using Visual Studio Online and wondering what’s installed on the build machine for your builds to use, we have a list here. Tarun Arora, one of our MVPs, put together a web site that shows a comprehensive list of what’s installed: http://listofsoftwareontfshostedbuildserver.azurewebsites.net/.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Team Foundation Service 2013 RC is go-live–use it in production


    Have you upgraded to TFS 2013 RC yet? It’s a supported, “go-live” release that you can use in production. We are using it ourselves in production our biggest internal server (I’ve written about that server before).

    Download TFS 2013 RC and upgrade.

    You can check out what’s new here (hint: the background image at the top of that page shows off the new work item charting feature).

    One thing I want to point out is that Internet Explorer 8 is not supported for TFS 2013. You can use IE 8 (we don’t block it), but you may encounter problems, and the issues will get worse moving forward (i.e., updates to TFS 2013) since we aren’t testing it. Brian announced it back in February here.

    Here are a few reasons you should upgrade.

    Git support – full git support on the server where you can use our git experience built into VS 2013, the add-in for VS 2012, Eclipse, Xcode, Git for Windows, and any other git client you want.

    Coding commenting in the web UI

    Agile portfolio management

    Work item charting

    Team rooms


    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    How to delete a team project from Team Foundation Service using the web UI


    [Update 18 Nov. 2013] It is now easier to get to the UI to delete a project. Navigate to the home page for your team project, then click on the gear icon in the upper left, and then you can click on the drop down arrow that will show up when you mouse over the name of the team project.

    You now have the ability to delete a team project from TF Service using the web UI rather than the command line (the command line still works, and you can find the instructions here if you need them). Unfortunately, this only applies to the service. In the future, we plan to have both team project creation and deletion be available from the web UI for the on-premises server, but we haven’t gotten to it yet.

    However, it’s a bit hidden. Here’s how to find it. First, go to the project home page for one of your team projects (you’ll need to be an account admin) and click on the gear icon in the upper right. It doesn’t matter at this point which team project you pick. Doing this will open a new tab with the administrative web page in it.


    Next, click on DefaultCollection in the navigation area in the upper left. Doing this will take you to a list of team projects.


    In the list of team projects, you will see a drop down arrow to the left of a team project name when you hover over the name with you mouse. Clicking on that drop down arrow will give you the option to delete the team project. Here I am deleting a team project called Awesome.


    Now you will be presented with a dialog to confirm deletion of the team project. Be sure you really want to delete the entire team project. The code, work items, etc. will all be destroyed on the server. There is no way to recover from this operation (it’s not a soft delete at this point – it’s in our plans for the future). So, be certain you want to delete the team project (and double check that the name in the dialog is the one you want to delete)!


    If you choose to delete the team project, you will see a progress bar. The time to delete a project depends on how large it is. For an empty team project, it will be done in seconds.


    That’s all there is to it. Just be careful that you don’t delete something you care about!

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    Team Foundation Server 2012.2 (aka Update 2) now available


    Today we released the second update for TFS 2012 (TFS, TFS Express). You will find a detailed list of features in this blog post. You need to install the one that matches your installation – full or express. You do not need to uninstall TFS 2012 RTM or TFS 2012 Update 1. Installing TFS 2012.2 will take care of uninstalling your previous TFS 2012 installation and then installing the new version. You also do not need to detach collections or anything else special. As always, you should make a backup of your server prior to upgrade (you do make and test backups regularly, right?).

    In this update, we preserve most of the configuration settings. This was a common complaint in the first update. We have a few more settings to preserve, most notably build machine settings, that we will address in the third update.

    The one feature I want to highlight is the compatibility with TFS 2010 build agents/controllers. I mentioned this in a previous post. This is one of those features that’s incredibly valuable but isn’t obvious – you have to know about it.

    This update process is completely new for the 2012 product cycle, and we learned a lot from our mistakes with the first update. Of course, our testing was even more thorough for this release.

    The biggest change we made was getting more users trying early releases of it. We did this because we realized that some of the issues we had to patch for the first update were due just to the fact that customers exercise the product in different ways, and the combinations are nearly infinite. In a regular major version cycle, we have at least one public “go live” release that is crucial to flushing out bugs. We made CTP3 “go live” for our MVPs. We made CTP4 “go live” for everyone. That helped immensely. Every issue that was reported to us we investigated and fixed.


    Follow me on Twitter at http://twitter.com/tfsbuck

  • Buck Hodges

    Using VS 2012 and getting “TF206018: The items could not be…” when creating a project using Git


    If you go to Team Foundation Service, create a new account, and then create a new team project that uses Git, you may receive the following error message when either you try to connect VS 2012 or use the “Open new instance of Visual Studio” from your project’s web page and try to add a new Visual Studio project to source control.

    TF206018: The items could not be added to source control because either no team projects have been set up, or because you do not have permission to access any of the team projects in the current collection.

    If you run into this situation, it is because Visual Studio 2012 does not know how to handle a team project that uses Git. The normal version control entries for the team project are not there, so when VS 2012 asks for $/<your team project>, the server sends back TF206018.

    To fix this, you need to install Update 2 for Visual Studio 2012 and the Visual Studio Tools for Git, as described in the topic Create a new code project.

    When the next major version of Visual Studio ships, the Git support will be built-in.

    Follow me on Twitter at twitter.com/tfsbuck

  • Buck Hodges

    How to see activity and job history in TFS 2012


    [Update 9/15/14] Here you can find permissions for these pages.

    [Update 4/24/15] Added information about filtering job history by result.

    With TFS 2012, we introduced a new feature in the web UI that makes it easy to look at the activity and job history on your TFS 2012 server (this feature was first introduced in 2012 RTM). Before the 2012 release, you would have had to look at things like tbl_Command directly in the database to get this information.

    To see this feature, just go to http://<yourserver>/tfs/_oi if you have admin privileges.

    Grant Holliday has written a great blog post, TFS2012: New tools for TFS Administrators, that walks you through the feature. One thing that I’ll call attention to is for folks not used to looking at the info in tbl_Command, you’ll need to make sure you notice the Execution Count in the web UI, which we use to record a sequence of the same calls without writing a row per call. Grant explains it in his post, but it’s easy to over look.

    One additional tip is when you are using the Job History page, if you want to be able to see successful jobs (not normally shown on the Job History) or want to be able to see only a certain job outcome, you can add &result=N to the URL where N is one of the following integers.

    • Succeeded = 0
    • PartiallySucceeded = 1
    • Failed = 2
    • Stopped = 3
    • Killed = 4
    • Blocked = 5
    • ExtensionNotFound = 6
    • Inactive = 7
    • Disabled = 8
    • JobInitializationError = 9

    Here’s an example to see successful executions of one particular job on my server: http://buckh-dev:8080/tfs/_oi/_jobMonitoring#_a=history&id=95593a11-ecab-4446-b129-07cd21dac1e0&result=0

    Also, if you are interested in what jobs run and when, check out TFS2012: What are all the different Jobs built-in to TFS?

    Here are a couple of screenshots to whet your appetite.

    Screenshot of TFS Activity Log Web Interface




    Follow me on Twitter at twitter.com/tfsbuck

Page 1 of 23 (573 items) 12345»