It’s been a while since I last talked about our work on TFS on Azure and I’ve been feeling like it’s time to give some kind of an update.  We made a bunch of new ALM announcements about our ALM roadmap at TechEd and we didn’t say too much about hosted TFS, so you may be wondering what we’ve been up to.

We’ve continued to work hard on it along with the rest of our V.Next work.  As you’ll recall, our initial port to Azure was pretty quick and easy.  Azure – the technology stack isn’t all that different from the standard Microsoft on premises stack so it’s not all that hard.  The hard part is all the work after getting the app up and running to really make it an internet scale, cost effective, manageable service.  That’s what we’ve been working on for the past several months.

The big news, in my book, is that, technically speaking, TFS on Azure is up and running now.  We deployed it in mid-April and have been operating it since.  It’s not yet publicly available – we have a roadmap for gradually increasing the scope of people using it.  And it’s not done, for instance, we haven’t hooked up any billing infrastructure to it so we’ve still got a ways to go before anyone could call it “released”.  But the reason I say it is “up and running” is that we’ve deployed an instance in one Azure data center and it will stay up “forever”.  From here on out, we will treat it like a cloud service – updating it every few months, never losing any data and rarely having any service disruption.

At the moment, we have about 100 accounts and about 200 users on the running TFS instance.  Most of the users are people on my team but a handful – about 15 accounts, are the earliest wave of TAP customers (early – pre-“Beta” adopters”.  It’s a small start, but we have a plan to increase adoption every month as we continue to build out the infrastructure and user experience.  Expect to hear more later this year about increasing availability.  For now, I’m not able to make accounts available to people who want to try it – we’ve already got a backlog of hundreds of customers who want to join the TAP program and we’ll be working through that list.  Once we’ve got some good experience operating the service with a decent load of projects and users, we’ll begin to open it up for more public access.  Stay tuned.

We are working towards operating this as a real mission critical service and while our monitoring and operations are not entirely in place, overall it’s been good.  Our availability for the last 24 hours is 100%, last 7 days is 99.99% and the last 30 days is 99.98.  We had a few hiccups the first week we rolled it out.  We had one operator error where out DNS entry was accidentally deleted (oops).  A general rule for services are “people are your biggest source of error”.  Don’t require anything to be done by a person.  Everything should be automated and repeatable.  We also had a dependent service go down for a little while that made our service unavailable.  Another general rule of services is every service should continue to function regardless of the health of the services it relies on.  But as you can see from the availability data, it’s getting better week by week.

It was a VERY exciting day when we deployed the first TFS on Azure instance.  We had a nice barbecue lunch for the team to celebrate :)  I can’t wait to make it available and to have an easier way to deliver value on a more regular basis than we can with shipping boxed product.

We’ve just finished feature development on the next wave of TFS functionality (you saw some of it if you were at TechEd) and are about to begin the process of pre-production testing that will ultimately lead to a live upgrade of the TFS on Azure service – very exciting!  Stay tuned for more…

Brian