I’m Bruce Forstall, a developer on the Visual C++ IDE team. I currently work on the implementation of the CLR CodeDOM API for C++/CLI, which supports the WinForms designer as well as a number of other scenarios. Previously, I’ve worked on the Itanium code generator, the Java virtual machine team, and the Cairo OS project (anyone remember that?).
I thought I’d write about an internal tool named Gauntlet that we depend on for our development work.
Before Gauntlet, we had a number of problems. People often wouldn’t do all the pre-check-in validation builds or test runs they should have, or they would do them in an environment that wasn’t up-to-date. Conflicting changes from different people made on the same day would cause builds or tests to break because of integration problems. And in Visual Studio 2005, we started supporting three different architectures (x86, Itanium, x64). This meant we needed to do a lot more per-architecture testing, and not everybody had every type of machine in their office. Gauntlet solved these problems and more.
Gauntlet is a tool designed to improve the everyday quality of the product by performing a large amount of testing of each and every change before those changes are committed to the source control system, and made visible to other developers, testers, and our daily process. Each change, from every developer, is serialized. Thus, all testing happens on each change in sequence, detecting integration problems immediately. If a change passes all testing, it gets checked in. If not, it gets rejected and no damage is done to the “live” source base. Everyone can always assume the “live” source tree is good, because Gauntlet has validated every check-in. Gauntlet also tries to simplify life for developers by automatically sending email with information about the change and updating our bug database.
The system consists of a website user-interface, a SQL server for storing pending and completed check-in information and other data such as usage metrics, a Gauntlet server process, and a large number of “slave” machines for performing builds and tests. This is a key to the system: as much as possible is run in parallel, to maximize throughput as well as increase the amount of testing that is performed. We have a number of Gauntlet systems on the team, for different groups. The largest uses about 60 machines, comprising some of each of the three processor architectures. Each check-in takes between an hour and two hours (depending on the type) and runs hundreds of thousands of test cases, not only for each of the three already-stated processor architectures, but for code generator check-ins, the x86-hosted cross-compilers for Itanium and x64 are tested, as are compilers built from the same code base for several Windows CE processor architectures.
Gauntlet originated in the DHTML team. I rolled Gauntlet out to the Visual C++ team over five years ago. Initially, the change in process was resisted, but now the system is viewed as indispensable. In that time, our original Gauntlet has processed almost 15,000 submissions. On average, about 20% of the submissions fail due to various reasons, such as a developer not testing a configuration or variation that Gauntlet tests. That means that about 3,000 faulty check-ins have been prevented from entering the “live” sources. That represents at least 3,000 bugs that didn’t have to be detected sometime later, the faulty check-in found and backed out or fixed. And that means our everyday quality has been higher.
There are several simple lessons from our Gauntlet experience. First, testing a change as much as possible before committing the change is incredibly valuable. Enforcing that testing regimen, and enriching it by greatly expanding the testing performed, makes it even more valuable. Secondly, setting up and maintaining a system this large, simply from a hardware perspective, is costly and time-consuming, but we deem it to be absolutely worth the cost.
Is Gauntlet a Borland program? They have one product with same name (http://www.borland.com/downloads/download_gauntlet.html) or is this program created by internal programmers?
> and the Cairo OS project (anyone remember that?).
Indeed. The O-O cure to all that ails you, by promising features that Vista **stilll** won't deliver, more than (how long has it been?) 14 years later. I thought that WinFS was supposed to put us back on the road to Cairo, but apparently we have been detoured yet again.
Anyway, I don't think many people in the user community shed many tears when most of it never came to pass. As I recall, when we heard all these grandiose pronouncements about what lay in store under this next-generation OS, most folks snidely said, "Yeah, right!", and went back to cranking out more plain old C under UNIX...
Gauntlet sounds like good work! Hopefully developers such as yourself who are actually driving quality are able to manage AWOL executives. During the past year, I've been pretty critical of the horribly lacking quality of VS 2005 MFC & C++. Hopefully initiatives such as this will help redirect VS quality back up.
Is there any word on VS 2005 SP1 Beta 2? Do you know of any upcoming announcements regarding SP1?
Thanks again for your good work.
When working on a new version with large changes to existing functionality how do you work the process through to update Gauntlet?
Does Gauntlet get disabled at certain parts of the development cycle? Do developers update Gauntlet or do test engineers maintain the test framework accounting for new changes to your applications?
Sounds pretty cool, but like any automated testing framework there is almost always more maintenance overhead than most organizations plan for.
1. Thiago: The Gauntlet described here is not a Borland program: it is a completely internally-developed Microsoft tool. The name "Gauntlet" refers to the sense of "a severe trial" by my dictionary. Or, "running the gauntlet", where your code is submitted to severe punishment (by the tests) and it hopes to make it to the end of the trial successfully. But that might be stretching the analogy a bit far.
2. Wil: Cairo certainly dreamed big. I guess I could defend it for that, if nothing else. And in the end, quite a lot of the pieces, or at least ideas, of Cairo have shipped in various forms (the Win95 UI, the Windows Index service, Active Directory, etc). The OFS/WinFS/???FS piece might be the last large piece to go.
3. jamome: I'm sorry you haven't had a good experience with the quality of VS2005. Gauntlet was a part of the entire development cycle for both Visual C++ as well as most other teams in Visual Studio. However, it is only as good as the tests that it runs. We've done a lot of work since VS2005 shipped on the tests we've got. I don't know about SP1 beta 2 -- my advice: stay tuned on MSDN.microsoft.com.
4. visionep: Regarding changed functionality during the development cycle: We never disable Gauntlet, and very rarely disable tests that are run in Gauntlet. As for who maintains the tests to account for changes, it is generally a team effort between the dev and test organization. You do allude to something I didn't describe, which is the maintenance cost of this system. It is fairly heavy, and more than most people expect. The primary cost is just the maintenance of a large number of machines with OS patches, hardware failures, and lab issues (cooling, power, KVM, network, etc). Then, when developers change functionality that affects tests, especially those tests run in the Gauntlet system, those tests need to be updated as well. There's also a significant cost if there is any flakiness in the system, including the basic system itself and in any test. We get about 70% machine utilization almost 24/7, so you can imagine that any test that fails even 5% of the time wreaks havoc. We work really, really hard to any kind of flakiness.
You mentioned checkins. What does MS use for Version Control? Certainly not SourceSafe otherwise it would have improved over time.
A couple of new posts have appeared on http://vcblog : Bruce Forstall talks about Gauntlet, an internal
I know for sure that MS do NOT use VSS for version control.
They might be using PVS.
We use SVN and it would be GREAT is MS provided out of the box SVN capabilities in Orcas.
Internally, we mostly use a thing called Source Depot. It's an internal tool that I think this may have been an external product at one point that was purchased. Many groups use the Team Foundation source control in Visual Studio Team System. In the past, there were some projects that used VSS, but mostly another internal tool, from the dawn of time, called SLM (which I think also was semi-productized pre-VSS).
Is this type of integration suitable for a team consisting of 15 developers as each check-in takes 1-2 hours of time.
Why does anyone care about your internal program? I'm missing the point of this conversation.
>>The Gauntlet described here is not a Borland program
Are you really sure?
Borland's Gauntlet does exactly the same as the Gauntlet you're describing does. http://www.borland.com/us/products/silk/gauntlet/index.html