Yesterday morning we were sitting in the office of one of our usability researchers watching some DVCAM tapes from tests conducted a few weeks ago.

We had a discussion that got me thinking about a set of tests we ran several years ago to determine the discoverability characteristics of contextual tabs.

At the time, contextual tabs were struggling in the usability lab. The visuals and triggers were not obvious enough, and even when people noticed them, the tabs looked so different from normal tabs in the UI that participants thought they were decorative or unactionable.

We kept iterating and iterating on the design, and one of the desperate ideas we had was to pop up a little yellow balloon the first time a contextual tab set appeared saying something like "Hey you, contextual tabs have appeared, you better click here get to the tools for working with your table."

(I'm sure the real wording was a lot more Microsoft-esque.)

Anyway, we wrote a little app to enable us to pop up the balloon at the right time--but it was a totally manual process. We had two keyboards hooked up to the usability computer, and when the contextual tabs appeared, one of us in the back room would press F10 on our keyboard to make the balloon appear. So the timing was a little weird, but it was cheaper than building the feature directly into the product itself.

The balloon wasn't the only change to the design in the new build being tested, however--we also tested improvements to the visuals and the triggers that activated the contextual tabs as well.

The result of the tests? The usability characteristics of contextual tabs improved dramatically.

But now we had a quandary: which improvements precisely had caused the uptick in usability? The balloon? The substantive changes to the interaction model? The clearer visual design?

One could imagine a world in which we ran controlled, double-blind studies to test the impact of each element of the design separately to assess the best possible combination.

In reality, though, we tend to use an iterative process in which we bring an entire design to a next level and then (if the design is successful) figure out which are the non-critical parts of the improvements. The advantage to this process is that it lets us move faster and abandon bad ideas sooner.

In this particular case, we felt kind of icky about the balloon, so we decided to run another set of tests to see how much not showing it changed the results from the previous successful test. It turned out that the test results didn't change at all mathematically; the usability of the feature was being impacted much more by the substantive changes to the design than by the notification balloon.

Developing a contextual tab design that worked well took well over six months of concentrated iterations, followed by tweaks over the last two years or so as we continued to make progress on the design surrounding them.

The biggest reset recently was when we introduced new visuals for Beta 1 Technical Refresh last winter and we had to reevaluate usability of the entire UI based on the new look.

Some of the most interesting studies we did were eye tracker comparison tests which enabled us to see how and where the new visuals affected the scanning pattern of the UI. It turns out that moving group labels to the bottom of each group in the Ribbon, for instance, helps people target the control they're looking for a bit faster than in the Beta 1 visuals.

So, could we apply an even more incremental method of usability confirmation to more fully test each element of a design change in isolation?

Perhaps, but a design is much more than the sum of its parts, and the usability of one piece always has to be weighed against the usability of the overall product. This is where art meets science.

There's a talk I give to program managers internally at Microsoft in which I present a 100% guaranteed way to improve the discoverability of a fictitious "Send via Telegraph" feature in Word:

You can't evaluate the usability of just one feature or component of an overall design without understanding its impact on the entire product.

Good design is the art of balance.

It's an art that can be infused and informed by scientific rigor, but in the end it's still an art.