A couple of us on the UI team were having a conversation about the comments to yesterday's post.  Someone pointed out that it appears we've discovered a community on the internet of people just as obsessed about UI as we are around here. :)

I wanted to build on yesterday's conversation by helping you to understand more about how we measure results.

When you boil it down, we have a pretty straightforward set of high-level goals for the Office 12 UI redesign.  Help people use more of the functionality in Office.  Help people create better-looking, richer documents.  Save people time in doing the tasks they frequently do.  Make sure people can be productive right away using the new version.  Help normal people get results that only power users could get before.  Give power users the a richer set of powerful tools to go beyond what was possible before.  Stuff like that.

So, how do you know when a feature is right?  There are so many tools at our disposal.

The first line of defense is the people who work on the product.  I install a daily build of Office literally every day.  As soon as a new piece of functionality is in, you have to try it out and get that visceral feeling about "does it feel right?"  Sometimes, no matter how good your spec seemed or your prototypes were, the first time you play with it you know it feels wrong.  You never get a second chance to form that first, instant impression of "is it right or not" and I put a lot of value in my initial impression.

Of course, there are way more people than me using interim builds.  And included in that set are the crustiest elite power-users in the world: Microsoft employees.  You would think that Microsoft employees would be open to change and sympathetic about "work in progress", but some of the most advanced Office users in the world work here, and if you get in the way of their productivity, you're going to hear about it.  These people represent the upper crust of people who use a cross-section of just about every feature in Office.  So they provide a constant stream of opinions representing people who are already experts in using previous versions of Office.

We do usability tests, as I've talked about many times.  The test subjects range literally from people who have no experience with productivity software at all to experts who make a living writing Office add-ins.  We do tests in Redmond, of course, but we also do remote testing in sites all around the United States and in our labs in Europe and Asia.

A common misconception is that a usability test is all about data--that we receive a 100-page report full of graphs and tables and we average the data and it makes design decisions for us.

It's not actually the raw data though that makes usability so compelling.  Most of the time, it's the "a-ha" moment you have in watching someone with a different background and way of thinking from you use the software.  Often times, within the first 5 minutes you can see that you've failed and you don't need a sheet of data to tell you that.  It's a humbling experience to sit and watch people struggle.  And your job as a PM is to figure out why and to fix it.

Yesterday's story about "Eat Dismiss Clicks" is an example of this.  One can argue the theoretical implications of focus issues until the cows come home, but watching people all around the world, of all different skill levels, fail again and again in the same way tells you the design is wrong.  When you repair the design and then see the same diversity of people succeed at the same tasks, you know you've done the right thing.  It's not about some computer spitting out data, it's about watching the experiences people have interacting with the software.  Watching their faces, hearing what they have to say.

Data from usability comes into play when answering questions like "what features are the hardest to find" or "where does it take people longer to do something than it used to."  For instance, we can benchmark how long it takes people to make particular kinds of documents and see where people do great and where they struggle.  We can then take a more in-depth look to see why people are struggling in certain areas and make improvements.

The next source of feedback are all of the many people who have been using the product over the last six months.  We have MVPs and a program of technical professionals who have been giving us feedback during all this time.  For the last three months, we've been receiving feedback from thousands of beta testers.  We have rollouts of Office 12 in businesses in three continents, having people use it every day to get their job done and telling us what works and what doesn't.  Having thousands of vocal Office users providing a constant stream of feedback gives you a good idea of what people like and where parts of the product need some more thought or some more work.

There's all of you here on the UI blog as well, and people writing about Office 12 all over the internet.  We read the things you write, noting what you like and the questions you have.

And of course, the last piece--yes, we do have data.  Through the Customer Experience Improvement Program, we can look at aggregated statistics about what features people use, how they use them (keyboard, mouse), when they tab switch in the Ribbon, and a lot of other things.  This provides a general "heartbeat" of how the overall project is doing and complements all of the anecdotal feedback we get from people using the product.

So, how do we measure results against our high-level goals?  We have to synthesize all of those inputs.  Yes, there is a ton of data.  We get bushels of anecdotal opinions.  We talk to partners and beta users.  We watch people use the software here and in their place of work.  We watch our parents use it.  We watch our children use it.  We talk to people at the grocery store, or on the airplane.  We look at what you write here.  And we try to stay true to our design tenets around simplicity, efficiency, predictability, respect for screen real-estate...

All of these things combine to provide the true picture of how we measure success.