Today, we got back a pile of data from a recent card sort exercise.

We brought in 17 Word users and 9 Excel users and gave them a huge stack of virtual "cards" containing the name of a command and a short description of what the command does.  They were also given the proposed names of Ribbon tabs (both core tabs and contextual tabs).  The subjects were asked to stack the commands where they think they belong solely based on the names of the tabs and the commands.

Beyond that, the subjects weren't given any other instructions and they weren't allowed to see Office 12 or the new UI at all.

What we got back were these wonderful huge Excel spreadsheets, each one containing 20 worksheets chock full of raw data.  I haven't had time to go through it in detail yet, and, in fact, the usability team itself hasn't had time to fully analyze it either.  It's hot off the presses.  But still, I can't resist looking it over and making some rush judgments.


These are not the cards we use in usability tests

There are always some things in usability data that make you scratch your head.  You think to yourself "really, someone thinks Check Spelling belongs on the Format Picture tab?"  Do you ignore that data?  Assume there's a bug in the test?  Chalk it up to disinterest on the part of the subject?  Or take it to heart, following the usability principle "the software's wrong, not the user."

Overall, tab categories were scored based on two criteria:

  • Number of "Errors of Omission": how many commands should have been placed in a tab but were instead placed somewhere else.
     
  • Number of "Errors of Inclusion": how many commands people placed in a tab that were supposed to be somewhere else.

Some tabs scored marvelously (above 80% correct with few errors of inclusion), a few did less well (less than 50% correct.)

Interestingly, the first tab of each application, which is designed as a kind of efficient clearinghouse for the most-used commands, scores poorly in a blind test because the current tab names ("Write" for instance in Word) are not descriptive.  I tend to take that data point a bit less seriously because we've seen people be successful using the first tab over and over.  Because it feels like "home", it seems to matter a bit less what we name it.  (There was a time when we thought about naming the first tab "Home" or "Start" or even the program name.)

So, there's tons of data to look through here, and I can't wait to see what we learn from it.

What are the actions we could take based on the card sort data?  There are at least three different possibilities.  We could decide that a tab name is not descriptive enough and try out different names for it.  Or, we could decide that our organization isn't fitting with the way people think and shuffle things around accordingly.  Or, we could use other kinds of tests to explore a particular aspect of the results from a different angle, trying to validate or invalidate the need to take action.

Honestly, I'm feeling pretty good about most of our organization although we do continue to move things around.  Finalizing the tab names will be a meaningful process, and I'm already feeling a bit of the fear of commitment.  I want to live up to the legacy of the people who came up with "File, Edit, View, Insert, Format, Tools, ..." and make sure that we ship the best names and feature organization possible.