Friday’s Wall Street Journal contained an article by Mr. Wagstaff (available here to subscribers) which, I believe, clearly issued a challenge to you. The article discusses the advancements and challenges of voice recognition by focusing on the dictation scenario. Mr. Wagstaff acknowledges that voice recognition is quite good and focuses on the challenge of having voice recognition interact with people. Here is what Mr. Wagstaff had to say:

You see, it turns out that the problem with speech recognition isn't recognizing what you're saying. The problem is interpreting what you want the computer to do beyond that. (Speaking Up for Voice Recognition, Wall Street Journal, July 15, 2005)

Your challenge is summed up in that pair sentences.

Voice User Interface(VUI)design is the art of designing and integrating a speech interface that "anticipate[s] the needs and preferences of the user and conform[s] to the user's mental model of the domain and of spoken language." (Dr. Hura, Heuristics: Lessons in the Art of Automated Conversation) Or, as I like to think about it, it's the “say what?” model. If in a conversation with you about vintage cars I suddenly ask you what time is high tide Seattle you might respond with “say what?” I switched context and you had to quickly interpret what I wanted (you could also say I was a poor conversationalist). It's a challenge of context.

One of the most effective ways we've found of defining context is the core set of scenarios that an application must support. What scenarios are you enabling? What happens if users step outside of those scenarios?

Imagine that you are speech enabling a store locator application. One core scenario you want to enable is: user telephones in, and in three prompt steps is able to find the store closest to them. There could be other scenarios you would like to enable as well: step by step instructions as you travel to the store, a call transfer to an employee of the store for additional details, store item query for a certain item, etc. What are the steps that you take? How would you set the context for the user? What questions would you ask the user that resulted responses that you are prepared to act on? For instance your store application asked: "What time would you like to pick up that item?" Some possible responses: 4 o'clock, in 30 minutes, 1600 hours, this afternoon. What if the response is: "when I'm ready."

My point is that while speech recognition has gotten quite good your challenge is to design a Voice User Interface that enables speech technology to its fullest potential and addresses your key user scenarios.

VUI is a design discipline has entered the mainstream and takes its place as a peer alongside visual design, engineering design, information design and the others. Like the other design disciplines there is technique and art to VUI design. There is an opportunity here: the percentage of people in organizations who are good at VUI and evangelizing speech technologies is very small.

As developers, managers, executives and educators you are tasked with developing your vision for speech enabled applications. Here are some suggested steps:

  1. Identify the right individuals in your organization who will become your leaders in VUI design. Please remember that this expertise spans individual disciplines. For instance, developers might be focused on the tools they needs, the application architectures required to enable the scenarios, error handling, performance, grammars, prompts, etc. Product managers might be focused on the key scenarios, market requirements, business rules, context requirements, etc. What skills do your students, employees and managers need to effectively leverage speech technologies?
  2. Don't bolt-on speech without making VUI design a top priority. For instance, you are speech enabling a preexisting web form -- the appropriate VUI may be different from the form list. For instance, if the form has three text input boxes: size and type. Your speech-enabled application could have users answer in such a way that all three web form boxes could be filled out automatically, such as "I'd like a small pepperoni pizza.” The visual fields could be automatically populated with the values, small and pepperoni. Don't just add-on speech. The chances are very good that you'll miss the right context for your users. Make yours a speech-enabled, not speech-disabled, application by setting the right VUI.
  3. Evaluate your business and service model. How can speech-enabled applications help your organization? Are there obvious opportunities in your organization where speech-enabled applications might save time and money or enhance services?

You are setting the context for your application. What do you want your application to do after the user says hello?


Additional Resources:

VUI Design
Heuristics: Lessons in the Art of Automated Conversation
Voice User Interface Design - Purpose and Process
Best Practices in Designing Speech User Interfaces
Six Steps for Creating a Speech Recognition Application or Speech-Enabling Your DTMF IVR

Business Value
Speech Server Case Studies

This posting is provided "AS IS" with no warranties, and confers no rights.