Welcome to MSDN Blogs Sign in | Join | Help

Hi All,

A quick update -- my laptop's hard drive failed on my way here. It should be fixed later today and then I will have several posts ready for your consumption.

Cheers!
Chris

 

 

This posting is provided "AS IS" with no warranties, and confers no rights.

I'll be at SpeechTek in New York from August 1-4. Stop by the Microsoft Speech Server booth and ask for me if you'd like to chat.

I'll be blogging my experiences at SpeechTek so keep watching.

Cheers!
Chris

This posting is provided "AS IS" with no warranties, and confers no rights.

One of my tasks is to identify the reference applications that will ship with the next version of Speech Server. Before I prejudice the jury with how we are thinking about reference applications I'd like your feedback. This will be an on-going discussion and I expect to make many posts about this as I hear from you.

Reference applications are sample applications that ship with the product to demonstrate a particular feature set. The code is free for download and review, and we step people through how to create the application in a tutorial format.

Some sample applications that shipped with the v1.x product:

Speech-Enabled Fitch and Mather Stocks Application (FMStocksVoice)
Speech Application SDK -- contained a number of sample applications.

So, to get started, what do you want reference applications to do? How would they best help you? What is the gold standard of reference applications for you? If you were Speech Server, what would be the best reference applications you would develop?

I want to hear from everyone: developers, manager, executives, evaluators, sales, etc. If you don't want to respond publicly on the thread please feel free to contact me directly using the contact link above this post.

Let me say this now: I can't take the covers off our plans for the next release. As we publicly announce details I'll be sure to include them into this conversation and this blog.

Cheers!
Chris

This posting is provided "AS IS" with no warranties, and confers no rights.

Friday’s Wall Street Journal contained an article by Mr. Wagstaff (available here to wsj.com subscribers) which, I believe, clearly issued a challenge to you. The article discusses the advancements and challenges of voice recognition by focusing on the dictation scenario. Mr. Wagstaff acknowledges that voice recognition is quite good and focuses on the challenge of having voice recognition interact with people. Here is what Mr. Wagstaff had to say:

You see, it turns out that the problem with speech recognition isn't recognizing what you're saying. The problem is interpreting what you want the computer to do beyond that. (Speaking Up for Voice Recognition, Wall Street Journal, July 15, 2005)

Your challenge is summed up in that pair sentences.

Voice User Interface(VUI)design is the art of designing and integrating a speech interface that "anticipate[s] the needs and preferences of the user and conform[s] to the user's mental model of the domain and of spoken language." (Dr. Hura, Heuristics: Lessons in the Art of Automated Conversation) Or, as I like to think about it, it's the “say what?” model. If in a conversation with you about vintage cars I suddenly ask you what time is high tide Seattle you might respond with “say what?” I switched context and you had to quickly interpret what I wanted (you could also say I was a poor conversationalist). It's a challenge of context.

One of the most effective ways we've found of defining context is the core set of scenarios that an application must support. What scenarios are you enabling? What happens if users step outside of those scenarios?

Imagine that you are speech enabling a store locator application. One core scenario you want to enable is: user telephones in, and in three prompt steps is able to find the store closest to them. There could be other scenarios you would like to enable as well: step by step instructions as you travel to the store, a call transfer to an employee of the store for additional details, store item query for a certain item, etc. What are the steps that you take? How would you set the context for the user? What questions would you ask the user that resulted responses that you are prepared to act on? For instance your store application asked: "What time would you like to pick up that item?" Some possible responses: 4 o'clock, in 30 minutes, 1600 hours, this afternoon. What if the response is: "when I'm ready."

My point is that while speech recognition has gotten quite good your challenge is to design a Voice User Interface that enables speech technology to its fullest potential and addresses your key user scenarios.

VUI is a design discipline has entered the mainstream and takes its place as a peer alongside visual design, engineering design, information design and the others. Like the other design disciplines there is technique and art to VUI design. There is an opportunity here: the percentage of people in organizations who are good at VUI and evangelizing speech technologies is very small.

As developers, managers, executives and educators you are tasked with developing your vision for speech enabled applications. Here are some suggested steps:

  1. Identify the right individuals in your organization who will become your leaders in VUI design. Please remember that this expertise spans individual disciplines. For instance, developers might be focused on the tools they needs, the application architectures required to enable the scenarios, error handling, performance, grammars, prompts, etc. Product managers might be focused on the key scenarios, market requirements, business rules, context requirements, etc. What skills do your students, employees and managers need to effectively leverage speech technologies?
  2. Don't bolt-on speech without making VUI design a top priority. For instance, you are speech enabling a preexisting web form -- the appropriate VUI may be different from the form list. For instance, if the form has three text input boxes: size and type. Your speech-enabled application could have users answer in such a way that all three web form boxes could be filled out automatically, such as "I'd like a small pepperoni pizza.” The visual fields could be automatically populated with the values, small and pepperoni. Don't just add-on speech. The chances are very good that you'll miss the right context for your users. Make yours a speech-enabled, not speech-disabled, application by setting the right VUI.
  3. Evaluate your business and service model. How can speech-enabled applications help your organization? Are there obvious opportunities in your organization where speech-enabled applications might save time and money or enhance services?

You are setting the context for your application. What do you want your application to do after the user says hello?

Cheers!
Chris

Additional Resources:

VUI Design
Heuristics: Lessons in the Art of Automated Conversation
Voice User Interface Design - Purpose and Process
Best Practices in Designing Speech User Interfaces
Six Steps for Creating a Speech Recognition Application or Speech-Enabling Your DTMF IVR

Business Value
Speech Server Case Studies

This posting is provided "AS IS" with no warranties, and confers no rights.

I heard from a number of you after my welcome message. Thanks very much for welcoming me to the blog-o-sphere. I have enabled comments on my posts which was a popular suggestion. Talk with you soon.

Cheers!
Chris

 

This posting is provided "AS IS" with no warranties, and confers no rights.

Hi All,

A bit about me. I’m a Program Manager in Microsoft Speech Server. I see speech as the next enabler of technology: a means to make interfacing with technology easier and simpler every day. This blog, I hope, will get you thinking about Speech enabling your technology and be a venue for you to teach me what Speech means to you.

Please feel free to contact me and join the blog discussion.

Cheers!
Chris

More Posts « Previous page
 
Page view tracker