Welcome to MSDN Blogs Sign in | Join | Help

The shouting match

Does shouting at a speech recognition system make it work better?

Here's a good example from the front lines - a frustrated human shouting Main menu! at the top of her voice (audio via her sister-in-law). But I bet the recognizer didn't get it. To cut an interesting story unjustly short: speech recognition systems are trained on the acoustic features of "typical" speech, i.e. people talking normally. Shouting distorts those features, not just in terms of higher amplitude, but also in all sorts of other ways, phonetic and prosodic.

So only if the underlying engine had been trained largely on irate speakers would she have had a better shot at recognition by shouting. But sufficient amounts of shouted data are probably not commercially available, and even if they were, an engine trained on that data would produce so many misrecognitions for people with sweet tempers that those people would probably soon turn bad and become dangers to society.

But then again, what's a human to do here? The machine doesn't recognize your polite, restrained voice, even after a courteous repetition or two. You have to try something else, and your only tool is 8 KHz audio and a two- or three-second window. Wait... it's a computer. And when you're training the dog and Rover doesn't get it the first couple of times, shouting actually does work some of the time. Plus you've read somewhere that computers aren't even as smart as dogs, so why not shout even louder?

Or is it more about making the machine suffer? You were gracious enough to grant the system a response to its question - even though it's a machine - and you gave it your clearest articulation, twice, and now it's trying to tell you that you can't even speak your own language! The cheek - got to teach that machine a thing or two. Main menu!  That's better. Always good to offload a wad of phonetic and prosodic distortion down the line at high amplitude. One up for the humans!

Or was it a cry for help? Stay with me, now. Frustration is often about the lack of control. Different people have different ways of reacting to machines that don't do what they want them to do (when was the last time you swore at your hardware? Or operating system? ;-) Sometimes it even gets physical. Except you can't get physical with a machine at the other end of the phone line, your options are much more limited. All you've got is your voice, so shouting is the only way to express the anguish of powerlessness. Right?

Enough dodgy psychology. This particular story ends nicely - the company behind this auto-attendant obviously keeps an eye on their data and they even redesigned the system on the basis of feedback from someone's Dad.

And back to the original question: does shouting at a speech recognition system make it work better?

The answer is no, not immediately. It will undoubtedly make it worse (but it may make you feel better). However, the answer may also be yes in the long term. If someone behind the scenes is watching the logs and listening to the audio, you could be helping to make the system better for the next person. (Was "main menu" in the grammar then? Maybe it is now.) Deploying the application doesn't mean that the company's job is done, far from it. If some of the time spent on application design was devoted to closing the feedback loop of user input after initial trials and deployment, there would be a lot less shouting, and a lot more happy customers. There's no data like angry data.

Published Monday, May 22, 2006 5:10 PM by Stephen Potter
Filed under:

Comments

Tuesday, May 23, 2006 7:38 AM by marshallharrison

# re: The shouting match

Shouting seem like a natural response for most people. I think that we should really step back and take a closer look at our speech patterns when dealing with automated systems. Have you ever had someone leave you a voice message that was easily understood only to speed up their delivery when they leave their call back number? We have a natural tendency to speed up when saying phone numbers. We also have other speech habits like this that need to be addressed. Some of these patterns can create problems for speech recognition.
Tuesday, May 23, 2006 7:43 AM by marshallharrison

# re: The shouting match

I also don't think people understand the cell phone and cell tower interaction.

Your cell phone has limited transmit power (I think around 1.5 watts) while the cell tower has a lot more transmitting power. This means that you may be hearing the remote side (i.e. the IVR side) just fine but your little cell phone signal may not be reaching the tower (and thus the IVR)with the same amount of clarity.

Just because you hear it doesn't mean it hears you.
Tuesday, May 23, 2006 3:15 PM by Stephen Potter

# re: The shouting match

Good comments, Marshall. The technology improvements are incremental right now. I think a lot of the problem is about setting the right user expectations. This will come about gradually as people get more exposure to the limitations of SR systems, but it could be a long hike.
Thursday, May 25, 2006 12:04 AM by mycall

# re: The shouting match

For menus that don't recognize, it is good practice to fall back to DTMF for that one prompt after 2 misrecognitions (if it is possible).  Then switch back.. either that or make sure 0 always works :-)
Thursday, May 25, 2006 6:48 AM by marshallharrison

# re: The shouting match

I always have DTMF available for all prompts. I just don't tell them to use it until I have had two consecutive norecos or silence events.
Thursday, May 25, 2006 10:43 AM by KevinJohnson

# re: The shouting match

The best solution would be to recognise the prosodic nature of the voice (are they shouting/angry?) and then respond accordingly (I'm sorry, I seem to be upsetting you.  I'll pass you onto my friend - the human operator).

Until this is possible the opt-out must be available.  This along with correctly designed dialogues, analysis/feedback loops and user education are what is needed. As mentioned above.
Friday, May 26, 2006 11:08 PM by J Kandra

# re: The shouting match

This video clip is a satire on voice recognition based on the Space Odyssey 2001 movie's computer HAL.  In this clip we see Dave talking to his computer Hallie--who has a sweet voice but a serious attitude problem.
See how Dave gets frustrated by Hallie when trying to get Hallie to open applications and perform other voice recognition tasks.

This video clip is meant to be humorous.  It just shows how a person can become frustrated when interacting with a computer.

The dialog spoken by Hallie is in response to voice recognition commands that are spoken by Dave.
Friday, May 26, 2006 11:09 PM by J Kandra

# re: The shouting match

This video clip is a satire on voice recognition based on the Space Odyssey 2001 movie's computer HAL.  In this clip we see Dave talking to his computer Hallie--who has a sweet voice but a serious attitude problem.
See how Dave gets frustrated by Hallie when trying to get Hallie to open applications and perform other voice recognition tasks.

This video clip is meant to be humorous.  It just shows how a person can become frustrated when interacting with a computer.

The dialog spoken by Hallie is in response to voice recognition commands that are spoken by Dave.

Here's the URL:
http://www.youtube.com/watch?v=IdAQgVTl7wk

(wasn't sure if the URL came through on my last submit).
New Comments to this post are disabled
 
Page view tracker