How to punish a speech recognition system
We've all had frustrating experiences with speech recognition systems, and as a race we're not beyond punishing virtual beings the same way we would punish people. So, what to do when that voicebot won't behave? Teach it a lesson! Here are some tips on how to get your own back on a telephony speech recognition system.
1. Play loud noise in the background. Music, car engines, crowded bar noise... all good. Systems typically calibrate background noise levels at the start of the call as a baseline against which to separate the speech signal. Blasting noise right up its input channel at start-up is going to give the system such a distorted view of your audio world, it won't have a hope at picking out your voice. For extra points, play loud music and get the song recognized instead of your voice: (How may I help you?) I can't get no... (I think you said 'account get new', is that right?) ...Satisfaction... (Got it!)
2. Speak long utterances without a pause. Great way to tie up system resources! Speech recognition doesn't come cheap in terms of CPU, and the longer you can make it process your big shiny audio, the sweeter your revenge. Pick up a newspaper, start reading and keep going without taking a breath. Keep it up for long enough and the system will eventually bail with a 'babble timeout' - you win.
3. Stay silent. The stealth-mode way to confuse the system. There it is, listening hard, straining at the lowest levels of the audio stack for your voice - but don't speak or make a noise. (Tip: put the phone on mute.) You might be tempted to chuckle during the silences, but keep your nerve, and laugh inside at every "I didn't catch that". It won't be long before the system just hangs up in perplexity.
4. Shout as loud as you can. This causes 'clipping' to the audio - basically, you're exceeding the expected amplitude of a bunch of frequencies in your signal, which flattens the waveform and introduces all kinds of distortion. Recognize that!
5. Pretend you're different people as the session progresses. Bit subtle this one, but in order to improve accuracy, speech recognizers like to decide early on what kind of speaker you are - male/female, child/adult, etc., and assume that you won't change. Nice try, reco-bot. This futile assumption can be wiped on the floor simply by first pretending to be a middle-aged man and then suddenly a twelve-year old girl! (You might want to practice voices beforehand.) A fun variant of this is to get different kinds of people together, and hand the phone between them at each dialog turn - great party game.
6. Play "Dialog-Turn-the-Tables". This one is not only very satisfactory to do a number of times in a single call, it also has the potential to mislead the underlying data analysis algorithms that try to improve accuracy. The idea is to answer the system's questions with some information (so you might say for example I'm in Seattle), but then when the system tries to confirm it (Am I right with Seattle?), you can triumphantly say No! if it's right, and Yes! if it's wrong. You are messing with that heap of code, big time.
7. Chirp DTMF. DTMF (a.k.a. 'touch-tone') chirping is a skill that requires simultaneously humming and whistling a pair of different tones in order to mimic a keypress. This takes a lot of practice, but stick at it - the payoffs are big. Imagine: the system asks you to "Press or say '1'..." but you do neither, you chirp #! Your voice just snubbed the SR engine and shoved it to the DTMF recognizer with a tone that was out-of-grammar! Beautiful!
Note: these techniques should be applied only when you have no interest in the outcome of your call (or in what an analyst of the audio logs of your call might think of you). If you want the system to provide information, conduct a transaction or put you through to an operator, don't do these things. Speech recognition engines are fragile, graceful things of beauty that will improve with love, patience, and lots of training data. Speak normally in a quiet environment, and do what you're told.