Jay's blog on text-to-speech (Now Defunct)

What's up with text-to-speech (TTS) and Microsoft? Heck, what's up with TTS in general these days? Speech, language, and technology. Cool stuff, indeed.

Text to Speech in Mission Impossible 3: A Dissection

Besides being the best of the three MI movies, there were 2 instances of TTS in the movie that deserve some discussion (and clarification). One of the scenes was simple and plausable, while the second was a definite stretch (i.e., not doable by today's technology).

In the first scene with TTS, one of the "good" guys was automating the descruction of a wharehouse full of "bad" guys, using vehicles equipped with large guns. When the automation started, the computer began speaking out some information using TTS. I'm pretty sure it was Mac OSX TTS. Definitely low on the naturalness scale, but intelligible nonetheless. (Can anyone confirm which TTS voice this was?)

In the second scene(s), THE "good" guy (i.e., Tom Cruise's character), forces THE bad guy to read several syntactically but not semantically grammatical sentences off of a business sized card at gun point. Within seconds of completing the reading of the card, another "good" guy has intercepted the wave beneath the complex to generate a highly natural and intelligible TTS voice which is sent back to our protagonist in a bathroom who then can talk with the "bad" guy's voice.  OK, so I'm actually quite forgiving in movies, giving the technology the benefit of the doubt (i.e., I pretend that I'm watching Sci-Fi and not a modern day action movie). So, if we assume that this was some other technology beyond TTS, great. No worries. However, if you are insisting that the movie follow current plausable technology, then here's what wrong with the TTS in this second scene:

1) The TTS engine was generated from several sentences. Today, takes many many hours of recordings to generate a naturally sounding engine.

2) The recording was done in a bathroom next to a loud party and then streamed to a nearby underground location. Not likely to result in the high quality recordings that one would need for TTS.

3) The recording was streamed through rock. I'm imagining that some signal loss would be encountered in real life.

4) The resulting TTS sounded almost EXACTLY (egads, as if it truly was the other actor speaking with Tom lip-synching) like the "Bad" guy!  Even on the BEST concatenative engines (i.e., based on 40+ hours of recording a person's voice), it won't sound just like the real person.

Comments? Alternative takes?

Published Friday, May 12, 2006 9:47 PM by jaywaltm

Comments

 

Rosyna said:

Uhm, the latter part wasn't a TTS engine at all. It was frequency/pitch matching engine dealie. He had him read a card full of consonant, vowel, and combination sounds. This audio was sent (I imagine digitally) to Luther, Luther's many computers compiled a modulation algorithm based on Tom Cruise's voice and sent the finished algorithm back to the chip inside his neck. This program just adjusted frequency and what-not. It did not actually have to do any Text To Speech. It turned the sound of one voice into the sound of another. No biggy. it's like how you can make Male speech sound like female speech in an audio program.

However, the timing of each syllable for cruise and what's his face would have to be identical for this to be convincing.
May 12, 2006 5:37 PM
 

Literal-Minded » Blog Archive » A Panphonic Poem for Mission: Impossible 3 said:

May 28, 2006 4:06 PM
 

jaywaltm said:

Yes, you are right. I mingle my worlds of speech synthesis with text to speech. Technically, it was speech synthesis and not text to speech.
June 9, 2006 3:24 PM
 

The Howdy Kid » Is my thesis Mission Impossible? said:

October 9, 2006 10:00 AM
 

Jay s blog on text to speech Now Defunct Text to Speech in Mission | Outdoor Ceiling Fans said:

May 31, 2009 8:40 AM
 

Jay s blog on text to speech Now Defunct Text to Speech in Mission | Outdoor Ceiling Fans said:

May 31, 2009 11:15 AM
 

Jay s blog on text to speech Now Defunct Text to Speech in Mission | Outdoor Decor said:

June 19, 2009 12:33 AM
 

Jay s blog on text to speech Now Defunct Text to Speech in Mission | home lighting said:

June 19, 2009 1:38 AM
Anonymous comments are disabled

About jaywaltm

Jay is a Program Manager in the Speech and Natural Language Group at Microsoft. He has a Ph.D. in Linguistics from the University of Washington.

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker