Welcome to MSDN Blogs Sign in | Join | Help

The voices behind the machine

Here's an inside look at the 'voice talent' industry - the people who provide prompts (and thereby branding) for companies with automated systems - with the following data points:

  • Perhaps half of all Fortune 1,000 companies still rely on their own staffs to provide voices for their phone systems (Spotter: yep, that explains a lot)
  • ...a big trucking company went with a sultry, female voice for a system that truckers used to get their next dispatch assignments.
  • Voice provider: we have this window of 10 to 15 years before you don't need me any more because of the technology...

On that last point, low-cost deployments will undoubtedly benefit from advances in TTS technology. But it'll probably be a lot longer before a TTS system can match the sparkle from the human touch of a good voice talent.

(from the Atlanta Journal-Constitution)

Published Friday, October 27, 2006 6:51 PM by Stephen Potter

Comments

Saturday, October 28, 2006 10:49 AM by Florian Laws

# re: The voices behind the machine

"it'll probably be a lot longer before a TTS system can match the sparkle from the human touch of a good voice talent."

Current state of the art TTS systems synthetisize their output from snippets of recordings of real human voices. (The snippets, that can be as small a a single phoneme but also span words, are more or less concatened to form the desired utterance and then digitally smoothed and altered to get the right prosody). (Unit Selection Synthesis)

Because of that, developers of TTS systems still need voice talents to record the builing blocks of their TTS systems, and I think you can still identify the voice of the output of a TTS system. Thus, if you wanted to deploy a TTS system that has certain qualities of its voice (such as female and sultry in your trucking example), you'd still need a voice talent with these qualities.

So I think even with growing adoption of TTS systems, voice talents will still be needed.

Fully synthetic speech sounds totally inacceptable, and since speech synthesis research seems to have pretty much abandoned full synthesis in favor of concatenative synthesis, this is unlikely to change soon.

Saturday, October 28, 2006 1:08 PM by flawed concepts

# Sprachsynthese vs. Professionelle Sprecher

Stephen Potter berichtet über einen Artikel , in dem über eine Firma berichtet wird, die professionelle Sprecher für Sprachcomputer vermittelt, so daß auch die eines Sprachcomputersystems das gewünschte Image vermittelt. Das sind "klassische" Sprachco

Monday, October 30, 2006 2:51 PM by Stephen Potter

# re: The voices behind the machine

Hi Florian, yes I agree. And I would add that a sentence recorded by a good voice talent for a particular dialog state in a particular application is likely to be more contextually appropriate in terms of prosody, etc. - and hence more acceptable to callers - than the same sentence generated by a TTS system.

Wednesday, February 07, 2007 9:58 PM by Working the Spoken Word

# Prompt 1: Take n

"He never works when he's in a bad mood..." says the the Boston Globe in a profile of successful voice

New Comments to this post are disabled
 
Page view tracker