By Dieudonné Mayi

A decade ago, who would have heard of Interactive Voice Response Systems (IVR) speaking almost as clearly as a human operator, let alone walking the user through phone menus, or intelligently routing calls without the need of expensive telephony hardware?  Speech technology has come a long way and it is now becoming pervasive in our daily lives. Call any major companies, from Amtrak, Qwest, Microsoft, to T-Mobile, and you get a non-human voice which is seamlessly guiding you through phone menus, asking questions, giving suggestions, and even placing a joke here and there.  

The experience has gotten better than a few years back. Remember the rush to press 0 to bypass the speech agent? No more.  Nonetheless, there is a little bit of frustration, when the speech system does not understand what you just said and keeps prompting you to speak your selection. These are due to non-native English accent or when you are in a noisy environment like driving on a freeway.  Regardless of the limitations intrinsic to voice recognition, the speech technology has come a long way.  In this blog entry, I am looking at an overview of the disciplines emerging in the Speech Technology field.


Why speech automation?

The core value proposition for speech technology for call centers is to reduce enterprise communication costs of hiring, training, and retaining thousands of call support workers to handle routine calls.  Other opportunities are revenue generation. Most call centers are embracing speech technology, not just as an alternative feature in addition to human voice, but as a cost saving.  Therefore, call centers are becoming a strategic asset for companies with the ability to analyze customer calls and extract valuable data (Speech Analytics). 

Speech technology has not been used exclusively on call centers.  A number of desktop applications have been developed to help people with speech impairment, voice command tools for hands off operation for both the physically challenged and normal users, dictation software (e.g. Dragon Naturally Speaking, Microsoft Office 2007), and electronic book readers. 


The Speech Industry

The speech industry is growing at a fast pace. With the latest introduction of VoiceXML (based on Speech Application Language Tags or SALT standard), more IVRs have been shipped, also multimodal communication using voice is gaining footage in the area of wireless mobile devices (cell phones, navigation systems, etc…).  Following areas have emerged or are emerging:

Speech engine

In 2007 and 2008, the speech engine market has seen significant growth in size of recognition engines, as well as the number of supported languages. Also, the number of embedded devices with speech recognition support has grown steadily. The most growth is in the Asian and Eastern European markets.  This growth is due to advances in natural language processing, expansion of vocabulary, and faster processors. According to Speech Technology Magazine (Sept. 2008), the leaders in this segment are LumenVox, Nuance Communications, and Loquendo; with Nuance maintaining a lead in accuracy. IBM and Microsoft have also been called out as contenders in this space, with Microsoft built in Speech engine in Vista.


Speech self service suite

With the maturation of the IVR systems market, more vendors are adding capabilities to the speech suites.  The general trend is a move from legacy and proprietary systems to open-source with VoiceXML-based applications.   According to Speech Technology Magazine, the leaders in this segment are Avaya, SpeechCycle, and Voxeo. Intervoice is a contender in this segment.


Speech Security

Speech security is involved with adding voice biometrics to speech for consumer facing deployments. How does it work? The provider saves prerecorded phrases from the user and saves them in a database. When user calls, the IVR directs the user to speak the phrases and compares with the stored voice. If there is a match, the user is routed to the live operator who then verifies other credentials before letting the user access resources via voice.  The hardest is the non-existence of standard compliance tests.  In this segment, Speech Technology Magazine calls out the leaders as Agnito, VoiceVerified, and VoiceVault.


Speech analytics

This segment of the speech industry has grown by 106 percent in 2007. Speech analytics is mainly used by call centers to analyze customer calls, and extract data that companies can use for opportunities and business benefits. They can apply the same analytics as for chat, e-mail, and web-data to understand customer behavior, customer problems, and feed that data back to making strategic business decisions for the company.  Speech Technology Magazine identifies the leaders in this segment as NICE Systems, Verint, and Nexidi; with Nuance Communications joining this landscape.


Mobile speech applications

Voice powered applications are gaining ground in the mobile phone and navigation arena. Here are a few systems:

·         Nuance Communications has a vSearch engine on the iPhone

·         Microsoft subsidiary TellMe Networks has a large number of devices running speech enabled applications and control on various devices including Samsung, Motorola, Sony Ericson

·         Vlingo introduced the Yahoo! One Search with Voice, as layered speech capabilities on top of Yahoo’s OpenSearch application.


Future directions

I believe that speech technology will continue to grow and gain space in both profitability and applications.  One question is will the computer speech agent replace the human agent completely? Maybe and in some cases, but is it desirable to totally eliminate customer service agents for speech robots? Not sure from the customer satisfaction standpoint, but all is a matter of getting used to new ways of doing things. One hurdle is to expand the speech recognition to various languages beyond major European languages and Eastern languages.  Another area of ongoing research is to add emotion to the speech agents and more intelligence to the IVRs.   The horizon is not distant when the human will be fully able to interact with computing devices via speech. 


Speech Technology Magazine: is a good source of information on the speech technology industry.