One of the coolest tools of the Microsoft Speech Server (MSS) platform (and heretofore most underappreciated in my opinion) has got to be the Prompt Engine. I think this tool is a key differentiator for MSS and deserves more attention. Managing and using recorded prompts is one of the most tedious, time-consuming aspects of developing speech applications. As a former VoiceXML developer, I spent countless hours writing hundreds of lines of VoiceXML code like the following:
<Prompt>
<audio src=”you_ordered_a.wav”> You ordered a </audio>
<audio src=”../numbers/{%1}.wav”> {%1} </audio>
<audio src=”pizza.wav”> pizza. </audio>
<audio src=”and_your_telephone_number_is.wav”> And your telephone number is </audio>
<audio src=”../numbers/{%2}.wav”> {%2} </audio>
<audio src=”is_that_correct.wav”> Is that correct? </audio>
</Prompt>
Some of the difficulties with this appproach are:
-
You have to verify that each referenced .wav file exists, is named correctly, and is in the right directory. (As your library of prompts grows, the directory structure to house all the prompts gets more sophisticated. The more complicated your directory structure, the more difficult it becomes to reference the .wav files.)
-
You also have to verify that the TTS fallback string is in sync with the recorded prompt. Dealing with thousands of prompts, there will invariably be missing prompts, mis-named .wav files, and TTS strings that don't match their referenced .wav files. Debugging these errors can be costly and trying. Often there is no way to know what a .wav file contains without listening to it.
-
Deploying the thousands of recorded prompts is painful as well - one corrupted .wav file could stymie the deployment of the rest of the prompts.
And here're some of the reasons why I think the Microsoft Prompt Engine is such a godsend to speech application developers:
-
With the Microsoft Prompt Engine, you build a database of all your prompts and deploy a single binary file. No risk of inadvertently losing one or two prompts during deployment.
-
You don't have to reference each .wav file individually - the Prompt Engine automatically finds the extractions it needs from the right .wav files. No more relative filepaths, complicated directory structures, or filename mismatches! In fact, you don't have to have a separate .wav file for each concatenative part of your prompt anymore. You can extract as many words or phrases from a single .wav file as you'd like.
-
There's no sync problem with the TTS fallback string, because the same string is used for TTS fallback and referencing the recorded prompt.
The sample prompt above is much simpler on MSS with the Prompt Engine:
sPrompt = “You ordered a <div/> ” + {%1} + “ <div/> pizza <div/> and your telephone number is <div/> ” + {%2} + “. <div/> Is that correct?”
(The <div/> tags specify which phrases (extractions) to look for in the prompt database.)
Of course, keeping track of which extractions you have in the database can be just as complicated as keeping track of which .wav files you've recorded. No problem! With the Prompt Validator, you simply enter the text you want your application to speak, and you can find out instantly which words or phrases you're missing. In the sample screen below, you'll see that "large" is spoken in TTS.

A bit long-winded, I know, but I think the MSS Prompt Engine is really cool. If you're a speech app developer and you haven't tried it out yet, check it out!