I heartily announce that our new managed Speech API is in the Avalon & Indigo Beta 1 RC!

 

With the System.Speech namespace you can incorporate both speech recognition and speech synthesis in your applications.

 

Recognition:

 

The main classes for speech recognition are:

  • DesktopRecognizer: abstracts the recognizer shared by apps on the desktop.
  • SpeechRecognizer: abstracts a recognition engine for exclusive use by your app.
  • RecognitionResult: examine text and semantics returned by a recognizer.
  • SrgsDocument: used to build recognition grammars (the rules for what phrases a recognizer should listen for in your app)

 

For example, to load a grammar containing your app’s commands into the shared desktop recognizer:

 

DesktopRecognizer desktopRecognizer = new DesktopRecognizer();

desktopRecognizer.LoadGrammar(new Grammar(new Uri(grammarPath)));

desktopRecognizer.SpeechRecognized += delegate(object sender, RecognitionEventArgs e)

    {

        // Do appropriate handling when we get a recognition

        // Console.WriteLine("User said {0}", e.Result.Text);

    };

 

You’ll also need to have an SR engine installed.  There are various ways to get these.  Tablets already have an engine.  If you have a recent version of Office, you’ll have an engine.  You can also download an engine from the SAPI web site http://www.microsoft.com/speech/download/sdk51/.

 

Synthesis:

 

The main classes for speech synthesis are:

  • SpeechSynthesizer: abstracts a synthesis engine
  • PromptBuilder: build a prompt string containing emphasis, loudness, pre-recorded sounds, and other characteristics.

 

For example, if you want your app to say “hello world”, just write:

 

SpeechSynthesizer synth = new SpeechSynthesizer();

synth.Speak(“Hello world!”);

 

You can easily splice this with a “ding” wave file by using the PromptBuilder:

 

PromptBuilder builder = new PromptBuilder();

builder.AddAudio (new Uri (@"file://\windows\media\ding.wav"));

builder.AddText("Hello world!");

 

SpeechSynthesizer synth = new SpeechSynthesizer();

synth.Speak(builder);

 

Windows comes with a synthesis engine.

 

The API uses the W3C standard formats for recognition grammars (SRGS) and synthesis (SSML).