I heartily announce that our new managed Speech API is in the Avalon & Indigo Beta 1 RC!


With the System.Speech namespace you can incorporate both speech recognition and speech synthesis in your applications.




The main classes for speech recognition are:

  • DesktopRecognizer: abstracts the recognizer shared by apps on the desktop.
  • SpeechRecognizer: abstracts a recognition engine for exclusive use by your app.
  • RecognitionResult: examine text and semantics returned by a recognizer.
  • SrgsDocument: used to build recognition grammars (the rules for what phrases a recognizer should listen for in your app)


For example, to load a grammar containing your app’s commands into the shared desktop recognizer:


DesktopRecognizer desktopRecognizer = new DesktopRecognizer();

desktopRecognizer.LoadGrammar(new Grammar(new Uri(grammarPath)));

desktopRecognizer.SpeechRecognized += delegate(object sender, RecognitionEventArgs e)


        // Do appropriate handling when we get a recognition

        // Console.WriteLine("User said {0}", e.Result.Text);



You’ll also need to have an SR engine installed.  There are various ways to get these.  Tablets already have an engine.  If you have a recent version of Office, you’ll have an engine.  You can also download an engine from the SAPI web site http://www.microsoft.com/speech/download/sdk51/.




The main classes for speech synthesis are:

  • SpeechSynthesizer: abstracts a synthesis engine
  • PromptBuilder: build a prompt string containing emphasis, loudness, pre-recorded sounds, and other characteristics.


For example, if you want your app to say “hello world”, just write:


SpeechSynthesizer synth = new SpeechSynthesizer();

synth.Speak(“Hello world!”);


You can easily splice this with a “ding” wave file by using the PromptBuilder:


PromptBuilder builder = new PromptBuilder();

builder.AddAudio (new Uri (@"file://\windows\media\ding.wav"));

builder.AddText("Hello world!");


SpeechSynthesizer synth = new SpeechSynthesizer();



Windows comes with a synthesis engine.


The API uses the W3C standard formats for recognition grammars (SRGS) and synthesis (SSML).