WinFX contains the System.Speech.Synthesis and System.Speech.Recognition namespaces, which allow developers to add speech synthesis or recognition easily to a Windows application. The following diagram illustrates how both managed and unmanaged Windows applications interact with the underlying speech synthesis and speech recognition engines:
Application interaction with speech engines
The speech synthesis engine is accessed directly in managed applications by using the classes in System.Speech.Synthesis or, alternatively, by the Speech API (SAPI) when used in unmanaged applications. For more info on the breadth of features of the WinFX speech technology, see the MSDN article Exploring New Speech Recognition and Synthesis APIs in Windows Vista. The rest of this blog posting will look at how to enable speech synthesis in a WPF application.
Enabling speech synthesisThe Windows SDK Beta 2 contains a sample WPF application, Speech Sample, which provides a simple example of how to enable speech synthesis. BTW, speech synthesis is often referred to as TTS, or text-to-speech. When you build and run the sample application, the following window appears:
Click the Echo Textbox button to hear the displayed sample string, “Hello, world”. Try entering other text and hearing what it sounds like.
You can change the rate of speech by moving the slider control that is labeled Speaking rate -- this either slows or increase the playback of speech. You can also change the volume by moving the slider control labeled Volume.
Clicking the Say Date and Say Time buttons causes the current date and time to be spoken. Click the Say Name button causes the current speech persona to be identified. For Windows XP, the default speech persona is Microsoft Sam. For Vista, you can choose the default speech persona to be Sam, Lili, or Anna.
A Slider control is used to control the volume of the spoken text:
<!--
A second Slider control is used to control the rate of speaking for the spoken text:
A set of four Button controls are used to control the playback of entered text in the TextBox, in addition to three specific strings of text: date, time, and speech persona: <!-- Buttons --><Button Grid.Column="1" Grid.Row="3" Click="ButtonEchoOnClick">Echo Textbox</Button><Button Grid.Column="1" Grid.Row="4" Click="ButtonDateOnClick">Say Date</Button><Button Grid.Column="1" Grid.Row="5" Click="ButtonTimeOnClick">Say Time</Button><Button Grid.Column="1" Grid.Row="6" Click="ButtonNameOnClick">Say Name</Button>
using
public
To convert the text in the TextBox object to speech, call the SpeakAsync method of the SpeechSynthesizer object. Really...that's all there is!
void
You can also identify the currently selected speech persona by retrieving the Voice object and referencing the Name property:
Recall that two Slider controls were defined in XAML to control the volume and playback rate. The two event handlers below set the Volume and Rate properties for the SpeechSynthesizer object:
Creating a text-to-speech reading applicationWith a little more effort you could create a text-to-speech reading application that could read in ASCII text files. You would probably want to add functionality that allowed you to pause, play, and go back a certain interval.
Enjoy,Lorin