In my two previous posts, I covered how to create speech recognition engines and use them to parse through WAV files containing a sample “Hello World” recording.
This post will focus on two things. First, simple real-time recognition with a simple hardcoded grammar and second, a way to let you dynamically improve the system by updating the grammar.
The core Windows operating system has had built in speech recognition since Vista. You must enable the OS speech recognition in order for this tutorial to work.
On Windows 7:
This is a simple modification on the great MSDN tutorial for the System.Speech namespace. It is an application that will listen for colors that are predefined in a Speech Choices object and will write the output to a text box if the color was correctly recognized. Inside of this app I will show how to:
And as always, the Windows SDK is a requirement. Please install it to follow along with this tutorial.
First, launch Visual Studio and start with a blank WPF app. In the designer, add two text boxes, three labels and one button like so:
Set the larger text box to read only to get the best results from this tutorial. Once the UI objects have been created and laid out open up the window’s code behind file.
Next, add System.Speech to the project references:
Now, add following code to the Window constructor:
SpeechRecognizer sr; List<String> colorList; Choices colors; public MainWindow() { InitializeComponent(); sr = new SpeechRecognizer(); colors = new Choices(); colorList = new List<string>() { "red", "yellow", "green", "blue" }; InitializeSpeechRecogonition(); }
Once that’s done, it is time to create the InitializeSpeechRecognizer and the LoadGrammar helper methods:
private void InitializeSpeechRecogonition() { //First, load the grammar then wire up the Speech Recognition events LoadGrammar(); sr.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sr_SpeechRecognized); }
private void LoadGrammar() { //Load up the Choices object with the contents of the Color list, populate the GrammarBuilder, //create a Grammar with the Grammar builder helper and load it up into the SpeechRecognizer colors.Add(colorList.ToArray()); GrammarBuilder grammarBuilder = new GrammarBuilder(colors); Grammar testGrammar = new Grammar(grammarBuilder); sr.LoadGrammar(testGrammar); }
There are three objects of interest. The SpeechRecognizer object, Choices, and the Grammar. SpeechRecognizer hooks into the operating system’s shared recognizer but initializes its own engine to handle recognition events. The benefit of using SpeechRecognizer is that it makes it doesn’t require the developer to worry about the audio input, but it doesn’t allow for more advanced recognition scenarios.
Choices is a basic way to create simple grammars without having to define a detailed external grammar file. For this tutorial and basic scenarios requiring only simple recognition this is a great way to get started.
The Grammar object is instantiated from the GrammarBuilder, which will create a root rule for the string list that is contained within the Choices object. This is *essential* for the Speech Recognition engine. Without a grammar, there is nothing that the recognition engine can use to determine what was said.
Now that all the Speech Recognition objects have been wired up, set the Speech Recognized handler to write the output to the output text box:
void sr_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { textBox1.Text = textBox1.Text + e.Result.Text + "\r\n"; }
Once all of this is completed, the app will listen for whatever colors you defined in the colorList. In this example those colors are red, yellow, green and blue.
Launch the application and enable Windows Speech Recognition. Make sure that if you have to click outside of your application to enable recognition, you click back into the test app.
Now try it out! Speak into your microphone and say one of the words that exist in the list. A few things will happen:
This is an example of ‘bad’ recognition happening to me:
In this case of bad recognition, apparently the word I said and they way I said it was enough to recognize a word that was in the grammar.
Using the text box and button that were also added to the app, I am going to enable the user to improve the recognition experience by adding words to the list and reloading the grammar.
All you need to do is wire up the button’s click event and add the following code:
private void button1_Click(object sender, RoutedEventArgs e) { if (!String.IsNullOrEmpty(textBox2.Text)) { colorList.Add(textBox2.Text); LoadGrammar(); textBlock1.Text = "Added: " + textBox2.Text; textBox2.Text = string.Empty; } }
Now, when users add a word and click the Add button, the Speech Recognizer’s grammar will be reloaded with the new word.
The grammar is updated and new words are recognized based on user input.
Try out other grammar scenarios. My previous two posts use a grammar file that can be modified and loaded up externally.
There are some useful tutorials and code examples over on MSDN. You can check them out here:
Check out my other blog posts on the Speech Recognition Engine:
Special thanks to Steve Meyer for reviewing this post and suggesting the blog title!