Welcome to MSDN Blogs Sign in | Join | Help

I apologize for not keeping up my blogging as expected. I will continue my follow-up from the PDC demos shortly, but first I will answer some of the questions that were left in recent comments:

   Jay asked: "... has Microsoft developed an application that the common user can use out of the box without much developing experience?"

   Yes! Robert demonstrated the new Speech User Experience in Windows Vista during the first part of our PDC session. Check out the video once it becomes available and you will be plesantly surprised (we hope)! Especially look for new things like the correction experience, mouse grid and numbering.

   Sahil Malik asked: "Will the WinFX speech API be more than just a wrapper on the SASDK available as COM components already? Or will there be other enhancements - perhaps a more natural sounding Text to speech?"

   First let me point out that the SASDK is for telephony speech not desktop speech (it is the SDK for the Microsoft Speech Server 2004 product). The System.Speech managed API is a complete re-design of the API which follows the .Net API design guidelines. And yes, the new voices in Windows Vista are a big improvement (IMHO) over our previous offering. The new voice, Anna, is built using a database of pre-recorded samples (rather than DSP algorithms).

   Tommy asked: "Is it possible to implement a text to speech system only run at stand a lone pocket pc."

   There are currently no announced plans from Microsoft to expose speech synthesis (and speech recognition) functionalities to developers on the Pocket PC platform.

- Philipp

[I apologize for the delay in following up my promise to start blogging about my PDC demo.]

At the PDC I demoed an speech-enabled RSS reader, built on top of the WPF SDK Viewer sample application. The first functionality I showed was a button that when toggled will read the contents of the blog article using Vista's new built-in voice Anna. Under the covers I used a few regular expressions to extract the content from an HTML blog entry. Once I have a text I can simply do the following 2 steps:

  1. Add a reference to the speech assembly, currently called Speech.dll in the 'Add Reference' dialog (after installing WinFX).
  2. Import the namespace into my project:

        using System.Speech;

  3. Create a static instance of a SpeechSynthesizer (we only need one instance so for convenience sake I declare a static instance in the application's main class):

        public static SpeechSynthesizer MySynthesizer;

  4. Speak the article text:

        MySynthesizer.Speak(new FilePrompt(<path to text file containing the article>));

    The FilePrompt class is derived from the Prompt class. It is a convenient helper when the source of Speak() call is in a file on disk.

In our newsgroups (public.microsoft.speech_tech) we often get asked about how to send the output of the speech synthesizer to a WAV file instead of the default audio.

Here's the code snippet that does just that (taken from my demo):

            MySynthesizer.SetOutputToWaveFile(<path to WAV file>);

            MySynthesizer.Speak("Let's wreck a nice beach.");

            MySynthesizer.Speak("It is now " + DateTime.Now.ToShortTimeString());

In my next blog I will show how you can use the Pause() and Resume() methods to control the speaking action. I'll also talk about some of the events raised by the SpeechSyntheizer class.

- Philipp

 

 

Thank you all for coming to our Friday lunch session at the PDC! The large attendance was a pleasant surprise given the timing of our session.

Rob was first up. He demonstrated the new Speech User Experience in Windows Vista. Several times during his demo he got spontaneous applause from the audience :-). He demonstrated the new user interface and the free functionality that every application gets, including dictation (if their text fields use RichEdit 2.0 controls). The fact that Windows Vista ships with our latest speech recognition engine is a big plus for developers who no longer have to worry about how their users will get a competitive recognition engine on their machines.

I was next. I did some live coding demos of speech synthesis (including synthesizing to a file) and speech recognition. The demos worked w/o a problem (although my pressing the 'End' instead of the 'Page Down' button on the slide machine caused some confusion). I will be posting the code snippets from my demos on this blog over the next couple of weeks.

Finally Steve Chang from the Speech Server team showed off some of the prototype systems that they have built. Despite some tricky setup problems (coordinating phone line access with the room's speaker system) his demos went really well. It was very impressive to see both the multi-linguality of their platform and the ability of building working systems in matters of days, not weeks or months. And things will only get better ...

The ensuing Q & A session lasted for about 20 minutes (it helped that there was no talk scheduled in our room after us). I hope we were able to answer your questions. If you have any further questions (or have questions after viewing the online recording) feel free to post it to this blog.

- Philipp

 

After a somewhat lengthy tech check last night (telephony demos always take a bit more time to setup with the A/V folks) we finally made it to the BAR (Big A.. Room) for the 'Ask the Experts' session. There were quite a few folks already waiting for us! I had a great time talking to them, finding out what they are doing with SAPI, their pain points (I hope we won't repeat the same mistakes) with our current release and what their expectation are for the future.

I'm very much looking forward to showing them our demos today at noon during our PDC session. It will be fun!

- Philipp

Rob and I are both in the speaker's work room at the PDC working on our demos. I can hear Rob talking to his computer - it's going to be a cool demo!

My demo is pretty much set now (snippets and all). In the remaining time I'll work on my talking point and practice my typing, since I will be doing live coding demos. For anyone that has ever done any speech demo can appreciate the 'thrill'!

- Philipp

I just turned on annonymous comments for my blog. Sorry for any inconvenience.

- Philipp

I am very excited to have the opportunity to present our work over the last 2 years at the PDC with my co-worker Robert Brown. Unfortunately, our session is at noon on Friday, at a time when a lot of attendees will likely be on their way back to their loved ones. But I hope there will be at least some of you interested enough to stop by our session to see how far speech reocognition and speech synthesis has come and how easy it is to use it from managed code:

Session Details PRSL03 - Ten Amazing Ways to Speech-Enable Your Application September 16, 12:00 PM - 12:45 PM 502 AB

Phillip Schmid, Robert Brown

Microsoft is supercharging Windows and Windows Server System with state-of-the-art speech technology. WinFX has a powerful API for enabling your users to speak to your apps and your apps to speak to your users. A future version of Speech Server will use the same API for extending the reach of your .NET applications to the telephone. The technology just works and the code's easy to write. During this lunch session, see some great examples, like using speech to access information in Windows Vista ("Longhorn"), speech enabling your rich Windows Presentation Foundation ("Avalon") application, and how to extend your application to the telephone for ubiquitous access.

- Philipp

Hi, my name is Philipp Schmid and I am the development lead for the speech APIs at Microsoft. My team delivers both the COM-based sapi.dll (which ships as part of the OS since Windows XP) and the new managed API, System.Speech (part of WinFX). I was one of the original developers on SAPI 5.0 (which shipped in Windows XP and Office XP). I then became my own customer as I used sapi.dll to speech-enable the shell in Windows for Tablet PC V 1.0. After that I returned to the speech group to become the development lead for the managed API effort.

Before joining MSFT in 1999, I got my Ph.D. in Computer Science from the Oregon Graduate Institute (now part of the Oregon Health Sciences University) and then spent 2 wonderful years as a Postdoctoral Associate at the Spoken Language Systems Group at MIT's Laboratory for Computer Science.

But enough about me. This blog is about all things speech recognition and speech synthesis. On the side I am very interested in enterprise technologies and will comment on those from time to time.

One more thing: 2 more days until I leave for the PDC ... More about that in my next blog.

- Philipp

 
Page view tracker