speech @ microsoft

  • New top level MSDN node for Speech

    Check it out here! Now you can go to one location on MSDN to get any docs you need for your speech application needs...
  • Use Tellme Studio to create voice applications

    Tobin Coziahr, from Tellme, has been kind enough to write a post describing how you can use the Tellme VoiceXML studio environment to build prototype voice applications. Thanks Tobin!

     

    --- 

     

    Tellme, a Microsoft subsidiary, has a fully featured VoiceXML studio environment that allows anyone to learn how to make voice applications powered by the Tellme platform, for free.

     

    Head over to the getting started section to hear examples of functioning voice applications, and try out the VoiceXML scratchpad, which allows you to write your own VXML, and then immediately call an access number and try it out.  There’s a great code library for you to look at tutorials and sample code, as well. 

     

    There’s even a Visual Studio 2008 plugin that gives you a powerful GUI interface to design and develop voice applications using Microsoft’s Visual Studio Domain Specific Language toolkit.

     

    Sign up for a Tellme Studio account today!

     

    Tobin Coziahr

    Senior Software Design Engineer

  • Technical Preview of the Windows Speech Recognition Profile tool is now available for download!

    It's an exciting time here in Redmond, as we're finally ready to release the CTP (Client Technical Preview) of the Windows Speech Recognition Profile tool (WSRProfile for short). WSRProfile allows you to backup and restore your speech profile. It's fast, and it's easy. You can download WSRProfile here.

    Be sure to download the release notes as they contain useful information as well as user instructions. Please note that this is a "technical preview" version and as such there are a few known issues with the tool.

    As always, we welcome any and all feedback at listen@microsoft.com.

  • Microsoft sponsors the 2009-2010 AVIOS Speech Application Programming Contest

    As we have for the past 2 years, Microsoft is once again proud to announce our sponsorship of the 2009-2010 AVIOS Speech Application Programming Contest. Last year, winners of the contest received some pretty cool prizes from Microsoft, like copies of Visual Studio Professional, XBOX 360s, and Zunes... That's in addition to being flown out to San Diego for the Awards Ceremony at the Voice Search 2009 Conference. This year, Microsoft's offering similar prizes for 1st, 2nd, and 3rd prize.

    Do you have what it takes to win the 2009-2010 Contest? If you think you do, check out the official rules and how to use Microsoft Speech offerings here.

  • Windows Speech Recognition Macros v1.0 is now official!

    Here’s a quick note to let you know that WSR Macros v1.0 is now officially available for download here.

    Notable new features:

    • Improved UI for creating, editing, and signing WSR Macros
    • New UI for creating/editing WSR Macros directly in XML

    Check out the download page above.

    Questions? Comments? Send us email here.

  • Response Point wins InfoWorld 2009 Technology of the Year Award!

    Congratulations to the Response Point team for winning one of the InfoWorld 2009 Technology of the Year Awards!

    Microsoft overall won 4 out of 40 awards announced earlier this week:

    Congrats to the Response Point team and to all the teams at Microsoft for the 4 wins...

  • WA State Deploys Microsoft Audio Indexing Solution

    I just posted this over on my blog, but I thought it was cool enough to put directly on the Speech team blog as well.

    --

    It’s official. Microsoft’s audio indexing solution (born out of Microsoft Research) is now online as a part of a Washington State pilot program aimed at making audio recordings from 1973 to present available to the public, with an easy to use search interface.

    You can read more about it here or you can just play with it here. You can read the official press release here.

    I just tried it out, and it worked great!

    I love seeing the technology transfer from research to product groups, and research using existing technology off the shelf from the product groups.

    Nice job Microsoft Research. Nice job Speech team.

  • Sample Source Code for Speech Developers

    If you are a software developer wanting to incorporate speech recognition and voice output into your application, we have sample source code that can get you started.  We have samples for the native Speech API (SAPI) and the managed System.Speech namespace (Recognition and Synthesis).

    The samples are part of the Windows SDK and the most recent versions were included in the Windows SDK for Windows Server 2008 and Vista SP1.

    The Windows SDK completely replaces the older SAPI 5.1 SDK and supports development on Windows XP, Windows Server 2003, Windows Vista, and Windows Server 2008.

    To get the samples, customers can download this SDK as a DVD image (1,330MB ISO file), or go through a guided setup process where only the components they need are downloaded.  Speech is part of the base install.

    More Information:

  • Windows Speech Recognition Macros is now available!!

    I'm very pleased to announce that first Technical Preview of Windows Speech Recognition Macros is now available for immediate download on downloads.microsoft.com.

    The Windows Speech Recognition Macros tool (aka WSRMacros) extends the usefulness of the speech recognition capabilities already included in Windows Vista. Users can now create powerful macros that are triggered by spoken commands. These macros can perform a single task, or a series of tasks. Macros can be as simple as inserting your mailing address to as complex as providing a completely different speech interaction utilizing a number of built in capabilities or utilizing custom JScript/VBScript actions.

    You can read more about WSRMacros in the the release notes here.

    Stay tuned to the following blogs for more information on how you can best utilize the full power of Windows Speech Recognition Macros:

    Additionally, we'd love to hear from you about what you like and what you don't like about WSRMacros. Please send us email at listen@microsoft.com and let us know what you think. We won't be able to answer all the emails that we receive, but we'll do our best to use your questions and comments to make WSRMacros better over the coming months.

    NOTE: While we have tried to make it easy to use, this release of WSRMacros is a technical preview of technology we are planning to release in the future. Not all of the features we have planned are included, and some features are incomplete. Users are cautioned to treat this release as “pre-beta.”

  • How do you get a list of Speech Commands for an application in Vista?

    We often get requests for lists of availiable speech commands in Vista applications.  The challenge for most people is that when given these lists they can only rememeber a few commands for few applications. To address this challenge we employ

     

    “Say what you can see” as the basic principle behind the Windows Speech Recognition (WSR) UI in Vista.

     

    Of course sometimes you can see a control but you don’t know what it’s called, for example this is the case with the playback controls in Windows Media Player.  To find out what these controls are called you can hover the mouse over them or you can say “show numbers”.  You can then speak the number of the desired control followed by the word “OK” and the control will be activated as well as displaying the name of the control in the speech recognition notification area.  

     

    Typically if there is text associated with a control or action you can speak any sub string and have the control activated.  In the event that there’s ambiguity regarding what control you want to activate the system will highlight the possible controls and ask you to say the number of the desired control followed by the word “OK”.

     

    For more details on how to get the most out of Windows Speech Recognition (WSR) please consider taking the tutorial.

     

    I'll close with one special consideration (beyond general SR limitations) that might cause you difficulties. If the system isn't always recognizing what you've said it’s possible that you aren't waiting long enough for WSR to interrogate the application to obtain all of it's commands.  The blue doughnut in the Windows Speech control indicates that it is still obtaining information from the application.  While the blue doughnut is present, not all commands will be available.  And finally, be sure that command or item you are trying to speak is visible. If you can't see it you can't say it!

  • Where can I download the SAPI 5.3 SDK?

    We periodically get asked where someone can download the SAPI 5.3 SDK.

    For SAPI 5.0 & 5.1, the SDK was a standalone download.  It is now part of the platform SDK.  So, all you need to do is go to http://msdn.microsoft.com and search for what you are looking for.  For example, if you search for "SAPI 5.3", the first result returned is the overview page for SAPI 5.3.  There is a discussion on what is new in SAPI 5.3, and then of course all the API documentation and samples are there.

  • Where does dictation work in Windows Speech Recognition?

    I get a lot of questions about dictation in Windows Speech Recognition for Vista.  One of the most frequent questions is why dictation doesn't work in a particular application.  The short answer is that dictation relies on a technology called Text Services Framework to interact with applications.  If the application doesn't support it, then dictation doesn't work well.  There is a very basic form of dictation support that can be activated when you check 'Enable Dictation Everywhere' in the Options submenu of the speech recognition context menu (right-click on the speech recognition UI).

     What applications support Text Services Framework? Well, I know that the following applications definitely support Text Services Framework:

    • Microsoft Word 2003, 2007 (possibly XP; I haven't tried it)
    • Wordpad
    • Notepad
    • Microsoft Publisher 2003 and 2007 (I haven't used earlier versions)

    I also know that the following applications do NOT support Text Services Framework:

    • Microsoft Excel
    • Microsoft Powerpoint
    • Microsoft Works
    • Microsoft Word 2000 and earlier

    Dictation should work in any program that uses plain text controls or richedit controls, as well.

  • Live Search for Mobile - Now with Speech Recognition

    • Do you have a Windows Mobile phone?
    • Do you ever find yourself looking for a phone number, or directions, or gas prices, or movie listings while you are on the go?
    • Do you wish you could just speak your search queries, rather than fumble with your phone's tiny keypad?

    If you answered ‘yes’ to the three questions above, the Live Search for Mobile client is for you. Point your Windows Mobile phone's Internet Explorer to http://wls.live.com and install the client on your phone.

    The Speech Recognition Team at Microsoft spent the last few months helping the Live Search team to speech enable this exciting application, and we can't wait for everybody in America to give this little gem a try. Install it now!

    Here are some of the cool things you can do with the speech enabled Live Search for Mobile client:

    • Search for businesses by name
    • Search for businesses by name and location
    • Search for businesses by category
    • Find locations by saying a zip code, city and state, or full address

    Some examples of the things you can say:

    • Microsoft Corporation
    • Microsoft in Redmond Washington
    • Mexican Restaurant
    • Los Angeles California
    • 98052
    • Miami Florida 33101
    • One Microsoft Way, Redmond Washington 98052

    You can check out a cool video of the feature here

    The speech recognition functionality for the application doesn't actually sit on the Windows Mobile phone. Instead, the phone takes your speech input, sends it to a server, the server does it's recognition magic, and sends the results back to the phone.

    All in all, you'll find the whole experience a lot faster than typing your search query on your keyboard. Especially if you have a phone with a 10 key keypad. Can you imagine typing "One Microsoft Way Redmond Washington 98052" with that? Or even just "Mexican Restaurant"? No way! Speech makes things way easier.

    And hey, can you imagine how much money you are going to save in 411 fees? Behold the power of speech!

    And even if you don't have a Windows Mobile phone, you can always call Microsoft's new free 411 service at 1-800-CALL-411.

  • Speech Language Support in Windows Vista

    In my earlier post about how to download and install SR languages for Vista, I somehow neglected to mention what languages are supported.  Not surprisingly, we got some questions about this.

    Q: What languages is Windows Speech Recognition available in?

    A: For Speech Recognition, this info is available on the page on www.microsoft.com that talks about the feature: http://www.microsoft.com/windows/products/windowsvista/features/details/speechrecognition.mspx.  It says "Windows Speech Recognition is available in English (U.S.), English (U.K.), German (Germany), French (France), Spanish (Spain), Japanese, Chinese (Traditional), and Chinese (Simplified)."  It is not possible to extend this support in Vista.  If you have a recognizer for another language, you can use it in other applications, but not in Windows Speech Recognition.

    For TTS, Vista ships with US English and Chinese (Simplified) synthesizers.  The Narrator screen reader (http://www.microsoft.com/windows/products/windowsvista/features/details/accessibility.mspx#narrator) does support other languages and other engines.

    So, you might ask, why the difference?  Well, Windows Speech Recognition includes lots of grammars that have to be localized into each language.  Localizing the grammars is reasonably difficult, and will result in errors that need to be found through testing the integrated system.  Since we didn't have engines in languages other than those mentioned above, we had no way to find and fix issues and verify that grammars in other languages were correct.  Without being able to test, we can't ship, and without being able to ship there is no point in going through the effort in the first place.

    Narrator, on the other hand, doesn't have anything complex that needs localizing, just strings.  Localizing strings is part of the standard Windows localization process, so Narrator can support any language you have a TTS engine in.

  • Speech API and engine Availability

    This question came in to our SAPI5@microsoft.com alias today:

    Q: Can you please tell me where I can get Microsoft Speech API 5.3 appart from those versions packaged with Vista and .Net framework 3?

    A: This question and others like this could refer to one of a few things: SAPI, our COM API; System.Speech, our managed API in the .NET Framework 3.0; the TTS engine; or the SR engine.

    OS

    Managed API

    Unmanaged API

    TTS engine

    SR engine

    XP

    System.Speech (download the .NET Framework 3.0)

    SAPI 5.1 in the OS

    MS Sam in the OS, MS Mary and MS Mike as downloads

    SR engine in TabletPC, or from Office XP and Office 2003

    Vista

    System.Speech in .NET Framework 3.0 in the OS

    SAPI 5.3 in the OS

    MS Anna in the OS

    SR engine v8 in the OS

     

    If you want your application to work on both XP & Vista and are using SAPI (the COM API) you need to limit yourself to the 5.1 subset of SAPI.  The SAPI headers in the current platform SDK have the appropriate ifdefs in them so that if you target your app for XP, you will only see functions that are available in XP.

    System.Speech relies on SAPI for some functionality, so there are a couple places where functionality is restricted on XP.

    The newer engines are more accurate/better sounding than the old engines.

    So, if you want your application to work on both XP and Vista, that is easily done.  But if you want the newest functionality, or the best accuracy and voice quality/intelligibility, you'll want Vista.  If you want to use the newest functionality on Vista but also work on XP with reduced functionality, you can try to create the new interfaces, and if that succeeds (as it will on Vista), use them, and if it fails (as on XP), then skip over the part of your code that uses the new APIs.

     (I edited the post to put the info in table form, which is easier to understand).

More Posts Next page »

© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker