<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Rob's Response Point : Speech</title><link>http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx</link><description>Tags: Speech</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>SP2 has been updated to support speech barge-in</title><link>http://blogs.msdn.com/robertbrown/archive/2009/02/20/sp2-has-been-updated-to-support-speech-barge-in.aspx</link><pubDate>Sat, 21 Feb 2009 04:50:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9436803</guid><dc:creator>RobertBrown</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/9436803.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=9436803</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=9436803</wfw:comment><description>You wanted barge-in back. We heard you. It's back. Response Point SP2 now supports speech barge-in as a checkbox in the Configure Automated Receptionist dialog box: To install it.... Rather than supply a separate patch that would need to be applied every...(&lt;a href="http://blogs.msdn.com/robertbrown/archive/2009/02/20/sp2-has-been-updated-to-support-speech-barge-in.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9436803" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Response+Point/default.aspx">Response Point</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category></item><item><title>Streamlining your Automated Receptionist Greeting</title><link>http://blogs.msdn.com/robertbrown/archive/2009/02/13/streamlining-your-automated-receptionist-greeting.aspx</link><pubDate>Sat, 14 Feb 2009 02:32:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9420312</guid><dc:creator>RobertBrown</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/9420312.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=9420312</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=9420312</wfw:comment><description>(Don't ping me on barge-in right now - or at least, if you do, I don't have any more info yet. Hopefully soon. UPDATE 3/14/09: barge-in is supported in SP2 ) In the meantime, here's a nice technique for streamlining your Automated Receptionist greeting....(&lt;a href="http://blogs.msdn.com/robertbrown/archive/2009/02/13/streamlining-your-automated-receptionist-greeting.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9420312" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Response+Point/default.aspx">Response Point</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category></item><item><title>Trouble-Shooting Speech Recognition</title><link>http://blogs.msdn.com/robertbrown/archive/2009/01/14/trouble-shooting-speech-recognition.aspx</link><pubDate>Thu, 15 Jan 2009 06:22:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:9319955</guid><dc:creator>RobertBrown</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/9319955.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=9319955</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=9319955</wfw:comment><description>(UPDATE 3/14/09: take a look at the 10 ways to optimize speech recognition quick guide.) There have been lots of posts all over the web about how cool Response Point ’s speech recognition is. But every now and then somebody tells us it’s not working out...(&lt;a href="http://blogs.msdn.com/robertbrown/archive/2009/01/14/trouble-shooting-speech-recognition.aspx"&gt;read more&lt;/a&gt;)&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=9319955" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Response+Point/default.aspx">Response Point</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category></item><item><title>New Speech Server video on Channel 9</title><link>http://blogs.msdn.com/robertbrown/archive/2006/06/29/new-speech-server-video-on-channel-9.aspx</link><pubDate>Thu, 29 Jun 2006 20:04:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:650956</guid><dc:creator>RobertBrown</dc:creator><slash:comments>0</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/650956.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=650956</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=650956</wfw:comment><description>&lt;P&gt;&lt;A href="http://channel9.msdn.com/Showpost.aspx?postid=208891"&gt;&lt;FONT face=Arial&gt;http://channel9.msdn.com/Showpost.aspx?postid=208891&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;Albert, Mithun and Dave talk us through some of the great new capabilities of the next version of Speech Server.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;I worked on the initial version of Speech Server (2004), as well as some of the plumbing of the server and API in the upcoming version.&amp;nbsp; It's great to see what the Speech Server team is delivering.&amp;nbsp; They've made excellent use of the last couple of years to build an impressive upgrade.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;EDIT: I forgot to link to the beta download instructions: &lt;A href="http://www.microsoft.com/speech/preview/default.mspx"&gt;http://www.microsoft.com/speech/preview/default.mspx&lt;/A&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;---&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;As an aside, one of the comments (from &lt;/FONT&gt;&lt;A href="http://channel9.msdn.com/Niners/jsampsonPC"&gt;&lt;FONT face=Arial&gt;jsampsonPC&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt;) made me grin:&lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;EM&gt;&lt;FONT face=Arial&gt;"﻿I'm an outsider to Microsofts guts, but it seems like every smart programmer on the MS team has a foreign accent &lt;IMG src="http://channel9.msdn.com/emoticons/emotion-1.gif" border=0&gt; Perhaps channel nine should start each show with, "Who are who, what's your position, and what country do you hail from, mate?"&amp;nbsp; It would be an interesting stat to the [ratio] of foreign/local employees at MS &lt;IMG src="http://channel9.msdn.com/emoticons/emotion-1.gif" border=0&gt;"&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;I like the suggestion.&amp;nbsp; The development teams I've been a part of are very &lt;/FONT&gt;&lt;A href="http://encarta.msn.com/dictionary_/cosmopolitan.html"&gt;&lt;FONT face=Arial&gt;cosmopolitan&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt;. &amp;nbsp;It's one of the things&amp;nbsp;I find enjoyable and enriching about working here.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=650956" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>Updating speech samples for beta 2</title><link>http://blogs.msdn.com/robertbrown/archive/2006/06/22/updating-speech-samples-for-beta-2.aspx</link><pubDate>Thu, 22 Jun 2006 20:31:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:643066</guid><dc:creator>RobertBrown</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/643066.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=643066</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=643066</wfw:comment><description>&lt;P&gt;&lt;FONT face=Arial&gt;When beta 1 shipped, we published &lt;/FONT&gt;&lt;A href="http://msdn.microsoft.com/msdnmag/issues/06/01/speechinWindowsVista/"&gt;&lt;FONT face=Arial&gt;this article&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt;&amp;nbsp;in MSDN Magazine. &lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;We've made some minor tweaks to the API for beta 2, and I figure what better way to illustrate them than to walk through the samples in that article and update them for beta 2.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;First, the install.&amp;nbsp; I installed two pieces of software:&lt;/FONT&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;FONT face=Arial&gt;Vista beta 2.&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face=Arial&gt;Visual C# Express Edition&lt;/FONT&gt;&lt;/LI&gt;&lt;/OL&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;I didn't need to install WinFX (it's part of Vista and installed by default in beta 2).&amp;nbsp; I didn't need to install the SDK.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;One of the most frequently asked questions in beta 1 was "how on Earth do I add a reference to System.Speech?".&amp;nbsp; The answer then was to first install the SDK, then add a reference to "Speech", not "System.Speech".&amp;nbsp; Ack!&amp;nbsp; Sorry about that.&amp;nbsp; It now works as it should.&amp;nbsp; I just went into the "Add Reference" dialog in VC# and double clicked on System.Speech.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;Today's updated sample is the &lt;/FONT&gt;&lt;A href="http://msdn.microsoft.com/msdnmag/issues/06/01/speechinWindowsVista/default.aspx?fig=true#fig5"&gt;&lt;FONT face=Arial&gt;first sythnesis sample&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt; from the article.&amp;nbsp; The only real change is that we dropped "SpeakText()" and overloaded "Speak()" to take a string.&amp;nbsp; Pretty&amp;nbsp;simple.&lt;/FONT&gt;&lt;/P&gt;&lt;FONT color=#0000ff size=2&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;using&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt; System;&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;&lt;FONT face="Courier New"&gt;using&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt; System.Collections.Generic;&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;&lt;FONT face="Courier New"&gt;using&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt; System.Text;&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;&lt;FONT face="Courier New"&gt;using&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt; System.Speech.Synthesis;&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;&lt;FONT face="Courier New"&gt;namespace&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt; Synth_sample_1&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;{&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color=#0000ff size=2&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;class&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#008080 size=2&gt;Program&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color=#0000ff size=2&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;static&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;void&lt;/FONT&gt;&lt;FONT size=2&gt; Main(&lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;string&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt;[] args)&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{&lt;BR&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color=#008080 size=2&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;SpeechSynthesizer&lt;/FONT&gt;&lt;FONT size=2&gt; synth = &lt;/FONT&gt;&lt;FONT color=#0000ff size=2&gt;new&lt;/FONT&gt;&lt;FONT size=2&gt; &lt;/FONT&gt;&lt;FONT color=#008080 size=2&gt;SpeechSynthesizer&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt;();&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;synth.SelectVoice(&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" color=#800000 size=2&gt;"Microsoft Anna"&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt;);&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;synth.Speak(&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" color=#800000 size=2&gt;"Speak text has been replaced by an overload on speak."&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New"&gt;);&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;}&lt;BR&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;}&lt;/FONT&gt;&lt;/P&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;Well, that's all for today.&amp;nbsp; More soon...&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=643066" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>Recent ZDNet article on speech</title><link>http://blogs.msdn.com/robertbrown/archive/2006/06/15/recent-zdnet-article-on-speech.aspx</link><pubDate>Fri, 16 Jun 2006 00:03:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:632797</guid><dc:creator>RobertBrown</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/632797.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=632797</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=632797</wfw:comment><description>&lt;P&gt;&lt;A href="http://insight.zdnet.co.uk/software/applications/0,39020466,39274587,00.htm"&gt;&lt;FONT face=Arial&gt;http://insight.zdnet.co.uk/software/applications/0,39020466,39274587,00.htm&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;A high-level tour around speech synthesis and recognition.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=632797" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category></item><item><title>MSDN Magazine article on speech in Vista</title><link>http://blogs.msdn.com/robertbrown/archive/2005/11/15/msdn-magazine-article-on-speech-in-vista.aspx</link><pubDate>Tue, 15 Nov 2005 21:56:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:493036</guid><dc:creator>RobertBrown</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/493036.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=493036</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=493036</wfw:comment><description>&lt;P&gt;&lt;FONT face=Arial&gt;Okay, so I haven't been completely idle: &lt;A href="http://msdn.microsoft.com/msdnmag/issues/06/01/speechinWindowsVista/#void"&gt;http://msdn.microsoft.com/msdnmag/issues/06/01/speechinWindowsVista/#void&lt;/A&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;(Thanks to Robert Stumberger, Rob Chambers, and the other folks here at Microsoft who helped put this together).&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=493036" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>Building a speech telephony app</title><link>http://blogs.msdn.com/robertbrown/archive/2005/09/14/building-a-speech-telephony-app.aspx</link><pubDate>Wed, 14 Sep 2005 22:07:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:466385</guid><dc:creator>RobertBrown</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/466385.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=466385</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=466385</wfw:comment><description>&lt;P&gt;&lt;A href="http://channel9.msdn.com/ShowPost.aspx?PostID=111967#111967"&gt;&lt;FONT face=Arial&gt;Bosky wrote&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt;:&lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;&lt;FONT face=Arial&gt;"basically i want my speech server to be able to accept a call, quickly port it say to another 'something' . so is this possible ? i hear MS offices themselves at some places have replaced the phone opperator with spech recognition based speech servers . so how to tackle this&amp;nbsp; situation of :&lt;/FONT&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;EM&gt;&lt;FONT face=Arial&gt;having multiple calls coming in to the speech server concurrently.&lt;/FONT&gt;&lt;/EM&gt; 
&lt;LI&gt;&lt;EM&gt;&lt;FONT face=Arial&gt;what changes would i need to make in program considering that right now i am not using remoting. &lt;/FONT&gt;&lt;/EM&gt;
&lt;LI&gt;&lt;EM&gt;&lt;FONT face=Arial&gt;and what additional hardware apart from the telephony card ?"&lt;/FONT&gt;&lt;/EM&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;You could build this from the ground up with SAPI (or WinFX), but that's a *LOT* of work.&amp;nbsp; Telephony apps need to interface with telephony hardware, implement a lot of call control logic, and do some pretty sophisticated UI built out of question and answer dialogs.&amp;nbsp; The recognition and synthesis APIs we provide are such a small piece of this, that you will still have a lot of work ahead of you.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;My recommendation is to use &lt;/FONT&gt;&lt;A href="http://www.microsoft.com/speech/default.mspx"&gt;&lt;FONT face=Arial&gt;Speech Server&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt;, since it takes care of the telephony stuff for you, and provides&amp;nbsp;a lot of &lt;/FONT&gt;&lt;A href="http://www.microsoft.com/downloads/details.aspx?familyid=1194ED95-7A23-46A0-BBBC-06EF009C053A&amp;amp;displaylang=en"&gt;&lt;FONT face=Arial&gt;tools for building the dialog UI&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial&gt;.&lt;/FONT&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=466385" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>WinFX speech API changes in PDC CTP</title><link>http://blogs.msdn.com/robertbrown/archive/2005/09/14/winfx-speech-api-changes-in-pdc-ctp.aspx</link><pubDate>Wed, 14 Sep 2005 21:28:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:466324</guid><dc:creator>RobertBrown</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/466324.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=466324</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=466324</wfw:comment><description>&lt;P&gt;&lt;A href="https://blogs.msdn.com/robertbrown/archive/2005/09/12/464081.aspx#464363"&gt;Joseph Kilada wrote &lt;/A&gt;"Robert, can you give any hints as to whether the WinFX Speech APIs have changed at all in build 5219 (or whatever the PDC build is) compared to Beta 1?"&lt;/P&gt;
&lt;P&gt;Sometime soon I'll post updates to all the samples I've put in this blog, to make them work with the PDC CTP.&lt;/P&gt;
&lt;P&gt;The major functionality hasn't changed.&amp;nbsp; But we renamed some functions, changed some parameter lists, and rearranged some namespaces, primarily to address scenarios and feedback raised to us since we published the beta 1 release candidate.&amp;nbsp; Thank you to all of you who posted and mailed feedback.&lt;/P&gt;
&lt;P&gt;We also added a new set of classes for simplifying grammar building.&amp;nbsp; We retained the SRGS classes from beta 1.&amp;nbsp; But these are overkill for most grammars.&amp;nbsp; So the GrammarBuilder and its associated classes were added to address mainstream scenarios.&lt;/P&gt;
&lt;P&gt;The idea is that you'll be able to do things like "I'd like a &amp;lt;size&amp;gt; &amp;lt;topping&amp;gt; pizza" very easily:&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;&lt;FONT color=#0000ff&gt;Dim&lt;/FONT&gt;&amp;nbsp;pizzaBuilder &lt;FONT color=#0000ff&gt;As&lt;/FONT&gt; &lt;FONT color=#0000ff&gt;New&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt; GrammarBuilder&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;pizzaBuilder.AppendPhrase(&lt;/FONT&gt;&lt;FONT face="Courier New" color=#800000&gt;"I'd like a"&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;pizzaBuilder.AppendChoices(&lt;FONT color=#0000ff&gt;New&lt;/FONT&gt; Choices(&lt;FONT color=#800000&gt;"small"&lt;/FONT&gt;, &lt;FONT color=#800000&gt;"regular", "large"&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;)&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;pizzaBuilder.AppendChoices(&lt;FONT color=#0000ff&gt;New&lt;/FONT&gt; Choices(&lt;FONT color=#800000&gt;"pepperoni"&lt;/FONT&gt;, &lt;FONT color=#800000&gt;"cheese"&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;)&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;pizzaBuilder.AppendPhrase(&lt;FONT color=#800000&gt;"pizza"&lt;/FONT&gt;&lt;/FONT&gt;&lt;FONT face="Courier New"&gt;)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New"&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT color=#008000&gt;&lt;FONT face="Courier New"&gt;'load it into the recognizer&lt;/FONT&gt;&lt;/P&gt;&lt;/FONT&gt;&lt;FONT size=2&gt;
&lt;P&gt;&lt;FONT face="Courier New" size=3&gt;_reco.LoadGrammar(&lt;FONT color=#0000ff&gt;New&lt;/FONT&gt;&lt;FONT size=2&gt;&lt;FONT face="Courier New" size=3&gt; Grammar(pizzaBuilder)&lt;/FONT&gt;&lt;/FONT&gt;)&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;/FONT&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=466324" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>Introducing SAPI 5.3</title><link>http://blogs.msdn.com/robertbrown/archive/2005/08/24/introducing-sapi-5-3.aspx</link><pubDate>Thu, 25 Aug 2005 02:32:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:455919</guid><dc:creator>RobertBrown</dc:creator><slash:comments>3</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/455919.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=455919</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=455919</wfw:comment><description>&lt;P&gt;&lt;FONT face=Arial&gt;Vista (a.k.a. Longhorn)&amp;nbsp;has a new version of SAPI: 5.3.&lt;/FONT&gt;&lt;/P&gt;&lt;FONT face=Arial&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;SAPI 5.3 is an incremental update to SAPI 5.1.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;The core mission and architecture are unchanged across all 5.x releases.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; Among a variety of tweaks, &lt;/SPAN&gt;SAPI 5.3 has these overall improvements:&lt;/FONT&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;FONT face=Arial&gt;Support for W3C grammar and TTS formats (SRGS and SSML, as well as support for one of the SML drafts)&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face=Arial&gt;Support for semantic interpretation script within grammars.&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face=Arial&gt;Performance improvements.&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face=Arial&gt;An overall scrub for quality and security.&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/FONT&gt;
&lt;P&gt;&lt;FONT face=Arial&gt;The Vista beta 1 docs have an early draft of the documentation:&amp;nbsp; &lt;/FONT&gt;&lt;A href="http://windowssdk.msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI53/html/Welcome.asp"&gt;&lt;FONT face=Arial&gt;http://windowssdk.msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI53/html/Welcome.asp&lt;/FONT&gt;&lt;/A&gt;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=455919" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>Speech recognition of interviews &amp;amp; videos</title><link>http://blogs.msdn.com/robertbrown/archive/2005/07/14/speech-recognition-of-interviews-amp-videos.aspx</link><pubDate>Thu, 14 Jul 2005 19:52:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:438842</guid><dc:creator>RobertBrown</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/438842.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=438842</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=438842</wfw:comment><description>&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Jonathan Tregear&amp;nbsp;recently posted some &lt;/FONT&gt;&lt;a href="http://blogs.msdn.com/robertbrown/archive/2005/05/29/422995.aspx#comments"&gt;&lt;FONT face=Arial color=#000000&gt;comments/questions on speech recognition of interviews&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial color=#000000&gt; (in response to a brief discussion I had with Scoble in my &lt;/FONT&gt;&lt;A href="http://channel9.msdn.com/ShowPost.aspx?PostID=72163"&gt;&lt;FONT face=Arial color=#000000&gt;Channel 9 interview&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial color=#000000&gt; a couple of months back).&amp;nbsp; I looped in Frank Seide from &lt;/FONT&gt;&lt;A href="http://www.research.microsoft.com/speech/"&gt;&lt;FONT face=Arial color=#000000&gt;Microsoft Research Asia&lt;/FONT&gt;&lt;/A&gt;&lt;FONT face=Arial color=#000000&gt;&amp;nbsp;to answer these, since he's done a lot of work in this field.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Jonathan: &lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Late in your interview with Robert Scoble, he asked you about the possibility of using speech rocognition to produce transcripts of his interviews for example. Your answer was that the results he would get would not be very good unless the the speech engine was trained to each of the speaker's voices. &lt;/FONT&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Frank:&lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;This is true according to our experience. In interviews, a number of problems come together. The first is spontaneous speech style -- speech is irregular with interspersed self-corrections, stutters, uhms, etc..&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;This can put off a recognizer. Second, recognizers always have a predefined vocabulary. For generic interviews, topics can be very broad and important words like names and special terminology might not be included in the vocabulary.&lt;/FONT&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Jonathan:&lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;I've heard about ASR engines produced by companies like Autonomy/Virage&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;that claim to be able to do a decent job of speaker independent and unconstrained domain voice recognition for similar uses like indexing and and searching newscasts etc. Do you have experience with or an opinion about how good those engines are? &lt;/FONT&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Frank:&lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;If the transcript should be searchable but not necessarily readable, lower accuracies will be sufficient. Errors happen overproportionally in short words like "is" that are needed for readability but you don't care about for searching. The downside is, however, that for searching, the topic-specific words like names and terminology are often the ones you are interested in. If your vocabulary contains them, you are fine, otherwise it will not work. Then you will need to resort to phonetic search techniques.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;I do not know how good the mentioned engines are. State-of-the-art research systems achieve ~90% accuracy for broadcast news transcription.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;News is, however, a rather narrow domain, and a broadcast news recognizer will not work well for say entertainment, sports, or talk shows.&lt;/FONT&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Jonathan: &lt;/FONT&gt;&lt;/P&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Related to this is another question I've been wondering about: Suppose you pointed an engine at a video like the interview example above, but instead of using it to produce a transcript of the interview you were only interested in finding instances of a well defined list of keywords.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;This would be useful in indexing and searching libraries of audio content also. Would that be an easier problem to solve for speaker independent (i.e. untrained) speech recognition? &lt;/FONT&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;Frank:&lt;/FONT&gt;&lt;/P&gt;&lt;FONT face=Arial color=#000000&gt;
&lt;BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px"&gt;
&lt;P&gt;You need to distinguish between frequent (common) and (uncommon) infrequent keywords. Frequent words are often rather short, and recognizers can only reliably recognize them considering also the words surrounding them. For those, a keyword spotter as you describe would not work well. Infrequent words, on the other hand, are also often longer, and surrounding word context is less helpful as those words are rare and thus reliable statistics is not available anyway. For those, the method can work well.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;/FONT&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=438842" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item><item><title>Speech interview on Channel 9</title><link>http://blogs.msdn.com/robertbrown/archive/2005/05/29/speech-interview-on-channel-9.aspx</link><pubDate>Sun, 29 May 2005 06:21:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:422995</guid><dc:creator>RobertBrown</dc:creator><slash:comments>2</slash:comments><comments>http://blogs.msdn.com/robertbrown/comments/422995.aspx</comments><wfw:commentRss>http://blogs.msdn.com/robertbrown/commentrss.aspx?PostID=422995</wfw:commentRss><wfw:comment>http://blogs.msdn.com/robertbrown/rsscomments.aspx?PostID=422995</wfw:comment><description>&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;&lt;A href="http://channel9.msdn.com/"&gt;Channel 9&lt;/A&gt; just posted an &lt;A href="http://channel9.msdn.com/ShowPost.aspx?PostID=72163"&gt;interview with me&lt;/A&gt;.&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face=Arial color=#000000&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;FONT face=Arial color=#000000&gt;&lt;FONT size=3&gt;Some links to stuff I talk about:&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;FONT face=Arial color=#000000&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;FONT face=Arial color=#000000&gt;&lt;FONT size=3&gt;The new speech&amp;nbsp;API I &lt;a href="https://blogs.msdn.com:443/robertbrown/archive/2005/05/23/421085.aspx"&gt;posted about a few days ago&lt;/A&gt;&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;FONT face=Arial color=#000000&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;FONT face=Arial color=#000000&gt;&lt;FONT size=3&gt;The app I demo when I dial 0 is running on &lt;/FONT&gt;&lt;A href="http://www.microsoft.com/speech/default.mspx"&gt;&lt;FONT size=3&gt;Speech Server&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3&gt;, and the &lt;/FONT&gt;&lt;A href="http://www.microsoft.com/resources/casestudies/CaseStudy.asp?CaseStudyID=16039"&gt;&lt;FONT size=3&gt;case study I mention is here&lt;/FONT&gt;&lt;/A&gt;&lt;FONT size=3&gt;.&lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;You may also be interested in some of our research web pages, since I mention them at one stage in the interview: &lt;/FONT&gt;&lt;SPAN&gt;&lt;FONT face=Arial color=#000000&gt;&lt;A href="https://research.microsoft.com/speech/tts/"&gt;Synthesis research&lt;/A&gt;; &lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;FONT color=#000000&gt;&lt;FONT face=Arial&gt;&lt;A href="http://research.microsoft.com/srg/"&gt;Speech research in Redmond&lt;/A&gt;; &lt;/FONT&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;FONT face=Arial color=#000000 size=3&gt;&lt;A href="http://www.research.microsoft.com/speech/"&gt;Speech research in Asia&lt;/A&gt;.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt;&lt;FONT face=Arial color=#000000&gt;I also mentioned the &lt;A href="http://www.microsoft.com/windowsmobile/downloads/voicecommand/default.mspx"&gt;Voice Command&lt;/A&gt; app for Windows Mobile.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;FONT face=Arial&gt;&lt;/FONT&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=422995" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech/default.aspx">Speech</category><category domain="http://blogs.msdn.com/robertbrown/archive/tags/Speech+API/default.aspx">Speech API</category></item></channel></rss>