<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/atom.xsl" media="screen"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-US"><title type="html">Craig's List (of Speech Server miscellany)</title><subtitle type="html" /><id>http://blogs.msdn.com/craigfisher/atom.xml</id><link rel="alternate" type="text/html" href="http://blogs.msdn.com/craigfisher/default.aspx" /><link rel="self" type="application/atom+xml" href="http://blogs.msdn.com/craigfisher/atom.xml" /><generator uri="http://communityserver.org" version="2.1.61025.2">Community Server</generator><updated>2006-01-22T18:58:00Z</updated><entry><title>Daewoo's voice-recognizing microwave</title><link rel="alternate" type="text/html" href="http://blogs.msdn.com/craigfisher/archive/2006/09/27/774953.aspx" /><id>http://blogs.msdn.com/craigfisher/archive/2006/09/27/774953.aspx</id><published>2006-09-28T05:47:48Z</published><updated>2006-09-28T05:47:48Z</updated><content type="html">&lt;p&gt;Seems like speech recognition is turning up everywhere these days. Now&amp;nbsp;you can talk to your microwave. What's next!? What devices do you wish you could talk to?&amp;nbsp;&lt;/p&gt; &lt;p&gt;&lt;a href="http://www.engadget.com/2006/09/27/daewoos-voice-recognizing-microwave/"&gt;Link to Daewoo's voice-recognizing microwave - Engadget&lt;/a&gt;&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=774953" width="1" height="1"&gt;</content><author><name>craigfis</name><uri>http://blogs.msdn.com/members/craigfis.aspx</uri></author></entry><entry><title>Speech as a new computer-to-computer protocol?</title><link rel="alternate" type="text/html" href="http://blogs.msdn.com/craigfisher/archive/2006/02/15/532742.aspx" /><id>http://blogs.msdn.com/craigfisher/archive/2006/02/15/532742.aspx</id><published>2006-02-15T20:18:00Z</published><updated>2006-02-15T20:18:00Z</updated><content type="html">&lt;P&gt;We have this system - Microsoft Speech Server -&amp;nbsp;that is able to make (or receive) phone calls. Usually we expect that the other end of the line is going to be a person answering our call - or that the caller is a person. Sometimes though when an outbound call is made it isn't actually a person that has answered. This is a problem I've been working on lately, but that aside it got me thinking... when it is a computer that answered - such as a voice mail system, we effectively have two computers talking to each other using human (synthesized) speech. Kind of a weird protocol for two computers to use to talk to each other huh! I wonder if that will actually become more common one day (e.g. 20-30 years out?) - for the lingua franca for computers to communicate with each other to become speech. In that future possible world computers wouldn't actually distinguish between whether they were communicating with a person or a computer. I guess by then computers (or the software driving them) would be sophisticated enough in many cases that even we wouldn't be able to tell whether we were communicating with a human or a computer if we weren't in the physical presence of the other party. I believe that solutions for the Turing test will become increasingly common and there will be a sort of arms race for people to think up ever more sophistocated forms of the Turing test and more sophiticated solutions to it.&lt;/P&gt;
&lt;P&gt;The flaw in my imagining here though is that speech is such an inefficient and error prone communication mechanism. It is full of ambiguity, and contextual sensitivity. People have to go back and forth to confirm that what they thought they heard was what was intended for example. Or they fail to confirm and the mistake is only discovered&amp;nbsp; later down the line. Like when the passenger gets off the plane in Melbourne, Florida, rather than Melbourne, Australia! That's why schemas are so important in the world of compter information exchange - to remove ambiguity and inconsistencies. Maybe we need to schematize human speech so we can communicate as efficiently as computers? Of course it would then be culture neutal also so we could understand each other around the world. No I don't seriously think that will&amp;nbsp; ever happen - so much of our communication is tied to our common cultural experiences.&lt;/P&gt;
&lt;P&gt;OK, enough of my random rambling for now. (Hmmm, how long before computers will be able to post original random rambling blogs?) Hey did you wonder if maybe this was authored by a computer? Huh?&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=532742" width="1" height="1"&gt;</content><author><name>craigfis</name><uri>http://blogs.msdn.com/members/craigfis.aspx</uri></author></entry><entry><title>I'm new here</title><link rel="alternate" type="text/html" href="http://blogs.msdn.com/craigfisher/archive/2006/01/22/516069.aspx" /><id>http://blogs.msdn.com/craigfisher/archive/2006/01/22/516069.aspx</id><published>2006-01-23T02:58:00Z</published><updated>2006-01-23T02:58:00Z</updated><content type="html">&lt;DIV&gt;&lt;FONT face=Arial size=2&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;Allow me to introduce myself.&lt;SPAN style="mso-spacerun: yes"&gt;&amp;nbsp; &lt;/SPAN&gt;I'm a relative newcomer to the speech space and to the Microsoft Speech Server team. I was fortunate to land a Program Manager job with the team in August of last year. I have spent the time since swimming in a sea of technology acronyms and learning lots (with help from my patient team-mates). I'm responsible for the area of Speech Server known as the "Speech Engine Services" layer, which handles loading of prompts and grammars and passes audio to the Speech Recognition engines and requests for text to speech to the Speech Synthesis engines. Prior to joining the MSS team I worked on System Center Data Protection Manager, and before that Commerce Server.&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;I thought I'd start this blog off with some of the terms I've encountered, with brief definitions&amp;nbsp;- which may be useful for anyone else who is also new to this area.&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;SSML - &lt;A href="http://www.w3.org/TR/speech-synthesis/"&gt;Speech Synthesis Markup Language&lt;/A&gt; - a W3C recommendation that defines an XML schema for defining how text should be converted to speech by a speech synthesizer.&lt;BR&gt;SRGS - &lt;A title=http://www.w3.org/TR/speech-grammar/ href="http://www.w3.org/TR/speech-grammar/"&gt;Speech Recognition Grammar Specification&lt;/A&gt; - another W3C recommendation&lt;BR&gt;SALT - &lt;A title=http://www.saltforum.org/ href="http://www.saltforum.org/"&gt;Speech Application Language Tags&lt;/A&gt; - defines XML elements that can be incorporated into web pages to speech enable a web application&lt;BR&gt;&lt;A title=http://www.voicexml.org/ href="http://www.voicexml.org/"&gt;VoiceXML&lt;/A&gt; - XML schema for describing speech applications.&lt;BR&gt;SAPI - Speech API built into Windows. Provides both speech synthesis and speech recognition functionality.&lt;BR&gt;VUI - Voice User Interface&lt;BR&gt;TTS - Text To Speech&lt;BR&gt;IVR - Interactive Voice Response (i.e. a speech recognition enabled telephony system)&lt;BR&gt;UPL - User Peceived Latency - time taken between a user finishing speaking and the system providing a response&lt;BR&gt;VOIP - &lt;A title=http://www.voip-info.org/wiki/ href="http://www.voip-info.org/wiki/"&gt;Voice Over IP&lt;/A&gt; - a set of protocols used for doing voice and telephony over the internet.&lt;BR&gt;RTP - Real-Time Transport Protocol - used as part of VOIP for the actual transmission of voice audio streams.&lt;BR&gt;SIP - Session Initiation Protocol - this is the protocol used by VOIP for establishing and managing sessions between endpoints.&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;SDP - Session Description Protocol - used by SIP for, well, describing sessions!&lt;BR&gt;CTI - Computer Telephony Integration - technology for integrating telephony systems (switches etc) with computers. Used in call centers to allow operators to see information about a caller for example.&lt;BR&gt;DTMF - Dual Tone Multiple Frequency - the standard tones that are sent when you push buttons on your phone.&lt;BR&gt;CPA - Call Progress Analysis - Analysis that is done, e.g. by gateways, to figure out whether an outbound call has received a busy signal, a number-out-of-service tone, an answering machine or a live person etc.&lt;BR&gt;&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;And then there's a host of Speech Server specific ones, such as:&lt;BR&gt;TIS - Telephony Interface Service&lt;BR&gt;TAS - Telephony Application Service&lt;BR&gt;TIM - Telephony Interface Module&lt;BR&gt;SES - Speech Engine Services&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;I wish I could talk a bit about some of the stuff we're working on for the next release of Speech Server but its still under wraps at this point - suffice to say there's some exciting stuff coming.&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P style="FONT-SIZE: 11pt; MARGIN: 0in; FONT-FAMILY: Calibri; mso-outline-level: 1"&gt;By the way, I'm planning to attend SpeechTek West next week in San Francisco. I look forward to meeting other speech-minded people there and perhaps learning a few more acronyms!&lt;/FONT&gt;&lt;/P&gt;&lt;/DIV&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=516069" width="1" height="1"&gt;</content><author><name>craigfis</name><uri>http://blogs.msdn.com/members/craigfis.aspx</uri></author></entry></feed>