Allow me to introduce myself.  I'm a relative newcomer to the speech space and to the Microsoft Speech Server team. I was fortunate to land a Program Manager job with the team in August of last year. I have spent the time since swimming in a sea of technology acronyms and learning lots (with help from my patient team-mates). I'm responsible for the area of Speech Server known as the "Speech Engine Services" layer, which handles loading of prompts and grammars and passes audio to the Speech Recognition engines and requests for text to speech to the Speech Synthesis engines. Prior to joining the MSS team I worked on System Center Data Protection Manager, and before that Commerce Server.

 

I thought I'd start this blog off with some of the terms I've encountered, with brief definitions - which may be useful for anyone else who is also new to this area.

 

SSML - Speech Synthesis Markup Language - a W3C recommendation that defines an XML schema for defining how text should be converted to speech by a speech synthesizer.
SRGS - Speech Recognition Grammar Specification - another W3C recommendation
SALT - Speech Application Language Tags - defines XML elements that can be incorporated into web pages to speech enable a web application
VoiceXML - XML schema for describing speech applications.
SAPI - Speech API built into Windows. Provides both speech synthesis and speech recognition functionality.
VUI - Voice User Interface
TTS - Text To Speech
IVR - Interactive Voice Response (i.e. a speech recognition enabled telephony system)
UPL - User Peceived Latency - time taken between a user finishing speaking and the system providing a response
VOIP - Voice Over IP - a set of protocols used for doing voice and telephony over the internet.
RTP - Real-Time Transport Protocol - used as part of VOIP for the actual transmission of voice audio streams.
SIP - Session Initiation Protocol - this is the protocol used by VOIP for establishing and managing sessions between endpoints.

SDP - Session Description Protocol - used by SIP for, well, describing sessions!
CTI - Computer Telephony Integration - technology for integrating telephony systems (switches etc) with computers. Used in call centers to allow operators to see information about a caller for example.
DTMF - Dual Tone Multiple Frequency - the standard tones that are sent when you push buttons on your phone.
CPA - Call Progress Analysis - Analysis that is done, e.g. by gateways, to figure out whether an outbound call has received a busy signal, a number-out-of-service tone, an answering machine or a live person etc.

 

And then there's a host of Speech Server specific ones, such as:
TIS - Telephony Interface Service
TAS - Telephony Application Service
TIM - Telephony Interface Module
SES - Speech Engine Services

 

I wish I could talk a bit about some of the stuff we're working on for the next release of Speech Server but its still under wraps at this point - suffice to say there's some exciting stuff coming.

 

By the way, I'm planning to attend SpeechTek West next week in San Francisco. I look forward to meeting other speech-minded people there and perhaps learning a few more acronyms!