Cool demos for the next SAPI

We need some sample apps to include with the next version of SAPI, and today we had another brainstorming session to think up some good ideas.  Here are some of mine:
 
ESL: something for English as a Second Language students.  Transcribe the person's speech, then tell them (based on our SR engine's confidence scoring information) how closely what they said resembles what our models expected.
 
Educational: lets kids (or others who don't like keyboards) and want to learn math or something. Speak a phrase like "Seven times four" and it shows the result (28).
 
Audio indexing: give me a WAV or WMA file and a keyword.  I tell you the first place in the file where that keyword was spoken.  Lets you find where somebody said the word "Grandma" in a home video.
 
Foreign language phrasebook:  somebody says a common phrase in Japanese, we recognize it and reply with an English translation.
 
What are some other ideas? 
Published 14 October 04 08:42 by sprague

Comments

# Scott said on October 14, 2004 3:11 PM:
I like the audio indexing idea. Hope that makes it in.
# AT said on October 14, 2004 3:20 PM:
I do know is this exists:

Connect SAPI system to phone, record logs of all messages, convert to text and send via email.
And reverse - recieve email, dial a phone-number and tell message to somebody.

This is probably trivial to implement feature - but will be both recognizion and talk parts of SAPI. As well - it will be easy to use this app as starting project to all developers.

Feel free to ignore my posting if you will consider it lame ;-)
# Ray said on October 14, 2004 3:25 PM:
Google! the google deskbar is a pretty nice tool but imagine being able to call "google [insert random keywords here]" and it opens the search results for those keywords, or 'google image boat' and it would use google images searching for pictures of a boat
# Brian Groth said on October 14, 2004 10:39 PM:
I would like to see real-time WMA to text (Closed Captioning) so when I listen to any WMA/WMV file, I can see the text scrolling across the bottom of the video or audio window.
# casey chesnut said on October 15, 2004 9:08 PM:
create a tool like FestVox to create a voice model of my own voice. basically i would read certain sentences, and then the phenomes / diphones would be chopped up out of them. that would be compiled into some sort of voice database to be used for speech synthesis. then when i posted a text blog, TTS would be done using my voice model, and it would create a media file for a podcast
New Comments to this post are disabled
Page view tracker