How to transcribe any audio on your PC
Chris Pratley asked me if there is an easy way to transcribe the audio in a OneNote recording. Well there is, though it depends a little on your computer's audio configuration, and you won't necessarily --okay, you probably won't -- get great recognition results. Here's what works on my Toshiba M200 TabletPC:
First, open your speech control panel
Click the audio input button and you'll see this:
It's probably set by default to "Use preferred audio input device", so you'll have to change that to whatever your machine's built-in sound card is. On my Tablet, that's SoundMAX Digital Audio. Then select Properties
The default is to use automatically chosen line, so change that to "Wave Out Mix". This is kind of a hack, because as I understand it, Wave Out Mix is the virtual switchboard inside your computer's audio system that controls how the different audio sources are combined. When you tell it that the input is the combination of input and output, you're setting the computer into an internal loop that makes it only hear itself. In human body terms, It's kind of like unplugging the speaker in your throat and vocal cords, and then plugging the output of the little microphone inside your ears so that your brain thinks your throat is creating all the noises that your ears hear.
I tried this on some of my OneNote recordings and the result was, well, somewhat humorous. Instead of a long meeting brainstorming about Speech Server and TV, my transcription looked like a stream-of-consciousness rambling about ... I'm not sure.
You could do a lot to optmize the scenario and maybe even get it working well. For example, OneNote lets you record in much higher quality (albeit with bigger file sizes). Of course you could also use a high-quality headset microphone. I bet that would work pretty well right there, especially if you're a mainstream unaccented American English speaker, like the ones who provided most of our acoustic model training data. Finally, if there's a particular speaker or audio situation you'll be using a lot, you could also train the system that way and make it be one of your speaker profiles.
I should conclude by pointing out that the above hack is just the beginning if you're serious about transcribing a bunch of audio data. It's pretty easy to write your own tool using the SAPI 5.1 SDK that will take an audio file and turn it into text. We even ship one tool in the SDK called "WavetoText.exe". Unfortunately you'll have to download the whole 200MB SDK to get it, unless your favorite search engine points you to one that somebody has laying around the Internet someplace. (Our license says it's okay to do that).
Oh, and one more thing: this is all going to be a lot better in Vista, so remind me to write more details about that in a future post.