I just tried out a very cool system from http://www.dictomail.com/, using an interactive demo hosted by a company called Admiral Online. Their web site has an interactive demo that tells you to call a phone number, leave any message, and hang up. Here is the way it transcribed my message:
ANI:4255551212 HI JENNIFER, THIS IS RICHARD SPRAGUE. I'M JUST CALLING TO ASK YOU A COUPLE QUESTIONS ABOUT THE MEETING AND TO TELL YOU THAT THE RAIN IN SPAIN FALLS MAINLY ON THE PLAIN. YOU CAN REACH ME AGAIN AT 2136782217. I LOOK FORWARD TO HEARING FROM YOU, BYE., -- Your voice message was translated by DictoMail. Receive messages on your cellphone and desktop. Call 818-206-0775 to order service now.
The transcription was perfect, in spite of the following ways I tried to fool it:
  • "the rain in spain" thing, spoken quickly and with blurred speech.
  • the return phone number (which I made up) spoken with lots of 'ums' and repeated digits. I actually said "two one three, um, six seven eight, ah, two--um--two two one seven."
  • "I look forward..." spoken as quickly as I could
. Obvious questions I'd like to know: where did they get their ASR engine? Why didn't they use Microsoft's engine? and how did they make such a cool demo?