Spykdog introduced me to an interesting 15-min podcast interview with Mike Cohen, who runs speech recognition development at Google.

  1. The big trend recently in SR is in concept recognition like how-may-I-help-you systems, but these systems are most interesting for the way they cut the number of menus you have to navigate, not in making other tasks more straightforward.
  2. Multimodal will take several more years to take off because the web infrastructure isn't there yet, so meanwhile it's better to focus on those conversational-type systems.
  3. He doesn't have an opinion about whether we'll see more hosted speech systems (the "in-the-cloud" environments like Angel.com) or whether it's better to run your own server.
  4. Speech development is so complicated that it's unclear whether packaged dialog or grammar libraries help much -- you will still need lots of experts for customization.
  5. The best speech hires are computational linguists, because an intuitive feel for language is often more relevant than computer skills.

 We'll see where this goes.  I'm a little surprised Google isn't more interested in predictive AX modeling, but then this was only a short interview.