Real-time transcription

Alec Saunders wonders about Jeff Pulver's challenge to the internet communications community: why isn't there a good real-time captions application?

Answer: cuz it's super-hard (as I've discussed before).  The legal and medical transcription businesses are huge, multi-billion dollar industries and if this were easy they'd have already adopted it.  [side note: when you see those stickers on roadsides advertising you can make $300/day working at home, you know what that work is?  voice transcription for medical or legal companies]

Incidentally, even some subsets of the problem are tough.  For example, wouldn't it be nice if you could get an alert every time a live meeting mentions your name (or a keyword of interest to you)?   That's doable, if you're willing to tolerate a certain number of false alarms.  But the Holy Grail of where it can act just like a human is far harder than it looks.

Part of the problem is that when people say they want a transcription, they often mean they want a summary. Real-time speech is full of "um's" and "ah's" and repetitions that a human transcriber would ignore. 

That said, I agree with Alec that it would be wonderful if this problem can be solved.  Fortunately here at Microsoft we have plenty of very smart people thinking about it, and along the way we come up with some pretty nice solutions to other practical problems, like the concept recognition feature that's proving to be such a hit with RP.

Published 31 January 08 01:47 by sprague

Comments

No Comments
New Comments to this post are disabled

Search

This Blog

Syndication

Page view tracker