A number of stories about audio search products have appeared in the mainstream news in the last few days, and it's good to see this application of speech recognition technology gaining some traction. Audio mining will certainly take off as stored resources of audio and multimedia content continue to grow across many industries, including entertainment, contact center and surveillance.
Here's what I've seen in recent days.
BBN's release last week of the Podzinger podcast searching tool gained a positive buzz from the blogging community, although podcast search tools from blinkx.tv and Podscope have been around longer. BBN as a company have a long history of research to draw from and this is presumably another product out of the core audio indexing technology on which their commercial Avoke tool is based.
A new entrant from Germany also got some airtime: Com Vision's AudioClipping product (an unfortunate name - "audio clipping", I'm sure you know, is generally used for the phenomemon of audio recorded at an amplitude too high for the input system, resulting in a 'clipped' signal that is harsh on the ear and fatal to speech recognizers). A breathless article from the Irish Times (via TMCNet) introduces AudioClipping in the context of "the next Google" with the goal of a lucrative indexing product that will render media monitoring services obsolete. And, yes it mentions Star Trek, of course (but wait, the engineers were inspired to develop the technology in order to find episodes containing particular quotations. [Pause for effect.] Say no more.) The Boston Globe (via boston.com) highlighted last week the NICE Systems analytical tool for call centers, which mines data not just for lexical and syntactic patterns but also for prosodic indicators of satisfaction and frustration. This can apparently be used to train an online emotion detection component that can trigger supervisor interventions with angry callers. (Superb! This raises the distinct possibility that Paul English will find an IVR system that requires expletives and/or shouting in order to be get to an operator, and publish the details on his cheat sheet...) Since the author of this article talked to analysts (Datamonitor, this time) the de rigueur sci-fi reference for this article is 2001: A Space Odyssey. There's definitely a theme with this.
Nexidia is also doing interesting things here. I've seen good demos at recent conferences and trade shows, and they seem to have a sharper focus on scale and performance than many. No recent announcements nor consumer products, though. Microsoft Research has undertaken audio search projects with some very exciting results. In particular, the Audio Information Management and Extraction (AIME) project in MSR Asia's Speech Technology Group is chartered to develop an audio search engine for a variety of conversational, voicemail and broadcast settings. Frank Seide has been working on this. One forthcoming application of the technology is the next version of OneNote 12: this will enable audio and video search - see Chris Pratley's OneNote blog for more details. Last year, Frank also shared some answers to general questions about the technology with Robert Brown. Ciprian Chelba is another researcher who has some cool audio search demos and interesting publications.
In Speech Server, we've considered audio search functionality for future versions of our analytical tools. Is it a major priority for customers to use audio mining tools on IVR data gathered from SR systems (beyond the mining they might do on agent/caller interactions)? Are there any other Microsoft platforms that would benefit from audio indexing and search capabilities?