Keyword spotting

Lots of buzz about using SR technology to pick up keywords in an audio stream.  See Robert Scoble's  demo of Nexidia or this post by Eduardo Olvera. The basic idea has been around for years, especially if you believe the NSA has been listening in on phone conversations.  Even back in the Cold War, there was a demand for systems that could listen to a zillion Russian-language phone calls and alert a human analyst every time somebody said a magic word.  It's also not that hard, and I like it because it's an example of an application that still works even though the technology isn't perfect.

What's hard is figuring out what the real topic is at a given point in the audio.  If I say "camcorder" in the middle of a conversation, is it relevant to throw up an ad for a camcorder?  Probably yes if the conversation is actually about camcorders.  But what if I'm talking about something completely different and the word or phrase "camcorder" is an incidental side comment of no relevance to the topic at hand? 

Look at this posting, for example.  What's the most relevant word?  It's in the title, but nowhere else.  You know what I'm talking about because I mentioned it at the very beginning but after that I refer to "it" or various vaguely-relevant synonyms. 

Published 25 April 07 07:14 by sprague

Comments

# Drew Lanham said on April 27, 2007 11:04 AM:

I can agree a single word may not necessarily be representative of the content of a discussion. In our experience, however, the spoken word -- when looked at holistically -- can be an excellent indicator of content and context. To be clear, Nexidia is not proposing using a single word as an indictor of the appropriate ads to run.  In the video example, our technology has categorized the content (without the benefit of tags or metadata), determined the frequency of the word and the relationship to other words in time (e.g. camcorder within 10 seconds of video and or home movie).  Once content and context have been refined, this information is then passed to the ad server in order to be combined with other information (demographics, geography, search history, etc..) to allow the ad server to serve the highest value advertisement to the user at that point in time. The timing of the word was to illustrate accuracy but also that we could control timing of the ad to correspond with the content to increase frequency of the ad or ads.  The timing and frequency of the ad is important because it isn’t intrusive to the experience (e.g. frontroll ad), so there is no harm to the user in it being changed throughout playback.

Nexidia is not just word spotting, we are rendering the spoken word content fully searchable in over 33 languages.  Our technology is used to render tens of thousands of hours of audio content searchable every day in call centers, legal, media and government applications.  We add an analytics layer on this capability to extract actionable knowledge from this otherwise unusable volume of unstructured data.  Using a single, dual core dual processor box, we index audio or video searchable at 340  faster than real time.  This equates to 8,000 hours of content per day per box. As a result of this kind of efficiency, the cost of the infrastructure is not a barrier to getting started experimenting in the promising spaces of media search, categorization and contextual ad targeting in audio and video.  

New Comments to this post are disabled

Search

This Blog

Syndication

Page view tracker