Slashdot has a discussion for how open source speech corpora can help in the development of new SR engines.  Here's the excerpt:

VoxForge collects free GPL Transcribed Speech Audio that can be used in the creation of Acoustic Models for use with Open Source Speech Recognition Engines. We are essentially creating a user-submitted repository of the 'source' speech audio for the creation of Acoustic Models to be used by Speech Recognition Engines. The Speech Audio files will then be 'compiled' into Acoustic Models for use with Open Source Speech Recognition engines such as Sphinx, HTK, CAVS and Julius."

I've argued before that some kind of freely-available corpus of audio data would be a wonderful thing -- and I would want Microsoft to aggressively participate.  I would need to look more closely at VoxForge licensing policies, but if it's based on GPL, I wonder how they expect any commercial company to turn it into products.  For example, say you're a car company making a speech-controlled navigation system.  If you use this audio data to build your system, does that mean your nav system (i.e. your crown jewels) is open source now too?

More importantly, I wonder what Spykdog thinks?