As I posted yesterday, I got a chance to check out the machine that Shanen used for the Financial Analysts Meeting demo. I confirmed that it was just what I suspected: An audio gain issue.

If you watch the video clip on MSN Video you can see in the speech user interface that the microphone "volume" is very high. It pushes up into the red frequently while Shanen is speaking to the computer. That's caused by the fact that the audio sub-system wasn't respecting the audio gain settings we've asked it to use.

This is a known bug in current builds, and has already been fixed by the audio team in their private builds in preparation for RTM.

A little more on audio gain ...

Have you ever heard a car drive by that had the stereo blasting away, and the audio sounded absolutely horrible? In simple terms, this is caused by the system being set up incorrectly, and the system is experiencing a problem called clipping. 

Microphones and sound cards can have similar problems trying to convert the analog signal from the microphone element into a digital signal for use by software on the PC (for example: speech recognition software). That's why it's important to have the audio gain set correctly for the microphone and/or sound card that you're using. That's the whole point to the having the user run through our "Microphone Setup Wizard". That piece of Windows Speech Recognition takes great care to analyze the sounds of your voice to properly set the audio input gain on the mic / sound card to eliminate clipping.

The problem in this demo was simply a matter of the audio sub-system not respecting that audio gain "request" that WSR made. So effectively, all the audio data that was being received by WSR was being clipped, and thus was incredibly distorted.

Here's what Wikipedia says about clipping in digital signal processing:

In digital signal processing, clipping occurs when the signal is restricted by the range of a chosen representation. For example in a system using 16-bit signed integers, 32767 is the largest positive value that can be represented, and if during processing the amplitude of the signal is doubled, sample values of 32000 should become 64000, but instead they are truncated to the maximum, 32767. Clipping is preferable to the alternative in digital systems — wrapping occurs if the digital hardware is allowed to "overflow", ignoring the most significant bits of the magnitude, and sometimes even the sign of the sample value, resulting in terrible clipping distortion of the signal.

Why didn't we catch that before Shanen went on stage?

That's a good question. The reality of the situation is that Shanen and the demo setup team were aware of these issues, and great care was taken to try and eliminate the possibility of this gain setting being a problem.

Shanen practiced the demo a few times both off-stage and then again on-stage just prior to FAM starting. The whole demo was working perfectly several times.

Unfortunately, the nature of this specific audio sub-system bug is that it's intermittent. It worked great every single time. Right up until that one live demonstration -- the one that counted. ;-)

It's too bad that it didn't go more smoothly. The analysts would have been very happy with WSR's performance had they seen it working the way it normally works. Rest assured that we have the issue under control here in Redmond, and when Vista ships later this year, this audio gain issue will be a thing of the past.

There'll be more public demonstrations of WSR coming up in the near future. Then, we can finally show the world just how amazing Windows Speech Recognition really is!