OK, so it's been awhile since I last posted.  Since I last posted we have welcomed my son Nathan into the world. Needless to say between two kids under 18 months it has been quite difficult for my wife and I in terms of sleep and duress.

I thought I would discuss an interesting acronym called DTMF in today's post.  DTMF stands for Dual-Tone Multi-Frequency and is what allows the touch tones on your phone to be understood by a speech application.  All of us have dialed a credit card company, bank, or store and were asked to enter a number using the phone's keypad.  But how does the application understand these tones, or is it just magic?

The application understands these tones because all phones (at least in the US) follow a standard called DTMF.  The basics are quite simple.  Each tone consists of both a high frequency and a low frequency tone (hence the name "dual-tone multi-frequency").  The high frequencies are 1209 Hz, 1336 Hz, and 1477 Hz.  The low frequencies are 697 Hz, 770 Hz, 852 Hz, and 941 Hz.  Why are two tones necessary instead of one?  As it is impossible for humans to speak with a high and low tone at the same time, it is relatively easy to differentiate the dial tones from human speech.  If we didn't use two tones the applications today which let you touch tone or speak the numbers would be impossible.

If we multiply the three high tones by the four low tones, we can create twelve different keys.  These correspond to the 0-9, *, and # keys.  There also exists another high frequency of 1633 Hz that is used for the A-D keys which do not exist on many phones.

So now we know how the keys are encoded, but how does a speech application decode these keys?  A handy algorithm called Goertzel's algorithm is often used to accomplish this.  This algorithm is quite complex and details about it can be found through a search of the web.

If you would prefer not thinking about math, the TAPI API contains methods that will help you decode DTMF.  The easiest way to decode DTMF is simply to use Microsoft Speech Server 2004 and the corresponding SDK.  Using the SDK and Speech Server, you can create DTMF grammars to recognize DTMF or spoken numbers and translate them to a form your application will understand.