Giving Computers a Voice

Published 31 October 06 05:20 AM | Coding4Fun 
  This article demonstrates how easy it is to enable managed applications to finally talk back using Microsoft's Speech API (SAPI).

Difficulty: Easy
Time Required: 1-3 hours
Cost: Free
Software: Visual Studio Express Editions
Hardware:
Download:

 

Summary

Most of us talk to our computers all the time. You wouldn't believe the things my wife says to her poor machine when something goes wrong. This article demonstrates how easy it is to enable managed applications to finally talk back using Microsoft's Speech API (SAPI). But don't worry: since you control what the application says, you can ensure it uses nicer language than my wife.

SAPI

SAPI is the speech API that gives applications access to speech recognition and text-to-speech (TTS) engines. This article focuses on TTS. For TTS, SAPI takes text as input and uses the TTS engine to output that text as spoken audio. This is the same technology used by the Windows accessibility tool, Narrator. Every version of Windows since XP has shipped with SAPI and an English TTS engine.

TTS puts user's ears to work. It allows applications to send information to the user without requiring the user's eyes or hands. This is a very powerful output option that isn't often utilized on PCs.

Three steps are needed to use TTS in a managed application:

  1. Create an interop DLL

    Since SAPI is a COM component, an interop DLL is needed to use it from a managed app. To create this, open the project in Visual Studio. Select the Project menu and click Add Reference. Select the COM tab, select "Microsoft Speech Object Library" in the list, and click OK. These steps add this reference to your project and create an Interop.SpeechLib.dll in the same folder as your executable. This interop DLL must always be in the same folder as your .exe to work correctly.

  2. Reference the interop namespace

    Include this namespace in your application. In C#, add "using SpeechLib;"; iIn VB, add “Imports SpeechLib”.

  3. call Speak()

    Create a SpVoice object and call Speak():

    Visual C#

    SpVoice voice = new SpVoice();
    voice.Speak("Hello World!", SpeechVoiceSpeakFlags.SVSFDefault);

    Visual Basic

    voice = New SpVoice
    voice.Speak("Hello World!", SpeechVoiceSpeakFlags.SVSFDefault)

That's it! The downloads for this article are simple C# and VB.NET "hello world" samples to try out.

Another Example

One of the best uses of TTS is for audio notifications. The System Monitor sample on coding4fun is a perfect candidate to be TTS-enabled. It informs a user when a particular system event has occurred through a message box, system tray balloon tip, or email. I'll use its extensible notification infrastructure to add TTS to this list.

First, create an interop DLL and add a reference to the namespace using the steps above.

Second, add this file to the SystemMonitor project:

SpeechNotifier.cs:

Visual C#

using System;
using SpeechLib;
namespace SystemMonitor.Notifiers
{
class SpeechNotifier : NotifierBase
{
public SpeechNotifier()
{
_voice = new SpVoice();
}

public override void Execute(string title, string message)
{
string text = "System monitor warning! " + title;
_voice.Speak(text, SpeechVoiceSpeakFlags.SVSFDefault);
}

private SpVoice _voice;
}
}

SpeechNotifier.vb:

Visual Basic

imports System
imports SpeechLib

Namespace SystemMonitor.Notifiers

Class SpeechNotifier Inherits NotifierBase

Public Sub New()
_voice = new SpVoice
End Sub Public Overrides Sub Execute(title As String, message As String)

String text = "System monitor warning! " + title;
_voice.Speak(text, SpeechVoiceSpeakFlags.SVSFDefault);
End Sub Private SpVoice _voice
End Sub End Class End Namespace

Third, add this line in the App.config file to any monitors you want to be spoken:

<notifier type="SystemMonitor.Notifiers.SpeechNotifier,SystemMonitor" />

 

For example:

<monitor runFrequency="00:04" type="SystemMonitor.Monitors.DiskSpaceMonitor,SystemMonitor">
<settings>
<setting name="driveLetter" value="C" />
<setting name="freeMegabytes" value="10000" />
</settings>
<notifiers>
<notifier type="SystemMonitor.Notifiers.SpeechNotifier,SystemMonitor" />
</notifiers>
</monitor>

 

This example will say "System monitor warning! Disk space low." when the space on c:\ drops below the specified value.

General TTS Tips

  • Keep it short. The voice used by the default XP TTS engine is not exactly soothing. Most users would not want to hear it speak long texts, but it often works well for conveying short, functional pieces of information.
  • Have a preface. Many TTS apps are designed to be run in the background and speak audio when the user is focused on a different application. In this case it's helpful to let the user know the context. It wouldn't make sense if your computer randomly said "Bill Gates," but you would know what "New mail from Bill Gates" meant.
  • Keep it "speakable." Test out the text your app intends to say. The default TTS engine has some limitations you can avoid. For example, "c:\" would be pronounced "see colon backslash". This can be avoided by speaking just "c" instead.
  • Change defaults. The speech control panel allows the user to select the default TTS engine (for computers with more than one installed) and set the default speaking rate.

Cool project ideas

The TTS functionality on PCs is an under-used resource with much promise. Here are some more project ideas:

  • GotDotNet has the source code of a few email and RSS readers. Reading the title of a message as it arrives in the background would be a great use of TTS.
  • The System Monitor sample above just scratches the surface of event notifications. That application could be the foundation for monitoring any kind of system resource or application event. And with TTS you can let your ears do the work while your hands and eyes are busy.
  • The It's Hot in Here coding4fun sample uses a web service to read weather forecasts. Don't bother checking a web page for the weather; just modify this app to read it with a hotkey while your eyes do more important things. TTS is great for any similar web service update application: news headlines, stock quotes, sports scores, auction prices, and more.
  • Perhaps more importantly, that same article also introduces us to Phidgets. Upstage the guy next door by sending your Phidget battlebot into the fray with a TTS "You killed my father, prepare to die!".

What's next?

This article outlines how you can use TTS on PCs today. But the real story is what's coming up.

WinFX will contain the fully managed API for speech. No more ugly COM interop. Developers will get all the benefits of a managed API and full intellisense support.

The default TTS engine on Windows Vista (formerly Windows code name "Longhorn") is much better than the default XP engine. Check out some of the resources below for updates as Windows Vista gets closer to shipping.

Resources

Here's some blogs from members of the Speech Components Group at Microsoft:

Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# B R Pawan Kumar, V-enable Software Pvt Ltd said on April 13, 2007 1:43 AM:

Thabk you for explaining all the fatures so well and i am extrwemely greatful that your website is helping me to create new applications. I will really appreatiate if you could please guide me to a page that gives an idea abt the internal working of sapi and how we can use it to recognise speech.

# Asish Halder said on April 27, 2007 2:24 AM:

i think its a easy guideline for making  TTS application for the beginers. want to try it first .

# Prashant D said on May 2, 2007 5:52 AM:

Thank you this document helped me a lot..

# Max R. said on May 9, 2007 2:04 PM:

Hello! Very interesting. Thank you.

# thecobra said on July 16, 2007 11:38 AM:

cool totoring but were do u use speach to text

# Armando said on August 27, 2007 5:59 PM:

Hi!

There`s a way to do that but using a Spanish TTS?

Thanks!

# Aplicaciones con voz « Lo mejor del mundo tiene dos letras… said on October 25, 2007 1:53 PM:

PingBack from http://elmanu.wordpress.com/2007/10/25/aplicaciones-con-voz/

# Nevin said on November 3, 2007 4:07 PM:

This is a good tutorial, but is there any way to use the same funtionality on a web application.  I have built one that works fine on a local server, but when deployed to the remote server, it has problems accessing the Speechlib in the BIN folder without creating an error.

Nevin

# ahmed awad said on November 7, 2007 9:27 PM:

thank you very much but if you can send me how can i read from more languages specialy arabic and frensh.

my mail (elbanna23@yahoo.com)

thank you again

bye

# Noticias externas said on January 25, 2008 8:08 AM:

It is not easy to find good and free Text to Speech libraries for your .NET application. Even the solution

# one software developer said on March 2, 2008 9:37 AM:

I have worked on that but it does not have so good quality. For excellent quality there is aproject from AT&T. its free and you can use <a href="http://developeronline.blogspot.com/2008/01/text-to-speech-for-masses.html"> text-to-speech C#</a>

# Deepak Trama said on March 10, 2008 3:42 AM:

Hi,

I'm writing an app which generates a .wav file from some text.

I need to sync some pictures with this text and present it to the user as a slide show.

'Sync' is a relative term here, I need the pictures to appear some what close to the words when they are spoken.

Is there some way (a SAPI api) which will tell me where (duration in seconds) a particular word would occur in the generated audio file.

Any ideas?

My email is zodiac.seven@gmail.com

# Coding4Fun said on March 10, 2008 4:28 PM:

@Deepak Trama:  If you know the text you're going to say, create a dictionary before hand and their lengths.

Without diving into the SAPI api, I couldn't tell you if it does or doesn't have that functionality.  Sorry.

# Jefferson Vazquez » Blog Archive » where can i find these? said on March 21, 2008 11:42 AM:

PingBack from http://jeffersonvazquez.myelitelife.com/2008/03/21/where-can-i-find-these/

# Adam Rodgers » Blog Archive » where can i find these? said on March 27, 2008 10:08 AM:

PingBack from http://adamrodgers.frompariswithlove.net/2008/03/27/where-can-i-find-these/

# Ritchie said on April 2, 2008 9:03 AM:

This is a good very good article. There seems to be quite a bit of information on TTS. But, can somebody help me with STT Speech-To-Text conversions? RichardMaliwatu@hotmail.com

# mehwish said on May 7, 2008 10:20 AM:

i want to know that is it possible to convert voice to digital signal and then to identify the speaker...

if yes then plz provide me the C# code for voice recognition...

thank you!!

# aday said on June 4, 2008 12:50 AM:

Thanks very much for this excellent tutorial.

Can you teach me how to choose other voices in c# ?

# venkat said on July 2, 2008 1:49 AM:

does SAPI supports multi languages like german,spanish.

# Alfonso&#8217;s Blah Blah &raquo; Blog Archive &raquo; where can i find these? said on August 11, 2008 9:37 AM:

PingBack from http://bloguetas.com/alfonsomcgowan/2008/08/11/where-can-i-find-these/

# amin said on September 9, 2008 10:54 AM:

hi

it was so interesting article.

pleas introduce me some more example about speech recognition.I need something simple with no grammer recognition. something that just understand one word.

thanks alot

amin_mhsn@yahoo.com

amin

# Heath said on November 7, 2008 5:45 PM:

Check out http://www.iSpeech.org/  The work is already done for you.  Not to mention it uses good voices and it's free.

# vigneshkumar said on April 8, 2009 7:45 AM:

Thanks a lot for the information provided above

Leave a Comment

(required) 
(optional)
(required) 
Page view tracker