Matthew van Eerde's web log
I am a Software Development Engineer in Test working for the Windows Sound team. You can contact me via email: mateer at microsoft dot com
Friend key:28904932216450_59cd9d55374be03d8167d37c8ff4196b
A while ago I wrote a post on Implementing a "say" command using ISpVoice from the Microsoft Speech API which showed how to use Speech API to do text-to-speech, but was limited to playing the generated audio out of the default audio device.
Recently on the Windows Pro Audio forums, user falven asked a question about how to grab the output of the text-to-speech engine as a stream for further processing.
Here's how to do it.
The key part is to use ISpStream::BindToFile to save the audio data to a .wav file, and ISpStream::SetBaseStream to save to a given IStream. Then call ISpVoice::SetOutput with the ISpStream, prior to calling ISpVoice::Speak.
ISpStream *pSpStream = nullptr; hr = CoCreateInstance( CLSID_SpStream, nullptr, CLSCTX_ALL, __uuidof(ISpStream), (void**)&pSpStream ); if (FAILED(hr)) { ERR(L"CoCreateInstance(ISpVoice) failed: hr = 0x%08x", hr); return -__LINE__; } ReleaseOnExit rSpStream(pSpStream); if (File == where) { hr = pSpStream->BindToFile( file, SPFM_CREATE_ALWAYS, &SPDFID_WaveFormatEx, &fmt, 0 ); if (FAILED(hr)) { ERR(L"ISpStream::BindToFile failed: hr = 0x%08x", hr); return -__LINE__; } } else { // stream pStream = SHCreateMemStream(NULL, 0); if (nullptr == pStream) { ERR(L"SHCreateMemStream failed"); return -__LINE__; } hr = pSpStream->SetBaseStream( pStream, SPDFID_WaveFormatEx, &fmt ); if (FAILED(hr)) { ERR(L"ISpStream::SetBaseStream failed: hr = 0x%08x", hr); return -__LINE__; } } hr = pSpVoice->SetOutput(pSpStream, TRUE); if (FAILED(hr)) { ERR(L"ISpVoice::SetOutput failed: hr = 0x%08x", hr); return -__LINE__; }
Updated source and binaries attached.
Usage:
>say.exesay "phrase" [--file <filename> | --stream]runs phrase through text-to-speech engineif --file is specified, writes to .wav fileif --stream is specified, captures to a streamif neither is specified, plays to default output
Here's how to generate a .wav file (uh.wav attached)
>say.exe "uh" --file uh.wavStream is 1
And here's how to generate an output stream. The app consumes this and prints the INT16 sample values to the console. uh.txt attached.
>say.exe "uh" --streamStream is 1 0 0; 0 0; 0 0; 0 0 0 0; 0 0; 0 0; 0 0... 86 86; -1052 -1052; -2839 -2839; -3774 -3774 -4199 -4199; -4581 -4581; -4284 -4284; -3640 -3640 -3100 -3100; -2011 -2011; -393 -393; 533 533...
Looks great!