Welcome to MSDN Blogs Sign in | Join | Help

When WSR shows a window (for example, the alternates dialog, the Disambiguation Numbers UI, or the dictation scratchpad), that window does not take focus. 

These windows do not take focus because these windows could alter the state of the application that WSR is talking to.  For example, if you’re renaming a file in Windows Explorer, when the focus goes away, the file gets renamed.

Unfortunately, though, accessibility applications often have problems finding these windows, even though these windows fire the standard WinEvents (several types of windows are plain dialogs, for example).

In Windows 7, WSR fires a custom UI Automation event when it shows or hides one of its custom windows (Disambiguation Numbers, dictation scratchpad, Alternates, Spelling, Pronunciation, Switch Windows, or Launch Application).

To do this, WSR uses the IUIAutomationRegistrar interface.

IUIAutomationRegistrar::RegisterEvent defines a new event ID given a GUID and a name.  If multiple applications register an event using the same GUID and name, they all get the same event ID.  The event ID can also be used in a call to SetWinEventHook, if your application is using WinEvents.

Beware: all registrations are invalidated when you release the IUIAutomationRegistrar interface pointer!  So you must retain the interface pointer until you no longer wish to process events.  (MSDN doesn’t make this clear.)

So, if you want to find out when WSR is showing one of its custom windows, call IUIAutomationRegistrar::RegisterEvent with these two event declarations, and check for these events in your event processing code.

const GUID GUID_ShowEvent = { 
    0x3891149e, 0x7190, 0x47d0, {0xa5, 0x18, 0xca, 0x1c, 0xdb, 0x40, 0xf7, 0xe3}
  };
#define UI_SHOW      L"Microsoft.Speech.UI.Shown"

const GUID GUID_HideEvent = { 
    0x987a1c35, 0x597b, 0x4947, {0x9e, 0xf8, 0xe7, 0xa6, 0x81, 0x42, 0xfd, 0x11}
  };
#define UI_HIDDEN    L"Microsoft.Speech.UI.Hidden"

In a previous post, I mentioned that you need to set the GUID_PROP_COMPOSING property across text in a composition, or else Windows XP will terminate the composition.

That’s true.

I then provided a code snippet to set that property.

That snippet is wrong.

It turns out that you can’t set the GUID_PROP_COMPOSING property via ITfProperty::SetValue.  If you try, you’ll get a TF_E_READONLY error (even if you are in a read-write edit session), which can cause great consternation.

Instead, you need to call ITfComposition::ShiftStart and/or ITfComposition::ShiftEnd to set the bounds of the composition after changing the text in your composition.

These calls update the GUID_PROP_COMPOSING property internally.

Sorry for the confusion.

Now that the Beta of Windows 7 is out, it’s time to talk about the improvements and new features in Windows Speech Recognition.

For Windows 7, we focused primarily on improving the user experience and removing the “rough spots” that we did not have time to fix in Vista.

First and foremost, we focused on performance. 

  • We rewrote the logic that builds the “say what you see” grammar to use the new native UI Automation API (instead of the MSAA IAccessible API).  This dramatically reduces the number of cross-process COM calls (by an order of magnitude), and speeds up the grammar generation by about 5-6 times.
  • The document harvester also has substantial performance improvements;
  • Building the “start application” grammar also runs much faster, as well.

Second, we focused on usability.

  • Dictation into TSF-unaware applications works much better than it did before.  Now, when you dictate into an unaware application, the dictation scratchpad appears.
    • You can use the scratchpad as a temporary document, and it is even voice, mouse, and keyboard-enabled; you can type, use the arrow keys for navigation, or use the mouse or voice commands to select and correct text before inserting the finished text into the unaware application.
    • If you don’t like the scratchpad, you can turn it off, and your dictations will be directly inserted into the unaware application.
  • Sleep mode works much better than it did before; false recognitions of “start listening” have been greatly reduced.
  • We simplified the transitions between OFF and SLEEP mode; for security reasons, we now default to OFF after “stop listening”; although users can change the default to SLEEP mode.  (We call this “voice activation” in the Control Panel and First Time User Experience.)

Third, we looked at accuracy.

  • The SR engine now uses the WASAPI audio stack, so we support array microphones and echo cancellation; this vastly improves WSR’s accuracy when used without a headset.
  • Document harvesting runs periodically, rather than just at startup; this lets the harvester pick up new documents as you create them, rather than having to wait for you to reload speech.
  • You can upload your training data to Microsoft, so that we can improve the recognizers in the future.  (You have to initiate this, incidentally; we will not upload any data without your explicit consent.)
  • The Chinese recognizer has substantial accuracy improvements as well.

Lastly, we did a few tweaks to the recognizer.  In Vista, 3rd party applications couldn’t tell whether the shared recognizer was ON or SLEEPing.  For Win7, there are new APIs that expose SLEEP mode.

The TSF samples (long missing from MSDN) have finally been uploaded to the MSDN Code Gallery.  The documentation is also available (in plain-text form) in each sample.  MSDN’s web page should be updated soon to point to the code gallery.

Many people have emailed me asking about the TSF samples on MSDN.  They’re supposed to be on MSDN code gallery, although they don’t appear to be there.

They are, however, part of the Windows SDK. After installation, you'll find them in %programfiles%\Microsoft SDKs\Windows\v6.1\Samples\winui\Input\tsf.

There are actually more samples in the Windows SDK than were on MSDN, including some examples of how to write a text store, as well as how to interact with TSF in UILess mode.

One common cause of dictation not working is that CTFMon is not running.  This is a helper process used by the Text Services Framework to implement things like global compartments and the like.

If dictation is not working, try running this command from an elevated command prompt:

schtasks /Query /TN \Microsoft\Windows\TextServicesFramework\MsCtfMonitor

(all on one line)

You should get some output that looks like this:

 

TaskName            Next Run Time       Status
=================== =============       =======
MsCtfMonitor        N/A                 Running

 

If the status doesn’t say ‘Running’, then you need to restart CTFMon like this:

schtasks /Run /TN \Microsoft\Windows\TextServicesFramework\MsCtfMonitor

 

Run the query again, and verify that the MsCtfMonitor task is running.

Once it is, restart Windows Speech Recognition, and dictation should work again.

I’ve heard from a number of sources that there isn’t any good documentation about the ‘inline’ dictation commands.  These commands can be uttered in the middle of a dictation stream (in other words, you don’t have to stop speaking to use these commands), and are used to guide the Speech Recognition Engine to produce the desired results.

Command Description
tab Inserts a <tab> character.
new-line Inserts a new line character and forces the next word to be capitalized.
new-paragraph Inserts two new line characters and forces the next word to be capitalized.
caps <word> Capitalizes the first letter of <word>.
no caps <word> Lowercases <word>.
all caps <word> Uppercases <word>.
no space Does not insert a space before the next word.
literal <argument> Does not perform any inverse text normalization on <argument>.
numeral <argument> Forces <argument> into numeric form, if possible.

Simple Examples

Tab, new-line, new-paragraph, caps, no caps, and all caps are pretty straightforward:

User Input Recognition Result
king tab county king<tab>county
this is a test new-line this is another test This is a test
This is another test
this is a test new-paragraph this is another test This is a test

This is another test
no caps C I A cia
C I A CIA
I have caps in my closet I have In my closet
I have all caps nothing I have NOTHING
I have no space available I haveavailable

Examples using <literal>

In order to understand what <literal> does, you need to know what Inverse Text Normalization does.  Inverse Text Normalization is the process of converting spoken forms into a preferred textual representation.   Some examples are:

User Input Recognition Result
doctor smith Dr. Smith
period .
united states of america United States of America
three hundred and five 305

The <literal> command prevents that conversion from occurring, and forces the text to be input exactly as spoken.

Examples using <numeral>

The best way to describe the <numeral> command is to give some examples:

User Input Recognition Result
numeral two 2
two two
numeral twelve 12
twelve 12
numeral four eight seven six two three 487623
four eight seven six two three four eight seven six two three
numeral one and a half million 1,500,000
numeral one point five million 1.5 million
numeral one point four five 1.45

If the phrase cannot be interpreted as a number, then the numeral command does nothing:

User Input recognition result
numeral I want to go to the store I want to go to the store

Note that this phrase has a possible number (to –> two –> 2), but the intervening words deactivate the command.

David LeBlanc wrote an excellent overview of encrypted documents in Office.  A long, long time ago, I worked on the Word conversions team (it wasn’t even called Office then).  As part of my job, I wrote a document encryption filter. 

More specifically, I wrote (around 1990 or so) a document obfuscation filter.  I say ‘obfuscation’ because one of the requirements was that the password had to be stored with the (encrypted) document.   Of course, that rendered any possible security null and void, but the customer needed to be able to recover documents with forgotten passwords.

When David talks about XOR obfuscation, I believe that the incredibly weak security was a ‘feature’, not a bug – quite a few companies wanted to be able to prevent casual snoopers from reading their documents, but also wanted to be able to recover from a forgotten password.

DES was a standard in 1990 (it didn’t really fall until 1998), and I know the Office devs had an implementation around to use, as my manager wrote a real encryption filter using DES a couple of months after I wrote my obfuscation filter.

Well, for one thing, it’s not that great for grabbing the current selection in the foreground application. 

There are several problems here:

1) You have to get your text service loaded into the target application. This can be slow.

2) Once your text service is loaded, you can only really work with TSF-enabled applications, which, sadly, are few and far between.

What do I recommend?  There aren’t any really good solutions.  MSAA doesn’t have good text support.  UI Automation doesn’t have good native support (although it works very nicely in managed applications, and has good text support).  You can use the clipboard, but users tend to be somewhat attached to their clipboard contents.  (Saving & restoring the clipboard is possible, but can be tricky – application vendors will often register private clipboard formats that contain pointers which get invalidated when the clipboard changes.  When you put the clipboard back, you will crash the application.  Most vendors don’t do that, but it’s something you should be aware of.)

If you do decide to use the clipboard, then this guy has implemented a pretty fancy clipboard manager, and has a good discussion of various traps & pitfalls with the clipboard.

WSR Dictation should always work in Wordpad.  If you're having problems with dictation, make sure it works in Wordpad.

 

If dictation doesn’t work in WordPad:

1)  Start regedit, go to the key HKEY_CURRENT_USER\Software\Microsoft\Speech\Preferences\en-us\ and remove any value named DictationEnabled.

If there is no registry key named DictationEnabled, then

2)  Check %windir%\ime\sptip.dll - it should exist, be 128KB in size, and have a file version of 6.0.6001.17128 and a product version of 6.0.6000.16386.

3)  The file %windir%\ime\en-us\sptip.dll.mui should also exist.

4)  Make sure ctfmon.exe is running.  It should start at login (there is a scheduled task that starts it).

 

If dictation works in WordPad, but doesn't work in your application:

Your application probably doesn't support TSF.  Complain to your vendor.  You can enable a (very limited) version of dictation by checking 'Enable Dictation Everywhere' in the Windows Speech Recognition context menu.

In an earlier post on keyboards, I talked briefly about text service categories.  I'd like to talk more about categories.

TSF will make sure that at most one text service in any category is enabled at any given time.

So, for example, you can enable one text service with GUID_TFCAT_TIP_KEYBOARD, one text service with GUID_TFCAT_TIP_SPEECH, and one text service with GUID_TFCAT_TIP_HANDWRITING, and all three text services will be active at the same time.

But if you enable another text service with GUID_TFCAT_TIP_KEYBOARD, the first one will be deactivated. 

If you don't want this to happen, then you should call RegisterCategory with your own category GUID.

If you're building a text service DLL, you almost certainly don't want to use Visual Studio 2008's compiler.  The problem is that Visual Studio 2008 uses a new C Runtime Library, and if you build your text service with Visual Studio, your text service likely won't load in all applications.  (Plus, you would have to redistribute the C Runtime Library.)

What to do instead?

Well, I would recommend installing the Vista (or XP) DDK and use the DDKWizard instead.  The DDK comes with its own C/C++ compiler that uses the C Runtime Library that ships with the OS (and won't cause problems with other applications), and the DDKWizard will let you use all of Visual Studio's awesome capabilities.

I recently had two people ask me the same question:

"Why can't I insert more than one character into a composition on Notepad?"

It's actually a bit more complicated than that, since this behavior only appears to happen on Windows XP with a US English text service.  (Japanese text services appear to work correctly.)

I just had a chance to figure out what was going on here, and, since it doesn't seem to be documented anywhere else, I thought I'd post the answer so that the next poor sod doesn't have to spend eternity scratching his head wondering why he can't get it to work.

After a longish debugging bout, and a bunch of searching through the XP source code, I have the answer.

The answer is that there's a Text Event Sink attached to the TSF-unaware context, and when the context changes, this event sink looks to see if the changed text has the GUID_PROP_COMPOSING property attached to it.  If it doesn't, it terminates the composition.

So, if you want to have your text services insert more than one character into a composition, you need to make sure that your text has the GUID_PROP_COMPOSING property set to 1.

 I had previously written some code to manage this property.  However, you cannot explicitly set the GUID_PROP_COMPOSING property -

ITfProperty::SetValue() will return TF_E_READONLY if you do.

What you do need to do is use the ITfComposition::ShiftStart and ITfComposition::ShiftEnd methods to move the start and ending points of your composition to cover the text.  These methods will update the GUID_PROP_COMPOSING property directly.

The code to do that would look like this:

 

BOOL CTextService::_SetCompositionComposing(TfEditCookie ec, ITfContext *pContext)
{
    ITfRange *pRangeComposition;
    ITfProperty *pComposingProperty;
    HRESULT hr;

    // the composition requires a range and the context it lives in
    if (_pComposition->GetRange(&pRangeComposition) != S_OK)
        return FALSE;

    hr = E_FAIL;

    // get our the display attribute property
    if (pContext->GetProperty(GUID_PROP_COMPOSING, &pComposingProperty) == S_OK)
    {
        VARIANT var;
        // set the value over the range
        var.vt = VT_I4;
        var.lVal = 1;

        hr = pComposingProperty->SetValue(ec, pRangeComposition, &var);

        pComposingProperty->Release();
    }

    pRangeComposition->Release();
    return (hr == S_OK);
}

When you're ready to finalize your composition, you should clear the property over the composed text by calling ITfProperty::Clear().

Note that none of the sample text services do this.

If you've tried to use the modified version of Scintilla that I described in my MSDN article, you will find that the zipped sources don't actually have the changes that I made.  That was my fault; when I was packaging the sources, I had two versions of ScintillaWin.cxx around, and I picked the newer one, which (sigh) was the wrong one.  Anyway, here's a link to the correct version of ScintillaWin.cxx with TSF support.

I got caught by this recently.  ITfCompartmentEventSink::OnChange means what it says.  If you repeatedly store the same value into a global compartment, the event sinks will not fire.  If you store a different value into the compartment, the event sinks fire just fine.

More Posts Next page »
 
Page view tracker