Welcome to MSDN Blogs Sign in | Join | Help

The TSF samples (long missing from MSDN) have finally been uploaded to the MSDN Code Gallery.  The documentation is also available (in plain-text form) in each sample.  MSDN’s web page should be updated soon to point to the code gallery.

Many people have emailed me asking about the TSF samples on MSDN.  They’re supposed to be on MSDN code gallery, although they don’t appear to be there.

They are, however, part of the Windows SDK. After installation, you'll find them in %programfiles%\Microsoft SDKs\Windows\v6.1\Samples\winui\Input\tsf.

There are actually more samples in the Windows SDK than were on MSDN, including some examples of how to write a text store, as well as how to interact with TSF in UILess mode.

One common cause of dictation not working is that CTFMon is not running.  This is a helper process used by the Text Services Framework to implement things like global compartments and the like.

If dictation is not working, try running this command from an elevated command prompt:

schtasks /Query /TN \Microsoft\Windows\TextServicesFramework\MsCtfMonitor

(all on one line)

You should get some output that looks like this:

 

TaskName            Next Run Time       Status
=================== =============       =======
MsCtfMonitor        N/A                 Running

 

If the status doesn’t say ‘Running’, then you need to restart CTFMon like this:

schtasks /Run /TN \Microsoft\Windows\TextServicesFramework\MsCtfMonitor

 

Run the query again, and verify that the MsCtfMonitor task is running.

Once it is, restart Windows Speech Recognition, and dictation should work again.

I’ve heard from a number of sources that there isn’t any good documentation about the ‘inline’ dictation commands.  These commands can be uttered in the middle of a dictation stream (in other words, you don’t have to stop speaking to use these commands), and are used to guide the Speech Recognition Engine to produce the desired results.

Command Description
tab Inserts a <tab> character.
new-line Inserts a new line character and forces the next word to be capitalized.
new-paragraph Inserts two new line characters and forces the next word to be capitalized.
caps <word> Capitalizes the first letter of <word>.
no caps <word> Lowercases <word>.
all caps <word> Uppercases <word>.
no space Does not insert a space before the next word.
literal <argument> Does not perform any inverse text normalization on <argument>.
numeral <argument> Forces <argument> into numeric form, if possible.

Simple Examples

Tab, new-line, new-paragraph, caps, no caps, and all caps are pretty straightforward:

User Input Recognition Result
king tab county king<tab>county
this is a test new-line this is another test This is a test
This is another test
this is a test new-paragraph this is another test This is a test

This is another test
no caps C I A cia
C I A CIA
I have caps in my closet I have In my closet
I have all caps nothing I have NOTHING
I have no space available I haveavailable

Examples using <literal>

In order to understand what <literal> does, you need to know what Inverse Text Normalization does.  Inverse Text Normalization is the process of converting spoken forms into a preferred textual representation.   Some examples are:

User Input Recognition Result
doctor smith Dr. Smith
period .
united states of america United States of America
three hundred and five 305

The <literal> command prevents that conversion from occurring, and forces the text to be input exactly as spoken.

Examples using <numeral>

The best way to describe the <numeral> command is to give some examples:

User Input Recognition Result
numeral two 2
two two
numeral twelve 12
twelve 12
numeral four eight seven six two three 487623
four eight seven six two three four eight seven six two three
numeral one and a half million 1,500,000
numeral one point five million 1.5 million
numeral one point four five 1.45

If the phrase cannot be interpreted as a number, then the numeral command does nothing:

User Input recognition result
numeral I want to go to the store I want to go to the store

Note that this phrase has a possible number (to –> two –> 2), but the intervening words deactivate the command.

David LeBlanc wrote an excellent overview of encrypted documents in Office.  A long, long time ago, I worked on the Word conversions team (it wasn’t even called Office then).  As part of my job, I wrote a document encryption filter. 

More specifically, I wrote (around 1990 or so) a document obfuscation filter.  I say ‘obfuscation’ because one of the requirements was that the password had to be stored with the (encrypted) document.   Of course, that rendered any possible security null and void, but the customer needed to be able to recover documents with forgotten passwords.

When David talks about XOR obfuscation, I believe that the incredibly weak security was a ‘feature’, not a bug – quite a few companies wanted to be able to prevent casual snoopers from reading their documents, but also wanted to be able to recover from a forgotten password.

DES was a standard in 1990 (it didn’t really fall until 1998), and I know the Office devs had an implementation around to use, as my manager wrote a real encryption filter using DES a couple of months after I wrote my obfuscation filter.

Well, for one thing, it’s not that great for grabbing the current selection in the foreground application. 

There are several problems here:

1) You have to get your text service loaded into the target application. This can be slow.

2) Once your text service is loaded, you can only really work with TSF-enabled applications, which, sadly, are few and far between.

What do I recommend?  There aren’t any really good solutions.  MSAA doesn’t have good text support.  UI Automation doesn’t have good native support (although it works very nicely in managed applications, and has good text support).  You can use the clipboard, but users tend to be somewhat attached to their clipboard contents.  (Saving & restoring the clipboard is possible, but can be tricky – application vendors will often register private clipboard formats that contain pointers which get invalidated when the clipboard changes.  When you put the clipboard back, you will crash the application.  Most vendors don’t do that, but it’s something you should be aware of.)

If you do decide to use the clipboard, then this guy has implemented a pretty fancy clipboard manager, and has a good discussion of various traps & pitfalls with the clipboard.

WSR Dictation should always work in Wordpad.  If you're having problems with dictation, make sure it works in Wordpad.

 

If dictation doesn’t work in WordPad:

1)  Start regedit, go to the key HKEY_CURRENT_USER\Software\Microsoft\Speech\Preferences\en-us\ and remove any value named DictationEnabled.

If there is no registry key named DictationEnabled, then

2)  Check %windir%\ime\sptip.dll - it should exist, be 128KB in size, and have a file version of 6.0.6001.17128 and a product version of 6.0.6000.16386.

3)  The file %windir%\ime\en-us\sptip.dll.mui should also exist.

4)  Make sure ctfmon.exe is running.  It should start at login (there is a scheduled task that starts it).

 

If dictation works in WordPad, but doesn't work in your application:

Your application probably doesn't support TSF.  Complain to your vendor.  You can enable a (very limited) version of dictation by checking 'Enable Dictation Everywhere' in the Windows Speech Recognition context menu.

In an earlier post on keyboards, I talked briefly about text service categories.  I'd like to talk more about categories.

TSF will make sure that at most one text service in any category is enabled at any given time.

So, for example, you can enable one text service with GUID_TFCAT_TIP_KEYBOARD, one text service with GUID_TFCAT_TIP_SPEECH, and one text service with GUID_TFCAT_TIP_HANDWRITING, and all three text services will be active at the same time.

But if you enable another text service with GUID_TFCAT_TIP_KEYBOARD, the first one will be deactivated. 

If you don't want this to happen, then you should call RegisterCategory with your own category GUID.

If you're building a text service DLL, you almost certainly don't want to use Visual Studio 2008's compiler.  The problem is that Visual Studio 2008 uses a new C Runtime Library, and if you build your text service with Visual Studio, your text service likely won't load in all applications.  (Plus, you would have to redistribute the C Runtime Library.)

What to do instead?

Well, I would recommend installing the Vista (or XP) DDK and use the DDKWizard instead.  The DDK comes with its own C/C++ compiler that uses the C Runtime Library that ships with the OS (and won't cause problems with other applications), and the DDKWizard will let you use all of Visual Studio's awesome capabilities.

I recently had two people ask me the same question:

"Why can't I insert more than one character into a composition on Notepad?"

It's actually a bit more complicated than that, since this behavior only appears to happen on Windows XP with a US English text service.  (Japanese text services appear to work correctly.)

I just had a chance to figure out what was going on here, and, since it doesn't seem to be documented anywhere else, I thought I'd post the answer so that the next poor sod doesn't have to spend eternity scratching his head wondering why he can't get it to work.

After a longish debugging bout, and a bunch of searching through the XP source code, I have the answer.

The answer is that there's a Text Event Sink attached to the TSF-unaware context, and when the context changes, this event sink looks to see if the changed text has the GUID_PROP_COMPOSING property attached to it.  If it doesn't, it terminates the composition.

So, if you want to have your text services insert more than one character into a composition, you need to make sure that your text has the GUID_PROP_COMPOSING property set to 1.

The code to do that would look like this:

 

BOOL CTextService::_SetCompositionComposing(TfEditCookie ec, ITfContext *pContext)
{
    ITfRange *pRangeComposition;
    ITfProperty *pComposingProperty;
    HRESULT hr;

    // the composition requires a range and the context it lives in
    if (_pComposition->GetRange(&pRangeComposition) != S_OK)
        return FALSE;

    hr = E_FAIL;

    // get our the display attribute property
    if (pContext->GetProperty(GUID_PROP_COMPOSING, &pComposingProperty) == S_OK)
    {
        VARIANT var;
        // set the value over the range
        var.vt = VT_I4;
        var.lVal = 1;

        hr = pComposingProperty->SetValue(ec, pRangeComposition, &var);

        pComposingProperty->Release();
    }

    pRangeComposition->Release();
    return (hr == S_OK);
}

When you're ready to finalize your composition, you should clear the property over the composed text by calling ITfProperty::Clear().

Note that none of the sample text services do this.

If you've tried to use the modified version of Scintilla that I described in my MSDN article, you will find that the zipped sources don't actually have the changes that I made.  That was my fault; when I was packaging the sources, I had two versions of ScintillaWin.cxx around, and I picked the newer one, which (sigh) was the wrong one.  Anyway, here's a link to the correct version of ScintillaWin.cxx with TSF support.

I got caught by this recently.  ITfCompartmentEventSink::OnChange means what it says.  If you repeatedly store the same value into a global compartment, the event sinks will not fire.  If you store a different value into the compartment, the event sinks fire just fine.

I've been working with compartments recently, and I've run across a few 'features' that tripped me up.  I figure if I've run across them, others have too.

Although MSDN says that you can put integers, BSTRs, and interface pointers into a compartment, you can not store interface pointers or strings into a global compartment (that you get from ITfThreadMgr::GetGlobalCompartment).  You can only store integers and empty variants.

Side note:  I would be very cautious about storing interface pointers into a (local) compartment, as well.  If an application has multiple UI threads (e.g., Explorer), TSF will load multiple instances of the text service (one instance per thread).  When each instance retrieves the interface pointer from an event sink and tries to call it, the interface pointer is properly marshalled across the apartment boundaries (COM marshals the IUnknown in the variant returned from ITfCompartment::GetValue), but you do have a proxy in the way, and calls across the interface pointer will be quite slow.

Text Services Framework assumes that your text service follows a particular processing path.  If your text service doesn't conform to these assumptions, then your programming job will be more complicated.  (Not impossible, just more complex.)  The text service samples on MSDN also follow these assumptions, but they aren't explicitly stated anywhere (that I know of).   I've mentioned some of these assumptions in previous articles, but I thought I'd bring them together in one post.

Text Services makes the following assumptions:

  1. Your service must perform all changes to a context or range object within an edit session.  Text Services Framework enforces this assumption through the use of edit cookies.
  2. Your service should not assume that it is possible to request a synchronous edit session.  (I discussed this here.)
  3. Your service should track focus changes between applications and between controls within an application.  This means that your text service must install event sinks for ITfThreadFocusSink and ITfThreadMgrEventSink.
  4. Your text service should use compositions to handle partially formed input. 

This last assumption is the big one.  It can cause problems for text services that aren't keyboard-related (speech, for example).

The problem is that TSF handles the (admittedly, very difficult) job of interacting with non-TSF aware applications entirely through compositions.  Once you close the composition, TSF assumes that you're completely finished with that piece of input.

Unfortunately, it's hard to tell beforehand when you're done with a piece of dictation.  SAPI will tell you when it's recognized a piece of text, obviously, but, ideally, once you've dictated some text, you would like to be able to correct it.  That requires that you leave the composition open.

In an application that isn't TSF-aware, though, you need to close that composition as soon as you can (it's bad form to have large open compositions; most IMEs have compositions that are a few characters in size).

So there's a tradeoff here.  Dictation in Windows Vista currently closes the composition as soon as the text is recognized.  (In fact, it doesn't use compositions at all.)   That works fine for TSF-aware applications, but causes problems with TSF-unaware applications.  In particular, once you've dictated some text, you can't correct it by voice.  That's why Windows Speech recognition makes you confirm every dictation into a TSF-unaware application. 

I received an interesting email the other day asking about how to get the character code from the parameters passed to the ITfKeyEventSink::OnKeyDown method.

The answer is that most keyboard related text services only work with a particular keyboard layout, and the text service manages the mapping from virtual key codes to character codes.

It is actually surprisingly difficult to write a keyboard-related text service that is keyboard-layout agnostic, as there isn't a public API that exposes the details of the keyboard layout.

"What about ToAscii/ToUnicode (and the related APIs)", you say?  Well, unfortunately, those APIs don't actually work across all languages.  In particular, they don't deal well with dead keys and obscure shift states (AltGr, SGCAPS, etc.), and, of course, fail utterly with languages like Japanese, Chinese, and Korean.  Michael Kaplan has an extensive set of articles about using ToUnicode (summarized here).

If you want to write a text service that is completely agnostic as to keyboard layouts, then, most likely, you probably don't want to use ITfKeyEventSink.  What can you use to monitor keyboard changes?

So far, the best recommendation I have is to use ITfTextEditSink::OnEndEdit, and if either of the following two conditions are true, then you can conclude that a keyboard made the changes: 

1) Call ITfEditRecord::GetTextAndPropertyUpdates to see if it contains a range with the property GUID_PROP_COMPOSING. TSF will set this property on text that is part of a composition.

2) Check if the context that you get from OnEndEdit has one or more ITfCompositionView objects within it.  Here's some sample code to demonstrate:

hr = pContext->QueryInterface(IID_ITfContextComposition,
                              (void **)&pContextComposition);
if (hr == S_OK)
{
    IEnumITfCompositionView *pEnumCompositionView;

    hr = pContextComposition->EnumCompositions(&pEnumCompositionView);
    if (hr == S_OK)
    {
        ITfCompositionView *pCompositionView;

        while (pEnumCompositionView->Next(1, &pCompositionView, NULL) == S_OK)
        {
            ITfRange *pRange;
            hr = pCompositionView->GetRange(&pRange);
            if (hr == S_OK)
            {
                // Do Stuff Here
                pRange->Release();
            }
            pCompositionView->Release();
        }
        pEnumCompositionView->Release();
    }
    pContextComposition->Release();
}

More Posts Next page »
 
Page view tracker