- Our Users Are Leading Authorities
-
Throughout my career at Microsoft, I've eagerly participated in mailing lists, newsgroups, and web forums to engage customers and learn more about their needs and foster direct communication.
One of the better forums for speech recognition is run by Professor Itamar Even-Zohar of Tel Aviv University, where he teaches Culture Research. Itamar has been a long time user of speech recognition and vocal in feedback regarding Windows Speech Recognition. His web site on speech recognition contains useful information on WSR and speech recognition included in Office XP and Office 2003. In particular, his ms-speech forum is invaluable.
Recently when David Pogue of the New York Times wrote about the newest version of NaturallySpeaking, Itamar was quick to write David and set him straight on a few matters, including a plug about Windows Speech Recognition Macros!
David wrote of Itamar, "Clearly, I’ve unearthed the world’s leading authority on speech-recognition foreign-language versions,"
If you read the links I'm providing, you'll see that Professor Even-Zohar is not enamored of all that we do. He's critical of several aspects of WSR and while he "gets it" regarding WSR Macros, he's quick to point out flaws and features.
It's users like this that we need more of; people who are highly experienced and unafraid to share their opinions. The information provided is valuable to me and the rest of the product teams. On the flip side, we have to be careful regarding users expectations. Bending our ear doesn't mean you'll get whatever feature you asked for, and within a particular timeframe.
Oftentimes we'll have more features than time or people available. We have to be very choosy about where to spend our resources. Even things that are a number #1 priority sometimes have to take a backseat to a lesser feature because it was one that we could do in the time or resources available.
Having the feedback from experienced users though help us make the most of the resources we have. We can prioritize better and have confidence that what we're doing will have the greatest impact.
To everyone who writes us at listen, speak and sapitech - we thank you and keep the feedback rolling!
- WSR Accuracy Survey
-
We're always looking for feedback on how to improve Windows Speech Recognition. If you are a frequent user, please take a moment to respond with your experiences. You can email us, or leave a comment below.
- What mode of microphone control do you use most often?
- I use “start listening” and “stop listening”
- I press CTRL+WIN to change listening modes
- I use my headset/microphone mute button
- If you use “start listening” (or have in the past), how reliable is it for your environment?
- Very reliable: WSR only listens when I say “start listening”
- Somewhat reliable: WSR occasionally wakes up even if I did not say, “start listening”, but it’s not a problem for me.
- Not reliable: I cannot use the Sleep mode because “start listening” is recognized too frequently.
- Were you aware that pressing CTRL+WIN was a possible means of controlling the listening state?
- No
- Yes
- When you add words to the speech dictionary, do you also record a pronunciation?
- Yes, always
- Sometimes
- Only if after adding the word WSR still does not recognize it correctly
- What’s the speech dictionary?
- Do you find that while correcting misrecognized phrases, that WSR still misrecognizes the phrase, even after one or more corrections?
- Always
- Frequently
- Enough that I notice
- Occasionally
- Rarely
- What is your favorite feature or aspect of Windows Speech Recognition?
- Conversely, what is the one thing you’d like to change?
Thanks for taking the time to share your feedback! We do value your feedback and use it to help guide future development.
- Enumerating TTS Engines using System.Speech.Synthesizer
-
Here is a quick and dirty C# console application that will list out the installed TTS engines and associated properties. Make sure you add System.Speech to your project's list of references.
using System;
using System.Collections.Generic;
using System.Speech;
using System.Speech.Synthesis;
using System.Speech.AudioFormat;
namespace SelectVoice
{
class SelectVoice
{
static void Main(string[] args)
{
Console.WriteLine("SelectVoice Example");
SpeechSynthesizer ttsSynth = new SpeechSynthesizer();
Console.WriteLine("Listing installed speech synthesizer voices...");
foreach (InstalledVoice ttsVoice in ttsSynth.GetInstalledVoices())
{
Console.WriteLine("Name:\t{0}", ttsVoice.VoiceInfo.Name);
Console.WriteLine("Desc:\t{0}", ttsVoice.VoiceInfo.Description);
Console.WriteLine("Id:\t{0}", ttsVoice.VoiceInfo.Id);
Console.WriteLine("Gender:\t{0}", ttsVoice.VoiceInfo.Gender);
Console.WriteLine("Age:\t{0}", ttsVoice.VoiceInfo.Age);
Console.WriteLine("Supported Audio Formats:");
foreach (SpeechAudioFormatInfo audioFormat in ttsVoice.VoiceInfo.SupportedAudioFormats)
{
Console.WriteLine("\tEncodingFormat:\t{0}", audioFormat.EncodingFormat);
Console.WriteLine("\tChannelCount:\t{0}", audioFormat.ChannelCount);
Console.WriteLine("\tBits/sec:\t{0}", audioFormat.BitsPerSample);
Console.WriteLine("\tAvg Bytes/sec:\t{0}", audioFormat.AverageBytesPerSecond);
Console.WriteLine("\tSamples/sec:\t{0}", audioFormat.SamplesPerSecond);
Console.WriteLine("\tBlockAlign:\t{0}", audioFormat.BlockAlign);
}
Console.WriteLine("Additional Information:");
foreach(KeyValuePair<string, string> kvp in ttsVoice.VoiceInfo.AdditionalInfo)
Console.WriteLine("\t{0}: {1}", kvp.Key, kvp.Value);
Console.WriteLine();
}
Console.WriteLine("Finished listing installed voices.");
ttsSynth.SelectVoice("Microsoft Anna");
ttsSynth.Speak("Greetings, my name is " + ttsSynth.Voice.Name);
}
}
}
- The "Mojave Experiment"
-
Check out the "Mojave Experiment", where Microsoft brought in people to show them a un-released version of Windows.
Having been part of the Windows 95 team, and then shipped components in Windows 98, Windows 2000 and Windows Vista, I'm used to people complaining that the newest version of Windows is not as good at the previous version.
There are a lot of misconceptions about Windows Vista - that's it's slow, that feature X is not as good as it was in Windows XP, or any number of possible excuses. Many of the people doing the complaining haven't used Vista, or installed it on hardware that didn't meet have the recommended system requirements.
If you have Windows XP and are concerned that Windows Vista won't work with your hardware of existing applications, check out the very useful "Windows Vista Upgrade Advisor" tool.
What's your favorite Windows Vista feature? Of course, I'm partial to Speech Recognition and the Microsoft Anna TTS voice, but there are many new features, tell me yours.
Update: The New York Times collects some reaction.
- Where can I get the Microsoft Bob SDK?
-
My friend and colleague Karin Meier is the person I work with when putting updated speech content into the Windows SDK.
She recently blogged about some odd requests she's gotten for old software. One of them was "where can I get the Microsoft Bob SDK?". I've been feeling nostalgic about Bob recently, as I visited Dan Rose's exhibit on Microsoft Bob, part of Dan's 20th Century Abandonware site.
My first job at Microsoft in 1994 was working on Microsoft Bob which was code-named Utopia. My contribution to the product was minor, but in my opinion, it never deserved the trashing it got by the industry and pundits at the time and ever since. Someday I'll write more regarding what went right and what went wrong with Microsoft Bob.
In the meantime, and for a stroll down memory lane, check out Dan's website.
- SAPI Documentation Errata: ISpRecoGrammar::SetRuleState
-
There is a typo in the documentation for the ISpRecoGrammar::SetRuleState method in SAPI 5.3. The input parameters are listed as:
HRESULT SetRuleState(
LPCWSTR *pszName,
void *pReserved,
SPRULESTATE NewState
);
This instead it should be:
HRESULT SetRuleState(
LPCWSTR pszName,
void *pReserved,
SPRULESTATE NewState
);
Note that instead of "*pszName" the parameter should be "pszName".
We'll update MSDN and the Windows SDK documentation, but in the meantime we wanted to publish this errata.
- Speech Content in the Windows SDK
-
I'm happy to announce the availability of the RTM release of the Windows SDK. This release - the first RTM one since Vista - contains the following speech-related items:
- Updated: SAPI 5.3 documentation
- Updated: System.Speech documentation
- Updated: Sample source code
- 8 C++ projects
- 3 C# projects
- 2 sample engines - TTS and SR
- New: Grammar Compiler (GC.EXE) tool now part of the tool binaries included in the SDK
The Windows SDK completely replaces the older SAPI 5.1 SDK and supports development on Windows XP, Windows Server 2003, Windows Vista, and Windows Server 2008.
Customers can download this SDK as a DVD image (1,330MB ISO file) from this location:
http://www.microsoft.com/downloads/details.aspx?FamilyId=F26B1AA4-741A-433A-9BE5-FA919850BDBF
Or go through a guided setup process where only the components they need are downloaded. Speech is part of the base install.
http://www.microsoft.com/downloads/details.aspx?FamilyId=E6E1C3DF-A74F-4207-8586-711EBE331CDC
I'm particularly interested in your feedback regarding the Windows SDK as a whole and in particular getting speech information.
- Display Context Menus Where The Cursor Is, Not Where the Mouse Is
-
This is a little user interface rant of mine since I'm speech and keyboard-oriented. While editing text, when I say "Press Shift F Ten" or press the Application Key (to the right of the spacebar on Windows keyboards), I expect the context menu to appear at the text cursor location, since that's where the action is going to take place.
However, some applications assume the mouse activitated the functionality and positions the context menu wherever the mouse is. Since I'm using speech or typing and haven't touched the mouse in a while, the menu appears nowhere near where the cursor or selection is.
A more common variant of this is when the menu appears in the upper-left corner of the edit box when activated by keyboard.
The article titled Using Menus on MSDN contains sample code that always uses lParam for the X/Y location to display the menu. The documentation on WM_CONTEXTMENU is clear:
If the context menu is generated from the keyboard—for example, if the user types SHIFT+F10—then the x- and y-coordinates are -1 and the application should display the context menu at the location of the current selection rather than at (xPos, yPos).
That advice is ignored in the Using Menus topic, so I used the "Add Community Content" to add a note, and I'll file a bug on this so that it can be fixed in the future.
Using MSDN's Community Content feature, I added the following to the Using Menus article:
Remember when processing the WM_CONTEXTMENU message, that the X/Y coordinates might be -1/-1 which indicates that the keyboard generated the menu, thus, the menu should be shown at the cursor location or at the location of the selection - NOT at -1/-1 or the mouse pointer location.
The samples currently in this article do not account for this and will attempt to display the menu at -1/-1, which is confusing to the keyboard user. Pressing the Application Key on Windows keyboards (to the right of the spacebar) generates a VK_APPS virtual scan code which by default generates a WM_CONTEXTMENU. You can also get this if the user presses SHIFT+F10.
Never handle SHIFT+F10 or VK_APPS to popup a context menu, rely on the WM_CONTEXTMENU message and if the location given is -1/-1, revert to using the text cursor and/or selection information to place the menu.
- The Desktop Is Not For Programs
-
I'm constantly amazed that people think that putting shortcuts to programs on the desktop makes accessing that program "easier".
For the second time in about a week, I've encountered people asking how to put shortcuts to programs on the desktop.
The desktop is ill-suited for this. To start with, items located on it are often not visible because other windows are placed in front of the desktop. Depending on the current window layout, you might have to make one or more mouse or keyboard operations to select the desktop item you want.
To make matters worse, the location of the items will shift positions as screen resolutions change (because of games, connecting monitors, etc.) and items are added.
While oftentimes commercial software will litter the desktop with shortcuts, the purpose is to increase visibility, not ease of use.
Why not use the Start Menu? If WordPad is a program you use often, just "Pin" it to the Start Menu and it'll always be there, available in less keystrokes than trying to use it off the desktop.
If you really want quick access, pin the item to the Start Menu and then modify the item's properties to have a shortcut key assigned. Only items in the Start Menu can have shortcut keys assigned to them.
Update 9/4/2007: My bad, shortcut keys can be assigned to shortcut file that are located on the desktop. My initial test of this failed, and since I knew that only certain locations respect shortcut keys, I figured that the desktop was not one of them. I'll try to find a definitive list, but it appears that any of the Start Menu locations and the Desktop are valid places for a shortcut file to have a shortcut key assigned. Interestingly, shortcut keys for items in the Quick Launch toolbar location are not respected.
- Cool Developers Go Flying
-
Last week the speech team wrapped up a milestone of work and to celebrate I took up our group program manager Richard Sprague up for a quick tour of downtown Seattle, the Eastside area including Bill Gates' house and the main Microsoft campus in Redmond.
Richard took some video of the trip and posted the highlights on MSN's Soapbox video service:
Video: Flying over Seattle
- Background on Audio Volume in Windows Vista
-
Our friend in the multimedia group and prolific blogger Larry Osterman is writing a series of articles on how volume is treated in Windows Vista.
There is a whole new audio sub-system in Vista and Larry's blog is a great source of information for developers.
Volume in Windows Vista, part 1: What is "volume"?
Volume in Windows Vista, part 2: Types of volume in Windows Vista
- Upgrading to Windows Vista
-
I've personally found the upgrade process from Windows XP to Windows Vista to be seamless. However, I know that people have concerns about their software and devices working with a new version of the operating system.
In Windows Vista there is pro-active and reactive technology to help with compatibility issues, so I think compatibility problems will be not be as big a concern. However, with over 200 million users, there will be issues for some people. The best thing you can do is plan ahead.
Use the Windows Vista Upgrade Advisor on your current machine to identify potential problem areas. In many cases where there is compatibility issues, there is already a new version available from the manufacturer. The upgrade advisor will link directly there where you can get the new version.
For technical folks, you can browse the Hardware Compatibility List (HCL) and see the level of support available.
Now for a purely personal comment. In the past, I've found that software that doesn't work with newer versions of the operating system (not just Windows Vista), tend be of lower quality overall, or tend to perform tasks that require intertwining with the operating system. The days of allowing applications free rein to manipulate the file system and registry is over. Too many applications abused this and Windows has had to clamp down to prevent exploits.
Already have Windows Vista installed and are having problems? Here is some help.
Now, for Speech API developers, I'm happy to say that Windows Vista includes SAPI 5.3, which is backwards compatible with SAPI 5.1 that was included with Windows XP.
- Every single thing Windows Vista Speech Recognition is listening for
-
Rob Chambers is passionate about speech and a prolific blogger. Something that I've always wanted was a list of all the commands that Windows Speech Recognition recognizes. I knew I could probably scan through the internal grammars that WSR uses, but what I didn't know was that Rob had already posted such a list nearly a year ago.
One command that surprised me was "move speech recognition to the bottom" (or "top"). Sometimes the UI panel at the top of screen gets in the way. I knew I could click the Minimize button, but that would not be an option for everyone trying to use their computer hands-free.
Link to Rob's Rhapsody : Every single thing Windows Vista Speech Recognition is listening for
- Windows Live WiFi Hotspot Locator
-
How I wish I had this tool last month as I was without power and telephone/DSL service at home and looking for a Internet hookup.
Sure, Starbucks and McDonalds have WiFi, but Starbucks wants $9.95 for 24 hours. McDonalds will give you 2 hours for $2.95, which is not bad.
The Windows Live WiFi Hotspot Locator will accept location information and show you the hotspots within a selected radius. You can easily print the list or map each location. Very handy.
- How NASA Can Help Microsoft
-
On an internal blog at Microsoft, I came across a posting by a Corporate VP on some books he was going read while on vacation. One of the books was a autobiography of one of my hero's, Gene Kranz, who was Flight Director for several flights in Project Gemini and Apollo.
Kranz lead his team of Flight Controllers (known as the White Team) during two of the most dramatic events in the space program at the time; the touchdown of the Apollo 11 Lunar Module, Eagle, and again when the Apollo 13 Service Module exploded.
Kranz's book is called Failure Is Not An Option and is wonderfully written and I recommend it highly. In my opinion, and echoed in the internal blog posting I read, the lessons of the space race of the late 1950's and 1960's are of value to any large organization. Kranz's management style is no-nonsense, and that constant practice through simulation kept everyone alert and allowed them to react quickly to unplanned situations.
Probably the worst thing in his Flight Control world was encountering a problem that wasn't already thought about. The idea being as anything went wrong, everyone has experience with the problem from simulation and could react very quickly. Absolutely there are numerous lessons for Microsoft in how NASA and it's contractors approached the space race to the Moon.
In this arena, anything could happen, so you assumed that from the start, planned for it, designed for it, and executed it with that in mind. Imagine a world in which software developers assume everything could fail and one in which simulation (testing) does fail everything. The result would be much more robust code.
I think Microsoft does more of this kind of testing than another major software maker. While software components, particular the interactions with the operating system platform are complex, the various systems of the Apollo missions where incredibly complex as well. There were several major companies and hundreds of small contractors providing pieces to the system and they all had to work together perfectly.
After reading his book, I got to meet Gene at a MoF dinner a few years ago. He's got a great personality and mentality for thinking through problems.
I also recommend Angle of Attack by Harrison Storms, who was a senior VP for North American Aerospace, personally responsible for Apollo Command Module. It too has lessons on how business can react to tragic failure (as what happened when the Apollo 1 command module caught fire during a ground test, killing three astronauts).
These are not business books, but biographies of head-strong people leading large organizations doing high-profile work.