Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
Microsoft has been working on handwriting recognition for over 15 years going back to the Pen extensions for Windows 3.0. With the increased integration and broad availability of the handwriting components present in Windows Vista we continue to see increased use of handwriting with Windows PCs. We see many customers using handwriting across a wide variety of applications including schools, hospitals, banking, insurance, government, and more. It is exciting to see this natural form of interaction used in new scenarios. Of course one thing we need to continue to do is improve the quality of recognition as well as the availability of recognizers in more languages around the world. In this post, Yvonne, a Program Manager on our User Interface Platform team, provides a perspective on engineering new recognizers and recognition improvements in Windows 7. --Steven
Hi, my name is Yvonne and I’m a Program Manager on the Tablet PC and Handwriting Recognition team. This post is about the work we’ve done to improve recognition in handwriting for Windows 7.
Microsoft has invested in pen based computing since the early 1990s and with the release of Windows Vista handwriting recognizers are available for 12 languages, including USA, UK, German, French, Spanish, Italian, Dutch, Brazilian Portuguese, and Chinese (Simplified and Traditional), Japanese and Korean. Customers frequently ask us when we plan to ship more languages and why a specific language is not yet supported. We are planning to ship new and improved languages for Windows 7, including Norwegian, Swedish, Finnish, Danish, Russian, and Polish, and the list continues to grow. Let’s explore what it takes to develop new handwriting recognizers.
Windows has true cursive handwriting recognition, you don’t need to learn to write in a special way – in-fact, we’ve taught (or “trained” as we say) Windows the handwriting styles of thousands of people and Windows learns more about your style as you use it. Over the last 16 years we’ve developed powerful engines for recognizing handwriting, we continue to tune these to make them more accurate, faster and to add new capabilities, such as the ability to learn from you in Vista. Supporting a new language is much more than adding new dictionaries – each new language is a major investment. It starts with collecting native handwriting, next we analyze the data and go through iterations of training and tuning, and finally the system gets to you and continues to improve as you use it.
The development of a new handwriting recognizer starts with a huge data collection effort. We collect millions of words and characters of written text from tens of thousands of writers from all around the world.
Before I describe our collection efforts, I would like to answer a question we are frequently asked: “Why can’t you just use an existing recognizer with a new dictionary?” One reason is that some languages have special characters or accents. But the overriding reason is because people in different regions of the world learn to write in different ways, even between countries with the same language like the UK and US. Characters that may look visually very similar to you can actually be quite different to the computer. This is why we need to collect real world data that captures exactly how characters, punctuation marks and other shapes are written.
Setting up a data collection effort is challenging and time consuming because we want to ensure that we collect the “right kind of data”. We carefully choose our collection labs in the respective countries for which we develop recognizers.
Before we start our data collection in the labs, we configure our collection tools, prepare documentation, and compile language scripts that will guide our volunteers through the collection process. Our scripts are carefully prepared by native speakers in the respective language to ensure that we collect only orthographically correct data, data from different writing styles, and data that covers all characters, numbers, symbols and signs that are relevant to a specific language. All of our scripts are proofread and edited before they are blessed to be used at the collection labs.
Once our tools and scripts are ready, we open our labs and start to recruit volunteers to donate their handwriting samples. Our recruitment efforts ensure that we have balanced demographics such as gender, age, left handiness, and educational background that represent the majority of the population for that country.
A supervisor at the lab instructs the volunteers to copy the text as it is displayed in the collection tool in their own writing style. What is important to note is that we want to collect writing samples that accurately represent the person’s natural way of writing. We therefore encourage volunteers to treat “pen and tablet” like “pen and paper”. If one of the volunteers tends to writes in big, curvy strokes, then we want to collect his/her big, curvy strokes during the collection session. High quality data in this context refers to data that was naturally written.
Here is a snapshot of what our collection tool looks like:
Figure 1: Collection Tool
A collection session lasts between 60-90 minutes at which point our volunteer has donated a significant amount of handwritten data without feeling fatigued. The donated data is then uploaded and stored in our database at Microsoft ready for future use. The written samples contain important information like stroke orders, start- and end points, spacing, and other characteristics that are essential to train our new recognizer.
Let’s take a look at some of our samples in our database to illustrate the great variation among ink samples:
Figure 2: Ink samples illustrating different stroke orders.
The screenshot shows how three different volunteers inked the word “black”. The different colors are used to illustrate the exact stroke orders in which the word was written. Our first two volunteers used five strokes to write the word “black”; our third volunteer used four strokes. Please also note how our third volunteer used one stroke only to ink the letters “ck”, while our first volunteer used three strokes for the same combination of letters. All of this information is used to train our recognizers.
Once we have collected a sufficient amount of inked data, we split our data into a training set, used by our development team, and a “blind” set, used by our test team. The training set is then employed to train the Neural Network, which is largely responsible for the magic that is taking place during the recognition process. Good, naturally written data is essential in developing a high quality recognizer; the recognizer can’t be any better than its training set. The more high quality data we feed into our Neural Network, the more equipped we are to handle sloppy cursive handwriting.
Our Neural Network is a Time-Delay Neural Network (TDNN) that can handle connected letters of cursive scripts. A TDNN takes ink segments of preceding and following stroke segments into consideration when computing the probabilities of letters, digits and characters for each segment of ink. The output of the TDNN is powerful but not good enough when handwriting is sloppy. In order to come within reach of human recognition accuracy, we have to employ information that goes beyond the shape of the letter: we call this the Language Model context. The majority of this Language Model context comes in form of the lexicon, which is a wordlist of valid spellings for a given language. For many languages, this is the same lexicon that the spellchecker uses. The TDNN and the lexicon work closely together to compute word probabilities and output the top suggestions for the given input.
Training the Neural Network is an involved process that takes time. We often experiment with borrowing data from other languages to increase the size of the training data with the ultimate goal to boost recognition accuracy. Borrowing characters from other languages does not always lead to success. As I mentioned above, stroke order, letter shape, writing styles and letter size can differ significantly from country to country and can have a negative impact on the performance of the TDNN. It often takes us several rounds of training, re-training and tuning before we find “the right formula” that will lead to high recognition accuracy.
How do we know if we are headed in the right direction when we build a new recognizer? This is an important question that the test team and native speakers answer for us. The test team is responsible for generating our recognition accuracy metrics that reflect how good our recognizer is. These accuracy metrics are based on our blind test set which is the collected data that development could not use for training. In addition to our accuracy metrics, we work with native speakers in house and at our world-wide subsidiaries to get feedback and further input.
In the previous paragraphs I have outlined how we develop high quality recognizers that can handle a wide variety of different writing styles. But there is more as each person can also train the recognizer his/her unique writing style. The training that is done to teach the recognizer a personal writing style is the same training that happens before Microsoft ships the product. The only difference is that we are now collecting unique training data from a specific person (and not that of thousands of people). We call this process “Personalization”.
Figure 3: Personalization Wizard (Sentence module).
As the screenshots of our Personalization wizard illustrates, a person is asked to write the requested sentence to provide his/her ink samples. The more data a person donates during the personalization process, the better the recognizer will become. In addition to providing writing samples based on specified sentences, a person can target specific recognition errors, shapes, and characters that will all be used for training. Our Personalization feature is complex and offers a variety of different modules that enable a person to optimally tune the recognizer. We are proud to announce that Personalization will be available for all Vista languages and all new Windows 7 languages. We encourage you to use this feature to improve your recognition accuracy.
We continue to work on improving our recognizers which also means that we are incorporating our customers feedback through online telemetry (anonymously, privately, voluntary, and opt-in). In Windows Vista we released a new feature called “Report Handwriting Recognition Errors”, which gives people the opportunity to submit those ink samples that the recognizer did not recognize correctly. After the person has corrected a word in the Tablet Input Panel (TIP), we enable a menu that allows a person to send the misrecognized ink together with its corrected version to our team.
Here is a screenshot of what our error reporting tool looks like:
Figure 4: With “Report Handwriting Recognition Errors” people can choose which of the misrecognized ink samples they want to submit.
We receive approximately 2000 error reports per week. Each error report is stored in our database before we analyze it and use it to improve our next generation of recognizers. As you can imagine, real world data is extremely helpful because it is only this type of data that can reveal shortcomings of our recognizers.
We value and appreciate every single error report. Keep sending us your feedback, so that we can use it to improve the magic of our present and future recognizers.
– Yvonne representing the handwriting recognition efforts
And what are the improvements in Windows 7?
Now i wont tablet PC :D
Example new dell latitude Xt2
Handwriting works amazingly well on Tablet PCs, but a major problem I'm having with pen input (atleast on my HP Tablet PC) is terrible calibration and a no reliable way to improve calibration. The pointer will often be half a centimeter off from the point where the pen tip is touching the screen. In addition, at different positions on the screen, the calibration is correspondingly better or worse. Now, perhaps this is more of a hardware issue (which Microsoft has less control over), but it would be great if some kind of better calibration tool came with Windows 7 (there is one that I believe came with HP on my machine but it only had 4 calibration points on it and more often than not ruins calibration even more than it already is). The more calibration points on this tool the better--I wouldn't mind pecking away at the screen every once in a while if I could enjoy a realistic "pen and paper" experience.
Impressive. My nasty handwriting may actually be usable.
I take it that you're planning on releasing updates for the recognition if you're setting up an opt-in feedback service?
After years of using the pen, I find it serves me better focus on where the cursor is rather than where the pen tip is. It's never more than a mm off for me anyway, so it's not too much of a hardship.
Although I should mention there is a better calibration tool with Windows 7 that uses 16 points, but I haven't noticed an appreciable difference due to it.
Thanks for the excellent write up.
What about the new Math Input Panel new to Windows 7? Is there any overlap in how it was created? If so, is there a broader tool/API that could be created which Microsoft and third parties could use to create other recognition sets?
AND FINALY! NORWEGIAN IS GOING TO BE SUPPORTED! :D Therefore, I must get a tabletPC at once Windows 7 get's released ^^,
I think I have a lot of "bad habits" when it comes to pen input in Windows -- my first pen input device was an HP iPaq 3850 running PocketPC 2002. I learned to make my L's as curly-L's, even though I don't do that normally -- it increased recognition.
After reading this post I just went and did about half of the handwriting exercises. Hopefully this will help my Win7 tablet with my URL entry in non-IE browsers. :D
Wonderful post! I'm incredibly interested in this sort of thing, so I really appreciate your time writing this up Yvonne. Also thanks to Steven for posting it. :)
Pretty awesome, now if I only had a tablet/convertible PC myself.
I'm wondering though, you said that the recognizer will use the language that your spellchecker uses to improve the recognizers accuracy. How does this work of you would use 2 languages at the same time? Let's say I'm making lecture notes, which are given in Dutch in my case, but have to use quite some English words while doing so. Wouldn't using 2 languages at the same time decrease accuracy in this case? Or is the recognizer intelligent enough to know that you are not consistently using the same language in that same piece of text?
I have to admit though, I'm really interested in buying myself a nice tablet/convertible laptop in the future and this post made me even more eager to have one.
I have to say - the handwriting recognition software in Windows is excellent.
I actually found a very interesting use: when I was taking Chinese class in school, a tablet PC was by far the best way of using a dictionary to find the meaning and pronunciation of a character. By hand this requires looking the character up by radical and stroke count (a laborious process by any standard) - writing the character on screen with a tablet PC was massively easier (and worked incredibly well).
Perhaps you should advertise tablets to students studying east asian languages?
bdodson, I'm learning chinese too and have had the same idea about using Chinese (mandarin) character recognition: using OneNote to practice writing them down and using recognition to see if I've done so correctly :)
@bdodson & @lozmatic -- Hey, I did the same thing!
During my first demo of the new recognizers I even got a chance to write a little "show off" by writing a bit of Russian (a very little bit).
<pedant mode="on">Since when were 'USA' and 'UK' languages? Surely you mean English with USA or UK spelling & grammar?</pedant>
Very interesting and very impressive!
What about Speech and Narrator?