Notes on comments.
Welcome to our blog dedicated to the engineering of Microsoft Windows 7
Microsoft has been working on handwriting recognition for over 15 years going back to the Pen extensions for Windows 3.0. With the increased integration and broad availability of the handwriting components present in Windows Vista we continue to see increased use of handwriting with Windows PCs. We see many customers using handwriting across a wide variety of applications including schools, hospitals, banking, insurance, government, and more. It is exciting to see this natural form of interaction used in new scenarios. Of course one thing we need to continue to do is improve the quality of recognition as well as the availability of recognizers in more languages around the world. In this post, Yvonne, a Program Manager on our User Interface Platform team, provides a perspective on engineering new recognizers and recognition improvements in Windows 7. --Steven
Hi, my name is Yvonne and I’m a Program Manager on the Tablet PC and Handwriting Recognition team. This post is about the work we’ve done to improve recognition in handwriting for Windows 7.
Microsoft has invested in pen based computing since the early 1990s and with the release of Windows Vista handwriting recognizers are available for 12 languages, including USA, UK, German, French, Spanish, Italian, Dutch, Brazilian Portuguese, and Chinese (Simplified and Traditional), Japanese and Korean. Customers frequently ask us when we plan to ship more languages and why a specific language is not yet supported. We are planning to ship new and improved languages for Windows 7, including Norwegian, Swedish, Finnish, Danish, Russian, and Polish, and the list continues to grow. Let’s explore what it takes to develop new handwriting recognizers.
Windows has true cursive handwriting recognition, you don’t need to learn to write in a special way – in-fact, we’ve taught (or “trained” as we say) Windows the handwriting styles of thousands of people and Windows learns more about your style as you use it. Over the last 16 years we’ve developed powerful engines for recognizing handwriting, we continue to tune these to make them more accurate, faster and to add new capabilities, such as the ability to learn from you in Vista. Supporting a new language is much more than adding new dictionaries – each new language is a major investment. It starts with collecting native handwriting, next we analyze the data and go through iterations of training and tuning, and finally the system gets to you and continues to improve as you use it.
The development of a new handwriting recognizer starts with a huge data collection effort. We collect millions of words and characters of written text from tens of thousands of writers from all around the world.
Before I describe our collection efforts, I would like to answer a question we are frequently asked: “Why can’t you just use an existing recognizer with a new dictionary?” One reason is that some languages have special characters or accents. But the overriding reason is because people in different regions of the world learn to write in different ways, even between countries with the same language like the UK and US. Characters that may look visually very similar to you can actually be quite different to the computer. This is why we need to collect real world data that captures exactly how characters, punctuation marks and other shapes are written.
Setting up a data collection effort is challenging and time consuming because we want to ensure that we collect the “right kind of data”. We carefully choose our collection labs in the respective countries for which we develop recognizers.
Before we start our data collection in the labs, we configure our collection tools, prepare documentation, and compile language scripts that will guide our volunteers through the collection process. Our scripts are carefully prepared by native speakers in the respective language to ensure that we collect only orthographically correct data, data from different writing styles, and data that covers all characters, numbers, symbols and signs that are relevant to a specific language. All of our scripts are proofread and edited before they are blessed to be used at the collection labs.
Once our tools and scripts are ready, we open our labs and start to recruit volunteers to donate their handwriting samples. Our recruitment efforts ensure that we have balanced demographics such as gender, age, left handiness, and educational background that represent the majority of the population for that country.
A supervisor at the lab instructs the volunteers to copy the text as it is displayed in the collection tool in their own writing style. What is important to note is that we want to collect writing samples that accurately represent the person’s natural way of writing. We therefore encourage volunteers to treat “pen and tablet” like “pen and paper”. If one of the volunteers tends to writes in big, curvy strokes, then we want to collect his/her big, curvy strokes during the collection session. High quality data in this context refers to data that was naturally written.
Here is a snapshot of what our collection tool looks like:
Figure 1: Collection Tool
A collection session lasts between 60-90 minutes at which point our volunteer has donated a significant amount of handwritten data without feeling fatigued. The donated data is then uploaded and stored in our database at Microsoft ready for future use. The written samples contain important information like stroke orders, start- and end points, spacing, and other characteristics that are essential to train our new recognizer.
Let’s take a look at some of our samples in our database to illustrate the great variation among ink samples:
Figure 2: Ink samples illustrating different stroke orders.
The screenshot shows how three different volunteers inked the word “black”. The different colors are used to illustrate the exact stroke orders in which the word was written. Our first two volunteers used five strokes to write the word “black”; our third volunteer used four strokes. Please also note how our third volunteer used one stroke only to ink the letters “ck”, while our first volunteer used three strokes for the same combination of letters. All of this information is used to train our recognizers.
Once we have collected a sufficient amount of inked data, we split our data into a training set, used by our development team, and a “blind” set, used by our test team. The training set is then employed to train the Neural Network, which is largely responsible for the magic that is taking place during the recognition process. Good, naturally written data is essential in developing a high quality recognizer; the recognizer can’t be any better than its training set. The more high quality data we feed into our Neural Network, the more equipped we are to handle sloppy cursive handwriting.
Our Neural Network is a Time-Delay Neural Network (TDNN) that can handle connected letters of cursive scripts. A TDNN takes ink segments of preceding and following stroke segments into consideration when computing the probabilities of letters, digits and characters for each segment of ink. The output of the TDNN is powerful but not good enough when handwriting is sloppy. In order to come within reach of human recognition accuracy, we have to employ information that goes beyond the shape of the letter: we call this the Language Model context. The majority of this Language Model context comes in form of the lexicon, which is a wordlist of valid spellings for a given language. For many languages, this is the same lexicon that the spellchecker uses. The TDNN and the lexicon work closely together to compute word probabilities and output the top suggestions for the given input.
Training the Neural Network is an involved process that takes time. We often experiment with borrowing data from other languages to increase the size of the training data with the ultimate goal to boost recognition accuracy. Borrowing characters from other languages does not always lead to success. As I mentioned above, stroke order, letter shape, writing styles and letter size can differ significantly from country to country and can have a negative impact on the performance of the TDNN. It often takes us several rounds of training, re-training and tuning before we find “the right formula” that will lead to high recognition accuracy.
How do we know if we are headed in the right direction when we build a new recognizer? This is an important question that the test team and native speakers answer for us. The test team is responsible for generating our recognition accuracy metrics that reflect how good our recognizer is. These accuracy metrics are based on our blind test set which is the collected data that development could not use for training. In addition to our accuracy metrics, we work with native speakers in house and at our world-wide subsidiaries to get feedback and further input.
In the previous paragraphs I have outlined how we develop high quality recognizers that can handle a wide variety of different writing styles. But there is more as each person can also train the recognizer his/her unique writing style. The training that is done to teach the recognizer a personal writing style is the same training that happens before Microsoft ships the product. The only difference is that we are now collecting unique training data from a specific person (and not that of thousands of people). We call this process “Personalization”.
Figure 3: Personalization Wizard (Sentence module).
As the screenshots of our Personalization wizard illustrates, a person is asked to write the requested sentence to provide his/her ink samples. The more data a person donates during the personalization process, the better the recognizer will become. In addition to providing writing samples based on specified sentences, a person can target specific recognition errors, shapes, and characters that will all be used for training. Our Personalization feature is complex and offers a variety of different modules that enable a person to optimally tune the recognizer. We are proud to announce that Personalization will be available for all Vista languages and all new Windows 7 languages. We encourage you to use this feature to improve your recognition accuracy.
We continue to work on improving our recognizers which also means that we are incorporating our customers feedback through online telemetry (anonymously, privately, voluntary, and opt-in). In Windows Vista we released a new feature called “Report Handwriting Recognition Errors”, which gives people the opportunity to submit those ink samples that the recognizer did not recognize correctly. After the person has corrected a word in the Tablet Input Panel (TIP), we enable a menu that allows a person to send the misrecognized ink together with its corrected version to our team.
Here is a screenshot of what our error reporting tool looks like:
Figure 4: With “Report Handwriting Recognition Errors” people can choose which of the misrecognized ink samples they want to submit.
We receive approximately 2000 error reports per week. Each error report is stored in our database before we analyze it and use it to improve our next generation of recognizers. As you can imagine, real world data is extremely helpful because it is only this type of data that can reveal shortcomings of our recognizers.
We value and appreciate every single error report. Keep sending us your feedback, so that we can use it to improve the magic of our present and future recognizers.
– Yvonne representing the handwriting recognition efforts
The Great Mark Russinovich
Springboard Windows 7
Ultimate Extras come in windows seven ultimate ? From my point of view is an important feature for windows, that many bought the Ultimate version, not only by all the features but also by the ultimate extras. Many hope that this feature is not removed from the final version
I've upgraded my Thinkpad X60 Tablet running Vista to the Windows 7 beta. Nice! Except, now I don't have Dutch recognition anymore. Under Vista I could also put it to Dutch, in Win7 I only get the on-screen keyboard when set to the Dutch language. (Note: I ran English Vista, not Dutch, so that's not it...)
Is the beta missing other pen input languages but English, or can I turn it on somewhere.
Oh yeah, I do like the 'in place' recognition of Win7! Easier then the Vista way. And Win7 does indeed a better job of recognizing, IMHO.
I was waiting for Arabic language recognition from the first days of tablets, years ago.
I know that Arabic language recognition is very different from Latin languages, but if Microsoft did not do it, who will?
When will you add support for Arabic language?
In this version we have significantly improved the accuracy for four East Asian languages (Simplified Chinese, Traditional Chinese, Japanese, and Korean), we have also provided better personalization scheme, and we also support text prediction fucntion for CHS and CHT. Actually our accuracy has surprised many users since initially they didn't expect their cursive writing can be recognized!
This will be a big benefit for students and the learning process in general. the days of taking hand written notes may soon become a thing of the past. I would also image this could reduce the amount of paper used, in that all notes can be consolidated.
well, handwriting itself works spotless, BUT I used to use TRUST's graphic tablet which seems not to work with anything else than handwriting collector as cursor sticks to the edges of the screen and I can only move it up and down :( 7's handwriting works, however well
See subject-line above:
I have YET to see a valid technical response from anyone online, be they Microsoft networking personnel (or otherwise) as to:
1.) Why HOSTS files in VISTA/Server 2008/Windows 7 cannot use the more efficient on disk smaller 0 blocking IP address
(vs. the larger & slower to load 127.0.0.1 loopback adapter, or the slightly less efficient 0.0.0.0, for stopping known bad site access)
2.) Why the GUI front-end for PORT FILTERING has been removed in VISTA/Server 2008/Windows 7... or, rather, moved to the Advanced section of Windows Firewall controls.
(&, the reasoning from the VISTA resource kit is poor in this regards, because removing the PORT FILTERING gui control feature ONLY doesn't prevent the other 2-3 methods of layered security from working WITH (or against) one another, as the reasoning was stated)
Removing the PORT FILTER 3 part design (not just the Local Connection gui control layer of security) is a BAD MOVE, imo, & for the SAME REASON "zone defenses" are usually better than "man-to-man" ones in sports!
Now - the reasoning given by the VISTA reskit was that removing it was because none of the methods in software firewalls, IP Security Policies, Port Filtering & even RRAS methods do not "automatically sync" w/ one another...
WELL - so what?
I say that, because this actually WORKS IN FAVOR of "layered security", because if 1 of them goes down (or, is taken down, which is what malware often seek to do, say, in the case of software firewalls), the other layered security methods are in the way...
This is much like folks using deadbolt locks, door handle locks, & chain locks on the doors of their homes - break 1? The others still function to stop intruders.
Nobody seems to be answering why this was done, especially in favor of BOTH of the above points, as to WHY it was done... could it be MS has made a mistake here, & is unwilling to admit it publicly?
Until I see a SOLID, LOGICAL TECHNICAL REASON for both of the above occurring, because I have not to date, @ this poin, from YOU folks @ MS, or from others interested in the area of TCP/IP networking online?
I am leaning to my conclusion here - MS has messed up...
P.S.=> Following up on what I wrote up above!
(That's so others here have some documentation from Microsoft themselves, & especially in regards to the differences in HOW their security works now)
Thus, I'll now note how:
1.) TCP/IP packet processing paths differences between in how Windows 2000/XP/Server 2003 did it (IPSEC.SYS (IP Security Policies), IPNAT.SYS (Windows Firewall), IPFLTDRV.SYS (Port Filtering), & TCPIP.SYS (base IP driver))...
2.) AND, how VISTA/Server 2008/Windows 7 do it now currently, using a SINGLE layer (WFP)...
First off, here is HOW it worked in Windows 2000/XP/Server 2003 - using 3 discrete & different drivers AND LEVELS/LAYERS of the packet processing path they worked in:
The Cable Guy - June 2005: TCP/IP Packet Processing Paths
The following components process IP packets:
IP forwarding Determines the next-hop interface and address for packets being sent or forwarded.
TCP/IP filtering Allows you to specify by IP protocol, TCP port, or UDP port, the types of traffic that are acceptable for incoming local host traffic (packets destined for the host). You can configure TCP/IP filtering on the Options tab from the advanced properties of the Internet Protocol (TCP/IP) component in the Network Connections folder.
Filter-hook driver A Windows component that uses the filter-hook API to filter incoming and outgoing IP packets. On a computer running Windows Server 2003, the filter-hook driver is Ipfltdrv.sys, a component of Routing and Remote Access. When enabled, Routing and Remote Access allows you to configure separate inbound and outbound IP packet filters for each interface using the Routing and Remote Access snap-in. Ipfltdrv.sys examines both local host and transit IP traffic (packets not destined for the host).
Firewall-hook driver A Windows component that uses the firewall-hook API to examine incoming and outgoing packets. On a computer running Windows XP, the firewall-hook driver is Ipnat.sys, which is shared by both Internet Connection Sharing and Windows Firewall. Internet Connection Sharing is a basic network address translator (NAT). Windows Firewall is a stateful host-based firewall. Ipnat.sys examines both local host and transit IP traffic. On a computer running Windows Server 2003, Ipnat.sys is shared by Internet Connection Sharing, Windows Firewall, and the NAT/Basic Firewall component of Routing and Remote Access. If the NAT/Basic Firewall component of Routing and Remote Access is enabled, you cannot also enable Windows Firewall or Internet Connection Sharing.
IPsec The IPsec component, Ipsec.sys, is the implementation of IPsec in Windows to provide cryptographic protection to IP traffic. Ipsec.sys examines both local host and transit IP traffic and can permit, block, or secure traffic.
1.) After receiving the IP packet, Tcpip.sys passes it to Ipsec.sys for processing.
If the packet has IPsec protection (the IP Protocol field value indicates either Authentication Header [AH] or Encapsulating Security Payload [ESP]), it is processed and removed. If the Windows Firewall: Allow authenticated IPSec bypass Group Policy setting applies to the computer, Ipsec.sys sets an IPsec Bypass flag associated with the packet. Ipsec.sys passes the resulting packet back to Tcpip.sys.
If the packet does not have IPsec protection, based on the set of IPsec filters, Ipsec.sys determines whether the packet is permitted, blocked, or requires security. If permitted, Ipsec.sys passes the packet back to Tcpip.sys without modification. If the packet is blocked or requires security, Ipsec.sys silently discards the packet.
2.) Tcpip.sys passes the packet to Ipfltdrv.sys for processing.
Based on the interface on which the packet was received, Ipfltdrv.sys compares the packet to the configured inbound IP packet filters.
If the inbound IP packet filters do not allow the packet, Ipfltdrv.sys silently discards the packet. If the inbound IP packet filters allow the packet, Ipfltdrv.sys passes the packet back to Tcpip.sys.
3.) Tcpip.sys passes the packet to Ipnat.sys for processing.
If Internet Connection Sharing or the NAT/Basic Firewall is enabled and the interface on which the packet was received is the public interface connected to the Internet, Ipnat.sys compares the packet to its NAT translation table. If an entry is found, the IP packet is translated and the resulting packet is treated as source traffic.
Windows Firewall checks the IPsec Bypass flag associated with the packet. If the IPsec Bypass flag is set, Windows Firewall passes the packet back to Tcpip.sys.
If the IPsec Bypass flag is not set, Windows Firewall compares the packet to its exceptions list. If the packet matches an exception, Ipnat.sys passes the IP packet back to Tcpip.sys. If the IP packet does not match an exception, Ipnat.sys silently discards the IP packet.
Tcpip.sys compares the IP packet to the configured set of allowed packets for TCP/IP filtering.
If TCP/IP filtering does not allow the packet, Tcpip.sys silently discards the packet. If TCP/IP filtering allows the packet, Tcpip.sys continues processing the packet, eventually passing the packet payload to TCP, UDP, or other upper layer protocols.
NOW, the new method, "WFP", used by Windows VISTA, Windows Server 2008, & the upcoming Windows 7:
"The IPsec Policy Agent service and Windows Firewall are examples of WFP applications that are included with Windows Vista and Windows Server 2008"
"Because all the applications and services use the same filtering engine, it is easier to determine whether other applications or services exist that perform the same function."
(JUST A SINGLE LAYER/LEVEL OF WORK, instead of 3 discrete-separate ones)
SO - what is the "problem" I have with this NEW method?
(That yes, does seem to "sync" what was 'out-of-sync' in older Windows 2000/XP/Server 2003, but, what I felt was a STRENGTH of that, & NOT a weakness)
THE NEW "WFP" METHOD apparently only REPRESENTS A SINGLE POINT TO ATTACK FOR MALWARE MAKERS...
(I.E.=> ONLY 1 THING TO "TAKE OUT" vs. 3 like before... actually making it EASIER to attack because of this!).
The HOSTS file issue I note above? I have NO DOUBT on that one... but, I'd like to see the reasoning for PORT FILTERING being changed the most though! Thanks for your time... apk
I am probably unusual in that I so prefer the comfort (form factor) of handwriting recognition that I use it almost 100% of the time, even for documents 50 pages long. As one of the owners of a medical software company, I also use it in order to understand the advantages and limitations of this tool. Compared to earlier versions, I do detect some modest improvement in the accuracy of recognition in Windows 7 and I like some of the UI changes. There are two problems, however. The new version has a greater propensity toward revising already-recognized words based on new writing that is spatially well-separated from the already-recognized word. For example, I am using handwriting now. When I wrote "am using" in the last sentence it insisted on revising "am" (which it had already recognized) to combine it with "using" to create "amusing." That one is easy to fix with the split gesture, but "at times" became "attorney." This happens very frequently now (probably 8-10 times as I wrote the above). Sometimes it even goes two words back. I'd say it offsets the other improvements. I had become adept at fixing wrong words, but now I have to re-write larger segments that had already been recognized correctly, then were changed in such a way that it has to be re-written. The second issue is that with intensive usage (as it gets from me) it develops some sort of problem and stops working. This manifests as a failure to insert the text into the target field. You still see it in the handwriting dialog and when you hit "insert" you see it disappear, but it doesn't show up in the target. This was an issue when handwriting first came out, but it was fixed by a service pack. It's back now. The problem is that once it starts, it will continue intermittently and get worse. You have to reboot to fix the problem. I hope there will be a fix on this soon.
On further experience with the problem inserting text into the target field, it appears that it can recover without rebooting if given enough time ( 20 seconds to several minutes). Sometimes, the problem manifests as a delay in insertion. More often, the text to be inserted is lost.
BTW, I am observing this on a new fully loaded HP tablet PC.
Unfortunately, the tendency of the Windows 7 handwriting system to re-think already recognized words as you continue to write subsequent words is a serious step backwards. The majority of the time, it turns a correctly recognized word into a wrong one. It sometimes rethinks it twice in quick succession, which usually makes the end result even worse. The rethought words are seldom easily correctable. Besides decreasing accuracy, the feel of instability is quite unsettling. It was a mistake to go in this direction.
You guys really need to tone down the tendency of the 7.0 handwriting system to rethink itself. I'm constantly seeing behavior like I just encountered: I was trying to write "Isn't that true" and it had already recognized "Isn't that" but when I wrote "true" it decided I must have been trying to write "infinite thrust". It misses more than it get's right in this re-thinking it now does!