Welcome to MSDN Blogs Sign in | Join | Help

试试ESL助理

工作以来几乎天天要用英语写点东西,但对很多用法还是没有底。于是我常常将一些吃不准的英语用法当作检索词,在互联网上看看是否有很多相同的用法。基于这些经验,我和微软研究院的同事们开发了一个英语自动校对原型系统,叫ESL助理。该系统的目标用户便是像我一样的,不以英语为母语的英语用户。ESL助理采用一些统计语言模型检测出英语书写中的问题并提出修改建议。用户可以通过该系统提供的互联网例句检索工具确认所提供修改建议。

自己用了一下,感觉还不错。有兴趣的话,你也来试试。ESL助理http://www.eslassistant.com/ 

使用之前一定先看看使用说明http://www.eslassistant.com/Help.aspx 

 

目前,ESL助理能纠正以下类型的错误

  • 冠词的插入和删除
    • You can use search engine as way to check spelling and grammar.
    • We often go to movie theater downtown.
  • 介词混淆
    • It can be found easily in everywhere.
    • This is the most direct way to prevent to criminals from doing more harm.
    • He relies by his friends too much.
  • 形容词混淆
    • She is very interesting in the problem.
  • 语序
    • I bought a nice red big bag.
    • He is a my good friend.
    • I don't know what has he told the police.
  • 名词单复数
    • I like to hike with friend at the weekend.
    • There are still many problems of pollutions.
  • 动词形态
    • Every line of code that I writed was rewrited.
    • We can succeeds if we try.
    • We might succeeded if we tried.
    • We could worked it out.
  • 助动词用法
    • My teacher does is a great person.
    • To learn English we should be speak it as much as possible.
  • 动名词和不定式
    • I encourage young people change their job.
    • By keep a hobby, he has something to do.

... a modest goal ...

... or (for those who understand the cultural reference here): all your errors are not belong to us.

Well, we all (i.e. us non-native speakers of English, myself as a native German included) would like a tool that could just take what we write and turn it into grammatical and fluent English. Come on, how hard can it be?

Bear with me while I try to explain why it's anything BUT simple. Note that this will be a non-technical post: if you have some knowledge of natural language processing you'll be better off reading this technical paper. Now, first of all, in order to be able to offer perfect correction, we need to have some computer understanding of human language. We require some magic algorithm that truly understands what you want to say in the first place, and then puts it into nice prose. But despite the claims you often find (mostly in commercial applications), no such thing as language understanding by computers currently exists. Not even for well-formed and well-structured English - unless you define "language understanding" as something that has little to do with the common use of the phrase.

So what about just targeting sentences with a single/simple error? Again, we're in very difficult territory. What is a mistake, how many types of mistakes are there, and how do you detect them? One frustrated user of our service observed that "This sentence lots of mistakes contains" does not trigger any suggestion. The mistake in this example is a mix-up in word order: the verb "contains" appears at the end of the sentence, but it should occur after the subject "this sentence". We could target this kind of mistake by looking at misplaced verbs and then checking if the sentence gets "better" according to an automatic score if we move the verb. But why should we target this kind of mistake? This is an artificial example that has little to do with the errors that non-native speakers really make. As you can imagine, you have to be fairly selective about the errors you try to fix, otherwise just about any kind of word order change/word insertion/word deletion and any combination thereof needs to be considered as a viable alternative for every sentence. Sorry, but not all errors can be dealt with. But some of them can, and here is how we try to deal with them.

First, we identified a list of common and typical errors that actually occur in written English produced by non-native speakers (of East Asian native language as a starting point). We did that by reading through error analyses produced by other researchers, and we did our own analysis on some real-life data.

Second, we investigated the different error types to see what kind of technical solution works best: rules, machine-learning (or a mix), and we designed different error modules geared towards the different errors.

Here are two examples to illustrate the different techniques and levels of complexity. Some non-native writers of English have trouble with English verb morphology. After all, if it is "I kicked the ball" it should also be "I hitted the ball", right? All we need to fix this type of error is to look up incorrectly inflected irregular verbs in a relatively small list, and suggest replacing them with the irregular form. A small rule will do to detect this error and make a good suggestion.

On the other hand there are errors like the use of determiners ("I am teacher from city" versus "I am a teacher from the city") and the use of prepositions ("in the other hand"/"on the other hand") where it becomes impossible to list all possible errors and corrections. For this type of error we decided to use machine-learning techniques. We feed the machine with millions of sentences and the contexts for determiners and prepositions, and let it figure out the patterns by itself. At every beginning of a noun phrase, the machine extracts several words and part-of-speech tags (verb, noun, adjective etc) to the right and to the left, and based on these many million data points from millions of examples it produces statistical generalizations. For example, it learns that if you start a sentence with a preposition followed by "the other hand", it is more likely to have the preposition "On". But if you enter "I hold an ace in the other hand", the probability shifts, and the preposition "in" becomes more likely. Statistical models like these have the very nice property that they can discover the patterns present in a large collection of text, without being explicitly told what to look for. The downside is that they are only as good as the data they have seen. If confronted with unknown words, misspellings and unusual language, they start to make mistakes. That's why you might sometimes see a suggestion appear or disappear when you make a seemingly innocuous and unrelated change in the sentence.

Finally, we know from proofing tool studies that there is nothing worse than flooding the user with incorrect suggestions. In order to cut back as much as possible on these so-called "false flags", we also use a language model as a filter. Think of a language model as a large table of words and word sequences and their counts/probabilities. A language model is "trained" on a large collection of sentences (containing billions of words in our case). When you present it with a new sentence it will be able to assign a "goodness" score to that sentence, based on all the words and word sequences it has seen in the training data. We use the language model to only show a suggestion to the user if the score of the correction is (much) higher than that of the original. Which unfortunately also means that sometimes we suppress perfectly good suggestions just because the system is not quite sure enough.

We can't do magic, but maybe we can still be of some help with a set of common errors us non-native speakers frequently make. A modest goal, but it's a start.

The Microsoft Research ESL Assistant Prototype is Live

The Natural Language Processing Group in Microsoft Research is pleased to announce that its ESL Assistant is now available on the World Wide Web via a link at http://www.eslassistant.com.  A prototype writing assistance service designed for people who have learned English as a second or foreign language, the ESL Assistant uses statistical models to suggest corrections for a number of common learner error patterns that are not currently supported by the proofing tools in Microsoft Office products. The service also tries to help users judge the whether a suggestion truly represents an improvement by showing real-life examples returned by a web search. We hope that both learners of English and TEFL/TESL professionals will experiment with this service and find it useful.  

It is our hope that this prototype will begin to fill a long-standing gap in Microsoft's English language proofing tools. Every day, throughout the world, millions of non-native speakers write English at school or work, on blogs, or in personal e-mail, both to communicate with their counterparts from English-speaking countries, and as often as not, to communicate with each other. Writing errors made by learners are often different from those made by individuals who have grown up speaking the language, and can be complex and subtly contextual in ways that are not easily addressed by syntactic or lexical rules. For these reasons, ESL errors were not supported by Microsoft Office’s proofing tools when they were originally developed over a decade ago. Today, however, we are better positioned to address the challenge of providing writing assistance to this fast-growing segment of Microsoft customers. Modern statistical models and tools now allow solutions that were once impossible. In particular, our group's Machine Translation effort has generated valuable spinoffs in the form of huge language models, fast statistical taggers and other tools that we have reused for this proofing work, and we are now beginning to bring these to bear on the task of grammar checking.

Some features of the ESL Assistant service include:

  • Corrections for common ESL error types found in non-native English writing but not supported in by the Office grammar checker. More information about the specific kinds of error patterns that are supported can be found in the ESL Assistant Help (FAQ) page and on the team website
  • Implementation as a web service, which permits the deployment of huge statistical models to identify possible error locations.
  • Automatically generated searches to assist users by retrieving real-world contextual examples from the web.
  • A downloadable add-in for Outlook 2007, allowing mail text to besent to the ESL Assistant web site for checking.
  • An inline thesaurus feature that proposes alternative words appropriate to the context. 
  • An initial pass though Microsoft Office grammar and spell checking, that can be turned on by a checkbox, to ensure more comprehensive error coverage.
  • Localized pages for users in the Chinese, Japanese, Korean, and Russian markets (or if you set your default language in IE to one of these).

In its present form, the service is unapologetically experimental ("pre-alpha", despite the beta label).  The range of error types that it corrects is still small, it does not capture everything, and it sometimes gets things wrong.  Work to improve error detection accuracy and coverage is ongoing;  we expect to roll out new modules and updates as the service matures. The present interface certainly does not represent what one would expect to see if this service were some day fully integrated into a shipping product.  With the current prototype, we will be gathering data about where the service succeeds and where it does not, to help inform future directions of development. 

Some known issues that may affect the user experience include:

  • Owing to architecture differences, the Office speller/grammar checker is not fully integrated into the current version. The speller/grammar checker can occasionally obscure suggestions that are available in the ESL component. Toggling this feature on or off may be helpful.
  • Tokenization occasionally inserts white spaces before punctuation if a preceding word is replaced. We are working on this.

As prototypes go, however, the ESL Assistant is state-of-the-art in ESL error correction. It represents, we believe, the beginning of a new paradigm of automated editorial assistance services that deploy very large statistical resources.

Members of our team will be reporting on the project from time to time in this blog. We look forward to your feedback.   

 
Page view tracker