Welcome to MSDN Blogs Sign in | Join | Help

Office 2007 Language Interface Packs and spell-checkers are now available for Bengali (Bangladesh), Urdu, Sinhalese, Nepali, IsiZulu, IsiXhosa, and Afrikaans

Several Indic and African languages have recently been added to the list of Office 2007 Language Interface Packs which can be freely downloaded by users who wish to change the language of their user interface and make use of a spell-checker. The new LIPs include Bengali (Bangladesh), Urdu, Sinhalese, Nepali, Afrikaans, IsiZulu, and IsiXhosa. The Sinhalese, Bengali (Bangladesh) and IsiXhosa spell-checkers are totally new, since previous versions of Office did not include any proofing tools for these languages.

Like all the other Office 2007 LIPs we have discussed on this blog, these new language packs have been developed within the framework of the Microsoft Local Language Program, whose aim is to preserve local and regional languages and cultures and to enable users of software to work with interfaces in their own languages. You can click on the languages below to download the corresponding LIPs for Microsoft Office 2007:

·         Bengali (বাংলা (বাংলাদেশ)) is the official language of Bangladesh (where it is spoken by around 110 million people) and the official language of the Indian States of West Bengal (with 55 million speakers) and Tripura. With more than 200 million speakers, it is the second most widely spoken language on the Indian subcontinent and among the 5 languages with the most native speakers worldwide. Together with Assamese and Oriya (for which we also recently released LIPs and new spellcheckers), it belongs to the Eastern Indo-Aryan language family.

·         Urdu (اردو) is the national language and one of the two official languages of Pakistan. It is also one of the 22 official languages of India (it is spoken in 5 Indian states). There are about 60 million Urdu native speakers (for a total of around 180 million speakers).

·         Sinhalese (a.k.a. Sinhala, or සිංහල) is the language of the Sinhalese, the largest ethnic group of Sri Lanka. It is spoken by about 19 million speakers (including 16 million native speakers). Along with Tamil, it is one of the official languages of Sri Lanka.

·         Nepali (नेपाली), a language in the Indo-Aryan branch of the Indo-European language family, is the lingua franca of Nepal and is one of the official languages of India. It is also spoken in Bhutan and in parts of Myanmar. It is commonly written in the Devanagari script and is closely related to Hindi. There are about 20 million native speakers worldwide.

·         IsiZulu, a Bantu language, has more than 9 million native speakers (with about 25 million second language speakers). It is one of the 11 official languages of South Africa and it is also spoken in other countries like Botswana, Lesotho, Mozambique, Malawi and Swaziland.

·         IsiXhosa, an agglutinative Bantu language with about 8 million speakers, is another one of the 11 official languages of South Africa.

·         Afrikaans, a Germanic language derived from the same 16th-century Dutch dialect that led to modern Dutch, is one of the 11 official languages of South Africa. It is mainly spoken in South Africa and in Namibia, Lesotho, Botswana and Swaziland. After Zulu and Xhosa, Afrikaans has the third-largest language community in South Africa, with about 6 million speakers.

-- Thierry Fontenelle (Program Manager)

Contextual spelling for French in Office 2010

At the Worldwide Partner Conference in New Orleans on 13 July 2009, we announced the launch of the Office 2010 Technical Preview. This technical preview can now be downloaded by thousands of customers. You can discover the innovations on the Office 2010 blog and watch really cool videos on www.microsoft.com/office2010. My colleague Stefanie Schiller wrote a few words about the proofing tools integrated in this Technical Preview and about some of the improvements we have made, specifically with respect to the English thesaurus.

French-speaking users will also be delighted. We have talked on multiple occasions on this blog about the English and Spanish contextual spellers that we launched in Office 2007 (which also includes such a tool for German). We have improved them all in Office 2010 and we are happy to introduce a brand-new contextual speller for French which, when added to the French spellchecker and grammar checker, will greatly improve our French-speaking users’ proofing experience.

The French contextual speller in Office 2010 will detect a lot more mistakes which went so far unnoticed with the traditional proofing tools. Unlike the grammar checker, which is based upon a syntactic parser, our contextual speller is based upon statistical analyses of very large textual corpora and upon “language models” which enable the program to compare the user’s text with huge lists of sequences of words with their frequencies. Words that exist in the language but are used improperly in a given context can then be flagged. A blue squiggly line will appear under mistakes such as the following ones:

Ils on faim. (on à ont)

Elles son malades. (son à sont)

Quand à moi, j’avoue que je sui fier de lui. (Quand à Quant ; sui à suis)

Si je peu me permettre, dans son fort intérieur, elle pense qu’elle a raison. (peu à peux ; fort à for)

Se test montre que le correcteur ne fonctionne pas trop mal. (Se à Ce)

L’installation de la fosse sceptique a pris plus de temps que prévu. (sceptique à septique)

Il arrive cet après midi.(après midi à après-midi)

Mon frère ma dit qu’il ne viendrait pas. (ma à m’a)

Il y a long temps que je l’aime, jamais je ne l’oublierai… (chanson populaire) (long temps à longtemps)

En temps que client de l’hôtel, vous avez gratuitement accès à l’Internet. (temps à tant)

 

The screenshot below shows this contextual speller in action :

 

 

What is a « contextual speller » ? As you know, the traditional Office spellchecker flags the odd typo (omission of a letter, permutation of two letters, etc.) with a red squiggly line. The grammar checker deals with agreement mistakes (such as between a verb and its subject, or agreement in number and gender between a noun and an adjective in French). Mistakes related to words that are pronounced similarly but are spelled differently are very hard to detect, however. Anyone who knows a bit of French knows how frequently people (native and non-native speakers alike) mix up similarly-sounding words like son/sont or on/ont. If I write “Ils on faim” (they are hungry), a grammar checker based upon a syntactic parser has difficulty detecting the mistake (“on” should read “ont”) because the erroneous sentence is made up of a pronoun (Ils), followed by another pronoun (on, instead of the correct verb form ont) and a noun (faim). It is hard to make sense of this structure, since it is not a traditional agreement problem as in “Ils mange du pain” (they eat bread), where “mange” (a singular verb) should read “mangent” (plural form).

Of course, you should not expect this tool to be able to flag any kind of mistake. No existing tool is able to do that and those that would be able to do so would probably create a lot of false flags, or false positives, which tend to irritate the user. I discussed the notion of precision and recall when I blogged about an academic evaluation of our Office 2007 contextual speller. When we developed our tool, we constantly tried to avoid false flags and our tool has a very high precision, which means it rarely makes mistakes when it flags something (in fact, it is right nearly all the time, but there will of course always be mistakes that will not be detected). I tend to feel that this new contextual speller will quickly prove to be an indispensible tool for many an Office 2010 user who writes documents in French. It will certainly very usefully complement the range of proofing tools we make available to them.

Thierry Fontenelle

Microsoft Natural Language Group – Program Manager

Posted by nlgblog | 3 Comments

Proofing Tools in Office 2010

On Monday this week, Microsoft launched the Office 2010 Technical Preview.  Thousands of customers can now download this latest version of Office to try out the cool new features and provide feedback.

Updates and Replacements

The Technical Preview includes proofing tools for English as well as for French and Spanish. We are delighted to announce that these proofing tools include brand-new spell checkers, contextual spellers, hyphenators and thesauri. We are excited about the release of these new proofing tools and eager to receive your feedback.

Improvements in the English Thesaurus

The English thesaurus in particular has seen some changes. You will now be able to find more synonyms in fewer steps. In Office 2007, if you look up the word tables, the only suggestion is table – the same as the lookup word, only in the singular. Next you have to look up the word table, select the synonym you are looking for and manually add an ‘s’ to the word to convert it back to plural. Four steps that include manual edits for one simple lookup. All of this can be done in one simple step in Office 2010. The screenshot below shows the results of the thesaurus in Office 2010 for the word tables:

 

We call this “inflectional morphology”.  Try the same with the word smarter in your Office 2007 version, and you will see why we think that inflectional morphology is a big step forward. We also added additional data to provide more synonyms for your lookups.

Localized Versions and Language Packs

The current Technical Preview includes Proofing Tools for English, French and Spanish. If you install the Technical Preview, you will not be able to run proofing tools for any language other than English, French and Spanish. For this release we have made significant changes in the proofing infrastructure, therefore  the Language Packs from previous Office versions including Office 2007 are not compatible with Office 2010. Proofing tools for these languages will become available with the release of the localized versions or with the subsequent release of the Language Packs for Office 2010.

Stefanie Schiller

Microsoft Natural Language Group – Program Manager

Add the Microsoft Translator to the Office Research pane

 

The Microsoft Translator from the Microsoft Research (MSR) is now available for download in the Microsoft Download Center. Another translation service for Microsoft Office? Do you need it? Well, more choices are a good thing—especially in the area of languages and machine translation. After all, you don’t want to inadvertently say something to your grandmother or to your pen pal in a machine-translated letter that you don’t mean to say.

 

The Microsoft Translator is free, available in many language pairs with more being added soon, and can be used with Microsoft Office 2003 and Microsoft Office 2007. So you don’t need to worry about having an older version of Microsoft Office. The best feature of this new service is the side-by-side comparison of the original document and the translated document. This allows you to easily compare the translation quality to the original document in a single browser window.

 

If you discover that you prefer one service for one language and another service for a different language, you can easily change the preferred service for each language pair: in the Research pane, under the Search for text box, selection Translation, and then click Translation options. In the Translations options dialog box, under the Machine Translation section, you can scroll to the right of the relevant language pair and select the service you want from the drop-down menu.

 

Ultimately, human translation is optimal—so you don’t say something to your grandmother or your pen pal that you need to apologize for later—but machine translation is the next best solution. Having more translation service choices available in the product that you are using to create your letter to your grandmother or your pen pal makes it easier for you to say exactly what you want to say, in the language that you want to say it. You can also quickly understand the gist of the text you have received.

 

You can find more information about this feature in the “Translate Text” Help asset.

 

Daj Oberg

Office Content Publishing

 

Posted by nlgblog | 0 Comments
Filed under:

Check spelling or grammar in another language: a new training course on Office Online

 

Do you have difficulty checking spelling and grammar in a different language in Office 2007? If you do not know how to set the language of your text, check out this new “Check spelling or grammar in another language” training course, which is precisely designed to help multilingual users.  You will learn how to set the language of a text, use the appropriate proofing tools (spelling and grammar), use shortcut keys to use the spelling and grammar checker, find out which proofing tool languages are installed on your computer, change the Word 2007 dictionary language from English (United States) to, say, English (United Kingdom), etc.

 

A very useful resource, courtesy of our Office Online colleagues…

 

Thierry Fontenelle – Program Manager

 

Office 2007 Language Interface Packs with Luxembourgish and Irish spell-checkers

The range of languages for which Office 2007 offers a spell-checker is constantly expanding, as the regular readers of this blog know. Luxembourgish and Irish have just been added to this list. Users of Office 2007 can now freely download a Language Interface Pack (LIP) for either of these languages. The pack will allow them to change the language of the user interface, if they wish. They can also keep the original language of their version of Office (English, French, German…) and simply use the Luxembourgish or Irish speller included in these packs.

The two new spellers have been created with the Lexicon Creator I presented at the European Association of Lexicography (Euralex) conference in 2008.

Luxembourgish is a Germanic language spoken by about 390,000 people, mainly in the Grand Duchy of Luxembourg. It is one of the national languages of that country (German and French being two of its administrative languages).

Irish is the national language of the Republic of Ireland. It is also one of the official languages of the European Union (since January 1st, 2007). About 1.8 million people can understand that language to some degree (it is the mother tongue of around 70,000 people).

These language packs have been developed within the framework of the Microsoft Local Language Program, whose aim is to preserve local and regional languages and cultures and to enable users of software to work with interfaces in their own languages. You can click on the following links to download these two new LIPs:

·         Luxembourgish Language Interface Pack for Microsoft Office 2007

·         Irish Language Interface Pack for Microsoft Office 2007

-- Thierry Fontenelle (Program Manager)

 

Why is chef-d’œuvre my favorite French word?

 

In this post, I would like to share with you the reasons why I love the French word chef-d’œuvre (=masterpiece). My interest for this word has nothing to do with its meaning. As a program manager working with computational linguists, I find it fascinating because it epitomizes the numerous difficult decisions one has to make when building natural language processing systems like word-breakers, tokenizers, spell-checkers, etc.

One of the most vexing problems in NLP is to decide whether hyphens and apostrophes are breaking characters or not. The identification of word boundaries (tokenization) is indeed essential, as I have argued elsewhere in an attempt to show that word-breaking is not easy. Hyphens frequently separate distinct tokens, as in le match France-Italie (nobody would argue that France-Italie is one word and nobody would expect to find the whole string in a dictionary). In chef-d’œuvre, however, the hyphen is part of the word and everyone will expect the whole string to be granted entry status in a dictionary. It should be considered as one token that has little to do with the word chef (which typically refers to a person), unless one considers the etymology of the word. This means that, in a search scenario, a user would not consider a document containing the words chef and oeuvre used separately as relevant if that user typed the keyword chef-d’œuvre.

The apostrophe in chef-d’œuvre is also interesting. Apostrophes are frequently used in French elided forms when a pronoun or a determiner is followed by a word that starts with a vowel (consider l’école [the school), je l’aime [I love her], j’arrive [I’m coming]]. In such cases, it is normal to consider that the string is made up of two distinct tokens (l’école -> l’ + école). The apostrophe in chef-d’œuvre has a distinct status and is an integral part of the word, very much like in other French words such as aujourd’hui, prud’homme, and a few others.

The word chef-d’œuvre is also interesting because it includes a special character, œ, known as a “ligature” (two or more letters joined together). Many other words in French include ligatures such as œ  or æ (œuf [egg], sœur [sister], cœur [heart], cæcum …) and many other languages use characters which are not traditionally found in English (the German β or the Spanish ñ are cases in point). This reminds us that many NLP projects started with applications developed for English initially and subsequently required specific changes to take into account the non-ascii characters found in many other languages. Until very recently, the OpenOffice.org French spell-checker used to flag forms with ligatures like sœur or œuf as incorrect and only verified the incorrect spellings with two distinct characters (soeur, oeuf…). With the advent of Unicode, such problems are probably less frequent today, but it is clear that any multilingual project needs to consider idiosyncrasies such as the use of diacritics and other special characters some languages love so much…

From a morphological point of view, the word chef-d’œuvre is also atypical. While regular nouns typically take a final “s” in the plural in French (singular maison [house] -> plural maisons), the form *chef-d’œuvres is incorrect and should be flagged by a spell-checker, even if œuvres is correct on its own in other contexts. Rather, the plural is formed by adding an “s” at the end of the subtoken chef: des chefs-d’œuvre.

Finally, if you consider how the word is pronounced, it is clear that chef-d’œuvre poses a number of challenges: why is it that the “f” is pronounced in chef [∫εf ], but not in chef-d’œuvre [ ∫εdœvR ]? Interesting problem for my colleagues who create text-to-speech systems…

Chef-d’œuvre  is certainly not the only complex word which encapsulates so many difficulties for those of us who create NLP applications. I could probably also have chosen le hors-d’œuvre [appetizer], which begins with an aspirated ‘h’ and does not admit elision, unlike homme [man] -> l’homme. Main-d’œuvre [manpower] would probably have been a nice candidate, too. In fact, there is no dearth of thorny issues linguists need to deal with. Well, languages are difficult, aren’t they? That’s probably what makes my job here so challenging and so interesting …

 

-- Thierry Fontenelle (Program Manager)

  

Natural Language Group blog featured in Language Tech News

The post I wrote a few months ago about how users can remove a word from the main dictionary of their Office speller has been reprinted in the latest edition of Language Tech News (vol.2, No.4, February 2009), a publication of the Language Technology Division of the American Translators Association (ATA). Besides some technical details about how to use these “exclude dictionaries”, which the editor of the Newsletter felt would interest his readers, this post was a nice opportunity to show how the new contextual spellers in Office 2007 reduce the need that some users feel to exclude certain words from their speller dictionary.

This issue of Language Tech News also includes interesting articles on Trados’s translation memory technology, terminology management tools, font converters, and statistical machine translation (SMT). It is interesting to read the following comment about the Windows Live Translator, the MT system created by our Microsoft Research colleagues: « Perhaps the most successful MT application in the world today, the Microsoft Knowledge Base, used by hundreds of millions of users across the globe, is mostly a SMT-based effort » (p.17).

Thierry Fontenelle (Program Manager)

 

Posted by nlgblog | 1 Comments
Filed under:

Office 2007 Language Interface Packs and spell-checkers for Armenian, Georgian, Telugu, Konkani, Punjabi, Kannada, and Oriya

 

In a recent post, the Language Log was discussing localization of software in under-resourced languages like Yoruba. Mark Liberman noted that, via its « Unlimited Potential Program », Microsoft had probably done more for linguistic diversity than any other software publisher (and perhaps more than the free software community) by providing localized versions of its software in dozens of languages. Meanwhile, the Microsoft Local Language Program has now made available a whole series of new Language Interface Packs for Office 2007. These LIPs enable users to work with user interfaces in their own languages while benefiting from spell-checkers (some of these languages had no speller in earlier versions of Office). They can be downloaded freely from the links you will access if you click on the names of the languages below (five of these languages are spoken in India):

·         Armenian (or հայերեն լեզու, if you want to write it in Armenian ; 7 million speakers, including 3 million in Armenia)

·         Georgian (4.1 million native speakers; the official language of the Republic of Georgia; anecdotally, consonant clusters are common in Georgian: some words contain up to 8 consecutive consonants, like გვბრდღვნი (gvbrdgvni), you tear us; nouns have 8 cases and verbal morphology is very complex)

·         Telugu (or తెలుగు ; one of the four classical languages in India and one of the 22 official languages in this country ; it is the official language of the state of Andhra Pradesh and is also spoken in Tamil Nadu, Karnataka, Orissa, and Pondicherry)

·         Konkani (or कोंकणी; 7,6 million speakers ; one of the official languages of the Republic of India, mainly spoken in the Indian state of Goa; the LIP includes a brand-new speller)

·         Kannada (or ಕನ್ನಡ; it is one of the major Dravidian languages of India and the official and administrative language in the Indian state of Karnataka, in the South of India; about 35 million speakers)

·         Punjabi (or ਪੰਜਾਬੀ ; Indo-Aryan language spoken in the Punjab region, which is now split between India and Pakistan ; it has 90 million native speakers, which makes it the 11th most widely spoken language in the world).

·         Oriya (or ଓଡ଼ିଆ; one of the official languages of India, mainly spoken in the Indian state of Orissa; about 30 million native speakers; the LIP includes a brand-new spell-checker for that language)

 

-- Thierry Fontenelle (Program Manager)

 

Hotfix für die deutsche Silbentrennung

Microsoft hat einen Hotfix für die deutsche Silbentrennung in Office 2007 veröffentlicht. Dieser Hotfix  verbessert eine Reihe von inkorrekten Silbentrennungen bezüglich der Trennung von Präfixen, wie z.B. be-fasst, ein-lässt, auf-läuft , um-bringt, und der Trennung von Suffixen, wie z.B. Kennt-nis, sowie der Trennung von zusammengesetzten Wörtern, wie z.B. Haus-tür, Glas-tisch. Betroffen von dem Hotfix ist nur die Spracheinstellung Deutsch (Deutschland), wo die fehlerhaften Silbentrennungen aufgetreten waren. Deutsch (Österreich) und Deutsch (Schweiz) sind nicht betroffen.

Der Hotfix ist unter dem KB-Artikel http://support.microsoft.com/kb/960500/en-us zu finden. Weitere Informationen zu den Installationsanforderungen können Sie im KB-Artikel nachlesen. Der Hotfix wird auch über eines der nächsten Service Packs verfügbar sein.

Stefanie Schiller - Program Manager

Posted by nlgblog | 2 Comments
Filed under: ,

A context-sensitive speller for Spanish in Office 2007

 

A few weeks ago, Microsoft announced an initiative targeting the Hispanic community, with special offers for Microsoft Office 2007 and Microsoft Office 2007 Language Pack in Spanish.

It may be worth pointing out that the Spanish proofing tools in Office 2007 include a brand-new context-sensitive speller in addition to the regular spell-checker, thesaurus, hyphenator and grammar checker. We have discussed the English context-sensitive speller on multiple occasions on this blog and regular readers know that blue squiggly lines now appear under contextual mistakes (real-word errors which traditional spellers cannot flag) such as:

Nobody knows weather he is innocent or guilty. (à whether)

President Bush addresses Untied Nations General Assembly. (à United)

People say your hole life flashes before your eyes before you die. (à whole)

Life insurance plays a very essential part in our every day life. (à everyday)

 

If you teach Spanish, if you are a translator, or if you write documents in Spanish regularly, you will most certainly understand how difficult it can be to spot the erroneous use of the word echo in a context where hecho should in fact be used. The blue squiggles under contextual mistakes testify to the progress made in this area in Office 2007 (if you right-click on the word echo in the sentence “Lo ha echo muy bien", you will see that the speller correctly suggests hecho, as is shown in the examples below).

 

 

Missing accents (as in medico, which should be médico in the sentence “Fuimos al medico ayer”) are very frequent mistakes as well and this new tool should prove useful to identify these blatant errors before you publish your document or send your email.

 -- Thierry Fontenelle (Program Manager)

Posted by nlgblog | 2 Comments

An academic evaluation of the Office 2007 contextual spelling checker

 A few days ago, I discovered an analysis of our Office 2007 contextual speller carried out by Prof. Graeme Hirst, from the University of Toronto:  An Evaluation of the Contextual Spelling Checker of Microsoft Office Word 2007.

We have discussed this new context-sensitive speller on several occasions on this blog (as well as here) and it is nice to see that it is attracting the attention of researchers in the academic world.

It’s an interesting paper, which provides some food for thought, however, especially with respect to how “aggressive” we should be in our approach to recall.

His conclusion nicely sums up our trade-offs and dilemmas (emphasis mine):

In an evaluation on 1400 examples, it is found to have high precision but low recall — that is, it fails to find most errors, but when it does flag a possible error, it is almost always correct.

 

The contextual spelling corrector in Microsoft Office Word 2007 is a cautious (low recall) but believable (high precision) system. However, its overall performance, as measured by F, is much poorer than that of the trigram method of Mays et al (1991).

The trade-off between the two systems is a difficult one. In simple terms, better performance is better; but believability is an important attribute for a consumer-level system (“if Word says it’s wrong then it’s wrong”) and could well be considered worth sacrificing performance for.  The problem with this, however, is that as users become familiar with the system, their expectations will rise and believability will start to apply also to what Word fails to flag (“If Word says it’s right then it’s right”).

A system that is more visibly error-prone might actually serve users better.

The methodology used by Prof. Hirst and his colleagues to evaluate the system deserves a few comments:

·         They automatically induced real-word errors by replacing words by any spelling variation found in the lexicon of the ispell spelling checker. They limit the manipulation to an edit distance of 1 manipulation. So these errors are not natural mistakes.

·         They did not consider “malapropisms” (real-word mistakes) involving closed-class words and words formed by the insertion or deletion of an apostrophe or by splitting a word: this means they exclude pairs which we have found to be extremely frequent in real texts (then/than; your/you’re; its/it’s; everyday/every day; to/too; their/there/they’re…). These pairs feature prominently in any analysis of real mistakes, especially in the literature devoted to English as a Second Language. Everyone knows that many native speakers of English have a lot of difficulty mastering these confusables, which is why we decided to specifically target them.

·         They did not include phonetic confusables such as cymbal/symbol, principle/principal, pear/pair, there/their which have an edit distance > 1.

The categories they did not include in their tests are precisely those which we focused on because flagging these real and frequent mistakes is very useful for users of Office and Word. So assessing the “performance” of a system by ignoring these may be a bit unfair, at least if one equates “performance” and “usefulness” (will users find the system more useful if we flag “have not lost monkey” (à money), a rare and unnatural mistake, or if we flag “it is to expensive”, a mistake our data shows is very frequent and which we seem to be good at flagging?). Recall would be a lot higher if pairs involving closed-class words and the standard phonetic confusables above were taken into account (our own metrics based on a large corpus of real mistakes shows that our recall is in fact higher than the 20-25% found by Hirst, and is around 40%). The alternative methods which he proposes have even higher recall (50%), but their precision (50%) is way lower than our system’s (96%). Hirst clearly favors a recall-based performance. His assumption is: do people want to use a system like Microsoft’s, which only spots one mistake out of 5 (our metrics show it’s in fact closer to 2 out of 5, i.e. 40%) and is right nearly all the time? Our assumption is: would users really want a system based on the trigram method advocated by Prof. Hirst, which flags 50% of the mistakes but is wrong in 50% of the cases? The feedback we generally get indicates that our users tend to prefer unobtrusive tools and switch off a tool which they consider unreliable.

Interesting debate, isn’t it? I am really grateful to Prof. Hirst for making this discussion possible.

So, what do you think? We are interested in hearing your opinion. Do you prefer a tool which casts the net as wide as possible and catches many mistakes, at the risk of being frequently wrong and of creating many false flags (false positives), or do you prefer a tool which does not catch all possible mistakes, but which you can trust when it does catch one? Do not hesitate to leave your comments below…

Thierry Fontenelle – Program Manager

  

English Grammar Checker, Fragments, and Settings

A French translator was asking an interesting question the other day on the Word community newsgroup. He wanted to know how he could switch off the grammar rule which flags “Fragments”, i.e. incomplete sentence fragments that the writer is invited to revise. The user did not want to turn off the English spell-checker, which he found very useful, or even the grammar checker, but only that particular rule, which he found particularly annoying when he was translating texts into English.

As can be seen below, the English grammar checker flags sentences that are considered incomplete (consider the fragment “And oranges.”, which is green-squiggled, or a verb-less sentence like “He happy.”). If you right-click on the squiggled string, you will see that the grammar checker advises you to consider revising this fragment.  

 

The grammar checker is aware of the context in which the fragment has been used. A fragment such as “Flight to Paris” will not be flagged if it is not followed by a period (for instance in a title). The grammar checker will not say anything either if the same fragment is used in a bulleted or numbered list, as in the example below. In contexts other than titles, headings, or lists, however, it may seem reasonable to draw the user’s attention to a suspicious fragment. The user is of course free to ignore these flags.

Let us come back to our translator’s original question. It is indeed possible to customize the grammar checker and to prevent this rule from firing if one finds it useless. To do so, go to the Word Options (via the Office button in the top-left corner). Click on Proofing, then on Settings in the section “When correcting spelling and grammar in Word”. You will then see the list of grammar rules used by the English grammar checker, as displayed in the screenshot below. You just need to uncheck the 2nd rule (Fragments and Run-ons), between the “Capitalization” rule and the “Misused words” rule. Done! The green squiggle will no longer appear under the structure shown in the examples above.

 

If you are writing a document in multiple languages, the cursor needs to be placed in a text whose language has been set to English. The English grammar checker will only work if the language of the text is set to English. If the text is in French, the rules of the French grammar checker will obviously be very different (the same applies to any other language), and you can customize them using the method described above. The settings of the French grammar checker will then be displayed as follows:

 

As you can see, users have some freedom to decide how they want to use their grammar checker. As far as I am concerned, I have unchecked the rule “Style – Contractions” of my English grammar checker because I constantly use contracted forms like “I’ll” and “You’ll” in my interactions with my colleagues. I know that, in many businesses, however, users who write very official documents want to spot these contracted forms and replace them with more formal forms like “I will” and “you will”. That is the reason why this gamut of settings is offered to the users of these complex tools.

Do not hesitate to send your feedback or your comments.

-- Thierry Fontenelle (Program Manager)

Can I remove a word from Office’s speller dictionary?

The other day, I was discussing a number of suggestions to improve Office’s spell-checker. A customer was suggesting we should allow users to delete individual items from Word’s spell-checker lexicon. This feature is already available, in fact: if you want to specify a preferred spelling for a word and to exclude a given spelling from the main lexicon used by the Office speller, you need to use an “exclusion dictionary”. Your speller comes with an empty exclusion dictionary and you can add words to it if you want them to be permanently red-squiggled.

You first need to locate your exclusion dictionary, which, if you use Vista and Office 2007, can be found in the following folder:

C:\Users\User Name\AppData\Roaming\Microsoft\UProof\

Each language has a specific dictionary whose name starts with “ExcludeDictionary”, followed by the language code (EN for English, FR for French, SP for Spanish, GE for German…), followed by the LCID (locale identification number). The extension is .lex. For instance:

English (US):                 ExcludeDictionaryEN0409.lex

English (UK):                 ExcludeDictionaryEN0809.lex

English (Australia)        ExcludeDictionaryEN0c09.lex

English (Canada)          ExcludeDictionaryEN1009.lex

French:                          ExcludeDictionaryFR040c.lex

 

You can open the file with Notepad or WordPad and add a word which you want the speller to flag as misspelled. Save and close the file. You are done!

 

You can type “exclude dictionary” or “exclusion dictionary” in the Office help to get more information about this feature.

Of course, caution should be exercised when you decide to remove a word from your Office speller. If you decide to remove the word manger because you frequently type program manger instead of program manager, you should not be surprised when your speller flags manger in a sentence like “Jesus was laid in a manger”. This is why we have introduced a contextual speller, which tries to identify words which exist but are misspelled in a given context (see the post I was referring to, in which I showed how Office 2007 flags some erroneous uses of manger in program manger).

To give another example where contextual spelling might be preferred over exclusion, consider the user who had contacted the Word newsgroup to find out how to exclude the word “ahs” from the main speller lexicon. This user kept typing ahs instead of has. The new context-sensitive speller in Office 2007 flags a number of contexts where "ahs" should not be used, however, which should address this user's problem without having to remove the word altogether from the lexicon. You will see a blue squiggly line under "ahs" if you write something like "He ahs never done it before", for instance. But you will not get any flag under "ahs" if you write "we definitely got oohs and ahs all around when we launched this product".

Thierry Fontenelle – Program Manager

Suggested improvements to Microsoft Word’s spell-checker

A few days ago, James wrote about the articles the Seattle Times published about our Natural Language Group and the Office spell-checkers. One of these articles was encouraging the Seattle Times readers to suggest improvements (What words would you add to the Microsoft spell-checker?).  It is very interesting to read what our users consider pain points.  The following suggestions from one of the readers (RogerKni) are interesting:

Here are my four suggested improvements to Word's spell-checker:

1. Give the user the option to flag rare words with an orange wavy line. An example would be “manger,” which 99% of the time is a misspelling of “manager.” Or “fro”: usually a misspelling of “for.” Or “whey” for “why.” Allow users to delete or add individual items to the Word-provided list of rarities.

2. Automatically add the apostrophized version of any noun added to a dictionary.

3. Add a “picky-mode” option, which one could turn on to get the preferred spelling of certain words. Currently, if there are two options for spelling a word, Word flags neither. That’s usually what’s wanted—the user doesn’t want to be harassed about a minor issue. But sometimes, as when publishing a book, the user wants to pick the best spelling. (Second-best spellings could be flagged with a wavy purple line.)

4. Give the user the option to tell Word to aggressively correct misspellings with what it thinks is the Best Match. Some would prefer this to Word’s cautious policy of merely flagging such words in red and making the user choose the correction.

In past releases, we looked into solving the problem described in Suggestion (2) for our English users. The solution presupposes an identification of (singular) nouns. We could ask users to do so for us, opting into the ‘s form for words like Palin (Palin’s), and not for the plural noun subproblems (*subproblems’s), the adjective semicontinuous or dermatoglyphics (* dermatoglyphics’s – see my colleague Mari Olsen’s post on possessives and apostrophes) . Of course we could also make guesses based on our knowledge of words that we have (continuous is an adjective; problems is a plural noun), but the computation would have to be weighed against the user benefit.

Suggestion (1) is in fact already covered in a large number of cases in Office 2007 (albeit not with an orange wavy line: we used a blue wavy line to signal contextual mistakes, which have been mentioned on several occasions on this blog). Look at the following screenshots, which show that the contextual speller in Office 2007 is able to flag words like “manger” or “fro” used in the wrong contexts:

mangerCSS

Note, too, that users have the possibility of deleting words from the Office speller lexicon. We will come back to this issue in a future post (if you are impatient, type “exclude dictionary” or “exclusion dictionary” in the Help file). You may use that feature to exclude the non-preferred variants (of course, this is an individual decision you need to make: we cannot impose your own preferred spelling onto everyone).

Last but not least, suggestion (4) also exists: it is called internally AutoReplace. This feature “aggressively” corrects misspellings with what it thinks is the best match, as is suggested by RogerKni. Try typing “infomation”, for instance and you will see that Word automatically corrects it to information as you as you hit the space bar. To activate that feature, go to the big Office button in the top-left corner of your Office application, click on Word Options, then on Proofing, then on AutoCorrect Options. At the bottom of the screen, you will see the option “Automatically use suggestions from the spelling checker”. Tick the box as in the screenshot below:

 AutoReplace

 

I hope you will find these tips useful.

-- Thierry Fontenelle (Program Manager)

 

More Posts Next page »
 
Page view tracker