Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition. Today's guest blog is a high level explanation of how the engine works:
As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine. Statistical systems are different than rule-based ones in that the “rules” mapping words and phrases from one language to another are learned by the system rather than being hand-coded. Training an SMT requires amassing a large amount of parallel training data—hopefully of good quality and from heterogeneous sources—and training the engine on that data. (By parallel, we mean a source of data where the content for one language is the same as the content for the other.) The engine learns the correspondences between words and phrases in one language and those in another, which are often reinforced by repeated occurrences of the same words and phrases throughout the input. For instance, in training the English-German system let’s say, if the engine sees the phrase All rights reserved on the English side and also notices Alle Rechte vorbehalten on the German side, it may align these two phrases, and assign some probability to this alignment. Repeated occurrences of the source and target phrases in the training data will only reinforce this alignment.
Generally, having parallel data for a language pair means we can train engines in both directions (i.e., both the English-German and the German-English systems can be trained on the same input sentences). Some of you had some questions regarding why it was that we released the English-Spanish system before we released Spanish-English. There were really two reasons. First, English-Spanish was the first general domain language pair we released. Releasing one language pair allowed us to test the infrastructure before we started releasing more. Second, the technology for Spanish-English was slightly different than that used for English-Spanish, and we need some additional time to do the necessary infrastructural changes to accommodate. In the future, we plan to release new translation systems in pairs (with a couple of exceptions). I can’t reveal what languages we have planned next, but do expect some new ones soon!
For those of you interested in technical discussions regarding our engines and how they work, please refer to some of the papers by the researchers who developed them. Three recent papers of note are:
Chris Quirk, Arul Menezes. Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation May 2006 New York, New York, USA Proceedings of HLT-NAACL 2006
Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43-65 (Attached file)
Chris Quirk, Arul Menezes. Using Dependency Order Templates to Improve Generality in Translation July 2007 Association for Computational Linguistics
Update: Check out the new installer you can download to make this really easy!
Following up on last week's post on the integration of translation into Office, here are the instructions to set it up in Office 2003, for our users who do not have Office 2007.
First bring up the task pane by selecting "Task Pane" on the View menu (or pressing Ctrl-F1):
In the Task pane drop-down menu (here labeled "Getting Started"), select the "Research" task pane.
After you've chosen the "Research" task pane, there should be a "Research options" hyperlink at the bottom of the pane. Click on this hyperlink to bring up the task pane.
Here you'll need to type in the address of the Microsoft Translator Web Service: http://www.windowslivetranslator.com/officetrans/register.asmx
Then click the "Add" button to continue.
Just click the "Install" button in this dialog.
Note that you can't check any of the boxes; this is expected behavior. Translation systems, unlike other Research Pane plug-ins, are enabled in a different dialog. The next steps will cover this.
Now click "OK" to close the research options dialog.
At this point, Word may bring up a dialog saying, "Microsoft Word can't open the translation feature. This feature is not currently installed. Would you like to install it now?" Click "Yes" to install the feature.
Just below the combo boxes that allow you to select the source and target language, there should be a hyperlink labeled "Translation options..". Click on it to open the translation options dialog. (Depending on what text you have highlighted and which translation features are installed and enabled on your machine, the Research task pane may look slightly different. That's OK; just find the "Translation options..." hyperlink.)
This is where you specify which translation engines you'd like to use for each language pair. By default Word uses WorldLingo for all language pairs; this is where you can choose Windows Live Translator instead. (Certain Word installations don't seem to come with WorldLingo pre-installed, so you may not have to change anything here.)
The language pairs currently available from MSR-MT are as follows:
English ↔ Chinese (Simplified)
English ↔ French
English ↔ German
English ↔ Italian
English ↔ Japanese
English ↔ Spanish
English → Arabic
English → Chinese (Traditional)
English → Dutch
English → Korean
English → Portuguese (Brazil)
You may have slightly different settings for Bilingual Dictionaries (on the top half of the dialog); that's OK. You only need to look at and change the Machine Translation settings (on the bottom of the dialog). Again, if Windows Live Translator is already selected, you don't have to do anything.
Click OK to close the dialog. You should now be ready to translate!
Now find or create a document that has some content you'd like to translate.
The easiest way to bring up and use the translation task pane is to simply select some content in your Word document, right click, and select the "Translation" option. You can also go to the Research task pane, type a query into the box, and select the Translation subpane.
By default, Word will list a variety of language pairs, even if you haven't installed a machine translation system for those pairs. In the "From" and "To" boxes, select a source and target language that correspond to one of the language pairs you installed above.
After a brief delay (during which the web service is invoked and the selected text is translated), the MT output should appear in the research pane.
At the bottom of the MT output, there's a button that allows you to easily insert the translated output into your document.
Windows Live Translator is now integrated into Office! One of the top features that our users ask for is simple integration of translation into Office, to translate a document quickly. The feature is really easy to use, and you can translate a block of text or an entire document, from within Office.
We have officially handed over our code to the Microsoft Office team for the integration of the translation tool directly in the Research Task Pane. Once they have finished their own testing and "flipped the switch" on their side, the feature will auto-update in existing versions of Office. I'll blog about that here again when that happens - at that point, no additional setup steps will be necessary.
In the meantime, you can use the instructions below to set up the service manually. For users of Office 2003, I'll post those instructions later this week.
Office 2007 Setup Instructions:
The language pairs currently available from Windows Live Translator are as follows:
English ↔ Arabic
English ↔ Chinese (Traditional)
English ↔ Dutch
English ↔ Korean
English ↔ Portuguese (Brazil)
Click OK to close the dialog. You are now ready to translate!
The easiest way to bring up and use the translation task pane is to simply select some content in your Word document, and click on the Translation icon in the Review tab. You can also go to the Research task pane, type a query into the box, and select the Translation subpane.
After a brief delay (during which the web service is invoked and the selected text is translated), the translated text should appear in the research pane.