Update: Check out the new installer you can download to make this really easy!
Following up on last week's post on the integration of translation into Office, here are the instructions to set it up in Office 2003, for our users who do not have Office 2007.
First bring up the task pane by selecting "Task Pane" on the View menu (or pressing Ctrl-F1):
In the Task pane drop-down menu (here labeled "Getting Started"), select the "Research" task pane.
After you've chosen the "Research" task pane, there should be a "Research options" hyperlink at the bottom of the pane. Click on this hyperlink to bring up the task pane.
Here you'll need to type in the address of the Microsoft Translator Web Service: http://www.windowslivetranslator.com/officetrans/register.asmx
Then click the "Add" button to continue.
Just click the "Install" button in this dialog.
Note that you can't check any of the boxes; this is expected behavior. Translation systems, unlike other Research Pane plug-ins, are enabled in a different dialog. The next steps will cover this.
Now click "OK" to close the research options dialog.
At this point, Word may bring up a dialog saying, "Microsoft Word can't open the translation feature. This feature is not currently installed. Would you like to install it now?" Click "Yes" to install the feature.
Just below the combo boxes that allow you to select the source and target language, there should be a hyperlink labeled "Translation options..". Click on it to open the translation options dialog. (Depending on what text you have highlighted and which translation features are installed and enabled on your machine, the Research task pane may look slightly different. That's OK; just find the "Translation options..." hyperlink.)
This is where you specify which translation engines you'd like to use for each language pair. By default Word uses WorldLingo for all language pairs; this is where you can choose Windows Live Translator instead. (Certain Word installations don't seem to come with WorldLingo pre-installed, so you may not have to change anything here.)
The language pairs currently available from MSR-MT are as follows:
English ↔ Chinese (Simplified)
English ↔ French
English ↔ German
English ↔ Italian
English ↔ Japanese
English ↔ Spanish
English → Arabic
English → Chinese (Traditional)
English → Dutch
English → Korean
English → Portuguese (Brazil)
You may have slightly different settings for Bilingual Dictionaries (on the top half of the dialog); that's OK. You only need to look at and change the Machine Translation settings (on the bottom of the dialog). Again, if Windows Live Translator is already selected, you don't have to do anything.
Click OK to close the dialog. You should now be ready to translate!
Now find or create a document that has some content you'd like to translate.
The easiest way to bring up and use the translation task pane is to simply select some content in your Word document, right click, and select the "Translation" option. You can also go to the Research task pane, type a query into the box, and select the Translation subpane.
By default, Word will list a variety of language pairs, even if you haven't installed a machine translation system for those pairs. In the "From" and "To" boxes, select a source and target language that correspond to one of the language pairs you installed above.
After a brief delay (during which the web service is invoked and the selected text is translated), the MT output should appear in the research pane.
At the bottom of the MT output, there's a button that allows you to easily insert the translated output into your document.
· In celebration of International Mother Language Day, we are pleased to announce the addition of the Hmong language to our list of supported languages, made possible by a close partnership with the Hmong community. Anyone can now try out the new language on the Bing Translator site, or call it via the Microsoft Translator web service (Hmong Daw, language code mww). Hmong Daw is the dialect of Hmong the system supports, also known as White Hmong.
Instrumental to this effort were members of the Hmong community, who were able to leverage new tools from Microsoft Translator to help preserve and revitalize their language online. These new tools, currently in beta, enable automatic translation support for additional languages, or building higher quality systems for specific terminology and style in the established languages.
The addition of the Hmong language is an example of the first scenario: Members of the community utilized existing translated material and new features of Microsoft Translator to train a new translation engine. This leveraged Microsoft Translator’s learning abilities, which can learn how to translate from a set of parallel documents (same document in two languages), dictionaries and texts in the language to translate to (Hmong in this case). In addition to teaching the engine a new language, they also involved members of the community, partners and collaborators to create and review improved versions of the automated translation system, and collect qualitative feedback about each “trained” system. Deploying a system that reaches a certain level of quality allows seamless use with the standard Microsoft Translator APIs, and many scenarios powered by the API, like the web translation widget. Feedback that is generated through these scenarios can be utilized again in the training process – creating a virtuous loop for improving the translation quality. Stay tuned for more details about these new tools/features.
Once again, on International Mother Language day, we congratulate the Hmong community on their accomplishment. We are looking forward to working with many more partners and language communities in the near future.
Did you know that MSN messenger recently became* is the number one instant messenger in the world? Last summer, thanks to the efforts of Helvecio on our team, the MTBot prototype project quietly launched – to provide a glimpse to the community of 28.6 million unique messenger users what might be possible when you combine machine translation technology with instant messaging.
The MTBot prototype project was released in May 2007 with the main goal to try to understand how useful machine translation would be in IM conversations. The bot acts as a human translator, participating in conferences and translating messages as they are sent by all parties.
A typical usage scenario would be something like this: let's assume you have a friend in Japan that does not speak English... Well, you would add MTBotemail@example.com to your Live messenger buddy list, wait until the bot accepts your request (by switching status to Online) and then you would start your conversation by sending the "Hello" message... The Bot is going to wake up, and display a list of languages - enter "ja" for Japanese. Once it gets a valid connection the Bot will tell you to invite your friend to join the conversation. That's it... From this point on, everything you type will be translated from English to Japanese, and everything your friend types will be translated from Japanese to English.
Another typical use is as portable translator: using Messenger from any Smartphone a user can translate simple sentences when traveling to other countries.
As with any prototype effort, do keep in mind that this is experimental and there is a possibility the bot might be offline from time to time. The usual caveats about the quality of machine translation also apply.
We always appreciate your feedback and suggestions – so feel free to do so on this thread.
* Update: The link pointing to the data on Messenger becoming the most used IM client is from around when (2003-04) it first claimed that crown. Messenger has continued that trend since then. (Thanks to our keen eyed readers for catching that one!)
In the current crisis in Haiti there are a number of initiatives to rapidly build software to assist in humanitarian aid. Responding to community requests for a machine translation (MT) system to translate between English and Haitian Creole, our team has been hard at work over the last few days. I am glad to announce that an experimental Haitian Creole MT system is now publicly available via several services and APIs powered by Microsoft Translator technologies. We will continue working on improving the system, but we hope meanwhile that in spite of the experimental nature – it will be of use in the relief efforts.
1) What is being announced today?
Responding to requests from the community involved in Haitian relief efforts, Microsoft Research is making available today an experimental machine translation system for translating to and from Haitian Creole. You can try it at http://translate.bing.com or http://www.microsofttranslator.com.
2) How is it significant?
With the devastating disaster that struck Haiti, we have all been individually pitching in to help the efforts. This is our effort, as a team, to respond to the needs of communities such as Crisis Commons by delivering a Haitian Creole translator which can be of help to individual users, as well as other technology projects that could use a scalable translation system in their relief endeavors. Further, the usage of our API is completely free and it can be built into any application or website for immediate use. We hope that this might help the many applications being developed (such as those on crisiscommons.org) to aid the humanitarian efforts.
3) How can I use this system?
The Haitian Creole translator is now part of the Microsoft Translator web service enabling many of the user scenarios powered by the service. Users can access the service through the Microsoft Translator web site. Developers would be interested in looking at our APIs – and choose from SOAP or HTTP (Support for Haitian in our AJAX API will be rolled out in the coming days).
4) How is it different from other efforts?
There have been some great efforts in quickly building dictionary and rule-based Haitian Creole translation tools. The statistical machine translation system behind Microsoft Translator allows for a continuous improvement in the quality of translations (by adding more training data). Also, by delivering this as part of our web service we can ensure scale and performance and open up the possibility of using our many scenarios (Bing Translator, Internet Explorer 8, Messenger Bot etc.) with Haitian Creole, as well as using our extensive API set to add such support to other software and web sites at no cost.
5) What was involved in getting this out of the door in record time?
The process involved identifying parallel (translated) data between English and Haitian Creole, and training the MT engine to create the requisite language models. We would also like to acknowledge the great work being done the Crisis Commons folks, the dictionary builders at haitisurf.com, the folks at CMU that made available parallel data and the Microsoft volunteers who challenged our team to action.
6) What should I expect in terms of quality?
This is an experimental system put together in record time. While our typical approach to adding new languages involves significantly larger amounts of training, a higher threshold for quality testing – we decided that the upside warranted making the system available to the community at the earliest, and continue improving it subsequently. We are working diligently to keep improving the quality, but bear with us if you encounter problems. You can always contact us at firstname.lastname@example.org with feedback. Our user and developer forums are also available to discuss any issues you encounter.
7) How can I help improve the system?
The best way you can help improve the system is by helping us find more training data. This is typically sentences or words translated between English and Haitian Creole. We intend to make available to the larger community (via tausdata.org) data that we collect (as license restrictions permit) for training purposes. If you know of dictionaries, translated sentences, or websites that have such translations we urge you to contribute it to TDA’s TAUS data sharing initiative. TDA is a non-profit organization providing a neutral and secure platform for sharing language data. If you have any concerns or questions feel free to contact us at email@example.com.
8) How can I help the broader Haiti relief efforts?
Go here to learn more about how you can help those devastated by the earthquake.
9) Where can I get more information?
Please stay tuned to our blog for further announcements. You can learn more about Microsoft Translator and the services we offer here.
10) What can we expect next?
In the coming days expect to see support for Haitian Creole added to even more of our scenarios (Translation Bot, Translator widget, Office etc) as well as the AJAX API. Known issues and announcements can also be found on our forums.
We hope that this contribution proves useful to the various humanitarian efforts underway, and please stay tuned to this blog for further news on the Haitian Creole language support. If you have any questions feel free to contact us at firstname.lastname@example.org.
Update (2:53 PM PST): The Messenger Translation Bot can now speak Haitian Creole. Add email@example.com to your messenger buddy list. Try the group conversation feature with a Kreyol speaker!
-Vikram Dendi, Senior Product Manager, Microsoft Translator
Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition. Today's guest blog is a high level explanation of how the engine works:
As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine. Statistical systems are different than rule-based ones in that the “rules” mapping words and phrases from one language to another are learned by the system rather than being hand-coded. Training an SMT requires amassing a large amount of parallel training data—hopefully of good quality and from heterogeneous sources—and training the engine on that data. (By parallel, we mean a source of data where the content for one language is the same as the content for the other.) The engine learns the correspondences between words and phrases in one language and those in another, which are often reinforced by repeated occurrences of the same words and phrases throughout the input. For instance, in training the English-German system let’s say, if the engine sees the phrase All rights reserved on the English side and also notices Alle Rechte vorbehalten on the German side, it may align these two phrases, and assign some probability to this alignment. Repeated occurrences of the source and target phrases in the training data will only reinforce this alignment.
Generally, having parallel data for a language pair means we can train engines in both directions (i.e., both the English-German and the German-English systems can be trained on the same input sentences). Some of you had some questions regarding why it was that we released the English-Spanish system before we released Spanish-English. There were really two reasons. First, English-Spanish was the first general domain language pair we released. Releasing one language pair allowed us to test the infrastructure before we started releasing more. Second, the technology for Spanish-English was slightly different than that used for English-Spanish, and we need some additional time to do the necessary infrastructural changes to accommodate. In the future, we plan to release new translation systems in pairs (with a couple of exceptions). I can’t reveal what languages we have planned next, but do expect some new ones soon!
For those of you interested in technical discussions regarding our engines and how they work, please refer to some of the papers by the researchers who developed them. Three recent papers of note are:
Chris Quirk, Arul Menezes. Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation May 2006 New York, New York, USA Proceedings of HLT-NAACL 2006
Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43-65 (Attached file)
Chris Quirk, Arul Menezes. Using Dependency Order Templates to Improve Generality in Translation July 2007 Association for Computational Linguistics