Microsoft Translator (and Bing Translator) Official Team Blog

News and Views from the Microsoft Translator (and Bing Translator) Team in Microsoft Research.

Statistical Machine Translation - Guest Blog (Updated with additional paper)

Statistical Machine Translation - Guest Blog (Updated with additional paper)

Rate This
  • Comments 15

Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition.  Today's guest blog is a high level explanation of how the engine works:  

As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine.  Statistical systems are different than rule-based ones in that the “rules” mapping words and phrases from one language to another are learned by the system rather than being hand-coded.  Training an SMT requires amassing a large amount of parallel training data—hopefully of good quality and from heterogeneous sources—and training the engine on that data.  (By parallel, we mean a source of data where the content for one language is the same as the content for the other.)  The engine learns the correspondences between words and phrases in one language and those in another, which are often reinforced by repeated occurrences of the same words and phrases throughout the input.  For instance, in training the English-German system let’s say, if the engine sees the phrase All rights reserved on the English side and also notices Alle Rechte vorbehalten on the German side, it may align these two phrases, and assign some probability to this alignment.  Repeated occurrences of the source and target phrases in the training data will only reinforce this alignment.

Generally, having parallel data for a language pair means we can train engines in both directions (i.e., both the English-German and the German-English systems can be trained on the same input sentences).  Some of you had some questions regarding why it was that we released the English-Spanish system before we released Spanish-English.  There were really two reasons.  First, English-Spanish was the first general domain language pair we released.  Releasing one language pair allowed us to test the infrastructure before we started releasing more.  Second, the technology for Spanish-English was slightly different than that used for English-Spanish, and we need some additional time to do the necessary infrastructural changes to accommodate.  In the future, we plan to release new translation systems in pairs (with a couple of exceptions).  I can’t reveal what languages we have planned next, but do expect some new ones soon!

For those of you interested in technical discussions regarding our engines and how they work, please refer to some of the papers by the researchers who developed them.  Three recent papers of note are:

Chris Quirk, Arul Menezes. Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation May 2006 New York, New York, USA Proceedings of HLT-NAACL 2006

Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43-65 (Attached file)

Chris Quirk, Arul Menezes. Using Dependency Order Templates to Improve Generality in Translation July 2007 Association for Computational Linguistics

Attachment: Dependency Treelet Translation The convergence of statistical and example-based machinetranslation.pdf
Leave a Comment
  • Please add 7 and 8 and type the answer here:
  • Post
  • PingBack from http://blog.a-foton.ru/2008/08/statistical-machine-translation-guest-blog/

  • Hey Machine Translation Team at MS, I've a small suggestion. Google's Translation now has a "Detect language" feature that automatically detects the foreign language which is very useful. Can you add such a feature to Windows Live Translator?

  • Hello someone, thanks for the suggestion. We'll plan that for one of our next updates.

  • Hey, Can I expect, that Polish language will be available in near future?

  • Hi Slawek,  We are always looking to add more languages to improve our engine, but we do not have a specific timeline for individual languages.

  • Is there a way for programers to access the tranlation direcly from code?  C# or other dot.net programming languages.  Thanks

  • I am hoping that the machine translation is available as a web service that would allow inputing one language and getting a translation to another language.  I am hoping that this would be availble by making a call from a dotnet programming language such as C# or any of the other dot.net programming languages.  My company is a ISV Microsoft Partner that developes applications for retail and manufacturing companies.  Please let me know if this is available.

  • Please answer my request.  Can we (As a Microsoft Certified Partern (ISV) access the Microsoft Tranlation service from our program?

    Thanks

  • Mes collègues de Microsoft Research l’annonçaient il y a quelques jours : toutes les paires de langues

  • The Translator team is excited to announce the availability of the English to Russian language pair on

  • The Microsoft Translator team is very proud to announce the technology preview of an innovative offering

  • This is a repost from the Microsoft Research Machine Translation (MSR-MT) Team Blog by permission, and

  • From elsewhere in the collective.

  • This is a helpful tip on how machine translation works. I'm writing a project on language translation techniques and reading this article has given me much insight.

  • It's good to understand how the machine translation works. But an average person doesn't need to understand this to use the tool.

Page 1 of 1 (15 items)