Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition. Today's guest blog is a high level explanation of how the engine works:
As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine. Statistical systems are different than rule-based ones in that the “rules” mapping words and phrases from one language to another are learned by the system rather than being hand-coded. Training an SMT requires amassing a large amount of parallel training data—hopefully of good quality and from heterogeneous sources—and training the engine on that data. (By parallel, we mean a source of data where the content for one language is the same as the content for the other.) The engine learns the correspondences between words and phrases in one language and those in another, which are often reinforced by repeated occurrences of the same words and phrases throughout the input. For instance, in training the English-German system let’s say, if the engine sees the phrase All rights reserved on the English side and also notices Alle Rechte vorbehalten on the German side, it may align these two phrases, and assign some probability to this alignment. Repeated occurrences of the source and target phrases in the training data will only reinforce this alignment.
Generally, having parallel data for a language pair means we can train engines in both directions (i.e., both the English-German and the German-English systems can be trained on the same input sentences). Some of you had some questions regarding why it was that we released the English-Spanish system before we released Spanish-English. There were really two reasons. First, English-Spanish was the first general domain language pair we released. Releasing one language pair allowed us to test the infrastructure before we started releasing more. Second, the technology for Spanish-English was slightly different than that used for English-Spanish, and we need some additional time to do the necessary infrastructural changes to accommodate. In the future, we plan to release new translation systems in pairs (with a couple of exceptions). I can’t reveal what languages we have planned next, but do expect some new ones soon!
For those of you interested in technical discussions regarding our engines and how they work, please refer to some of the papers by the researchers who developed them. Three recent papers of note are:
Chris Quirk, Arul Menezes. Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation May 2006 New York, New York, USA Proceedings of HLT-NAACL 2006
Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43-65 (Attached file)
Chris Quirk, Arul Menezes. Using Dependency Order Templates to Improve Generality in Translation July 2007 Association for Computational Linguistics
Every now and then I look at visitor logs on the various personal and professional sites/blogs that I administer. It makes for a fascinating experience to see the many places worldwide that visitors come from. I have often wondered about non English speakers and how I could make my writing more accessible to them. While some professional and company web sites have translated versions available, in many user forums and communities across the web there have been requests for a translated version of the pages/posts. Today, on many sites, I have to copy the text on the site, paste it into a translator and look at the translation. It is cumbersome and not very seamless in an otherwise smooth navigation experience.
I am very pleased to say Windows Live Translator solved this problem with the latest feature addition that rolled out this week. Now on the Live Translator home page you will find a new link "Add the web page Translator to your site". By clicking on this link you go to a page that offers snippets of code that can be added to individual web pages for which you wish to offer translations.
The code generator will create the appropriate widget depending on the source language of your site. Refer to the Live Translator introduction post where Andrea listed the language pairs that we currently support.
So here is what you do to have a link on your web page to translate it:
Step 1: Click on the Add the web page Translator to your site link
Step 2: Select the language your web page is written in (source language)
For example: Since all the articles on my blog are in English, I choose English as the source language
Step 3: The code that you need to copy and paste into your web page's HTML is generated in the box
For example: Since I chose English, the code that is generated looks like this
Step 4: Copy that code and paste it into the page that should offer translation.
For example: On my blog say I want the blog post I wrote about Live Translator to be translated, I go into the blog editor and paste it like so:
If the blog or web page uses templates, one could also paste the code into a template - thereby providing the Translate This Page widget on all pages
Step 5: Enjoy an expanded (and hopefully more appreciative) audience!
The end result on my blog looks like this in the case of a single post translation:
The end result looks like this if I put it in the template (this allows for translation of every post):
For the more technically minded here is some more information on the parameters that the Live Translator accepts:
where lp is the language pair (such as en_fr for english to french) for source and target languages. a is the URL you want translated.
where lp is the language pair (such as en_fr for english to french) for source and target languages. a is the URL you want translated.
The Windows Live focused community site ViaWindowsLive is making creative use of the Live Translator to make their site available in multiple languages (look on the left bottom of the page). I would love to check out how you might be utilize this new feature. Feel free to post a link to your site in the comments.
Edit: Updating the parameters link
Klingon* is now a supported option on the Bing Translator site, allowing you to translate text snippets and web pages to and from Klingon. It is also available within the Translator widget, allowing Klingon visitors of your site to see it in their language. Bing Translator for Windows Phone added Klingon as a supported language, for text mode input/output and camera mode output. On the Bing Translator site you can also choose to translate to both Latin-script Klingon and to plqaD (the Klingon script). Please note that if you are translating from Klingon, you would need to explicitly select the language (rather than rely on Auto-detect).
This system has been built as a labor of love, in close partnership with members of the Klingon Language Institute (KLI) headed by Dr.Lawrence Schoen, Prof. Marc Okrand, the inventor of the Klingon language, many other Klingon enthusiasts inside and outside Microsoft. We received fantastic support from our fellow Star Trek fans at Paramount and CBS.
Building a new translation system from scratch is a challenging affair, requiring a large amount of training documents, many iterations of training the engine, reviewing and evaluating, and repeating this many times. What you initially get is mostly unintelligible, and with continued learning comes the improvement – both in vocabulary and in fluency. While there is a great amount of training material for such a system in mainstream languages like English, French or German, Klingon is a language that does not (yet!) have comparable volume of “parallel” (translated) text, or even material in Klingon alone. Our friends in the community were able to help us gather what is available, and used the Microsoft Translator Hub to train the initial engine. Members of the community were then able to review, critique and correct the translation errors this infant system was making. These corrections directly influenced the next training run, and thus the system has been getting better every day. Given its infancy, and the distance it has yet to travel to achieve the necessary fluency and vocabulary – Klingon will stay as an experimental language in Bing Translator for the time being.
We wish to thank the Klingon language community, Prof.Okrand, Dr.Schoen and CBS/Paramount for helping make this a reality. If you are a Klingon speaker and wish to join the Hub community built around this effort, please email email@example.com or firstname.lastname@example.org. Not everyone can have Lieutenant Uhura translate for them, so we hope Bing Translator’s Klingon support comes handy next time you are in a pinch.
lupDujHomwIj lubuy'moH gharghmey
- Vikram Dendi & the Translator team at Microsoft
Update (2:52 PM): Added note about auto-detection, and other minor edits.
* Klingon is a trademark of CBS Studios Inc.
Here is a translated version of the original Klingon Empire Announcement:
tlhIngan Hol 'oH qIb Hol wa'DIch'e' mughlaHbogh Bing Translator 'e' maq tlhIngan yejquv, boqbogh tlhIngan Hol yejHaD, Microsoft je.
Klingon is the first galactic language which can be translated by Bing Translator, announces the Klingon High Council, in alliance with the Klingon Language Institute and Microsoft.
qaStaHvIS DISmey, yuQjIjDIvI' luSuchtaHvIS tlhInganpu''e', Qatlhqu' tlhIngan Hol mughmeH 'ej tera' Holmey mu'tlhegh lIngmeH Qu', nuja' tlhInganpu'. tera' Holmey rurbe'chu' tlhIngan Hol, 'ej 'oH HaDtaH tera'ngan law'. wejmaH tera' Sep, Hoch puH'a' je Dab HaDwI'pu'. qIb ghatlh tlhIngan Hol, tlhIngan tIgh je 'e' 'agh ngoDvam.
For years, Klingons have told us that the task of translating Klingon and producing sentences in Earth languages while visiting the UFP is very difficult. Klingon is truly unlike Earth languages, and many Earthlings (continue to) study it. Students (of Klingon) live in thirty different Earth regions (countries) and all great landmasses. This fact demonstrates the galactic dominance of Klingon language and the Klingon Way.
tlhIngan Hol chelta'mo' Bing Translator, qIb lengwI'vaD, tlhIngan wo' SuchwI'vaD je nuH 'ut mojbej mughwI'. Hoch SepDaq, tera'nganvaD tlhIngan Hol, tlhIngan tIgh je lIH Bing Translator mughmeH laHmey. pIj mughwI' lo'chugh taghwI', nom tlhIngan Hol pab pIn moj.
Because Bing Translator has added Klingon, the translator will certainly become and essential weapon (tool) for (the) galactic traveler and (the) visitor to the Klingon Empire. In every region (country), the translation abilities of Bing Translator will introduce Earthlings to the Klingon language and the Klingon Way (culture). If beginners frequently use the translator, they will quickly become grammarians of the Klingon language.
Qo'noS Qombogh muD, tuj'a', Debmey tIn je SIQlaH tera'nganpu'. pIraQSIS Qaw'lu'mo' choHpu' Qo'noS 'e' leghlaH je. Bing Translator lo'taHvIS lengwI', lengDI' bel, 'ej roD batlhHa' vangbe'laH 'ej tIgh chach junlaH. Microsoft Bing Translator, qum chaw' je ghajchugh «SuvwI' lengmey» lengwI', tlhIngan SuvmeH tIgh 'ut ghojlaH, qagh SoplaH ghopDu'Daj lo'taHvIS, 'ej pIjHa' QumHa'.
Earthlings will be able to endure (experience) the quaking (turbulent) atmosphere, great heat and large deserts of Qo'noS. They will also be able to see that Qo'noS has changed due to the destruction of Praxis. While the traveler uses Bing Translator, he will be comfortable while travelling, and will usually be able to not act dishonorably and avoid cultural emergencies. With Microsoft Bing Translator and a government permit, "Warrior Tours" travelers can learn essential Klingon fighting, eat qagh with their hands and infrequently miscommunicate.
che'ronDaq mughwI' mu'tlheghmey, mu'mey je tobta' tlhIngan Hol yejHaD. jIjDI' tlhIngan Hubbeq, 'ejyo' je, toy'beH mughwI'. 'e' poQbej SuvwI' Hol. DaH not Hegh SuvwI' «HIjol» mughHa'DI' boq beq 'ej «HIQoj» mojDI'. taHmeH tlhIngan wo''a' HoSghaj, lI'chu' Bing Translator mughmeH laHmey.
The Klingon Language Institute has tested the translator's sentences and words on the battlefield. When the Klingon Defense Force and Starfleet cooperate, the translator will be ready to serve. A warrior language certainly requires that. Now warriors will never die when "Beam me up!" is mistranslated by an alliance crew and becomes "Beam me out!" In order that the powerful great Klingon Empire continue, the translation abilities of Bing Translator will be supremely useful.
Our friends over in the Internet Explorer building recently released a developer preview version of IE8.
There are a lot of interesting features in IE8 developer beta 1, ranging from improved standards compatibility to improving security through elegant tweaks to the address bar. Web slices, improved Favorites bar and the developer toolbar are some other welcome additions to the feature set.
The Activities feature in IE8 is a great way for users to access various web services in a single click. We are very excited to deliver translations through the Translation activity for IE8. If you don't already have it available through the activities menu you can get it (along with other great activities) from here.
For a detailed review on the Translation activity, and to hear it in "non product manager speak", you can check out Helvecio's blog post here. You can download the developer preview IE8 from here. More information on newer releases and other features is available at the IE team blog.
We look forward to hearing about your experiences with the Translation activity for IE8.
Lee Schwartz is a Computational Linguist on the Microsoft Translator team. Today’s guest blog is about getting lost in (machine) translation…
Recently, a user seemed upset with the translation he received for a metal paint can. No wonder. When he translated this into Spanish, he got un metal pintura puede, which means a metal paint is able to. And, what is that supposed to mean? But, then again, what is "meaning" to a machine translation system anyway? Does anything mean anything? Or, is the computer just seeing words in combination in one language and corresponding words in another language? And is it assuming that because one sequence is used in the source language when another is used in the target, one is the translation of another? Even if the machine translation program is just seeing words in combination, wouldn't it have seen paint can before and know that the can in this context is some kind of container? Then, again, can you be sure that the computer behind the MT program knows anything about paint cans, or has seen those two words in combination? Why do you think it would have? But, giving it the benefit of the doubt, and assuming it knows all about paint cans, or at least has seen the string paint can a lot, how is it supposed to know how to translate a metal paint can? Maybe the computer has seen something like The metal film on one side of the plate... may be obtained by ...spraying a metal paint or ....
Ah ha! So there really are metal paints. And, if there are metal paints, why can't a metal paint can be the answer to a metal paint can, can't it? Well, it is just not likely that when you have the words paint and can in sequence, that can means be able to. But then again it is just not likely that can means anything but be able to. I guess we can say things and think things that are just not likely. I can easily understand what A metal paint can can, can't it? means. The computer might just think that I inadvertently typed can twice. Certainly, if it learns from real data, say from the Web, it will see can can a lot. Maybe that is why it won't translate He did the can can correctly. But really, what is English doing with so many types of cans anyway? We can even can worms, but we won’t open that one now.