Will Lewis is a program manager on the Microsoft Translator team, working on language quality and data acquisition. Today's guest blog is a high level explanation of how the engine works:
As many of you know, under the hood Microsoft Translator is powered by a Statistical Machine Translation (SMT) engine. Statistical systems are different than rule-based ones in that the “rules” mapping words and phrases from one language to another are learned by the system rather than being hand-coded. Training an SMT requires amassing a large amount of parallel training data—hopefully of good quality and from heterogeneous sources—and training the engine on that data. (By parallel, we mean a source of data where the content for one language is the same as the content for the other.) The engine learns the correspondences between words and phrases in one language and those in another, which are often reinforced by repeated occurrences of the same words and phrases throughout the input. For instance, in training the English-German system let’s say, if the engine sees the phrase All rights reserved on the English side and also notices Alle Rechte vorbehalten on the German side, it may align these two phrases, and assign some probability to this alignment. Repeated occurrences of the source and target phrases in the training data will only reinforce this alignment.
Generally, having parallel data for a language pair means we can train engines in both directions (i.e., both the English-German and the German-English systems can be trained on the same input sentences). Some of you had some questions regarding why it was that we released the English-Spanish system before we released Spanish-English. There were really two reasons. First, English-Spanish was the first general domain language pair we released. Releasing one language pair allowed us to test the infrastructure before we started releasing more. Second, the technology for Spanish-English was slightly different than that used for English-Spanish, and we need some additional time to do the necessary infrastructural changes to accommodate. In the future, we plan to release new translation systems in pairs (with a couple of exceptions). I can’t reveal what languages we have planned next, but do expect some new ones soon!
For those of you interested in technical discussions regarding our engines and how they work, please refer to some of the papers by the researchers who developed them. Three recent papers of note are:
Chris Quirk, Arul Menezes. Do we need phrases? Challenging the conventional wisdom in Statistical Machine Translation May 2006 New York, New York, USA Proceedings of HLT-NAACL 2006
Chris Quirk, Arul Menezes. Dependency Treelet Translation: The convergence of statistical and example-based machine translation? March 2006 Machine Translation 43-65 (Attached file)
Chris Quirk, Arul Menezes. Using Dependency Order Templates to Improve Generality in Translation July 2007 Association for Computational Linguistics
The free Bing Translator app for Windows Phone continues to be one of the most popular and best reviewed applications for the Windows Phone – surpassing a million downloads and garnering average ratings between 4 and 5 stars since release. Combining Augmented Reality Translation using your camera, speech & text translation, word-of-the-day live tiles and a travel optimized offline mode the app has received rave reviews and has been highlighted as one of the most innovative translation apps on any platform.
For the past few weeks the team has been heads down getting the app ready and tested for the new phones running Windows Phone 8, and we are pleased to announce that owners of the new Windows Phone devices are now able to download the app from the App Store.
You can download from the marketplace here.
As a Windows Phone 8 user, you will also discover a new translator “lens” whenever you launch your camera – allowing you to quickly access the camera mode translation functionality of the app.
For those of you who are new to the app, here is a behind-the-scenes look:
We hope you find the app useful as you navigate an increasingly multilingual universe.
- Vikram Dendi, Director of Product Management, Microsoft/Bing Translator
Klingon* is now a supported option on the Bing Translator site, allowing you to translate text snippets and web pages to and from Klingon. It is also available within the Translator widget, allowing Klingon visitors of your site to see it in their language. Bing Translator for Windows Phone added Klingon as a supported language, for text mode input/output and camera mode output. On the Bing Translator site you can also choose to translate to both Latin-script Klingon and to plqaD (the Klingon script). Please note that if you are translating from Klingon, you would need to explicitly select the language (rather than rely on Auto-detect).
This system has been built as a labor of love, in close partnership with members of the Klingon Language Institute (KLI) headed by Dr.Lawrence Schoen, Prof. Marc Okrand, the inventor of the Klingon language, many other Klingon enthusiasts inside and outside Microsoft. We received fantastic support from our fellow Star Trek fans at Paramount and CBS.
Building a new translation system from scratch is a challenging affair, requiring a large amount of training documents, many iterations of training the engine, reviewing and evaluating, and repeating this many times. What you initially get is mostly unintelligible, and with continued learning comes the improvement – both in vocabulary and in fluency. While there is a great amount of training material for such a system in mainstream languages like English, French or German, Klingon is a language that does not (yet!) have comparable volume of “parallel” (translated) text, or even material in Klingon alone. Our friends in the community were able to help us gather what is available, and used the Microsoft Translator Hub to train the initial engine. Members of the community were then able to review, critique and correct the translation errors this infant system was making. These corrections directly influenced the next training run, and thus the system has been getting better every day. Given its infancy, and the distance it has yet to travel to achieve the necessary fluency and vocabulary – Klingon will stay as an experimental language in Bing Translator for the time being.
We wish to thank the Klingon language community, Prof.Okrand, Dr.Schoen and CBS/Paramount for helping make this a reality. If you are a Klingon speaker and wish to join the Hub community built around this effort, please email firstname.lastname@example.org or email@example.com. Not everyone can have Lieutenant Uhura translate for them, so we hope Bing Translator’s Klingon support comes handy next time you are in a pinch.
lupDujHomwIj lubuy'moH gharghmey
- Vikram Dendi & the Translator team at Microsoft
Update (2:52 PM): Added note about auto-detection, and other minor edits.
* Klingon is a trademark of CBS Studios Inc.
Here is a translated version of the original Klingon Empire Announcement:
tlhIngan Hol 'oH qIb Hol wa'DIch'e' mughlaHbogh Bing Translator 'e' maq tlhIngan yejquv, boqbogh tlhIngan Hol yejHaD, Microsoft je.
Klingon is the first galactic language which can be translated by Bing Translator, announces the Klingon High Council, in alliance with the Klingon Language Institute and Microsoft.
qaStaHvIS DISmey, yuQjIjDIvI' luSuchtaHvIS tlhInganpu''e', Qatlhqu' tlhIngan Hol mughmeH 'ej tera' Holmey mu'tlhegh lIngmeH Qu', nuja' tlhInganpu'. tera' Holmey rurbe'chu' tlhIngan Hol, 'ej 'oH HaDtaH tera'ngan law'. wejmaH tera' Sep, Hoch puH'a' je Dab HaDwI'pu'. qIb ghatlh tlhIngan Hol, tlhIngan tIgh je 'e' 'agh ngoDvam.
For years, Klingons have told us that the task of translating Klingon and producing sentences in Earth languages while visiting the UFP is very difficult. Klingon is truly unlike Earth languages, and many Earthlings (continue to) study it. Students (of Klingon) live in thirty different Earth regions (countries) and all great landmasses. This fact demonstrates the galactic dominance of Klingon language and the Klingon Way.
tlhIngan Hol chelta'mo' Bing Translator, qIb lengwI'vaD, tlhIngan wo' SuchwI'vaD je nuH 'ut mojbej mughwI'. Hoch SepDaq, tera'nganvaD tlhIngan Hol, tlhIngan tIgh je lIH Bing Translator mughmeH laHmey. pIj mughwI' lo'chugh taghwI', nom tlhIngan Hol pab pIn moj.
Because Bing Translator has added Klingon, the translator will certainly become and essential weapon (tool) for (the) galactic traveler and (the) visitor to the Klingon Empire. In every region (country), the translation abilities of Bing Translator will introduce Earthlings to the Klingon language and the Klingon Way (culture). If beginners frequently use the translator, they will quickly become grammarians of the Klingon language.
Qo'noS Qombogh muD, tuj'a', Debmey tIn je SIQlaH tera'nganpu'. pIraQSIS Qaw'lu'mo' choHpu' Qo'noS 'e' leghlaH je. Bing Translator lo'taHvIS lengwI', lengDI' bel, 'ej roD batlhHa' vangbe'laH 'ej tIgh chach junlaH. Microsoft Bing Translator, qum chaw' je ghajchugh «SuvwI' lengmey» lengwI', tlhIngan SuvmeH tIgh 'ut ghojlaH, qagh SoplaH ghopDu'Daj lo'taHvIS, 'ej pIjHa' QumHa'.
Earthlings will be able to endure (experience) the quaking (turbulent) atmosphere, great heat and large deserts of Qo'noS. They will also be able to see that Qo'noS has changed due to the destruction of Praxis. While the traveler uses Bing Translator, he will be comfortable while travelling, and will usually be able to not act dishonorably and avoid cultural emergencies. With Microsoft Bing Translator and a government permit, "Warrior Tours" travelers can learn essential Klingon fighting, eat qagh with their hands and infrequently miscommunicate.
che'ronDaq mughwI' mu'tlheghmey, mu'mey je tobta' tlhIngan Hol yejHaD. jIjDI' tlhIngan Hubbeq, 'ejyo' je, toy'beH mughwI'. 'e' poQbej SuvwI' Hol. DaH not Hegh SuvwI' «HIjol» mughHa'DI' boq beq 'ej «HIQoj» mojDI'. taHmeH tlhIngan wo''a' HoSghaj, lI'chu' Bing Translator mughmeH laHmey.
The Klingon Language Institute has tested the translator's sentences and words on the battlefield. When the Klingon Defense Force and Starfleet cooperate, the translator will be ready to serve. A warrior language certainly requires that. Now warriors will never die when "Beam me up!" is mistranslated by an alliance crew and becomes "Beam me out!" In order that the powerful great Klingon Empire continue, the translation abilities of Bing Translator will be supremely useful.
Our friends over in the Internet Explorer building recently released a developer preview version of IE8.
There are a lot of interesting features in IE8 developer beta 1, ranging from improved standards compatibility to improving security through elegant tweaks to the address bar. Web slices, improved Favorites bar and the developer toolbar are some other welcome additions to the feature set.
The Activities feature in IE8 is a great way for users to access various web services in a single click. We are very excited to deliver translations through the Translation activity for IE8. If you don't already have it available through the activities menu you can get it (along with other great activities) from here.
For a detailed review on the Translation activity, and to hear it in "non product manager speak", you can check out Helvecio's blog post here. You can download the developer preview IE8 from here. More information on newer releases and other features is available at the IE team blog.
We look forward to hearing about your experiences with the Translation activity for IE8.
Lee Schwartz is a Computational Linguist on the Microsoft Translator team. Today’s guest blog is about getting lost in (machine) translation…
Recently, a user seemed upset with the translation he received for a metal paint can. No wonder. When he translated this into Spanish, he got un metal pintura puede, which means a metal paint is able to. And, what is that supposed to mean? But, then again, what is "meaning" to a machine translation system anyway? Does anything mean anything? Or, is the computer just seeing words in combination in one language and corresponding words in another language? And is it assuming that because one sequence is used in the source language when another is used in the target, one is the translation of another? Even if the machine translation program is just seeing words in combination, wouldn't it have seen paint can before and know that the can in this context is some kind of container? Then, again, can you be sure that the computer behind the MT program knows anything about paint cans, or has seen those two words in combination? Why do you think it would have? But, giving it the benefit of the doubt, and assuming it knows all about paint cans, or at least has seen the string paint can a lot, how is it supposed to know how to translate a metal paint can? Maybe the computer has seen something like The metal film on one side of the plate... may be obtained by ...spraying a metal paint or ....
Ah ha! So there really are metal paints. And, if there are metal paints, why can't a metal paint can be the answer to a metal paint can, can't it? Well, it is just not likely that when you have the words paint and can in sequence, that can means be able to. But then again it is just not likely that can means anything but be able to. I guess we can say things and think things that are just not likely. I can easily understand what A metal paint can can, can't it? means. The computer might just think that I inadvertently typed can twice. Certainly, if it learns from real data, say from the Web, it will see can can a lot. Maybe that is why it won't translate He did the can can correctly. But really, what is English doing with so many types of cans anyway? We can even can worms, but we won’t open that one now.