While we at the Machine Translation team have been seeing increasing traffic to our various offerings over the past few months, we noticed a sudden bump in traffic yesterday. Having grown up on Agatha Christie and Sherlock Holmes, such mysteries are irresistible for me – and a number of other folks on the team were just as curious to find out what caused this sudden bump. We figured that the IE8 Activity/Accelerator, the Messenger Bot, Search translations, Office translations were all showing the same upward trend as the days before and thus were not the specific reason for this bump.
Eventually, we were able to identify one potential reason why we were seeing this spike. Our user community found an oddity in how the machine translation engine processed the translation for several names from English to German. It was to be expected that when the engine translates the name of the candidate of one party to someone from the other party, given the current political atmosphere in the run up to US elections, that it would end up as news. While we certainly welcome all the new users that came by to check this phenomenon out – we wanted to share with our users the reason why such things seem to happen from time to time with statistically trained machine translation systems from us and others.
A Statistical Machine Translation engine is trained on lots and lots of parallel data, that is, data that exists in both a source language (e.g., English) and a target language (e.g., German), where the source and target are translations of one another. Our engine is trained on millions of sentences for each language pair we support. In order to train on a particular corpus of data—maybe a large number of newswire articles in English which have been translated into German—we first have to break that corpus down into sentences. After the corpus is sentence broken, we feed the resulting sentences into a sentence aligner, the sole purpose of which is to find what sentences on the source side align with sentences on the target side. This is no trivial task, since a sentence on one side could conceivably align with one or more sentences on the target (or possibly none at all!). The aligner will sometimes make mistakes, and misalign one sentence with another that is in fact not a translation. This can lead to some mistranslations, especially if there are words in the source and target that are infrequently occurring. Since our translation engine is statistical, it is highly reliant on co-occurrence frequencies between words in the source and target data. If certain words are infrequently occurring—people’s names, for instance, may only occur a few times across a corpus of millions of sentences—the lack of frequency can lead to mistranslations resulting from incorrect “guesses” between source and target (i.e., low probabilities assigned to particular source and target words). This can lead to some comical gaffes in our translation system.
So, that is how the “machine” decided to translate in a way that ended up with the community attributing it to the sense of humor of our team. While we continue to work hard to ensure proper alignments, it is to be expected from a statistical system that is built on millions to billions of words that such a situation could repeat.
The current issue with alignment should now be resolved but we urge our community of users to keep helping us identify any such situations by contacting us through this blog.
Our web page translation includes a user interface we refer to as the Bilingual Viewer. It offers 4 types of bilingual views which users can choose depending on preference. The side-by-side and top/bottom views offer synchronized scrolling, highlighting, and navigation. In the two single language views, you can hover your mouse pointer over a sentence in one language and the corresponding passage in the other language is automatically displayed nearby for ease of reference. Finally, we render the translated text progressively on a web page in order to make it more quickly available for the user to read, while other page elements are still being translated in the background.
To change your view, just click on one of the four options in the “Views” section on the upper right part of the site:
Original with hover translation view:
Translation with hover original view:
Note: when you click on “Translate this page” while using Live Search, the web page will be opened in the Bilingual Viewer (in side-by-side view or the view you selected during your last viewing session). Read more about that here.
Check out the bilingual viewer today if you haven’t played around with it before! And as always, let us know your feedback :)
Did you know that Microsoft Translator powers translations in Live Search Bing?
For example, to translate this search result, click "Translate this page" at the end of the result description:
You'll see the page in a bilingual view, with the original page on the left, and the translated page on the right.
Here is the list languages we support today:
We'll roll out more languages over the next several months.
So try clicking 'Translate this page' in your search results. Let us know what you think!
Bing Search Blog
Update (11/15): Edited the link to the Bing Search blog.