There has been some great coverage recently on the beta release of our new online translation service. In this post I would like to provide you with information regarding Microsoft’s entry into the free online machine translation field – straight from the horse’s mouth, figuratively speaking.
The URL of the translation home page is http://www.microsofttranslator.com, where you can issue requests for text and web page translations:
Note that a check box option labeled Computer-related content allows you to get better-adapted translations for (computer-) technical text, provided by Microsoft Research’s own statistical machine translation engine. This service is available for the following language pairs:
English - Chinese Simplified English - Chinese Traditional English - French English - German English - Italian English - Japanese English - Portuguese English - Spanish
English - Chinese Simplified
English - Chinese Traditional
English - French
English - German
English - Italian
English - Japanese
English - Portuguese
English - Spanish
Non-technical translations and additional language pairs are currently provided by the latest version of Systran. The additional language pairs available are:
Arabic - English Chinese Simplified - English Chinese Traditional - English Dutch - English English - Arabic English - Dutch English - Korean French - English French - German German - English German - French Italian - English Japanese - English Korean - English Portuguese - English Russian - English Spanish - English
Arabic - English
Chinese Simplified - English
Chinese Traditional - English
Dutch - English
English - Arabic
English - Dutch
English - Korean
French - English
French - German
German - English
German - French
Italian - English
Japanese - English
Korean - English
Portuguese - English
Russian - English
Spanish - English
Our innovative approach to web page translation includes a user interface we refer to as the Bilingual Viewer. It offers 4 types of bilingual views from which users may choose depending on their preference or screen size. The side-by-side and top/bottom views offer synchronized scrolling, highlighting, and navigation (and yes, we still have some wrinkles to iron out there). In the two single language views, you can hover your mouse pointer over a sentence in one language and the corresponding passage in the other language is automatically displayed nearby for ease of reference. Finally, we render the translated text progressively on a web page in order to make it more quickly available for the user to read, while other page elements are still being translated in the background.
Original with hover translation view:
Translation with hover original view:
Live Search will soon be exposing “Translate this page” links on the results page for search results which are in a language that is different from the user’s system language (provided that the required language pair is available from our service). When you click on a “Translate this page” link, the web page will be opened in the Bilingual Viewer (in side-by-side view or the view you selected during your last viewing session).
Language translation is extremely difficult, as the meaning of words and phrases often depends on the context and specialized knowledge of the domain area or culture. Sentence structures and grammatical rules vary significantly between two languages, adding to the complexity of the translation challenge. Currently, it still requires human skills to translate sentences without errors. The quality of today's most advanced translation software is well below the accuracy and fluency of a professional translator, and many sentences are simply not understandable. Researchers are continuously working on improvements, but it may be many years before high quality translation can be consistently offered by a computer. For this reason, we display both the original text and its translation, anticipating that you will find it easier to understand the translation, comparing it with the original content if needed. Some of our translation results (usually for computer-related content) are based on training our translation system on large amounts of bilingual text. The more bilingual or multilingual text we can train our system on, the better our translation quality will become. If you have large amounts of translated text in any subject domain, which you would be willing to share with us, please click here to let us know.
Please refer to our FAQ section for more answers to questions that have reached us, and please do make use of the option to send us Feedback. We have released our first version of this translation service as a Beta, so we can listen to and learn from you how to best meet your needs. Expect to see continuous improvements to the Windows Live Translator Beta.
1) Where can I find help with the translator service?
Help and FAQ are here.
2) How is the text translated?
Text is translated by computer software automatically and without human involvement. Web pages about computer-related topics are translated by Microsoft’s own state-of-the-art statistical machine translation technology which has been trained on large amounts of computer-related data. Web pages about other topics or into languages that are not included in Microsoft’s eight currently supported languages are translated by translation software from Systran.
3) Why is the quality of the translation not as good as I would like it to be?
Language translation is extremely difficult, as the meaning of words and phrases often depends on the context and specialized knowledge of the domain area or culture. Sentence structures and grammatical rules vary significantly between two languages, adding to the complexity of the translation challenge. Currently, it still requires human skills to translate sentences without errors. The quality of today's most advanced translation software is well below the accuracy and fluency of a professional translator, and many sentences are simply not understandable. Researchers are continuously working on improvements, but it may be many years before high quality translation can be consistently offered by a computer. For this reason, we display both the original text and its translation, anticipating that you will find it easier to understand the translation, comparing it with the original content if needed.
Some of our translation results (usually for computer-related content) are based on training our translation system on large amounts of bilingual text. The more bilingual or multilingual text we can train our system on, the better our translation quality will become. If you have large amounts of translated text in any subject domain, which you would be willing to share with us, please click here to let us know.
4) Where can I report translation problems? Can I make corrections to the translation you offer?
Use the Feedback link in the Bilingual Viewer or the Microsoft Translator Beta home page to report problems. We are currently working on additional ways to collect your corrections.
5) What is the difference between rule-based machine translation and statistical machine translation?
1. Rule-based MT systems require extensive dictionaries containing syntactic, semantic and morphological data, and large sets of rules to translate a word or phrase. Of multi-language, broad domain systems, Systran is the industry standard and benchmark for automatic translation, and relies on rule-based technology developed by a large team of linguists over many years. Other best-of-breed rule-based engines exist for specific language pairs, but their quality in a particular language pair does not necessarily scale to other pairs.
2. Statistical MT is based on machine-learning technologies, and relies on large volumes of parallel human-translated texts from which the MT engine can learn. This data must be obtained in every language pair and domain that the machine will be asked to translate in. While the quality of current systems is limited by the availability of parallel data, the potential for language coverage and quality improvements is very promising, as multilingual content on the web increases, and new techniques for mining parallel data are discovered.
6) Which language translations does Microsoft’s own statistical machine translation engine support?
Microsoft’s own translation system is used for translation from English into: German, Spanish, French, Italian, Portuguese (Brazilian), Chinese (simplified and traditional), and Japanese. Other languages are translated using software from Systran.
7) How long has Microsoft been working on the machine translation engine?
Microsoft has been developing statistical machine translation technology in its Research division for over 3 years, and other natural language and machine translation technologies for over 12 years. While the original focus of this work was on Microsoft’s own localization needs, where this technology has been very successful enabling customers from many countries to access help files, knowledge base articles and MSDN articles, we are now pleased to bring it to you for general use. Research and development are ongoing.
8) Why aren't all web pages translated?
There are a number of reasons why we may not be able to translate content on a certain web page:
· System requirements may not be met
· We do not transmit any https (secure web pages) content to our translation server, as this could be considered a phishing attempt. You still have the opportunity to navigate to an https site yourself and copy/paste paragraphs into the translator.live.com translation box, but in order to respect secure information 100%, we will not automatically send content that is protected by https to our server.
· Flash and text on images cannot be translated.
· The Bilingual Viewer uses frames to display translations to you. Pages containing scripts which enforce that a web page is not displayed in a frame will not be translated.
9) When do I see a “Translate this page” link next to a Live Search result?
This link is available when Live Search has found a web page which is in a different language than the default (or chosen) language for your browser, and if we can offer a translation between the web page language and your default language.
10) How do I set my preferred translation language?
You can select a translation language on the Microsoft Translator Beta home page (www.microsofttranslator.com) for your text or web page translations.
Once you see web page translations in the Bilingual Viewer, you may change your original and translation language.
The Language tab under Search Options (accessible through the “Options” link on the Live Search page) allows you to select a translation language into which pages are translated. This may be other than the default language of your browser. It also allows you to tell the system to return results only in certain languages.
11) Can I set a preferred translation view within the bilingual viewer?
You can select a view from within the bilingual viewer. If you use a compliant browser, this preference will persist between viewing sessions.
12) How is this service similar to others in the market, and how is it different?
Like other systems in the market, the Microsoft system offers free text and web page translation. Microsoft’s web page translator differs from the competitors in its use of a unique Bilingual Viewer, which shows the original and translation side by side, and which customers have said they prefer over other viewing modes. It also offers higher quality translations of computer-related technical text through the use of Microsoft’s own statistical MT technology.
Microsoft’s MT portfolio is focused on meeting customer needs through intuitive integration of MT into products and services that are part of our users’ everyday workflow, and creating as user-friendly an experience as possible. As such, we welcome customer feedback and look forward to hearing suggestions.
13) Will I be able to translate documents within Word?
Currently, Microsoft Office allows you to translate words, phrases or documents. For some languages, Office offers a screen-tip dictionary lookup for individual words. A “Translation” option that is available in the Office ribbon-UI as well as in the right-click menu allows the user to request translations of words, entire phrases, and even Word documents via the Research & Reference pane. There, users may select a translation language and enter sentences for translation into a query box, or they may click on a “Translate the whole document” button which will provide translations for short documents.
14) I’ve read that there are five views offered with the bilingual viewer. Where is the fifth view?
This even had the creators of the Bilingual Viewer surprised :). They share two competing theories about what the mysterious 5th view might be, and you can help solve this mystery. Please send your discovery or guesses here. We will publish all unique findings on this blog … and who knows—maybe you will uncover a secret 6th or 7th view?
15) When can I use the translator for language X?
We are constantly working to improve the quality of existing translations as well as those languages that are not ready yet. Keep an eye on this blog for announcements as to when new languages are added to the service.
16) I have a feature suggestion, what do I do?
Use the Feedback link in the Bilingual Viewer or the Microsoft Translator Beta home page to send feature requests. While we might not be able to respond personally, be assured that the team will be looking at each request that comes in.
17) What are the system requirements for the service?
Welcome to our blog! We are very excited to bring to you news and insights into work (and fun) at the Machine Translation (MT) Group within Microsoft Research. We have great mix of researchers, developers, testers, program managers, linguists, designers and product managers working on MT here, and we are pleased to launch this blog as a way to connect with customers, partners and other friends of MT. We hope this will provide greater insight into the work we do and who we are, and we are very excited to be talking to you.
Machine Translation (MT), to those that don’t know it, is exactly as it sounds: using a “machine” (in most cases computer software) to translate text from one human language to another. There have been many different approaches developed in this area and results have been improving over time. You will hear from members of the team that have been working on this technology and hear about how the research breakthroughs are coming to a desktop near you. We will be introducing you to the team building the new Microsoft Translator and you will get some background on the technology used for the site.
RSS and Atom feeds are available for all posts or specific categories on this blog. For now anonymous blog comments are under moderation – I am hopeful that as long as spam levels remain low we can keep it that way.
Once again, thank you for visiting the blog! Cheers!
Microsoft Research’s Machine Translation (MSR-MT) group has been among the leading research organizations in the machine translation space for over 8 years, and some of the foundational work in natural language processing at MSR began over 16 years ago. The team’s approach to machine translation integrates linguistic features with state-of-the-art statistical machine translation algorithms. The team’s focus has always been on automatically acquiring translation knowledge from bilingual corpora, i.e., parallel data consisting of original source language sentences and their corresponding translations by human translators. About 3 years ago, the team’s focus shifted from a purely rule-based approach to this task toward a hybrid approach that includes extensive statistical processing, allowing for greater scalability across domains and into new languages.
Microsoft’s Machine Translation technology was first developed for in-house localization purposes, to allow our Customer Support organization to publish technical support documents with a frequency and language breadth that would have been prohibitively expensive using human translators. With all of Microsoft’s previously human-translated documents and localized software at its disposal, the MT team was able automatically to train its statistical MT engine to achieve quite good quality in the technical domain. This technology was extended to support the Windows localization team, the Developer Division, MSDN, and several other groups within Microsoft. It has also allowed Microsoft to reach many more customers than would have ever been possible using human translation alone.
After focusing on Microsoft’s own translation needs, the team began to build a scalable web service that would allow it to provide translation services to the general public, as a standalone tool on the web, and as a feature within other products. Given that the Microsoft MT engine has been trained most heavily on technical data, it has not yet been tuned for translating text in other subject domains. However, we hope to continue improving the quality and breadth of the engine. We look forward to sharing our developments with you over the coming months on this blog.