Nearly a year ago Microsoft Translator unveiled an innovative new approach to translating web pages – one that enabled webmasters to bring the power of automatic machine translation to their sites with a snippet of java script. Unlike any other quick and easy solution out there at that time, the Microsoft Translator webpage widget integrated the translation experience into your site, and did not take your users away to a different translation site. Here is our friend Doug Thomas, in his inimitable style, explaining how the same powerful translation technology that powers translation inside Office can power your site.
The widget was a showcase for the broad set of APIs that we announced at the same time – APIs that have been used by many partners since that time to build a variety of software, services and sites.
You all know that we were the first major translation service to provide a Haitian Creole system to help with the relief efforts underway in Haiti. One of the key motivators for us to build the system was Rick Engle, a fellow Microsoft professional who in his various endeavors to help with the relief efforts wanted to write an application to help the workers on the ground in Haiti. Since the time we added the language to our supported list, Rick went ahead and built the mobile app he had originally set out to build. You can find it here and it works for all languages that our service supports. The goal for having a full set of APIs (including HTTP, SOAP and AJAX) has always been to help developers like Rick focus on building great applications without a lot of heavy lifting, and we will continue to invest in that direction.
When we announced the availability of the widget and the APIs, we articulated our mission – to empower content providers, site owners and developers to deeply integrate translations into their sites and communities – truly bringing translations “anywhere” they are needed. As MIX 2010 approaches, we are working towards showcasing the next wave of our partner focused innovations.
We love MIX – where we get to meet developers that understand design, designers that understand strategy, strategists that understand technology... We get to discuss language technology with a German developer building software for an English company that serves customers from China to Brazil and we get to hear great feedback about what new browsers should we be testing our AJAX controls against. It’s a brilliant “mix” of creativity, ingenuity and passion and we are glad that we have made it “our” conference to share with the world what new things that we have been cooking up.
A bunch of us with be at MIX2010, and those of you that will be there can expect some goodies in the attendee bag from our team. Do mark your schedule for our session – it’s at Lagoon H on Monday at 2 PM. If you were at last year’s session – you know how much fun it is. Oh also - we have some heavy boxes we are lugging with us. :)
If you are not at MIX (this is going to be the most attended MIX ever!), do not worry. We will have plenty of information posted here and on our site about what we are announcing at MIX on Monday. In addition, we hope to have Doug back – explaining the latest and the greatest in translation soon after that. Stay tuned!
- Vikram Dendi, Senior Product Manager, Microsoft Translator
A little while ago I was asked to figure out a solution to a user experience problem that was affecting some of our offerings such as the widget, the Bing text and web page translators. A “bug” was assigned to me, asking me to weigh in on how to deal with a problem of plenty: Given we were about to add a substantial set of new languages we were running out of space to display them properly. What could be a quick interim fix?
Several months ago, while announcing the availability of Hebrew in our language list, I had requested our community of users what else they wanted to see supported. Taking into account all the feedback that came in since then, we have been hard at work to add support to new languages. This is why it’s always a pleasure to encounter problems like the one above – they indicate that this work was coming to fruition.
I am happy to announce the addition of seven new languages to our translation service. As always, they will be immediately available for your use through the APIs and all the products that consume the service. Here is the list of languages that have been added in the latest release. In addition there have been several updates to the Haitian Creole language since we last talked about it here.
ROM - Romanian NOR - Norwegian HUN - Hungarian SKY - Slovak SLO - Slovenian LTH - Lithuanian TRK – Turkish
This brings our languages supported number to 30 languages. Here is the full list:
ARA - Arabic CHS - Chinese Simplified CHT - Chinese Traditional NLD - Dutch ENU - English FRA - French DEU - German HEB – Hebrew HT – Haitian Creole ITA - Italian JPN - Japanese KOR - Korean PLK - Polish PTB - Portuguese RUS - Russian ESN - Spanish CSY - Czech DAN - Danish ELL - Greek SVE - Swedish THA - Thai BGR - Bulgarian FIN – Finnish ROM - Romanian NOR - Norwegian HUN - Hungarian SKY - Slovak SLO - Slovenian LTH - Lithuanian TRK - Turkish
Head on over to our forums if you have specific feedback or looking for discussions about these new languages. We continue to work on adding even more languages to the service, so please keep sending us feedback and stay tuned for other announcements on this blog.
With the addition of these new languages, the approach I recommended in the short term is visible in the translation toolbar – the language list uses a smaller font size. In the future, we intend to move to either a multiple column list, or another style of display for the list.
I will look forward to more such problems, since it means we are meeting more expectations from you - our users. Enjoy the new languages!
Translating a website can be tricky – especially if it is not one that you built. A year back at MIX, we helped webmasters and developers take a step towards delivering a seamless, well integrated translation experience using the translator widget and the APIs. Yet, there are still many sites out there that users still need to translate without the help of such technology. Thus the continued popularity of our webpage translator, and the bi-lingual viewer feature that it pioneered.
Some of you might have noticed a significant improvement in how the webpage translator handles certain web pages that it had not done so well in the past. A couple of weeks ago, we released an updated version of the webpage translator that improves site compatibility and delivers better performance. If you have not tried it recently, we urge you to try any sites that you had not been able to get webpage translators to translate on this new release. As always, if you do find problems please don’t hesitate to contact us.
Speaking of MIX, keep an eye on MIX 2010 this year. In addition to the all the buzz around Windows Phone, don’t miss out on this session. :-)
Most of you know that we released the first publicly available Haitian Creole statistical machine translation engine last week and have been hard at work making it even better. I am pleased to announce since last night we rolled out two updates to the system and our site which bring several improvements:
1) More training data = better translations. We trained the system on even more training data (including data that we hand translated) which should reflect in better translations. We are nowhere near done yet, and we will continue to work on this.
2) Updating the AJAX API and widget. The Translator widget (and the underlying AJAX API) now accurately reflect “Haitian Creole” as the language selected in their UI. This was primarily a user interface fix (the Haitian Creole translation itself worked fine). You can use the widget to deliver any webpage in any of the languages we support (including Haitian Creole).
3) Please don’t forget the broad set of APIs and webmaster resources that are available for those that are building applications and websites to help with the relief efforts. There are several efforts underway to develop mobile apps (using the SOAP or HTTP API) and websites (using the AJAX API). If you are working on something along those lines, leave a link to your app/site in the comments and I will make sure to surface them up here so people can find them more easily.
We will continue to work on improving the system and we wish to thank everyone in the community that has been instrumental in helping us get this much requested translation engine out of the door. Stay tuned for more announcements!
Also, let me once again point to a resource where you can help with the broader Haiti relief efforts. Please help in any way you can!
Update (1/31): The DIPLOMAT project at CMU in the 1990s was an earlier project to create a Haitian Creole system for DOD/DARPA. As I mentioned in our earlier blog post, our system makes use of CMU’s data from that project.
In the current crisis in Haiti there are a number of initiatives to rapidly build software to assist in humanitarian aid. Responding to community requests for a machine translation (MT) system to translate between English and Haitian Creole, our team has been hard at work over the last few days. I am glad to announce that an experimental Haitian Creole MT system is now publicly available via several services and APIs powered by Microsoft Translator technologies. We will continue working on improving the system, but we hope meanwhile that in spite of the experimental nature – it will be of use in the relief efforts.
1) What is being announced today?
Responding to requests from the community involved in Haitian relief efforts, Microsoft Research is making available today an experimental machine translation system for translating to and from Haitian Creole. You can try it at http://translate.bing.com or http://www.microsofttranslator.com.
2) How is it significant?
With the devastating disaster that struck Haiti, we have all been individually pitching in to help the efforts. This is our effort, as a team, to respond to the needs of communities such as Crisis Commons by delivering a Haitian Creole translator which can be of help to individual users, as well as other technology projects that could use a scalable translation system in their relief endeavors. Further, the usage of our API is completely free and it can be built into any application or website for immediate use. We hope that this might help the many applications being developed (such as those on crisiscommons.org) to aid the humanitarian efforts.
3) How can I use this system?
The Haitian Creole translator is now part of the Microsoft Translator web service enabling many of the user scenarios powered by the service. Users can access the service through the Microsoft Translator web site. Developers would be interested in looking at our APIs – and choose from SOAP or HTTP (Support for Haitian in our AJAX API will be rolled out in the coming days).
4) How is it different from other efforts?
There have been some great efforts in quickly building dictionary and rule-based Haitian Creole translation tools. The statistical machine translation system behind Microsoft Translator allows for a continuous improvement in the quality of translations (by adding more training data). Also, by delivering this as part of our web service we can ensure scale and performance and open up the possibility of using our many scenarios (Bing Translator, Internet Explorer 8, Messenger Bot etc.) with Haitian Creole, as well as using our extensive API set to add such support to other software and web sites at no cost.
5) What was involved in getting this out of the door in record time?
The process involved identifying parallel (translated) data between English and Haitian Creole, and training the MT engine to create the requisite language models. We would also like to acknowledge the great work being done the Crisis Commons folks, the dictionary builders at haitisurf.com, the folks at CMU that made available parallel data and the Microsoft volunteers who challenged our team to action.
6) What should I expect in terms of quality?
This is an experimental system put together in record time. While our typical approach to adding new languages involves significantly larger amounts of training, a higher threshold for quality testing – we decided that the upside warranted making the system available to the community at the earliest, and continue improving it subsequently. We are working diligently to keep improving the quality, but bear with us if you encounter problems. You can always contact us at firstname.lastname@example.org with feedback. Our user and developer forums are also available to discuss any issues you encounter.
7) How can I help improve the system?
The best way you can help improve the system is by helping us find more training data. This is typically sentences or words translated between English and Haitian Creole. We intend to make available to the larger community (via tausdata.org) data that we collect (as license restrictions permit) for training purposes. If you know of dictionaries, translated sentences, or websites that have such translations we urge you to contribute it to TDA’s TAUS data sharing initiative. TDA is a non-profit organization providing a neutral and secure platform for sharing language data. If you have any concerns or questions feel free to contact us at email@example.com.
8) How can I help the broader Haiti relief efforts?
Go here to learn more about how you can help those devastated by the earthquake.
9) Where can I get more information?
Please stay tuned to our blog for further announcements. You can learn more about Microsoft Translator and the services we offer here.
10) What can we expect next?
In the coming days expect to see support for Haitian Creole added to even more of our scenarios (Translation Bot, Translator widget, Office etc) as well as the AJAX API. Known issues and announcements can also be found on our forums.
We hope that this contribution proves useful to the various humanitarian efforts underway, and please stay tuned to this blog for further news on the Haitian Creole language support. If you have any questions feel free to contact us at firstname.lastname@example.org.
Update (2:53 PM PST): The Messenger Translation Bot can now speak Haitian Creole. Add email@example.com to your messenger buddy list. Try the group conversation feature with a Kreyol speaker!
-Vikram Dendi, Senior Product Manager, Microsoft Translator