Download Research Tools
The following is the first of three blogs on the contributions of the Microsoft Research Asia Joint Lab Program (JLP), which recently celebrated its tenth anniversary. The JLP brings together the resources of Microsoft Research and major Chinese universities, facilitating collaboration on state-of-the-art research, academic exchange, and talent incubation. This blog focuses on the Microsoft-Harbin Institute of Technology joint lab (Microsoft-HIT; officially the China Ministry of Education–Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology).
Think of countries that have more than one official language. Which ones come to mind? Canada, with two official tongues? Switzerland with four? How about China, which has no less than eight official languages and more than 50 unofficial but widely spoken indigenous tongues. Each of these languages is cherished as a cultural treasure in China, but the multiplicity of minority languages seriously impedes economic, technological, scientific, and educational exchanges between minority groups and the Mandarin-speaking Han, who make up a majority of China’s population.
Resolving this linguistic tangle is exactly the sort of challenge that prompted the creation of the Microsoft Research Asia Joint Lab Program (JLP), and it is the research focus of Microsoft-HIT. Since 2004, Microsoft-HIT researchers have published over 500 academic journal papers and, during just the last five years, presented more than 30 essays at such high-level events as the ACM-SIGIR Conference and the International Joint Conference on Artificial Intelligence (IJCAI).
The fruits of this labor can be seen in a Microsoft-HIT project called Minority Language Machine Translation. The project’s goal is to bridge the linguistic and cultural gulfs that separate different ethnic and national groups, both in China and around the world, and, potentially help preserve endangered minority languages. The project prototype is based on Microsoft Research’s Microsoft Translator Hub, a platform for machine translation between different languages. Utilizing the Microsoft Azure cloud-computing service, the prototype allows users to upload language and translation data and thus build a repository of lexical and grammatical information that can facilitate bilingual translation. While the work to date has focused on machine translation between Mandarin, English, and Uyghur, the underlying principles can be applied to translating between any two languages.
But this project isn’t the only focus of Microsoft-HIT. The joint lab also aims to serve as a talent incubator, mentoring the young researchers who will be the leaders of tomorrow. Microsoft-HIT not only employs a large number of the university’s faculty and graduate students, it also holds an annual summer seminar on natural language processing. Since 2004, the summer seminar has provided more than 2,000 students an opportunity to develop their skills and laid the foundation for advanced research in language processing and speech technology.
Professor Sheng Li, seen here at the 2014 Microsoft Research Asia Faculty Summit, was instrumental in establishing the Microsoft-HIT joint lab.
Although the Microsoft-HIT joint lab dates from 2004, it antecedents stretch back to last century, when, during the 1990s, Microsoft Research Asia worked with Harbin Institute of Technology professor Sheng Li to set up a laboratory on machine translation. In 2000, it became one of the first labs in the Microsoft Research Joint Lab Program and in 2004, the Chinese Ministry of Education (MOE) accorded official recognition to this joint effort, designating it as a MOE-Microsoft Key Laboratory.
Professor Li, who is still deeply involved in the joint lab, credits it with having provided valuable experience to many young faculty members and promising students. He notes that many of these talented researchers have gone onto careers in related industries, but that a significant number choose to stay in the joint lab as either HIT professors or Microsoft researchers.
With the past 10 years of this program as a guide, we look forward to the next decade and beyond, confident that the Microsoft Research-HIT joint lab will foster even greater talent cultivation and research collaboration.
—Tim Pan, Director of University Relations, Microsoft Research Asia,
Big data—that buzzword seems to dominate information technology discussions these days. But big data is so much more than a clever catchphrase: it’s a reality that holds enormous potential. We now have the largest and most diversified volume of data in human history. And it’s growing exponentially: approximately 90% of today’s data has been generated within the past two years. The exploding science of big data is changing the IT industry and exerting a powerful impact on everyday life.
But what should big data science be, and where is it headed? These are the fundamental questions that have prompted Tsinghua University and Microsoft Research Asia to work together to establish a pioneering graduate course on Big Data Foundations and Applications. Turing Award winner and Tsinghua professor Andrew Chi-Chih Yao spent more than eight months developing the course, which launched in September 2014.
Turing Award winner and Tsinghua professor Andrew Chi-Chih Yao
Solidifying knowledge through academia-industry cooperation
On October 9, Hsiao-Wuen Hon, managing director of Microsoft Research Asia, delivered the course’s first lecture. Dr. Hon stressed that the importance of big data lies not only in its value in academic research but also in its application to real-world problems, which, he said, is why the academia-industry cooperation represented by the course is so critical.
“One of our purposes in launching this course with Tsinghua is to introduce Microsoft’s ideas to students, to let them get to know us better,” he explained. “Meanwhile, our top professional researchers can deepen their understanding of big data while teaching the students. So the course is not just about enhancing the students’ understanding of big data; it’s also about solidifying the researchers’ knowledge of big data.”
Hsiao-Wuen Hon, managing director of Microsoft Research Asia, delivered the course's first lecture.
Echoing the importance of the industry-academia connection, Professor Yao remarked, “Big data is an epoch-making subject. It has influenced all the other disciplines, including computer science and information technology. We should not only focus on the scientific research. Education development is also a new trend. ”
Leading the forefront of big data science
Wei Chen, a senior researcher at Microsoft Research Asia, has been a visiting professor at Tsinghua University since 2007. He has helped design and launch several entry-level courses at Tsinghua, and he is a strong proponent of the new big data course.
“We certainly don’t expect this course to become a platform for its product promotion. Instead, it is being established to provide students with cutting-edge knowledge, to get them engaged in research and technology development, and to foster their ability to do research and experimentation,” he said.
Wei Chen, senior researcher at Microsoft Research Asia, talks with student.
Professor Chen pointed out that while the course will provide an academic understanding of big data, it will also introduce students to real-life cases of Microsoft big data research and applications. In addition, students will have the opportunity to conduct experiments using Microsoft Azure, the company’s cloud-computing platform. He believes these practical, hands-on components distinguish this class from other big data courses.
Feeding the talent pipeline
Microsoft Research has a long tradition of collaborating with universities and has undertaken several initiatives to nurture the next generation of talented researchers. Since 2002, for instance, Microsoft Research Asia has hosted over 4,000 interns and carried out projects with more than 40 universities and institutes. The new big data course comes directly out of that tradition, and both Microsoft Research and Tsinghua University have high expectations for this collaboration. Professor Yao probably put it best, saying, “I believe this world-class course will give students a comprehensive understanding of big data and its knowledge structure, helping them reach their goals in future jobs and research.”
—Kangping Liu, Senior Research Program Manager, Microsoft Research Asia
Halloween 2013 brought real terror to an Austin, Texas, neighborhood, when a flash flood killed four residents and damaged roughly 1,200 homes. Following torrential rains, Onion Creek swept over its banks and inundated the surrounding community. At its peak, the rampaging water flowed at twice the force of Niagara Falls (source: USA Today).
While studying the flood site shortly afterwards, David Maidment, a professor of civil engineering at the University of Texas, ran into an old acquaintance, Harry Evans, chief of staff for the Austin Fire Department. Recognizing their shared interest in predicting and responding to floods, the two began collaborating on a system to bring flood forecasts and warnings down to the local level. The need was obvious: flooding claims more lives and costs more federal government money than any other category of natural disasters. A system that can predict local floods could help flood-prone communities prepare for and maybe even prevent catastrophic events like the Onion Creek deluge.
Soon, Maidment had pulled together other participants from academia, government, and industry to start the National Flood Interoperability Experiment (NFIE), with a goal of developing the next generation of flood forecasting for the United States. NFIE was designed to connect the National Flood Forecasting System with local emergency response and thereby create real-time flood information services.
The process of crunching data from the four federal agencies that deal with flooding (the US Geologic Survey, the National Weather Service, the US Army Corps of Engineers, and the Federal Emergency Management Agency) was a burden for even the best-equipped physical datacenter—but not for the almost limitless scalability of the cloud. Maidment submitted a successful proposal for a Microsoft Azure for Research Award, which provided the necessary storage and compute resources via Microsoft Azure, the company’s cloud-computing platform.
Today, NFIE is using Microsoft Azure to perform the statistical analysis necessary to compare present and past data from flood-prone areas and thus build prediction models. By deploying an Azure-based solution, the NFIE researchers can see what’s happening in real time and can collaborate from anywhere, sharing data from across the country. The system has also proved to be easy to learn: programmers had their computer model, RAPID (Routing Application for Parallel computation of Discharge) up and running after just two days of training on Azure. Moreover, the Azure cloud platform provides almost infinite scalability, which could be crucial as the National Weather Service is in the process of increasing its forecasts from 4,500 to 2.6 million locations. Of course, the greatest benefits of this Azure-based solution accrue to the public—to folks like those living along Onion Creek—whose property and lives might be spared by the timely prediction of floods.
—Dan Fay, Director: Earth, Energy, and Environment, Microsoft Research