Download Research Tools
Imagine the informational and cultural isolation that can result if you don’t speak one of the world’s major languages. Think about how limited your Internet experience would be. This is a reality for billions of people worldwide, who find themselves cut off linguistically from this great knowledge resource.
A related problem affects millions of people whose primary fluency is in a major language but whose ancestral traditions arise from a different linguistic heritage. These people find themselves increasingly separated from their ancestral culture, which can only be fully appreciated through an understanding of its native tongue.
Seeking to bring the power of computing to bear on these problems, Microsoft Research is pleased to announce the launch of Microsoft Translator Hub. We’re extremely excited by the potential of this tool to provide meaningful machine translation of lower-resourced languages and to help researchers and others build more targeted language models. The value of the Hub was very apparent to me during two recent events I hosted on opposite sides of the world, the first in California, and the second in Nepal.
California Dreamin’—in Hmong
In late November 2011, Microsoft Research Connections hosted a two-day workshop on Hmong Language Preservation at California State University Fresno, during which the local Hmong community provided input on the White Hmong-English machine translator. (White Hmong, or Hmong Dao, is one of several Hmong dialects.) Hmong is one of the indigenous languages of the mountain people of Southeast Asia, thousands of whom now live in the United States, Australia, and France. As such, many of the Hmong have raised their children and grandchildren without the benefit of immersion in their traditional culture and language. Instead, they have focused on integration into the dominant language and culture of the societies in which they now live.
In general, the second generation grows up somewhat bilingual, speaking Hmong with their parents and other elders, but using English at school and work. When they have children, they speak to them in English. This means the third generation acquires only limited fluency in their ancestral tongue by listening to their grandparents speak with their parents. And given that Hmong has only recently become a written language—within the last 60 years—many of the fluent speakers may not be literate.
These factors have led to a critical and progressive decline in the language’s usage in Hmong communities in the United States, making language preservation a major concern for the Hmong. During the California workshop, Microsoft Research Connections, in collaboration with Professor Phong Yang, a linguist at Cal State Fresno, explored machine translation as a method to preserve the Hmong language and culture.
The participation of the Hmong community was outstanding. Community members of all ages, from children to grandparents, worked with the Machine Translator Hub’s Reviewer UI, offering suggestions and words of encouragement. Hopes were realistic: no one expected the computer to provide a perfect translation between Hmong and English. One amused Hmong parent observed that “it speaks ‘Hmonglish,’ just like my children.” The overall reaction was extremely positive, reflecting the community’s strong desire to preserve their language and culture.
A tangible outcome of the event, hard work by the Microsoft Translator team, and the continued efforts of the Fresno Hmong community is that Microsoft released a public version of Hmong on Bing Translator on February 21 in honor of International Mother Language Day.
Teaching Students to Scale Language Technology Peaks in Nepal
In Nepal, Microsoft Research Connections co-hosted a two-day "Nepali Language Preservation Workshop” in conjunction with Kathmandu University and the nonprofit organization Language Technology Kendra. The goal was to begin the process of strengthening Nepali’s position in today’s digital world, bringing it up to the level of major world languages and increasing access to non-Nepali language Internet content for monolingual Nepali speakers. These efforts expand the presence of Nepali in addition to keeping it vibrant. As a lower-resourced language with a large speaker population (more than 30 million), Nepali is an ideal candidate language for the Microsoft Translator Hub.
David Harrison, a professor of linguistics at Swarthmore College and one of the world’s foremost experts on endangered languages, and I led a session for linguists and translators that focused on reviewing translation quality and providing us with valuable feedback on the reviewer interface. Approximately 1,200 sentences were translated and edited on the first day, and more on the second. Participants reported a number of bugs and suggested improvements.
Meanwhile, in a parallel track, computer science students and educators met under the guidance of Microsoft researchers Christophe Poulain and Sundar Poudel. The purpose of this session was to teach tomorrow’s computer scientists and computer science educators how they can access the nascent Nepali translator model, being refined in the other session, through the Microsoft Translator APIs in a private workspace for automatic translation between Nepali and other languages. By training educators, we give them the tools to go back to their institutions and teach others how to develop web service translation applications, thereby growing young experts in the field of natural language processing.
The enthusiasm and productive work of the workshop participants affirmed that Nepali was an apt choice for the workshop. As one participant observed, "If we can translate Nepali, we can communicate with the outsider world easier." Another noted that “the rural people don't understand English, so if we give them a translator, they will feel good and [find it] easy to read information on foreign-language websites."
I firmly believe that translation systems that can engender community participation, such as Microsoft Translator Hub, can have a beneficial impact on reducing the decline of lower-resourced languages. But it takes a strong commitment by a community to make this a reality. Machine translation mimics how a human learns a new language. Like a person, the translation software needs materials to read comparatively in both languages. It has to be taught and makes mistakes, but it gets better and better as it gets more exposure to the new language (data). Building up that language data to give the system more exposure is one of the chief practical values of events such as these workshops, where the participants actually teach the computer how to speak their native language.
Whether helping to preserve the links to an ancestral culture or working to bring a language into the digital world, Microsoft Translator Hub demonstrates Microsoft’s ongoing engagement and commitment to creating positive social change through technology.
Take a look at the Microsoft Translator Hub website and ask for an invitation to participate.
—Kristin Tolle, Director, Natural Interactions, Microsoft Research Connections
In December, I blogged about the beta release of Layerscape, a free set of research tools from Microsoft that enable earth scientists to visualize and tell stories around large, complex data sets. The full release is now available to the public at Layerscape.
We’re calling Layerscape an “ecosystem” to emphasize its focus on earth science and to communicate that Layerscape’s research tools include a community-based content sharing website, powered by Windows Azure. I’m pretty excited about Layerscape because it offers researchers new ways of looking at lots and lots of data, both above and below the earth’s surface—but also because the community site provides a great venue for learning how people are actually using Layerscape. Our collaborators are starting to gain new insights into their data and make use of our communities to share and collaborate.
As a research program manager at Microsoft Research, I am fortunate to get to collaborate with scientists working out the enormous puzzle of how the Earth works as a system. Needless to say, it is complicated work to study this astonishing collection of interlocking components and their intricate interconnections. But Layerscape can help with this.
One of the important technical challenges for the environmental scientist is managing the flow and the visualization of research data. Layerscape harnesses your PC’s graphics processor to visualize large amounts of data—in space and in time. Layerscape could be used to render 3-D visualizations from such diverse data sets as historical surface temperature measurements, chlorophyll concentration, seismic activity, greenhouse gas diffusion, sea ice extent, wind patterns, ocean pH, insect biodiversity, aquifer storage, geothermal heat flux, antelope migratory patterns, or the transport of Saharan dust as it fuels plankton blooms across the surface of the Atlantic Ocean with nitrogen and iron. Layerscape can also create abstract visualizations where you do not necessarily need latitude and longitude—just coordinate axes.
In addition to rendering data in 3-D space and in time, Layerscape has what we call freedom of perspective and free narrative. You can place your virtual eye anywhere you like and connect a sequence of perspectives and automated transitions that emphasize what the data is doing and what story you want to communicate. Such storytelling is ideal for educational outreach, enabling you to share your results with the scientific community and the general public.
Layerscape consists of three parts. Part one is the WorldWide Telescope visualization engine, and part two is the website that supports communities of users and the content they (you!) generate. The third part is a tool for getting data into Layerscape. This tool is built on Microsoft Excel, so if your data is already in an Excel spreadsheet, you simply click a few buttons to send it to the visualization engine. The link from Excel is dynamic, meaning that as you change the data in Excel, your Layerscape rendering changes automatically.
Today, a number of scientists—from geologists to seismologists to oceanographers—are using Layerscape to study atmosphere circulation, validate climate models, and even unravel evolutionary patterns of seahorses, demonstrating the wide applicability of Layerscape. In recognition of today’s release of Layerscape, I’ll share some extended remarks from researchers who are already taking advantage of its unique capabilities.
Looking at oceans of data
James Bellingham, PhD, is chief technologist at the Monterey Bay Aquarium Research Institute, a position that puts him at the nexus of technology and ocean sciences. The ocean environment is complex, which means that James and his colleagues deal with lots of different kinds of data. As James says, “Many times, you can’t understand your biological measurements without understanding the ocean chemistry or perhaps understanding the physical ocean, the temperature and currents. So the real challenge is to somehow bring all of these disparate data sets together in a way that you can see the relationships.” James points out that forging such connections requires some way to make the data visual. “You need to be able to manipulate it and look at it from different ways. And that’s why we have become so excited about Layerscape.”
Layerscape allows James and his colleagues to manipulate the data in ways that were previously impossible. Instead of simply plotting the data and printing out graphs, the researchers can interact with the data. “Sometimes,” James notes, “we’re really more interested in a story, and Layerscape helps us tell stories. We could put a dataset in it and we can play it like it’s a movie…not just play a static movie, but actually play with the data in an interactive way.”
James is also excited by the long-distance collaborations that Layerscape can facilitate. “The problems we’re dealing with here are so big that no one organization has all of the people who can understand it. In the past, we’ve gotten together once a year and tried to make sense of the data. Now, using [Layerscape’s] data environment as the collaboration framework, there’s the prospect of real-time collaboration with a person in another city.”
You built your house where???
Mark Abbott, PhD, dean and professor, College of Earth, Ocean, and Atmospheric Sciences at Oregon State University, is another ocean scientist working with Layerscape. Like James Bellingham, Mark is excited by Layerscape’s ability to handle diverse and complex data types as well the opportunities it presents for collaboration. As Mark observes, the complexity of ocean data sets, which are derived from countless tiny sensors, results in a fragmented view, rather like “looking at the ocean through soda straws and trying to piece that together to understand how the ocean is behaving as an integrated system.”
“Layerscape offers the opportunity to look at a whole range of variables and overlay them in space and, eventually, in time, so you can see how these ocean landscapes, as it were, change and respond to forces in the environment,” says Mark.
Mark believes that Layerscape will help us understand how people and the environment interact. “Are people building homes and roads in areas where natural hazards, say, tsunamis or earthquakes or coastal flooding, might make them more vulnerable to disaster?” Mark sees Layerscape as a unique way to visualize and communicate such data, thereby helping policymakers and ordinary citizens make informed decisions about how and where to build new infrastructure.
Mark points out the value of tools like Layerscape in one of his current projects, the Ocean Observatories Initiative, which, among other things, will employ high-definition video cameras to make real-time observations of deep sea vents. “We’re really excited about looking at these real-time data streams, these enormous data streams, and applying new tools to make it easier for scientists to do their research,” he notes.
Maps in 3-D, maps in time
Finally, back on dry land—really dry land—there’s Lee Allison, state geologist and director at the Arizona Geological Survey. His agency plays a critical role in public policy decisions, using science to help keep people and property safe, to promote economic resource utilizations, and to protect the environment.
It’s a job Lee clearly enjoys. “Everything about this job is exciting,” he says. “We're exploring areas that have never been explored before. We're doing new things with technology that have never been done before. It's a chance to explore.”
He adds, “it's that interplay of being able to go out into the field and look at the rocks, discover things that people have never seen before, bring it back into the office and translate it—to tell people what it means to daily life.”
Lee points out that the age of discovery is alive and well, especially underground. “The subsurface is an area that's really never been fully explored. And that's what we're doing here.”
“Now, we're mapping the geology in 3-D and through 4-D, through geologic time. We're doing it by mapping the geology on the ground, but then using technology to go well beyond where we can travel as individuals.”
“And Layerscape is this incredible visualization tool that's coming along that's going to allow us to take all of these data that we bring together and view it in 3-D and be able to go in the subsurface and be able to fly around and look at it in ways that we've never been able to before. This whole concept of visualizing the data is revolutionizing the way not only we do our science, but the way we portray our science to the people who use that data. Not only the public, but to industry and government decision makers.”
From the ocean depths to the high desert, Layerscape is helping scientists visualize complex data, achieve new insights, collaborate with far-flung colleagues, and explain their work through narratives. Build your own virtual tours and discover the possibilities with Layerscape.
—Rob Fatland, Research Program Manager, Microsoft Research Connections
Last year, women accounted for only 14 percent of computer science college graduates in the United States, according to the Computing Research Association. That’s down from 35 percent in 1985, despite U.S. Labor Department statistics that show computing to be among the fastest-growing, most in-demand fields, with too few qualified candidates to fill the available openings. In addition, studies reveal that executives value the variety of perspectives that comes with team diversity, yet another reason for needing greater female participation in computing careers.
As a technology company and innovation leader, Microsoft is passionate about increasing the participation of women in computing, which means attracting more female students to science, technology, engineering, and math (STEM) programs. CEO Steve Ballmer has acknowledged this need, observing that “…we need to keep more women interested longer in their lives in STEM subjects.” We know this will require a concerted effort across private companies, NGOs, IGOs, government, and academia. We recognize that it’s vital for young women to get support during their undergraduate and graduate studies and to be exposed to opportunities in computer science, which is why Microsoft Research is proud to support the NCWIT Academic Alliance Seed Fund and to fund the Microsoft Research Graduate Women’s Scholarship.
I remember my first year of college engineering studies: I took Computer Science 101, studying PASCAL. I found it extremely boring, and I had no idea what careers were available in computer science, even though I was working at the school’s computer center where I supported students in computer labs, installed network cards into student computers, and helped the IT staff build the university’s firewall. At the time, I had no idea these duties, which I really enjoyed, were potential careers in computer science. After being approached by one of the professors to conduct research on building an animatronic bison for the engineering department, I decided to focus my energies on mechanical engineering and robotics. I didn’t realize that robotics could be part of the computer science world. A future in computer science engineering seemed out of the question—so there I was: one less woman in computer science.
Today, I want to do everything possible so that young women don’t make the same mistakes as I did. It is critical for us at Microsoft Research to familiarize young women with the amazing career opportunities in computing. In furtherance of that goal, I would like to highlight the programs and recipients of this year’s NCWIT Academic Alliance Seed Fund and the Microsoft Research Graduate Women’s Scholarship.
NCWIT is a national coalition of more than 200 prominent corporations, academic institutions, government agencies, and nonprofits working to strengthen the technology workforce and cultivate innovation by increasing the participation of women. Its Academic Alliance brings together more than 250 distinguished representatives from the computer science and IT departments of colleges across the country, spanning research universities, community colleges, women’s colleges, and minority-serving institutions. In 2007, Microsoft Research initiated the Seed Fund in partnership with NCWIT Academic Alliance. The NCWIT Academic Alliance Seed Fund provides U.S. academic institutions with funds (up to US$15,000 per project) to develop and implement initiatives for recruiting and retaining women in computer science and information technology fields of study. To date, the Seed Fund has awarded US$315,450. In partnership with NCWIT Academic Alliance, we would like to announce the 2012 winners:
In addition, we know that a woman’s first two years of computer science graduate study are the most critical. During this time, she must determine her area of focus, increase her confidence in the field, enhance her capabilities in publishing and research, and build her network. This is why Microsoft Research created the Women’s Graduate Scholarship, which provides a US$15,000 stipend plus a US$2,000 travel and conference allowance to women in their second year of graduate study (at a U.S. or Canadian university), helping them gain visibility in their departments, acquire mentorship, and cover the burgeoning cost of graduate programs. Winners of the 2012 Microsoft Research Graduate Scholarship are:
Congratulations to all the winning programs and students. We look forward to great things from 2012’s women in computing.
—Rane Johnson-Stempson, Education and Scholarly Communication Principal Research Director, Microsoft Research Connections