Download Research Tools
Imagine the informational and cultural isolation that can result if you don’t speak one of the world’s major languages. Think about how limited your Internet experience would be. This is a reality for billions of people worldwide, who find themselves cut off linguistically from this great knowledge resource.
A related problem affects millions of people whose primary fluency is in a major language but whose ancestral traditions arise from a different linguistic heritage. These people find themselves increasingly separated from their ancestral culture, which can only be fully appreciated through an understanding of its native tongue.
Seeking to bring the power of computing to bear on these problems, Microsoft Research is pleased to announce the launch of Microsoft Translator Hub. We’re extremely excited by the potential of this tool to provide meaningful machine translation of lower-resourced languages and to help researchers and others build more targeted language models. The value of the Hub was very apparent to me during two recent events I hosted on opposite sides of the world, the first in California, and the second in Nepal.
California Dreamin’—in Hmong
In late November 2011, Microsoft Research Connections hosted a two-day workshop on Hmong Language Preservation at California State University Fresno, during which the local Hmong community provided input on the White Hmong-English machine translator. (White Hmong, or Hmong Dao, is one of several Hmong dialects.) Hmong is one of the indigenous languages of the mountain people of Southeast Asia, thousands of whom now live in the United States, Australia, and France. As such, many of the Hmong have raised their children and grandchildren without the benefit of immersion in their traditional culture and language. Instead, they have focused on integration into the dominant language and culture of the societies in which they now live.
In general, the second generation grows up somewhat bilingual, speaking Hmong with their parents and other elders, but using English at school and work. When they have children, they speak to them in English. This means the third generation acquires only limited fluency in their ancestral tongue by listening to their grandparents speak with their parents. And given that Hmong has only recently become a written language—within the last 60 years—many of the fluent speakers may not be literate.
These factors have led to a critical and progressive decline in the language’s usage in Hmong communities in the United States, making language preservation a major concern for the Hmong. During the California workshop, Microsoft Research Connections, in collaboration with Professor Phong Yang, a linguist at Cal State Fresno, explored machine translation as a method to preserve the Hmong language and culture.
The participation of the Hmong community was outstanding. Community members of all ages, from children to grandparents, worked with the Machine Translator Hub’s Reviewer UI, offering suggestions and words of encouragement. Hopes were realistic: no one expected the computer to provide a perfect translation between Hmong and English. One amused Hmong parent observed that “it speaks ‘Hmonglish,’ just like my children.” The overall reaction was extremely positive, reflecting the community’s strong desire to preserve their language and culture.
A tangible outcome of the event, hard work by the Microsoft Translator team, and the continued efforts of the Fresno Hmong community is that Microsoft released a public version of Hmong on Bing Translator on February 21 in honor of International Mother Language Day.
Teaching Students to Scale Language Technology Peaks in Nepal
In Nepal, Microsoft Research Connections co-hosted a two-day "Nepali Language Preservation Workshop” in conjunction with Kathmandu University and the nonprofit organization Language Technology Kendra. The goal was to begin the process of strengthening Nepali’s position in today’s digital world, bringing it up to the level of major world languages and increasing access to non-Nepali language Internet content for monolingual Nepali speakers. These efforts expand the presence of Nepali in addition to keeping it vibrant. As a lower-resourced language with a large speaker population (more than 30 million), Nepali is an ideal candidate language for the Microsoft Translator Hub.
David Harrison, a professor of linguistics at Swarthmore College and one of the world’s foremost experts on endangered languages, and I led a session for linguists and translators that focused on reviewing translation quality and providing us with valuable feedback on the reviewer interface. Approximately 1,200 sentences were translated and edited on the first day, and more on the second. Participants reported a number of bugs and suggested improvements.
Meanwhile, in a parallel track, computer science students and educators met under the guidance of Microsoft researchers Christophe Poulain and Sundar Poudel. The purpose of this session was to teach tomorrow’s computer scientists and computer science educators how they can access the nascent Nepali translator model, being refined in the other session, through the Microsoft Translator APIs in a private workspace for automatic translation between Nepali and other languages. By training educators, we give them the tools to go back to their institutions and teach others how to develop web service translation applications, thereby growing young experts in the field of natural language processing.
The enthusiasm and productive work of the workshop participants affirmed that Nepali was an apt choice for the workshop. As one participant observed, "If we can translate Nepali, we can communicate with the outsider world easier." Another noted that “the rural people don't understand English, so if we give them a translator, they will feel good and [find it] easy to read information on foreign-language websites."
I firmly believe that translation systems that can engender community participation, such as Microsoft Translator Hub, can have a beneficial impact on reducing the decline of lower-resourced languages. But it takes a strong commitment by a community to make this a reality. Machine translation mimics how a human learns a new language. Like a person, the translation software needs materials to read comparatively in both languages. It has to be taught and makes mistakes, but it gets better and better as it gets more exposure to the new language (data). Building up that language data to give the system more exposure is one of the chief practical values of events such as these workshops, where the participants actually teach the computer how to speak their native language.
Whether helping to preserve the links to an ancestral culture or working to bring a language into the digital world, Microsoft Translator Hub demonstrates Microsoft’s ongoing engagement and commitment to creating positive social change through technology.
Take a look at the Microsoft Translator Hub website and ask for an invitation to participate.
—Kristin Tolle, Director, Natural Interactions, Microsoft Research Connections
In December, I blogged about the beta release of Layerscape, a free set of research tools from Microsoft that enable earth scientists to visualize and tell stories around large, complex data sets. The full release is now available to the public at Layerscape.
We’re calling Layerscape an “ecosystem” to emphasize its focus on earth science and to communicate that Layerscape’s research tools include a community-based content sharing website, powered by Windows Azure. I’m pretty excited about Layerscape because it offers researchers new ways of looking at lots and lots of data, both above and below the earth’s surface—but also because the community site provides a great venue for learning how people are actually using Layerscape. Our collaborators are starting to gain new insights into their data and make use of our communities to share and collaborate.
As a research program manager at Microsoft Research, I am fortunate to get to collaborate with scientists working out the enormous puzzle of how the Earth works as a system. Needless to say, it is complicated work to study this astonishing collection of interlocking components and their intricate interconnections. But Layerscape can help with this.
One of the important technical challenges for the environmental scientist is managing the flow and the visualization of research data. Layerscape harnesses your PC’s graphics processor to visualize large amounts of data—in space and in time. Layerscape could be used to render 3-D visualizations from such diverse data sets as historical surface temperature measurements, chlorophyll concentration, seismic activity, greenhouse gas diffusion, sea ice extent, wind patterns, ocean pH, insect biodiversity, aquifer storage, geothermal heat flux, antelope migratory patterns, or the transport of Saharan dust as it fuels plankton blooms across the surface of the Atlantic Ocean with nitrogen and iron. Layerscape can also create abstract visualizations where you do not necessarily need latitude and longitude—just coordinate axes.
In addition to rendering data in 3-D space and in time, Layerscape has what we call freedom of perspective and free narrative. You can place your virtual eye anywhere you like and connect a sequence of perspectives and automated transitions that emphasize what the data is doing and what story you want to communicate. Such storytelling is ideal for educational outreach, enabling you to share your results with the scientific community and the general public.
Layerscape consists of three parts. Part one is the WorldWide Telescope visualization engine, and part two is the website that supports communities of users and the content they (you!) generate. The third part is a tool for getting data into Layerscape. This tool is built on Microsoft Excel, so if your data is already in an Excel spreadsheet, you simply click a few buttons to send it to the visualization engine. The link from Excel is dynamic, meaning that as you change the data in Excel, your Layerscape rendering changes automatically.
Today, a number of scientists—from geologists to seismologists to oceanographers—are using Layerscape to study atmosphere circulation, validate climate models, and even unravel evolutionary patterns of seahorses, demonstrating the wide applicability of Layerscape. In recognition of today’s release of Layerscape, I’ll share some extended remarks from researchers who are already taking advantage of its unique capabilities.
Looking at oceans of data
James Bellingham, PhD, is chief technologist at the Monterey Bay Aquarium Research Institute, a position that puts him at the nexus of technology and ocean sciences. The ocean environment is complex, which means that James and his colleagues deal with lots of different kinds of data. As James says, “Many times, you can’t understand your biological measurements without understanding the ocean chemistry or perhaps understanding the physical ocean, the temperature and currents. So the real challenge is to somehow bring all of these disparate data sets together in a way that you can see the relationships.” James points out that forging such connections requires some way to make the data visual. “You need to be able to manipulate it and look at it from different ways. And that’s why we have become so excited about Layerscape.”
Layerscape allows James and his colleagues to manipulate the data in ways that were previously impossible. Instead of simply plotting the data and printing out graphs, the researchers can interact with the data. “Sometimes,” James notes, “we’re really more interested in a story, and Layerscape helps us tell stories. We could put a dataset in it and we can play it like it’s a movie…not just play a static movie, but actually play with the data in an interactive way.”
James is also excited by the long-distance collaborations that Layerscape can facilitate. “The problems we’re dealing with here are so big that no one organization has all of the people who can understand it. In the past, we’ve gotten together once a year and tried to make sense of the data. Now, using [Layerscape’s] data environment as the collaboration framework, there’s the prospect of real-time collaboration with a person in another city.”
You built your house where???
Mark Abbott, PhD, dean and professor, College of Earth, Ocean, and Atmospheric Sciences at Oregon State University, is another ocean scientist working with Layerscape. Like James Bellingham, Mark is excited by Layerscape’s ability to handle diverse and complex data types as well the opportunities it presents for collaboration. As Mark observes, the complexity of ocean data sets, which are derived from countless tiny sensors, results in a fragmented view, rather like “looking at the ocean through soda straws and trying to piece that together to understand how the ocean is behaving as an integrated system.”
“Layerscape offers the opportunity to look at a whole range of variables and overlay them in space and, eventually, in time, so you can see how these ocean landscapes, as it were, change and respond to forces in the environment,” says Mark.
Mark believes that Layerscape will help us understand how people and the environment interact. “Are people building homes and roads in areas where natural hazards, say, tsunamis or earthquakes or coastal flooding, might make them more vulnerable to disaster?” Mark sees Layerscape as a unique way to visualize and communicate such data, thereby helping policymakers and ordinary citizens make informed decisions about how and where to build new infrastructure.
Mark points out the value of tools like Layerscape in one of his current projects, the Ocean Observatories Initiative, which, among other things, will employ high-definition video cameras to make real-time observations of deep sea vents. “We’re really excited about looking at these real-time data streams, these enormous data streams, and applying new tools to make it easier for scientists to do their research,” he notes.
Maps in 3-D, maps in time
Finally, back on dry land—really dry land—there’s Lee Allison, state geologist and director at the Arizona Geological Survey. His agency plays a critical role in public policy decisions, using science to help keep people and property safe, to promote economic resource utilizations, and to protect the environment.
It’s a job Lee clearly enjoys. “Everything about this job is exciting,” he says. “We're exploring areas that have never been explored before. We're doing new things with technology that have never been done before. It's a chance to explore.”
He adds, “it's that interplay of being able to go out into the field and look at the rocks, discover things that people have never seen before, bring it back into the office and translate it—to tell people what it means to daily life.”
Lee points out that the age of discovery is alive and well, especially underground. “The subsurface is an area that's really never been fully explored. And that's what we're doing here.”
“Now, we're mapping the geology in 3-D and through 4-D, through geologic time. We're doing it by mapping the geology on the ground, but then using technology to go well beyond where we can travel as individuals.”
“And Layerscape is this incredible visualization tool that's coming along that's going to allow us to take all of these data that we bring together and view it in 3-D and be able to go in the subsurface and be able to fly around and look at it in ways that we've never been able to before. This whole concept of visualizing the data is revolutionizing the way not only we do our science, but the way we portray our science to the people who use that data. Not only the public, but to industry and government decision makers.”
From the ocean depths to the high desert, Layerscape is helping scientists visualize complex data, achieve new insights, collaborate with far-flung colleagues, and explain their work through narratives. Build your own virtual tours and discover the possibilities with Layerscape.
—Rob Fatland, Research Program Manager, Microsoft Research Connections
Youngsters love gadgets. So wouldn’t it be great if they could build their own, and at school? This is exactly what more than 70 British students, ages 13 to 16, are doing by using .NET Gadgeteer. On January 30, they gathered at the Microsoft Research Cambridge Lab to present their final projects and celebrate the end of the first .NET Gadgeteer school pilot project in the United Kingdom (UK).
Microsoft .NET Gadgeteer is a platform that allows you to rapidly create prototypes of small electronic gadgets and embedded hardware devices. It combines the advantages of object-oriented programming, solderless assembly of electronics using a kit of hardware modules, and the quick fabrication of a physical enclosure using computer-aided design. The fact that .NET Gadgeteer covers a variety of sophisticated computer science and engineering skills, but requires minimal prior knowledge, makes it especially suitable for school education.
The UK school pilot involved eight secondary schools from the counties of Cambridgeshire and Essex. It was launched with an initial training workshop for teachers on October 6, 2011. After initial training, the schools used .NET Gadgeteer GHI FEZ Spider Starter Kits and worked through eight lesson plans created by Dr. Sue Sentance of Anglia Ruskin University. Lessons included construction of a digital camera, a stopwatch, and a game. The course was taught during lunch or after school over a 10-week period. The final weeks of the course were spent on individual and group .NET Gadgeteer projects that were then presented at the celebratory January 30 event at Microsoft Research.
Some of the student inventions from the .NET Gadgeteer pilot project
The celebration included talks by Christopher Bishop on “Secrets of the Web” and Andrew Fitzgibbon on “Kinect: Solving an Impossible Problem.” It also featured a lunchtime demo session, consisting of .NET Gadgeteer school and research demos as well as demos of cutting-edge Microsoft Research technologies such as KinectFusion, HoloDesk and SecondLight, and a show-and-tell of each school’s projects.
Students from Comberton Village College present their group project.
Also attending the celebration was a team of students from The Greneway School, the winners of The Think Computer Science Great Gadgimagining! competition. Their winning entry, the Greneway Super Walking Stick, a cane “designed with elderly people in mind, helping them to keep their independence,” is intended to sense when the user may have fallen and notify an emergency contact. The walking stick was prototyped during the event by Greneway students, together with .NET Gadgeteer inventor Nicolas Villar, and presented at the end of the celebration.
The Greneway School team celebrates their winning entry, the Greneway Super Walking Stick.
The enthusiasm and dedication shown by the youngsters and teachers during the school pilot demonstrate the hunger for hands-on computer science in schools. The high level of student engagement is perhaps best captured in this message posted by one of the teachers in the pilot’s Edmodo group:
“First day back at school, the building is cold and miserable but 7 kids have turned up after school to start designing and making their own gadgets ready for 30th Jan! The atmosphere is amazing—two groups, one either side of a mobile whiteboard, planning and drawing their gadgets and code on either side. One group using polystyrene (and hair slides!) to construct the physical object ready for their code. Loving it!”
The .NET Gadgeteer pilot project aligns with the UK’s commitment to prioritize computer science education in schools, as spelled out by the Education Secretary, Michael Gove, in his speech at the BETT Show (see School ICT to be replaced by computer science programme).
We look forward to more schools, colleges, and universities utilizing .NET Gadgeteer to unleash their students’ creativity and enthusiasm in technology—in the UK, and beyond.
—Scarlet Schwiderski-Grosche, Program Manager, Microsoft Research Connections EMEA, and Steve Hodges, Principal Hardware Engineer, Microsoft Research Cambridge