Download Research Tools
Fifty Latin American researchers and former Microsoft Research interns and Fellows gathered at the Microsoft campus in Redmond, Washington, in July to participate in the LATAM Workshop. The goals of this research workshop: share research challenges and results and seek opportunities to work together across the Latin American region.
The event included presentations from representatives from the Microsoft Research-FAPESP Institute for IT Research in São Paulo, Brazil, and the Latin American and Caribbean Collaborative ICT Research (LACCIR) Federation. Representatives from Microsoft Research also participated in discussions and delivered presentations about advances in computing that can be applied to research challenges. The topics of this year’s event focused on how the computer sciences can be applied to micro-economies, health and wellbeing, climate change, bioenergy, biodiversity, and tropical ecosystems.
“The Latin American Workshop played a significant role in sharing our research findings and perspectives with each other; not only with researchers from our region but also with colleagues from Microsoft Research,” said Domingo Mery, a professor from Catholic University of Chile and conference presenter. “This is an excellent way to nurture collaboration in Latin America and the Caribbean. Many thanks for this opportunity!”
While all of the presentations were impressive, we have chosen two to highlight here today: “The Brazilian Biodiversity Database and Information System (SinBiota),” presented by Tiago Egger Moellwald Duque Estrada, Instituto Virtual da Biodiversidade, Programa Biota/FAPESP; and “Live Andes (Advanced Network for the Distribution of Endangered Species): A New Tool for Wildlife Conservation,” presented by Cristian Bonacic, associate professor, Ecosystems and Environment Department, Catholic University of Chile, Chile.
Session Highlight: The Brazilian Biodiversity Database and Information System: SinBiota
The BIOTA/FAPESP program (São Paulo’s State Foundation for Research Funding) was created 10 years ago to provide support for the São Paulo State Government to achieve the targets of the Convention on Biological Diversity. One of the essential components of the BIOTA/FAPESP program is the information system called SinBiota. This is a new version (currently prototype) of the first SinBiota system. It runs on Microsoft Silverlight, and uses Bing maps to provide environmental data visualization.
The system has not been significantly upgraded in its first 10 years. With the renovation of the Biota/FAPESP program, a new system is needed to fulfill the demands of researchers, educators, NGOs, and governmental agencies.“The workshop was an invaluable opportunity for researchers from São Paulo and their students to interact with colleagues from LACCIR and scientists from Microsoft Research,” said Carlos Henrique de Brito Cruz, scientific director, FAPESP. “We expect that high-impact scientific collaboration will follow.”
Session Highlight: Live ANDES (Advanced Network for the Distribution of Endangered Species): A New Tool for Wildlife ConservationSouth America is home to some of the richest and most diverse ecosystems in the world. However, many species of mammals, birds, reptiles, and amphibians in these ecosystems are in danger of extinction. Additionally, vast areas of land have been minimally explored by scientists to assess the population status of various species and to identify unknown species. Scientists and conservationists can greatly improve their understanding of endangered species through access to reports about the local wildlife from residents of these regions.
ICT tools that citizens can use in natural areas could provide conservation scientists with vital information to help them protect wildlife. The Live ANDES platform, which is a citizen science project, is helping to create a global conservation community in South America. Citizens can upload and share wildlife data (such as notes, videos, and audio of endangered species) with scientists. This project enables local residents to contribute to biodiversity conservation by providing scientists with much-needed wildlife data.
This platform is currently available in beta version and enables users to share information online. The platform was built on the Microsoft .NET Framework and the web solution uses technologies such as ASP.NET MVC, Bing Map Services, Windows Communication Foundation data services, Microsoft SQL Server 2008, the ADO.NET Entity Framework, and LINQ. The mobile solution is based on the .NET Compact Framework for Windows Phone 7.
In a second version of Live ANDES, the project team will focus on data sharing among academics and policy makers, which requires more advanced tools for assessing quality data and for data analysis, as well as user profiles that provide more details.
Graduate Student Participation
The response to these and other sessions was overwhelmingly positive. A key factor contributing to the workshop’s success was the participation of 20 graduate students who have worked as interns or Fellows at Microsoft Research. Some were Microsoft Research alumni and others are currently working with Microsoft Research. All were actively involved in research and the workshop exchanges.
This workshop was a wonderful opportunity for these students. Attending the workshop will help them with their research, and it will also help broaden their understanding of a wide range of technologies and approaches that will, in turn, support the advancement of their careers. The workshop also gave alumni a chance to reconnect and catch up with their Microsoft Research mentors.
—Juliana Salles, Senior Research Program Manager; Harold Javid, Director, Americas/ANZ Regional Programs; and Jaime Puente, Director, Latin America and Caribbean
Nearly a million children die from pneumonia each year, making it a leading cause of death and the single most important health issue facing children under the age of five. The standard vaccination schedule calls for three doses of pneumonia vaccine given at six weeks, 10 weeks, and 14 weeks of age. Naturally, the intention is to protect children from this disease as early as possible—but administering the vaccine at such an early age also reduces how long the vaccine protects the child.
The Oxford Vaccine Group is conducting a program in Nepal to determine if shifting the vaccination schedule can extend childhood immunity until the critical five-year point. For the trial, the team is scheduling the first two doses to be given at six weeks and 14 weeks, but the third dose is given much later, at eight months of age. The team is hopeful that delaying the final vaccination will protect children for much longer, thus reducing the mortality rate from this serious disease.
Building Solutions with Everyday Technology
One of the biggest problems in medical informatics is keeping track of the data. Researchers must meticulously log who collected each piece of information, how it was collected, and any associated details. Manually inputting this data takes time away from actual research and is prone to error, while incomplete entries may cause problems for other researchers who refer to the material later. A team that is working on software support for medical informatics at the University of Oxford’s Department of Computer Science is seeking ways to simplify the process and reduce the risk of errors.
With support from Microsoft Research, this team developed CancerGrid, a system to manage all the diverse data that are associated with a clinical trial. Each data item to be collected is associated with a clearly-defined semantic label so that the precise meaning will be clear to clinical staff, and researchers can be certain that any two trials that use the same semantic label for an item of data are recording exactly the same thing. This makes it possible to reuse and combine data, making each trial far more valuable to researchers. Windows Azure and Microsoft Excel, SharePoint, and InfoPath are used to collect and organize the data, providing easy and intuitive access to data and implementing rules to ensure that critical data is recorded consistently and accurately. Forms, databases, and the associated infrastructure for each new trial can be generated at the touch of a button, permitting the deployment of trial support infrastructure in a fraction of the time and a fraction of the cost of conventional methods.
It is this flexibility and automation that made it possible for CancerGrid to meet the needs of the Oxford Vaccine Group, rapidly generating full document management support for the Nepalese pneumonia vaccine trial. By using a secure Internet connection, researchers in Nepal now transmit data back to the University of Oxford, where it can be analyzed and the effectiveness of the new vaccination regime assessed. Working on CancerGrid has been a very satisfying collaboration for both the Oxford team and Microsoft Research. We are hopeful that it will prove to be a powerful tool in the fight against pneumonia and many other diseases.
—Simon Mercer, Director of Health and Wellbeing, Microsoft Research Connections
It’s long been known that many serious diseases—including heart disease, asthma, and many forms of cancer—run in families. Until fairly recently, however, medical researchers have had no easy way of identifying the particular genes that are associated with a given malady. Now genome-wide association studies, which take advantage of our ability to sequence a person’s DNA, have enabled medical researchers to statistically correlate specific genes to particular diseases.
Sounds great, right? Well, it is, except for this significant problem: to study the genetics of a particular condition, say heart disease, researchers need a large sample of people who have the disorder, which means that some these people are likely to be related to one another—even if it’s a distant relationship. This means that certain positive associations between specific genes and heart disease are false positives, the result of two people sharing a common ancestor rather than their sharing a common propensity for clogged coronaries. In other words, your sample is not truly random, and you must statistically correct for “confounding,” which was caused by the relatedness of your subjects.
This is not an insurmountable statistical problem: there are so-called linear mixed models (LMMs), which are models that can eliminate the confounding. Use of these, however, is a computational problem, because it takes an inordinately large amount of computer runtime and memory to run LMMs to account for the relatedness among thousands of people in your sample. In fact, the runtime and memory footprint that are required by these models scale as the cube and square of the number of individuals in the dataset, respectively. So, when you’re dealing with a 10,000-person sample, the cost of the computer time and memory can quickly become prohibitive. And it is precisely these large datasets that offer the most promise for finding the connections between genetics and disease.
Enter Factored Spectrally Transformed Linear Mixed Model (FaST-LMM), which is an algorithm for genome-wide association studies that scale linearly in the number of individuals in both runtime and memory use (see FaST linear mixed models for genome-wide association studies). Developed by Microsoft Research, FaST-LMM can analyze data for 120,000 individuals in just a few hours, whereas the current algorithms fail to run at all at even 20,000 individuals. This means that the large datasets that are indispensable to genome-wide association studies are now computationally manageable from a memory and runtime perspective.
With FaST-LMM, researchers will have the ability to analyze hundreds of thousands of individuals to look for relationships between our DNA and our traits, identifying not only what diseases we may get, but also which drugs will work well for a specific patient and which ones won’t. In short, it puts us one step closer to the day when physicians can provide each of us with a personalized assessment of our risk of developing certain diseases and can devise prevention and treatment protocols that are attuned to our unique hereditary makeup.
—David Heckerman, Distinguished Scientist, Microsoft Research Connections; Jennifer Listgarten, Researcher, Microsoft Research Connections