Download Research Tools
Think about supercomputers of the recent past. Just 15 years ago, supercomputers were rare and exotic machines. Government laboratories in the United States and Japan spent hundreds of millions of dollars on custom computing rigs and specialized facilities to house them, in a bid to tackle the world’s toughest problems.
But now there is an alternative that is more attractive for scientists and businesses. Today, you can rent supercomputing horsepower by the hour online from public cloud providers. Amazing.
Windows Azure can help ensure that you’re not paying more than you can afford for your supercomputing time and it makes overall management of large-scale computations very simple. Unlike other cloud providers, Windows Azure has no virtual memory (VM) image you need to manage or store in your account; with tens of thousands of instances, this could add up—both from a management and cost standpoint. And Windows Azure provides the operating system for you (and keeps it up to date with patches)—you just copy your application to Windows Azure and run it in the cloud.
The Microsoft HPC Pack 2012 (a free download that will be available from the Microsoft Download Center later this year) makes it very easy to manage compute resources and schedule your jobs in Windows Azure. You take the proven cluster management tool from Windows Server, connect it to Windows Azure, and then let it do the work. All you need to get started is a Windows Azure account. A set-up wizard takes care of the preparation, and the job scheduler runs your computations.
What’s more, there’s no commitment: you can pay as you go, or you can negotiate a discount if you are going to use a lot of core hours. As Bill Hilf, general manager of product management for Windows Azure observes, it’s easy to manage a wide range of sizes and types of workloads on Windows Azure. Like Bill, we, too, are extremely enthusiastic about the possibilities offered by the supercomputing prowess of Windows Azure. Such massive computational power is critical for “big data” studies that increase our understanding of complex systems.
The genome-wide association study (GWAS) is a case in point. Microsoft Research conducted a 27,000-core run on Windows Azure to crunch data from this study. With the nodes busy for 72 hours, 1 million tasks were consumed—the equivalent of approximately 1.9 million compute hours. If the same computation had been run on an 8-core system, it would have taken 25 years to complete!
The GWAS offers a powerful approach to identifying genetic markers that are associated with human diseases. It used data from a Wellcome Trust study of the British population, which examined some 2,000 individuals and a shared set of about 13,000 controls for each of seven major diseases. But as in all genome-wide association studies, this study had to overcome this significant problem: to study the genetics of a particular condition, say heart disease, researchers need a large sample of people who have the disorder, which means that some of these people are likely to be related to one another—even if it’s a distant relationship. This means that certain positive associations between specific genes and heart disease are false positives, the result of two people sharing a common ancestor rather than sharing a common propensity for clogged coronaries. In other words, your sample is not truly random, and you must statistically correct for “confounding,” which was caused by the relatedness of your subjects.
This is not an insurmountable statistical problem: there are so-called linear mixed models (LMMs) that can eliminate the confounding. Use of these, however, is a computational problem, because it takes an inordinately large amount of computer runtime and memory to run LMMs to account for the relatedness among thousands of people in your sample. In fact, the runtime and memory footprint that are required by these models scale as the cube and square of the number of individuals in the dataset, respectively. So, when you’re dealing with a 10,000-person sample, the cost of the computer time and memory can quickly become prohibitive. And it is precisely these large datasets that offer the most promise for finding the connections between genetics and disease.
To avoid this computational roadblock, Microsoft Research developed the Factored Spectrally Transformed Linear Mixed Model (better known as FaST-LMM), an algorithm that extends the ability to detect new biological relations by using data that is several orders of magnitude larger. It allows much larger datasets to be processed and can, therefore, detect more subtle signals in the data.
By using Windows Azure, Microsoft Research ran FaST-LMM on data from the Wellcome Trust, analyzing 63,524,915,020 pairs of genetic markers, looking for interactions among these markers for bipolar disease, coronary artery disease, hypertension, inflammatory bowel disease (Crohn’s disease), rheumatoid arthritis, and type I and type II diabetes. The result: the discovery of new associations between the genome and these diseases—discoveries that could presage potential breakthroughs in prevention and treatment.
Results from individual pairs and the FaST-LMM algorithm are available via online query in Epistasis GWAS for 7 common diseases in the Windows Azure Marketplace (free access), so researchers can independently validate results that they find in their lab.
Today’s smartphones have put a computer in your pocket. Now, with cloud computing through Window Azure, you have a supercomputer in your—well, not in your pocket, but probably within your budget. Whatever your big-data concerns, Windows Azure can provide supercomputing power at an affordable price.
—David Heckerman, Distinguished Scientist, Microsoft Research; Robert Davidson, Principal Software Architect, Microsoft Research, eScience; Carl Kadie, Principal Research Software Design Engineer, Microsoft Research, eScience; Jeff Baxter, Development Lead, Windows HPC, Microsoft; Jennifer Listgarten, Researcher, Microsoft Research Connections; and Christoph Lippert, Researcher, Microsoft Research Connections
As computer scientists, we have the privilege of working on challenging problems, the solutions to which can markedly improve lives—and in some cases, even save them. It is just such a challenge that Senior Researcher Antonio Criminisi and his team at Microsoft Research Cambridge have undertaken, as they strive to develop software to help physicians more accurately and rapidly identify the anatomy of aggressive brain tumors, a feat that will enable better-targeted therapy.
Brain-scan images depicting tumors
As described in the feature article, “Coming to the Aid of Brain-Tumor Patients,” Antonio and his colleagues are using decision forests, an innovation in machine learning, to speed up and potentially fully automate the now time-consuming process of creating a 3-D image of brain tumors. This work has the potential to drastically reduce the amount of time a highly-trained radiotherapist needs to spend processing medical images, saving time and money in clinical care and, most importantly, getting patients into the most appropriate therapy at the soonest possible moment. Moreover, this technology—which, by the way, enables the Kinect sensor to identify players in Xbox video games—could be applied to many other challenges in medical image analysis.
To support this broader exploration, Microsoft Research Connections is establishing a medical imaging initiative, designed to compile a large, well-annotated, and sharable collection of medical images for the purpose of comparing and improving the algorithms that analyze them. Scientific advances often rely on such comparisons of different experimental approaches, which enable us to determine which is the most effective. Based on our results, over the coming year, we plan to begin providing the tools needed to accelerate innovation in the field of medical imaging. I will use this blog to provide further details of this initiative as it unfolds throughout the approaching months.
—Simon Mercer, Director of Health and Wellbeing, Microsoft Research Connections
The best way to describe how I’m feeling is deeply honored and emotionally moved. This is the feeling I get every time we start a Microsoft Translator Hub project in language preservation or translation because it is always an honor and privilege to work on preserving a language. Whether it’s in Fresno, California, working to preserve Hmong, or in distant Dhulikhel, Nepal, working to provide translations for Nepali, the feeling’s the same—a visceral sense of making an impact. I can attest that this feeling is a distinct benefit of being a part of the Microsoft Research Connections team.
The last week of September, I visited the Mexican states of Yucatan and Quintana Roo—or more accurately, I was warmly welcomed to these homelands of the Mayan people. Together with my colleagues Erick Stephens, director of technology at Microsoft Mexico, and Adrian Hernandez Becerril, a program manager at Microsoft Mexico, I came to the Universidad Intercultural Maya de Quintana Roo to finalize a project to preserve the Mayan language. Our visit marked the culmination of months-long discussions with the university and various government officials and was, in my opinion, a significant day on any calendar (more on calendars below). The future of the Mayan language is uncertain. University president Francisco Javier Rosado-May said it best when we first spoke back in May at the 2012 Latin American Faculty Summit in Cancun: “If we do not do anything to stop it, Mayan will be extinct within two generations.” President Rosado-May is extremely motivated to turn the tide, to change the future of the Mayan language, and his enthusiasm is infectious. So we and our partners in Microsoft Mexico decided to sponsor a project at his university along with Assistant Professor Martin Esquivel-Pat, to enable Mayan to survive the present and leap into the next b'ak'tun (in other words, the next long cycle of the Mayan calendar). For those of you who are concerned that the Mayan calendar predicts the end of the world this December 20, let me assure you, as my hosts in Quintana Roo assured me, this is simply the end of a time period in the Mesoamerican Long Count calendar developed by the Mayans—a timekeeper more accurate than our own Julian calendar, by the way. What the Mayans say is that this December 21, we will be starting the next b'ak'tun, and with that, we hope, an era where Mayan remains a viable language for generations to come.
On arrival at the university, we were greeted by Javier Díaz Carvajal, head of the Secretariat of Economic Development for Quintana Roo, who, on behalf of the governor, extended me the honor of being made an “adopted citizen” of Quintana Roo. Afterwards, we signed an agreement with the government and the university to work on developing a Mayan language translation system that is solely built by the community and shared only when they decide to do so. And that is the real benefit of the Microsoft Translator Hub: it places the power of developing automatic translation models into the hands of the community where it belongs.
For the remainder of the day and the one that followed, we gave presentations and trained our hosts, professional translators, and students at the university on using the Microsoft Translator system, both through the Hub interface (which any bilingual person can use with a little training) and programmatically (which requires some technical knowledge). The latter is significant, as the university is looking to establish a computer and information science program, and this programmatic work with the Microsoft Translator Hub can help them build expertise in this area. My colleagues and I wanted to assist them in this endeavor in every way possible. But back to building the language translation system. Microsoft Translator Hub makes the process easy, but it still takes time and commitment from the community—it doesn’t just happen overnight. It took our partners at California State University of Fresno and the Hmong Language Partners more than seven months to collect and add enough parallel data (between Hmong and English), upload it to the system, train, build, and release the Hmong translator.
We got a preview of how the Mayan translation system might work at a workshop we ran in Quintana Roo—which focused largely on building a translator system between Spanish and Yucatec (a local Mayan dialect). Participants employed another distinguishing feature of the Microsoft Translator Hub that enables you to build translation systems directly between any two languages instead of pivoting (and propagating errors) through English. How long it will take to build a functional Mayan translator is unknown right now, but I know the community is very motivated to get it done early in the next b'ak'tun!
I believe it is vital to future of the human race that we remember and preserve our past. My colleagues and I are thrilled to have the opportunity to play even a small part in making that happen.
—Kristin Tolle, Director, Natural User Interactions Team, Microsoft Research Connections