Download Research Tools
Twenty years; two decades; a fifth of a century—we can phrase it several ways, but what does it mean? To a person, it’s the onset of adulthood (or maybe the point marking only 10 more years of living in Mom and Dad’s basement); to a dog, it’s senescence. But to us at Microsoft Research, it marks the lifetime of our organization, which has grown and evolved in a remarkable era of transformation and innovation in computer science and scientific research.
Yes, Microsoft Research turns 20 this September, and in keeping with the tradition of honoring base-10 birthdays, this seems like an appropriate time to look back on some significant accomplishments and take stock in our future. Over the next four weeks, we will highlight some particularly noteworthy research: from using computing to better understand the body’s immune response to HIV and AIDS, to measuring and modeling complex ecosystems and global environment conditions, to tools that inspire and enable citizen-scientists around the world.
As you will see, the vast majority of these scientific advances were made possible because of joint efforts between Microsoft Research and academic, government, and industry scientists. Collaborative research is the sine qua non of my group, Microsoft Research Connections. We work with the world’s top academic and scientific researchers, institutions, and computer scientists to shape the future of computing in fields such as parallel programming, software engineering, natural user interfaces, and data-intensive scientific research. It is through the connection of dedicated researchers at Microsoft Research’s worldwide labs with the top minds in academia that we are able to push technology to tackle some of the world’s most pressing problems. Similarly, it is through our fellowships and grants that we are able to foster the next generation of world-class computer scientists.
As we look forward to our next 20 years, we do so with renewed vigor and a reaffirmed commitment to improve the world through basic and applied research in computer science and software engineering. Whether it’s the extension of the computer into people’s everyday lives through our research on natural user interfaces, or our ongoing efforts to create educational tools such as the WorldWide Telescope, or our quest to apply algorithms to solve the mysteries of disease, we will be guided by the words of Rick Rashid, who started Microsoft Research in September 1991 and today heads its worldwide operations:
"We are investing for the future, an insurance policy for the future. We’re doing things that, when we start, we don’t know if they are going to be successful. For us, it’s more about ideas and taking risks. Basic research is about agility. It’s about giving you the ability to change when you most need it."
The ability to change when you need it most… now there’s something to celebrate, for sure.
—Tony Hey, Corporate Vice President, Microsoft Research Connections
It’s long been known that many serious diseases—including heart disease, asthma, and many forms of cancer—run in families. Until fairly recently, however, medical researchers have had no easy way of identifying the particular genes that are associated with a given malady. Now genome-wide association studies, which take advantage of our ability to sequence a person’s DNA, have enabled medical researchers to statistically correlate specific genes to particular diseases.
Sounds great, right? Well, it is, except for this significant problem: to study the genetics of a particular condition, say heart disease, researchers need a large sample of people who have the disorder, which means that some these people are likely to be related to one another—even if it’s a distant relationship. This means that certain positive associations between specific genes and heart disease are false positives, the result of two people sharing a common ancestor rather than their sharing a common propensity for clogged coronaries. In other words, your sample is not truly random, and you must statistically correct for “confounding,” which was caused by the relatedness of your subjects.
This is not an insurmountable statistical problem: there are so-called linear mixed models (LMMs), which are models that can eliminate the confounding. Use of these, however, is a computational problem, because it takes an inordinately large amount of computer runtime and memory to run LMMs to account for the relatedness among thousands of people in your sample. In fact, the runtime and memory footprint that are required by these models scale as the cube and square of the number of individuals in the dataset, respectively. So, when you’re dealing with a 10,000-person sample, the cost of the computer time and memory can quickly become prohibitive. And it is precisely these large datasets that offer the most promise for finding the connections between genetics and disease.
Enter Factored Spectrally Transformed Linear Mixed Model (FaST-LMM), which is an algorithm for genome-wide association studies that scale linearly in the number of individuals in both runtime and memory use (see FaST linear mixed models for genome-wide association studies). Developed by Microsoft Research, FaST-LMM can analyze data for 120,000 individuals in just a few hours, whereas the current algorithms fail to run at all at even 20,000 individuals. This means that the large datasets that are indispensable to genome-wide association studies are now computationally manageable from a memory and runtime perspective.
With FaST-LMM, researchers will have the ability to analyze hundreds of thousands of individuals to look for relationships between our DNA and our traits, identifying not only what diseases we may get, but also which drugs will work well for a specific patient and which ones won’t. In short, it puts us one step closer to the day when physicians can provide each of us with a personalized assessment of our risk of developing certain diseases and can devise prevention and treatment protocols that are attuned to our unique hereditary makeup.
—David Heckerman, Distinguished Scientist, Microsoft Research Connections; Jennifer Listgarten, Researcher, Microsoft Research Connections
One of the core missions of Microsoft Research Connections is to support the creation of software tools that advance data-intensive science, especially those tools that are judged praiseworthy by their creators’ peers. With this in mind, we were pleased to present the first Microsoft Research Distinguished Artifact Award at ESEC/FSE 2011, the joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering.
This new, competitive award honors the most outstanding software tool submitted to the ESEC/FSE series of conferences. As explained in the call for submissions, the Distinguished Artifact competition is intended to reward creation of artifacts and replication of experiments. An Artifact Evaluation Committee was established to review the submissions and to formally recognize those artifacts that pass muster and fast-track them for additional presentation. Artifacts deemed especially meritorious were singled out for special recognition in the proceedings and at the conference, and the creators of the best artifact received a prize of US$1,000, a handsome certificate, and a memento from the Pacific Northwest, the last a reminder of their friends at Microsoft Research Connections in Redmond, Washington.
Professor Andreas Zeller (left) presents the award to Jérôme Vouillon (right) while Christian Bird (center) of Microsoft Research looks on.
So, are you wondering which artifact took home the big prize? Well, wonder no more: the winning artifact was Coinst, an application based on the paper “On Software Component Co-Installability,” by Jérôme Vouillon of CNRS and Roberto Di Cosmo of Université Paris Diderot and INRIA. Coinst resolves the common and frustrating problem of finding co-installation conflicts; what’s more, it does so in a scientifically strong manner (by using a theorem prover), and it runs very effectively. Coinst not only satisfies all the expectations established in the paper, but exceeds them in several ways: by working quickly, performing better than presented in the paper, finding real errors in installed systems, and rapidly identifying frustrating problems that the reviewers have encountered in their own computer usage.
Professor Andreas Zeller of the University of Saarland, the initiator of the award, spoke about its importance, noting that "Far too often, researchers publish their results, but keep their data and tools for themselves. In the long term, this hurts science, because one cannot reproduce results or build on the achievements of others. Vouillon and Di Cosmo make their tools widely available and usable, providing value not only for other researchers, but for everyone. This way, they act as role models for the research community. With this award, we are proud to recognize their extraordinary efforts."
The winners themselves had this to say: “Free software components are growing at an astonishing pace, and it is important to identify quality issues quickly. We show how to efficiently extract from huge collections of free software a compact representation that quickly identifies component incompatibilities that would go otherwise unnoticed for a long time. We are thrilled to provide a tool based on a sophisticated algorithm that has been machine checked and that paves the way for the large-scale analysis and visualization of software component collections."
Well done, Jérôme and Roberto.
—Judith Bishop, Director of Computer Science, Microsoft Research Connections