Download Research Tools
Antibiotics, antivirals, NSAIDs—the list of modern “wonder drugs” goes on and on. And yet many diseases remain resistant to drug therapy, and in other instances, the side effects of drug treatment are as bad as or worse than the disorder. Why, the public wonders, aren’t more new and better drugs coming to market?
The answer, in a word, is cost. Modern drug discovery involves identifying likely candidates and then screening them for biological efficacy and potential toxicity. This process is enormously, often prohibitively, expensive.
Toxicity prediction in particular remains one of the great challenges of drug discovery. Even after decades of unprecedented funding, scientists still struggle to predict the toxic side effects for any given compound. Traditional statistical models that are based on empirical data, while wonderful in theory, have one key shortcoming. Unless researchers have access to either a state-of-the-art corporate datacenter or one of the world’s few supercomputers, there’s just too much data to analyze efficiently. The identification of compounds that will cause a desired biological effect requires a huge investment in technical infrastructure.
At least it did until recently. Now, the power of cloud computing offers a relatively inexpensive alternative to the huge up-front costs of building out a high-powered computing infrastructure. Researchers from Molplex, a small drug-discovery company; Newcastle University; and Microsoft Research Connections are working together to use cloud computing to help scientists across the globe deliver new medicines faster and at lower cost. This collaborative partnership has helped Molplex develop Clouds Against Disease, an offering of high-quality drug discovery services based on a new molecular discovery platform that draws its power from Windows Azure.
The Clouds Against Disease computational platform runs algorithms to calculate, rapidly, the numerical properties of molecules. As a result, Molplex has been able to produce drug discovery results on a much larger scale than has ever been seen before.
The Molplex method enables researchers to address practical issues when screening compounds. Will the compound be toxic? Will it pass safely through the human intestine? Will it stay in the body long enough? The Molplex process features extreme front loading that identifies viable drug candidates early in the research process. Contrast this with the traditional approach, which involves a great deal of up-front experimental work that is wasted when the researchers later learn that the hoped-for drug is toxic.
Access to Windows Azure, Microsoft’s cloud platform, was critical to the success of Clouds Against Disease. Molplex can take advantage of 100 or more Windows Azure nodes, which are in effect virtual servers, to process data rapidly. The physical-world alternative would be to source, purchase, provision, and then manage 100 or more physical servers, which represents a significant financial investment. Scientists taking this traditional approach would have to raise hundreds of thousands, or even millions of dollars before they could begin drug research. That’s a huge barrier for scientists around the world who want to engage in drug discovery. Windows Azure helps to eliminate start-up costs by allowing new companies to pay for only what they use in computing resources.
One of the biggest potential impacts of Clouds Against Disease lies in its ability to make drug discovery affordable for tropical diseases and niche disorders—categories that have long been low priority for drug companies, due to their limited commercial payoff. The requirement of a multi-million dollar investment before even going into the clinic doesn’t work for scientists studying drugs to combat such diseases. Radically reducing the cost of drug discovery makes it feasible for scientists to tackle these scourges and bring hope to countless sufferers around the world.
—Fabrizio Gagliardi, Cloud Engagements in EU, Microsoft Research Connections
When wildfires strike, all eyes turn to the clouds, hoping for a downpour that will quench the flames. Now, wildfire prevention teams on the Greek island of Lesvos are looking to a different kind of cloud for help, thanks to the VENUS-C Fire application and the computing power of Windows Azure.
The Fire app determines the daily wildfire risk on Lesvos during the months of May to October, when the annual dry season turns the island’s forests into a tinder box. The application not only alerts fire prevention teams of the risk, it also enables firefighters to design and coordinate an effective response when a wildfire breaks out. As a result, the island’s fire prevention personnel have been better prepared to predict, respond to, and stop fires, preventing potential loss of life and property.
The Fire app integrates Bing Maps, Microsoft Silverlight, and Windows Azure in a single system that enables users to see the potential of an emerging fire
Developed by the Geography of Natural Disasters Laboratory at the University of Aegean in Greece, the Fire app is designed to calculate and visualize the risk of wildfire ignition and to simulate fire propagation. The end users are primarily emergency responders, including the fire service, fire departments, and civil protection agencies that address wildfires on the island of Lesvos.
The app was built with functionality from multiple resources, giving it both technological depth and a visual interface that is accessible to non-technical users. It integrates Bing Maps, Microsoft Silverlight, and Windows Azure in a single system that enables users to see the potential of an emerging fire.
All of the Fire app’s data is stored in the cloud via Windows Azure. And a lot of data it is, including information on topography, vegetation, weather patterns, and past fire patterns. This is “big data,” and crunching it requires the computing power of a large cloud infrastructure, such as Windows Azure.
Professor Kostas Kalabokidis of University of the Aegean calls Windows Azure essential to the app, noting that “the cloud provides us with the necessary processing power and storage that is required. That means the real end users for the fire department do not need to have any huge processing power or storage capabilities locally.” Indeed, on the end-user side, all that’s needed to access the tool is a regular computer or laptop, an Internet connection, and a web browser that supports Silverlight.
The Geography of Natural Disasters Laboratory team built the Fire app in 2011. Microsoft Research partnered with the lab during the development phase, providing funding, high-performance computing resources, and cloud computing infrastructure. As part of that collaboration, Microsoft built a tool called the Generic Worker (GW) that greatly simplified the challenges faced by Kalabokidis’ team.
GW was critical, according to Professor Kalabokidis, who states that “Generic Worker provides a robust environment for job execution that fulfilled the requirements of the University of the Aegean’s scenario for running forest fire risk and fire propagation models in the cloud. GW provides interoperability through OGF [Open Grid Forum] Basic Execution Service, which is very important in the Aegean scenario to execute tasks in a hybrid cloud environment, such as VMs [virtual machines] of different cloud solutions. Furthermore, GW provides scalability: for example, VMs are increased or decreased according to the needs of deployment. Users are also notified about the status of the job, which is important for the execution of the fire propagation simulation.”
The Fire app is just one of many big data projects that benefit from Windows Azure’s scalability, storage capacity, and computational power. There’s no question but that cloud computing is having a significant impact throughout the research world, as information from instruments, online sources, and social media are combining to create a data tsunami. This has ushered in the era of data-intensive science—what the late Jim Gray predicted would be the Fourth Paradigm of scientific research—and Windows Azure is in the forefront of making it possible.
Cloud computing, and the processing power that accompanies it, has made it possible for researchers to reduce processing job times from months to just hours. The thing that excites me about my job is the possibility that we can change the way science is conducted. I believe that cloud computing is a revolutionary change in an era of big data and the exploration of large data collections.
—Dennis Gannon, Director of Cloud Research Strategy, Microsoft Research Connections
Think about supercomputers of the recent past. Just 15 years ago, supercomputers were rare and exotic machines. Government laboratories in the United States and Japan spent hundreds of millions of dollars on custom computing rigs and specialized facilities to house them, in a bid to tackle the world’s toughest problems.
But now there is an alternative that is more attractive for scientists and businesses. Today, you can rent supercomputing horsepower by the hour online from public cloud providers. Amazing.
Windows Azure can help ensure that you’re not paying more than you can afford for your supercomputing time and it makes overall management of large-scale computations very simple. Unlike other cloud providers, Windows Azure has no virtual memory (VM) image you need to manage or store in your account; with tens of thousands of instances, this could add up—both from a management and cost standpoint. And Windows Azure provides the operating system for you (and keeps it up to date with patches)—you just copy your application to Windows Azure and run it in the cloud.
The Microsoft HPC Pack 2012 (a free download that will be available from the Microsoft Download Center later this year) makes it very easy to manage compute resources and schedule your jobs in Windows Azure. You take the proven cluster management tool from Windows Server, connect it to Windows Azure, and then let it do the work. All you need to get started is a Windows Azure account. A set-up wizard takes care of the preparation, and the job scheduler runs your computations.
What’s more, there’s no commitment: you can pay as you go, or you can negotiate a discount if you are going to use a lot of core hours. As Bill Hilf, general manager of product management for Windows Azure observes, it’s easy to manage a wide range of sizes and types of workloads on Windows Azure. Like Bill, we, too, are extremely enthusiastic about the possibilities offered by the supercomputing prowess of Windows Azure. Such massive computational power is critical for “big data” studies that increase our understanding of complex systems.
The genome-wide association study (GWAS) is a case in point. Microsoft Research conducted a 27,000-core run on Windows Azure to crunch data from this study. With the nodes busy for 72 hours, 1 million tasks were consumed—the equivalent of approximately 1.9 million compute hours. If the same computation had been run on an 8-core system, it would have taken 25 years to complete!
The GWAS offers a powerful approach to identifying genetic markers that are associated with human diseases. It used data from a Wellcome Trust study of the British population, which examined some 2,000 individuals and a shared set of about 13,000 controls for each of seven major diseases. But as in all genome-wide association studies, this study had to overcome this significant problem: to study the genetics of a particular condition, say heart disease, researchers need a large sample of people who have the disorder, which means that some of these people are likely to be related to one another—even if it’s a distant relationship. This means that certain positive associations between specific genes and heart disease are false positives, the result of two people sharing a common ancestor rather than sharing a common propensity for clogged coronaries. In other words, your sample is not truly random, and you must statistically correct for “confounding,” which was caused by the relatedness of your subjects.
This is not an insurmountable statistical problem: there are so-called linear mixed models (LMMs) that can eliminate the confounding. Use of these, however, is a computational problem, because it takes an inordinately large amount of computer runtime and memory to run LMMs to account for the relatedness among thousands of people in your sample. In fact, the runtime and memory footprint that are required by these models scale as the cube and square of the number of individuals in the dataset, respectively. So, when you’re dealing with a 10,000-person sample, the cost of the computer time and memory can quickly become prohibitive. And it is precisely these large datasets that offer the most promise for finding the connections between genetics and disease.
To avoid this computational roadblock, Microsoft Research developed the Factored Spectrally Transformed Linear Mixed Model (better known as FaST-LMM), an algorithm that extends the ability to detect new biological relations by using data that is several orders of magnitude larger. It allows much larger datasets to be processed and can, therefore, detect more subtle signals in the data.
By using Windows Azure, Microsoft Research ran FaST-LMM on data from the Wellcome Trust, analyzing 63,524,915,020 pairs of genetic markers, looking for interactions among these markers for bipolar disease, coronary artery disease, hypertension, inflammatory bowel disease (Crohn’s disease), rheumatoid arthritis, and type I and type II diabetes. The result: the discovery of new associations between the genome and these diseases—discoveries that could presage potential breakthroughs in prevention and treatment.
Results from individual pairs and the FaST-LMM algorithm are available via online query in Epistasis GWAS for 7 common diseases in the Windows Azure Marketplace (free access), so researchers can independently validate results that they find in their lab.
Today’s smartphones have put a computer in your pocket. Now, with cloud computing through Window Azure, you have a supercomputer in your—well, not in your pocket, but probably within your budget. Whatever your big-data concerns, Windows Azure can provide supercomputing power at an affordable price.
—David Heckerman, Distinguished Scientist, Microsoft Research; Robert Davidson, Principal Software Architect, Microsoft Research, eScience; Carl Kadie, Principal Research Software Design Engineer, Microsoft Research, eScience; Jeff Baxter, Development Lead, Windows HPC, Microsoft; Jennifer Listgarten, Researcher, Microsoft Research Connections; and Christoph Lippert, Researcher, Microsoft Research Connections