Download Research Tools
Halloween 2013 brought real terror to an Austin, Texas, neighborhood, when a flash flood killed four residents and damaged roughly 1,200 homes. Following torrential rains, Onion Creek swept over its banks and inundated the surrounding community. At its peak, the rampaging water flowed at twice the force of Niagara Falls (source: USA Today).
While studying the flood site shortly afterwards, David Maidment, a professor of civil engineering at the University of Texas, ran into an old acquaintance, Harry Evans, chief of staff for the Austin Fire Department. Recognizing their shared interest in predicting and responding to floods, the two began collaborating on a system to bring flood forecasts and warnings down to the local level. The need was obvious: flooding claims more lives and costs more federal government money than any other category of natural disasters. A system that can predict local floods could help flood-prone communities prepare for and maybe even prevent catastrophic events like the Onion Creek deluge.
Soon, Maidment had pulled together other participants from academia, government, and industry to start the National Flood Interoperability Experiment (NFIE), with a goal of developing the next generation of flood forecasting for the United States. NFIE was designed to connect the National Flood Forecasting System with local emergency response and thereby create real-time flood information services.
The process of crunching data from the four federal agencies that deal with flooding (the US Geologic Survey, the National Weather Service, the US Army Corps of Engineers, and the Federal Emergency Management Agency) was a burden for even the best-equipped physical datacenter—but not for the almost limitless scalability of the cloud. Maidment submitted a successful proposal for a Microsoft Azure for Research Award, which provided the necessary storage and compute resources via Microsoft Azure, the company’s cloud-computing platform.
Today, NFIE is using Microsoft Azure to perform the statistical analysis necessary to compare present and past data from flood-prone areas and thus build prediction models. By deploying an Azure-based solution, the NFIE researchers can see what’s happening in real time and can collaborate from anywhere, sharing data from across the country. The system has also proved to be easy to learn: programmers had their computer model, RAPID (Routing Application for Parallel computation of Discharge) up and running after just two days of training on Azure. Moreover, the Azure cloud platform provides almost infinite scalability, which could be crucial as the National Weather Service is in the process of increasing its forecasts from 4,500 to 2.6 million locations. Of course, the greatest benefits of this Azure-based solution accrue to the public—to folks like those living along Onion Creek—whose property and lives might be spared by the timely prediction of floods.
—Dan Fay, Director: Earth, Energy, and Environment, Microsoft Research
The forests that surround Campos do Jordao are among the foggiest places on Earth. With a canopy shrouded in mist much of time, these are the renowned cloud forests of the Brazilian state of São Paulo. It is here that researchers from the São Paulo Research Foundation—better known by its Portuguese acronym, FAPESP—have partnered with Rafael Olivier, professor of ecology at the University of Campinas, in an ambitious effort to understand the climate and ecology of these spectacular woodlands. Their aptly named Cloud Forest Project has both conservation and practical goals, as it seeks to understand how to protect one of Brazil’s largest forested areas while learning to manage access to water and other natural resources more effectively.
The researchers want to unravel the impact of micro-climate variation in the cloud forest ecosystem. Essentially, they want to understand how the forest works—how carbon dioxide, water, nitrogen, and other nutrients cycle through plants, animals, and microorganisms in this complex ecosystem. To do so, they’ve placed some 700 sensors in 15 forest plots, locating the devices at levels throughout the forest, from beneath the soil to the top of the canopy.
The integration of such a vast number of sensor data streams poses difficult challenges. Before the researchers can analyze the data, they have to determine the reliability of the devices, so that they can eliminate data from malfunctioning ones. They also need to translate scientific questions into analysis of the time-series data streams—a process much more sophisticated than the traditional “open all the data in Excel spreadsheets” approach.
Consequently, the project scientists have collaborated with Microsoft Research to manage the data with help from the Microsoft Azure for Research project. Think of it as cloud to cloud: cloud forest data being managed and analyzed through the power of cloud computing. Essentially, it’s a parallel process with some researchers developing the sensors, power supplies, and data flow in the cloud forest; others working with computers to set up receptacles for those massive incoming data flows; and everyone striving to reach a level of confidence that new insights can be discovered and explored through the data.
Reliance on cyber infrastructure built on the Microsoft Azure cloud platform frees the researchers from purchasing and maintaining physical computers, saving time and money and eliminating the aggravation of learning how to be a computer system administrator. Moreover, the cloud-based system gives researchers the power to combine interrelated data to create “virtual sensors” that quantify things that cannot be measured readily by one type of sensor. For example, measuring fog is difficult and expensive with just one sensor, but the presence of fog can be inferred by combining data from temperature, sunlight, and humidity sensors.
Similar cloud-computing advantages are available in almost any research project that involves the collection, management, and analysis of big data. If that describes your research, you’ll want to check out the Microsoft Azure for Research project, especially its award program, which offers substantial grants of Microsoft Azure compute resources to qualified projects. Your research might not involve a cloud forest, but if it entails a forest of data, the Microsoft Azure cloud could be your ticket to a more productive and less costly project.
—Rob Fatland, Senior Research Program Manager, Microsoft Research
Research published in academic journals is trustworthy. Or is it? This question is being asked more and more these days. While few doubt the integrity of the researchers, many in the scientific community are concerned about the inability to reproduce experiments. If the work is not reproducible, how can its reliability be judged?
Making research reproducible is far from trivial. It involves sharing not just the results, methods, and data, but also the implied knowledge of the original researcher, without which it’s difficult to independently reproduce the complete results. The sharing of results and methods has become established practice over the last 350 years through academic publication. Data sharing is coming of age, and huge efforts by the research and wider community on open data are starting to bear fruit. But that still leaves a big problem: how to share the original researcher’s implied knowledge. Fortunately, the possibility of attaining completely reproducible computational experiments is drawing closer, as we’ve seen this week (December 8–11, 2014) during the Recomputability 2014 workshop at the 7th IEEE/ACM International Conference on Utility and Cloud Computing in London.
During the workshop, Tom Crick of Cardiff Metropolitan University, Benjamin Hall of the University of Cambridge, Samin Ishtiaq of Microsoft Research, and I presented new ways of thinking about reproducibility. We’ve been considering what it takes to make reproducible research not only possible but attractive to researchers. Much of the current work around data sharing and reproducibility focuses on the person trying to reproduce the work. As such, it often fails to take into account the intense pressures under which the original researcher labors: the stress of having to produce and publish results. We therefore propose building systems that make researchers more productive during their day-to-day work: automated systems that help make their work reproducible. In computational domains, researchers have many cloud-based tools that make our lives easier, such as Github, Visual Studio Online, figshare, Office 365, and OneDrive. We use these tools every day, but usually in a disconnected way. We envisage a new world of “reproducibility as a service,” wherein these disparate services are brought together to make it much easier for researchers to think, develop, test, and publish their computational work wherever they are. Read more about our thoughts in our workshop paper, “Share and Enjoy: Publishing Useful and Usable Scientific Models.”
Tom Crick from Cardiff Metropolitan University and Lars Kotthoff from University College Cork kick off deep discussions at Recomputability 2014.
"This is an exciting area of research and one that could have a profound impact on the way that computational science is performed. By rethinking how we develop, use, benchmark, and share algorithms, software, and models, alongside the development of integrated and automated e-infrastructure to support recomputability and reproducibility, we will be able to improve the efficiency of scientific exploration as well as promoting open and verifiable scientific research," says Crick.
We are also excited to be working with Ian Gent and his team at the University of St Andrews on Recomputation.org, whose work on using virtual machines in the cloud to freeze, and later unfreeze, computational experiments is very promising. It’s like creating an exact copy of Michael Faraday’s Lab and then being able to reproduce and extend his experiments. During the summer, we had a fantastic time exploring what’s possible; read more about our experiences trying to reproduce several experiments. These efforts provide a great starting point, although even with access to detailed lab notebooks, it is very difficult to know every detail of the original experimenter’s experience and implied knowledge. So this approach is not the whole solution, but certainly a move in the right direction.
All of these steps toward reproducibility are becoming more realizable due to rapid developments in the cloud, and we were excited to share our own experiences with the research community during the conference. Ant Rowstron of Microsoft Research described the need to think at rack scale—that is to say, to think in terms of racks, rather than individual servers, as the building blocks in our data centers. Ross Smith from Skype and Lori Ada Kilty of Microsoft Research showed how crowdsourcing and gamification can improve online conferencing systems, such as Lync and Skype. I was delighted to discuss where we are heading with hyper-scale cloud computing, from efficiency, consistency, and productivity points of view. And it was great to see that Claudio Rosso from the Microsoft Innovation Centre in the Istituto Superiore Mario Boella (ISMB) received the best poster award for the FLOODIS flood information project, using Microsoft Azure: an example of how the cloud can augment mobile devices for helping citizens in times of crisis. To round it off, we had fun with dozens of researchers as they learned about Microsoft Azure and Azure Machine Learning and explored how these platforms can help with their work.
All and all, it was an amazing week in London. We were thrilled to share how cloud computing is making a difference for researchers today—and the promise it holds for the future. If you’re interested in how cloud computing can help with your research, we invite you to learn more through our Azure for Research program, which offers a host of tools and tips, not to mention substantial grants of cloud resources for qualified projects.
—Kenji Takeda, Solutions Architect and Technical Manager, Microsoft Research