Der deutsche Education Blog

April, 2014

Microsoft Research Outreach Blog

The Microsoft Research Outreach blog shares stories of collaborations with computer scientists at academic and scientific institutions to advance technical innovations in computing, as well as related events, scholarships, and fellowships.

April, 2014

  • Microsoft Research Outreach Blog

    Reproducible research: are we there yet?


     “If I have seen further, it is by standing on the shoulders of giants.”
    —Sir Isaac Newton

    Standing on the shoulders of giants is a metaphor we often use to describe how research advances. More than an aphorism, it is a mindset that we ingrain in students when they start graduate school: take the time to understand the current state of the art before attempting to advance it further. Having to justify why you have reinvented the wheel during your PhD defense is not a comfortable situation to be in. Moreover, the value of truly reproducible research is reinforced every time a paper is retracted because its results cannot be reproduced, or every time that promising academic research—such as pursuit of important new drugs—fails to meet the test of reproducibility.

    Is your research reproducible?

    Of course, to truly learn from work that has preceded yours, you need access to it. How can you build on the latest research if you don’t know its details? Thankfully, open access (OA) is making it easier to find research papers, and  Microsoft Research is committed to OA. Though it’s a good start, OA articles only contain words and pictures. What about the data, software, input parameters, and everything else needed to reproduce the research?

    While research software provides the potential for better reproducibility, most people agree that we are some way from achieving this. It’s not just a matter of throwing your source code online. Even though tools such as GitHub provide excellent sharing and versioning, it is up to the researcher or developer to make sure the code cannot only be re-run but also understood by others. There are still technical issues to overcome, but the social ones are even harder to tackle. The development of scientific software and researchers’ selection of which software to use and reuse are all intertwined. We at Microsoft Research are concerned with this—see “Troubling Trends in Scientific Software” in the May 17, 2013, issue of Science magazine.

    Kenji Takeda talks about reproducible research and the cloud at CW14. Photo: Tim Parkinson, CC-BY
    Kenji Takeda talks about reproducible research and the cloud at CW14.
    Photo: Tim Parkinson, CC-BY

    This year’s Collaboration Workshop (CW14), run by the Software Sustainability Institute (SSI), brought together likeminded innovators from a broad spectrum of the research world—researchers, software developers, managers, funders, and more—to explore the role of software in reproducible research. This theme couldn’t have been timelier, and I was excited to take part in this dynamic event again with a talk on reproducible research and the cloud. The “unconference” format—where the agenda is driven by attendees’ participation—was perfect for exploring the many issues around reproducible research and software. So, too, was the eclectic make-up of the attendees, so unlike that at more conventional conferences.

    Hack Day winners receive Windows 8.1 tablets for Open Source Health Check. Hack Day winners receive Windows 8.1 tablets for Open Source Health Check. Left to right: Arfon Smith (GitHub), Kenji Takeda (Microsoft Research), James Spencer (Imperial College), Clyde Fare (Imperial College), Ling Ge (Imperial College), Mark Basham (DIAMOND), Robin Wilson (University of Southampton), Neil Chue-Hong (Director, SSI), Shoaib Sufi (SSI)

    Instead of leaving after two days, many participants stayed on for Hack Day—a hackathon that challenged them to create real solutions to problems surfaced at the workshop. Eight team leaders had to pitch their ideas to the crowd, as the researchers and software developers literally voted with their feet to join their favorite team. The diversity of ideas was impressive, such as scraping the web to catalogue scientific software citations, extending GitHub to natively visualize scientific data, and assessing research code quality online. We made sure that teams were able to use Microsoft Azure to quickly set up websites, Linux virtual machines, and processing back-ends to build their solutions.

    Arfon Smith from GitHub and I served as judges, and we had a tough time choosing a winning project. After much back-and-forth, we awarded the honor to the Open Source Health Check team, which created an elegant and genuinely usable service that combines some of the best practices discussed during the workshop. Their prototype runs a checklist on any GitHub repository to make sure that it incorporates the critical components for reproducibility, including documentation, an explicit license, and a citation file. The team worked furiously to implement this, including deploying it on Microsoft Azure and integrating it with the GitHub API, to demonstrate a complete online working system. aims to make computational experiments easily reproducible decades into the future. aims to make computational experiments easily reproducible decades into the future.

    In addition to our role at CW14, Microsoft Research is delighted to be supporting teams working on new approaches to scientific reproducibility as part of our Microsoft Azure for Research program:

    • is focused on taking advantage of virtual machines to preserve software experiments. By packaging up a researcher’s entire experimental setup in a VM, it becomes trivial to replicate their work. Once these VMs have been uploaded to VMDepot, it takes just a few mouse clicks to call up the complete experiment in Microsoft Azure, log in, and rerun the research. From there, it is possible to drill down and dissect the work, extend it, and then share a new version online. It’s a great collaboration with the Software Sustainability Institute, and the cloud provides an ideal environment for this platform.
    • Patrick Henaff, of IAE de Paris, is working with Zeliade Systems on enhanced IPython notebooks shared via the cloud at Their vision for social coding around reproducible software tackles some of the cultural issues by allowing researchers to easily share, discover, and reuse their work in an executable way.
    • Titus Brown, of Michigan State University, is conducting pioneering work on open biological science, open protocols, and provenance-preserving analyses in the cloud. His pilot project uses publicly available data from the Marine Eukaryotic Transcriptome Sequencing Project to move processing workflows into Microsoft Azure in a reproducible way, allowing researchers to tweak and remix their analyses.

    While we still have not achieved truly reproducible research, CW14 proved that the community is dedicated to improving the situation, and cloud computing has an increasingly important role to play in enabling reproducible research.

    Kenji Takeda, Solutions Architect and Technical Manager, Microsoft Research Connections

    Learn more

  • Microsoft Research Outreach Blog

    Enhancing learning through the cloud


    Most parents want their children to have access to the best educational opportunities at schools with broad, enriching curricula. Students attending such schools may find themselves challenged with finding sufficient time to study any one subject adequately—in or out of the classroom. MyCloud, an innovative e-learning platform developed by Microsoft Research Asia, helps solve this problem by providing students and teachers with an interactive space for collaboration, exploration, and enrichment.

    Students from Singapore Nan Chiau primary school, which has been using the MyCloud e-learning platform since 2011 for Chinese language instruction.
    Students from Singapore Nan Chiau primary school, which has been using the MyCloud e-learning platform since 2011 for Chinese language instruction.

    Originally developed to assist in teaching Chinese to students in Singapore, MyCloud is a web-based, interactive platform that allows teachers and students to extend learning beyond the classroom. Students can use a tablet, smartphone, laptop, or desktop computer to access the platform and complete assignments. Having grown up with technology, today’s students are very comfortable with using it in an educational setting; in fact, the high-tech aspect of this innovative platform captures students’ attention and interest, promoting engagement and fueling an intrinsic motivation to excel academically.

    By using MyCloud, teachers can upload assigned lessons directly, knowing that their students can readily access their assignments and easily follow their instructions. It also enables teachers to upload supplemental activities and lessons, thereby complementing and expanding upon the material covered during limited classroom time. These supplemental activities not only broaden and enhance course content; they allow students to learn at their own pace, as the student controls MyCloud. Students can take uploaded tests on the e-learning platform to help them assess their progress with their studies. And the audio component is particularly valuable to students who are learning a foreign language, as they can practice their speaking and listening skills and readily learn new vocabulary. Video uploads will soon be added to promote students’ learning even further.

    The value of this e-learning platform is evident at Nan Chiau Primary School in Singapore, which has been using it since 2011 for Chinese language instruction. Teachers at Nan Chiau understand that students must complete time-consuming exercises to learn Chinese vocabulary and tonal inflections, but the time allotted for classroom instruction is limited. MyCloud has allowed the students to pursue their mastery of Chinese on their own time and at their own pace, reinforcing the significance of the rate at which individuals learn, while enhancing students’ enjoyment of learning. Students have shown increased proficiency in Chinese language as a result of using the e-learning platform, and Microsoft’s partnership with Nan Chiau Primary School demonstrates how schools can successfully use its technology to enhance learning and empower students.

    —Winnie Cui, Microsoft Research Asia, Senior UR Manager

    Learn more

  • Microsoft Research Outreach Blog

    FetchClimate—harnessing the cloud to find and share environmental data


    Scientists around the world are striving tirelessly to monitor and model the environment—to understand the intricate workings of our ecosystem—so that policymakers can make informed decisions that lead to a sustainable future for “spaceship Earth.” This research involves using the thousands of available environmental datasets, on everything from agriculture and biodiversity to climate and the oceans. But finding, browsing, choosing, and downloading the right data can be ridiculously hard, even for the experts.

    What if finding environmental data were as simple as clicking on a map?

    Draw a box around the geographic area you’re interested in, select the environmental information you want, and view the data on Bing Maps within seconds
    Draw a box around the geographic area you’re interested in, select the environmental information you want, and view the data on Bing Maps within seconds

    Enter FetchClimate, a tool that makes locating environmental information as easy as searching for a hotel or coffee shop online. Just draw a box around the geographic area you’re interested in, select the environmental information you want, and view the data on Bing Maps within seconds. What used to take researchers hours, days, or even weeks can now be done very quickly—by anyone. When possible, FetchClimate calculates data uncertainty, so you know how reliable the information is, and the tool allows you to specify precisely the size of the area and the period of time for your query.

    FetchClimate runs in the cloud, on Microsoft Azure, meaning there is no physical limit on how much information can be added. You can not only look at historical climate data but also peer into the future, as we have included forecast data from the latest climate simulation experiments. For example, you can see what the predicted temperature or precipitation in your area will be in 2050.

    Visualization of year-to-year precipitation averages in southern Asia
    Visualization of year-to-year precipitation averages in southern Asia

    The Computational Ecology and Environmental Science group in Microsoft Research has spent several years developing FetchClimate, working with Moscow State University, which provided software development, and the DigiLab at the London College of Communication, which designed an interface that makes finding and understanding environmental information stress-free. So we’re excited to be releasing FetchClimate—in three different ways—for anyone to use for research, study, or just to satisfy their curiosity about our planet.

    • First, anyone can access our own FetchClimate service via a web explorer, which features a number of useful climatological layers, including temperature, precipitation, and sea depth. (If you would like to make your information and data easily available within our service, please contact us. We are interested in augmenting our current climate and environmental data with socio-economic and health information—or any other global information best viewed on a map.)
    • Second, you can access the same service via an API, from inside a program written in any of several languages, including .NET, R, and Python.
    • And third, we’re releasing a deployment package so that the more technically savvy of you can set up your own FetchClimate-powered service to make your data available to your colleagues, and to the wider world, via the same web explorer that runs over our own service. We’ll soon be open sourcing that explorer portion of the deployment, so you can customize or skin it how you like (all we ask is that you acknowledge that the service is powered by FetchClimate).

    The deployment package will be attractive to individuals, research teams, national laboratories, and international collaborations who are used to dealing with geographical data and are keen to share it with colleagues and the outside world in a more dynamic way. For example, Ireland’s Marine Institute has created the Irish Digital Ocean–SMART Marine Research Platform to stimulate collaborative research across the marine sector. As Eoin O’Grady, Information Services & Development Manager at the Marine Institute, explains, “FetchClimate greatly simplifies access to scientific data, promoting reuse. We see it as an excellent way to share Irish marine research data, part of the Irish Digital Ocean, with a broad range of users in the marine community, to support research and innovation and as input into public information services."

    In addition, we are currently sponsoring a special Climate Data Initiative that offers grants of Microsoft Azure resources to help early adopters set up their own FetchClimate-powered services. Using the deployment package, you will be able to implement your own instance of FetchClimate, including your datasets and a web front end that is customized for your own site—and we’ll provide the space on Azure! If you would like to pursue this, please submit a proposal by June 15, 2014. We will be selecting 40 awardees from among these proposals.

    We created FetchClimate as a way to turn data into actionable information, and to make that information easily available to the world. There are some exciting features that we haven’t discussed here (hint: what if you could upload a model, not just data?), and FetchClimate is just one of several exciting tools for environmental science that we are developing. All of these tools illustrate how, with a bit of imagination, we can begin to deliver research-as-a-service on Microsoft Azure. We hope these tools will help scientists, policymakers, and the public become more informed and better equipped to take care of our planet.

    Kenji Takeda, Solutions Architect and Technical Manager, Microsoft Research

    Kristin Tolle, Director of Environmental Science Infrastructure Development, Microsoft Research

    Learn more

Page 1 of 4 (11 items) 1234