Download Research Tools
“If I have seen further, it is by standing on the shoulders of giants.”—Sir Isaac Newton
Standing on the shoulders of giants is a metaphor we often use to describe how research advances. More than an aphorism, it is a mindset that we ingrain in students when they start graduate school: take the time to understand the current state of the art before attempting to advance it further. Having to justify why you have reinvented the wheel during your PhD defense is not a comfortable situation to be in. Moreover, the value of truly reproducible research is reinforced every time a paper is retracted because its results cannot be reproduced, or every time that promising academic research—such as pursuit of important new drugs—fails to meet the test of reproducibility.
Of course, to truly learn from work that has preceded yours, you need access to it. How can you build on the latest research if you don’t know its details? Thankfully, open access (OA) is making it easier to find research papers, and Microsoft Research is committed to OA. Though it’s a good start, OA articles only contain words and pictures. What about the data, software, input parameters, and everything else needed to reproduce the research?
While research software provides the potential for better reproducibility, most people agree that we are some way from achieving this. It’s not just a matter of throwing your source code online. Even though tools such as GitHub provide excellent sharing and versioning, it is up to the researcher or developer to make sure the code cannot only be re-run but also understood by others. There are still technical issues to overcome, but the social ones are even harder to tackle. The development of scientific software and researchers’ selection of which software to use and reuse are all intertwined. We at Microsoft Research are concerned with this—see “Troubling Trends in Scientific Software” in the May 17, 2013, issue of Science magazine.
Kenji Takeda talks about reproducible research and the cloud at CW14.Photo: Tim Parkinson, CC-BY
This year’s Collaboration Workshop (CW14), run by the Software Sustainability Institute (SSI), brought together likeminded innovators from a broad spectrum of the research world—researchers, software developers, managers, funders, and more—to explore the role of software in reproducible research. This theme couldn’t have been timelier, and I was excited to take part in this dynamic event again with a talk on reproducible research and the cloud. The “unconference” format—where the agenda is driven by attendees’ participation—was perfect for exploring the many issues around reproducible research and software. So, too, was the eclectic make-up of the attendees, so unlike that at more conventional conferences.
Hack Day winners receive Windows 8.1 tablets for Open Source Health Check. Left to right: Arfon Smith (GitHub), Kenji Takeda (Microsoft Research), James Spencer (Imperial College), Clyde Fare (Imperial College), Ling Ge (Imperial College), Mark Basham (DIAMOND), Robin Wilson (University of Southampton), Neil Chue-Hong (Director, SSI), Shoaib Sufi (SSI)
Instead of leaving after two days, many participants stayed on for Hack Day—a hackathon that challenged them to create real solutions to problems surfaced at the workshop. Eight team leaders had to pitch their ideas to the crowd, as the researchers and software developers literally voted with their feet to join their favorite team. The diversity of ideas was impressive, such as scraping the web to catalogue scientific software citations, extending GitHub to natively visualize scientific data, and assessing research code quality online. We made sure that teams were able to use Microsoft Azure to quickly set up websites, Linux virtual machines, and processing back-ends to build their solutions.
Arfon Smith from GitHub and I served as judges, and we had a tough time choosing a winning project. After much back-and-forth, we awarded the honor to the Open Source Health Check team, which created an elegant and genuinely usable service that combines some of the best practices discussed during the workshop. Their prototype runs a checklist on any GitHub repository to make sure that it incorporates the critical components for reproducibility, including documentation, an explicit license, and a citation file. The team worked furiously to implement this, including deploying it on Microsoft Azure and integrating it with the GitHub API, to demonstrate a complete online working system.
Recomputation.org aims to make computational experiments easily reproducible decades into the future.
In addition to our role at CW14, Microsoft Research is delighted to be supporting teams working on new approaches to scientific reproducibility as part of our Microsoft Azure for Research program:
While we still have not achieved truly reproducible research, CW14 proved that the community is dedicated to improving the situation, and cloud computing has an increasingly important role to play in enabling reproducible research.
—Kenji Takeda, Solutions Architect and Technical Manager, Microsoft Research Connections
The warm, sunny days of late August in Saint Petersburg, Russia’s “northern capital,” were made even brighter by the 2012 Microsoft Research Russian Summer School. An annual Microsoft Research event, the Russian Summer School is intended for doctoral and master’s students, as well as young scientists. This year, the program focused on concurrency and parallelism in software, and featured lectures from eight of the world’s foremost experts in this field. The school was co-chaired by Judith Bishop, the director of computer science at Microsoft Research, and Bertrand Meyer, professor of software engineering at ETH Zurich and St. Petersburg National Research University of Information Technologies, Mechanics, and Optics (ITMO).
2012 Microsoft Research Russian Summer School participants
This year’s Russian Summer School follows the highly successful past schools: Computer Vision School 2011, MIDAS 2010, and HPC 2009. It represents another of the many collaborative efforts between Microsoft Research Connections and the world’s top research professionals and institutions. The school provided the participating students with a unique opportunity to learn from top scientists in the field of concurrency and parallelism. Lectures covered the fundamentals of the field and explored the latest research topics. The school also provided a great venue for interpersonal networking, enabling the students to establish connections with one another and with the school lecturers. Students had Sunday free to explore the beautiful city of Saint Petersburg—referred to as “Venice of the North” because of its picturesque canals—and carry on individual work. Competition for admission to the school was particularly intense. The number of registrations on the school website exceeded 600, and the overall acceptance rate was fewer than 10 percent. Most of the applicants were exceptionally strong, which made the decision process extremely difficult. The 60 admitted students came from 27 cities in Russia, Ukraine, Belarus, and Kazakhstan, and represented 47 academic institutions and companies. We are happy to report the continuing growth in the number of female students; women comprised more than 20 percent of this year’s class.
Students were excited in their praise of the school’s program, which they found professionally stimulating and personally rewarding. They, and we, are looking forward to the 2013 Russian Summer School in Moscow!
—Fabrizio Gagliardi, Director, Microsoft Research Connections EMEA (Europe, Middle East, and Africa)
I have just returned from the ninth annual Microsoft eScience Workshop, held in conjunction with the 2012 IEEE International Conference on eScience, at the Hyatt Regency Chicago. As in previous years, the Microsoft workshop focused on exploring where we are now and what future progress we can anticipate in extending science through computing. True to the conference theme, eScience in Action, computer science and scientific discovery merged into a lively discussion of results.
The keynotes supported the theme: Drew Purves of Microsoft Research Cambridge shared computer-based environmental models. We saw geographical visualizations of continent-wide temperature variations, measured and modeled. David Heckerman of Microsoft Research described the trend in computational biology, providing examples from genomics to vaccines. Antony Williams, the 2012 Jim Gray eScience Award winner, used his work on ChemSpider to show us how scientists can stand on the shoulders of others through easy access to scientific knowledge through the web. ChemSpider, an Internet-based chemical database, provides access to data on the profusion of new chemical compounds that are being identified and explored in the growing community of chemistry researchers.
The workshop breakout sessions covered a breadth of topics, ranging from the contributions that citizen scientists can offer to the knowledge that new generations of data scientists will need. Perspectives were diverse, and I came away impressed by the maturity of the community and the richness of the discussion.
As I look back over the two days of the workshop, I remember being taught as a child—by my grandmother, who possessed timeless wisdom—that I must always assess truth for myself, and not necessarily trust what the media present in such beauty. In many ways, this lesson, drummed into me when knowledge was mainly passed on in unsearchable print, was the underlying theme of this eScience Workshop. Web designers certainly know how to package information and make it beautiful, but to discover the truth the seeker must look more deeply. Drew Purves’s presentation showed results, but the challenge he posed was the “defensibility” of models: how can we know that they are predicting accurately? David Heckerman shared how pharmacists of the future will check prescriptions against an individual’s genome to help identify which prescription will be most effective—yet another discovery of what’s true. Antony Williams opened our eyes to the challenge of determining the accuracy of chemical data already on the Internet.
Looked at in one way, every presentation was about truth, whether a citizen scientist’s contribution to her community was accurate, whether the scientific results in a publication could be replicated, or whether we can trace the code and data that together generated a result. You can view the keynotes and session presentations and see for yourself if what I am saying is true.
—Harold Javid, Director, Microsoft Research Connections