Download Research Tools
One of the high points of the annual eScience Workshop is the presentation of the Jim Gray Award to a researcher who has made an outstanding contribution to the field of data-intensive computing. I'd like to say a bit more about Jim and why we've named an award after him, but first, here's the identity of this year's winner: Philip Bourne. Yes, the Bourne identity.
Now that you've stopped laughing, let me say just a few words about Phil and why he's this year's honoree. Phil's contributions to open access in bioinformatics and computational biology are legion, and are exactly the sort of groundbreaking accomplishments in data-intensive science that we celebrate with the Jim Gray Award. In particular, Phil's role as the founding editor-in-chief of the open-access journal PLoS Computational Biology has significantly advanced open access in mathematical and computational biology. Phil is also co-Director of the Protein Data Bank (PDB), whose vast store of data-most major journals and funding agencies now require scientists to submit relevant protein structure data to PDB-has become a key resource for the biology and genomics research communities. Phil also co-founded SciVee.tv, a website that lets scientists upload videos, lectures and presentations covering a variety of disciplines. He is committed to the free dissemination of scientific knowledge through new open access models linking textual publications to data in order to preserve the scientific record. It is this work-on education, open access and open science-that so perfectly aligns with Jim Gray's vision.
Like me, Phil is a transplant here in America. He's originally from Australia, where he trained as a chemist. He's now a professor in the Department of Pharmacology at the Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California, San Diego, so I have to admire his choice of American climates. He came to UCSD circuitously-leaving his native land for post-doc work at Sheffield University in the UK, and then arriving at Columbia University, where he became director of the Cancer Center Computing Facility.
Now, a few words about the award itself. It was established in 2008 as a tribute to Jim Gray, a Technical Fellow for Microsoft Research who disappeared at sea in 2007. Jim was intrigued by the explosive growth of data in modern science. He viewed the accumulation, organization, and utilization of this data deluge as the next step in the evolution of scientific exploration, and was utterly dedicated to the idea that data-intensive computing would help solve some of society's greatest challenges. So, in honor of the memory of Jim Gray, we celebrate the achievements of Phil Bourne and the other dedicated researchers who are striving daily to make Jim's predictions a reality.
--Tony Hey, corporate vice president of the External Research Division of Microsoft Research
Microsoft Research's 7th annual eScience Workshop is in full swing this week in lovely Berkeley, California. This event has brought together over 200 scientists from diverse fields (and diverse geographies), all united around their interest in using data-intensive science to advance their research. The theme of this year's workshop is "Scaling the Science," which is all about understanding processes at the molecular level and then scaling them up to larger systems-say, the human body or worldwide evaporation patterns.
New technologies in the physical and biological sciences play a huge role in this scaling effort, and this year's eScience workshop showcases several. In particular, we are excited to be highlighting the Microsoft Biology Foundation (MBF) and environmental research collaboration between Microsoft Research and UC Berkeley that leverages MODISAzure.
MBF is a prime example of the power of using enormous datasets to advance research. It provides researchers with advanced tools to detect connections among a vast store of bioinformatics functions-such as finding a correlation between a particular human gene sequence and the likelihood of developing a certain disease. Researchers at Johnson & Johnson are already using MBF to build and mine advanced biological and chemical databases, helping them to make discoveries more rapidly. By taking advantage of MBF's store of pre-existing functions, the Johnson & Johnson scientists don't have to reinvent the wheel as they search for meaningful linkages among bioinformatics data. This is a huge timesaver-and a potential lifesaver.
MBF is part of the Microsoft Biology Initiative, and is available under an open source license. It is freely downloadable at http://research.microsoft.com/bio/.
Another technology for data-intensive research leverages MODISAzure. The new technology takes images from MODIS, a NASA satellite that takes pictures of patches of the Earth, and then runs them through an image processing pipeline on the Microsoft Windows Azure cloud-computing platform. Records from ground-based sensors are layered in, and then the mammoth dataset is combined via biophysical modeling. This research allows scientists from diverse disciplines to share data and algorithms, which enables them to better visualize and understand how ecosystems behave as climate change occurs. By so doing, it takes earth science a giant step toward having systems that are present everywhere and running all the time. Using this research, scientists will be able to mine a vast array of data to better understand such environmental issues as the impact of specific sources of CO2 emissions on climate change in a given ecosystem. The project was created by Dennis Baldocchi, biometeorologist at U.C. Berkeley, Youngryel Ryu, biometerologist at Harvard University, and Catharine van Ingen, Microsoft eScience researcher. Tony Hey - corporate vice president of the External Research Division of Microsoft Research
Here in External Research, we collaborate with dozens of computer and research scientists around the world. Through these collaborations, we regularly ask the question, "How can software make you a better scientist?"
This seemingly simple question has evoked many exciting answers - and indeed drives the Research Accelerators we choose to incubate for the community.
Scientists operate on a number of platforms, use a number of languages, and have to deal with many different types of data and data formats.
"Open Science" and the benefits of collaboration are changing scientific culture - the amount of data and knowledge being shared between scientists and across scientific disciplines is growing dramatically. In this increasingly collaborative climate, interoperability and standards are becoming progressively more important.
Software for scientists must meet these evolving needs.
At the same time, Microsoft has been going through important changes. Many people don't realize that Microsoft now works with over 150 standards organizations and 350 working groups. Microsoft contributes to over 100 open source projects - including Linux, Samba, PHP, and IronPython - in the interest of improved interoperability. This is not the same Microsoft people remember from ten years ago.
(If these facts about Microsoft make you curious, you may be interested in my presentation "Ten Things You Don't Know About Microsoft," available at http://docs.com/@derickc.)
In our team in Microsoft Research (and many across Microsoft), we embrace the principles of "Open Software," so we can meet the needs of open science researchers. Open software has clear APIs, interoperability targets, and documented file and protocol formats, and has been developed collaboratively with its users.
(Incidentally, Microsoft server products and tools also have a public, highly formalized set of Common Engineering Criteria to improve integration, manageability, security, and reliability.)
In several cases, open software can also be open source software. Open source software is helpful when your community is prepared to contribute and help drive development, and when the software is not too broad or complex for new developers to come up to speed.
Not all user communities include developers, and building open source software can add overhead to a project. Developers aren't standing by waiting to contribute to every new open source project that arrives. In fact, it's quite difficult to establish a community and find interested developers for an open source project. It requires additional investment and governance - plus people with great community leadership and management skills. There are no guarantees these investments will result in contributions. Accordingly, the decision to make a project open source should not be taken lightly.
In our team - External Research - community collaboration is why we exist. We bring academic and Microsoft research teams together to create communities and technologies that advance and accelerate science. As a result, developing open source software is often the right choice for our business and our academic collaborators. Most of our releases are open source.
As part of Microsoft, a company that earns revenue from software and intellectual property (IP) licensing, our team has additional challenges to overcome when we ship open source software.
These challenges are not unlike the challenges academics face when pursuing open science or running their own open source projects. Universities manage patent portfolios, incubate competitive startup businesses, and collaborate with commercial entities interested in protecting their IP.
So, how does one protect commercial IP while also investing in open source efforts? A good way is to pursue open source efforts through a third-party, nonprofit organization, such the Outercurve Foundation.
The Outercurve Foundation
The Outercurve Foundation was created as a forum in which open source communities and the software development community can come together with the shared goal of increasing participation in open source community projects. It's structured like a museum, with thematic galleries, each filled with projects that relate to the gallery theme.
I'm really excited to announce that we are opening a new gallery for Research Accelerators with the Outercurve Foundation - to help our team grow our open source software efforts.The anchor project in our gallery is our Scientific Workflow Workbench (code-named Project Trident). By assigning this project to the Outercurve Foundation, we hope to grow academic participation in its ongoing development.
So far, I've been really pleased with the reception academics have given to the Outercurve Foundation. We have two universities interested in collaborating on Trident so far, Professor Beth Plale of Indiana University and Susan Cuddy, a researcher with Australia's CSIRO Land and Water research organization. Professor Plale, a researcher in Indiana University's Linked Environments for Atmospheric Discovery II (LEAD II) research project, used Project Trident to provide workflows to support the National Science Foundation (NSF) funded Vortex2 project that gathers data on tornados.
Incidentally, if you develop open source software, you'll want to take a closer look at another Outercurve Foundation project sponsored by Microsoft called CoApp. CoApp will provide open source developers with tools for package management on Windows - enabling the entire ecosystem of open source software projects to be more easily installed and managed on Windows.
Our engineering team is always working on something new, or the next version of something cool and useful for scientists. We have a number of exciting projects in the works that you'll hear more about on this blog and at our events. Many of these projects will reside with the Outercurve Foundation as well, to help facilitate great academic/Microsoft collaborations.
Stay tuned for more!
Derick Campbell, director of engineering, Microsoft Research
P.S. How can software make you a better scientist? Let me know your thoughts at derickc at microsoft.com.