Download Research Tools
It seems like only yesterday that the eScience team at Microsoft Research came up with the idea of recognizing outstanding contributions to the field of data-intensive computing with an award named in memory of Jim Gray. Jim was a man of vision. The breadth and clarity of the agenda he set forth has provided a roadmap that extends beyond traditional data-intensive research to the maturing field of eScience.
Last night, October 9, our annual Jim Gray Award banquet brought the 2012 Microsoft eScience Workshop to a close. As I stood on stage, presenting the Jim Gray eScience Award to Antony John Williams, I remembered Jim and thought to myself, “Jim would be pleased with this choice.”
Antony is leading the charge to show how experience, knowledge, insight, and crowd-sourced contributions can build a platform to facilitate a semantic web for chemistry. ChemSpider provides the means by which that can be realized now. Jim valued doers, and, with his pioneering spirit and energy, Antony is exactly that: a doer.
Jim Gray himself was the ultimate doer, a man with far-ranging interests—from astronomy to zoology, literally A to Z—but none was dearer to him than the idea of using computers to make scientists more productive. Jim had the clarity to see the revolutionary impact of what’s come to be known as Big Data—how data-intensive science had ushered in a new era, which he ccalled the Fourth Paradigm. At the time of his loss at sea (while sailing, another of his myriad interests), Jim was working with the science community to build a worldwide digital library to integrate all scientific literature and its underlying data in one easily-accessible collection.
Which is why the selection of Antony is so very apt. Antony’s work on ChemSpider aligns precisely with Jim’s vision of a global digital library of science. Jim would also have appreciated the diversity of Antony’s many endeavors. Currently vice president of strategic development and head of Chemoinformatics for the Royal Society of Chemistry, Antony has pursued a career built on rich experience in experimental techniques, implementation of new nuclear magnetic resonance (NMR) technologies, research and development, and teaching, as well as analytical laboratory management.
His selection as the 2012 winner of the Jim Gray eScience Award acknowledges Antony’s leadership in making chemistry publically available through collective action. ChemSpider provides fast text and structure search access to data and links on more than 28 million chemicals, and this marvelous resource is freely available to the scientific community and the general public. Like the previous five winners of the Jim Gray award, Antony’s contributions to eScience have led to the advancement of science through the use of computing. As I said, I am sure that Jim would be pleased with this year’s choice.
—Tony Hey, Vice President, Microsoft Research Connections
This week, the annual Microsoft eScience Workshop is being held in Chicago (the “Windy City”), providing an unparalleled opportunity for domain scientists, researchers, and technologists to discuss the benefits and difficulties of incorporating more computing and information technology into the scientific process. Over the years, the eScience workshop has provided a forum where scientists could voice their data and technology challenges and get input from those who’ve confronted similar issues. Front and center this year are topics related to Big Data—be it the management of the rising data flood, the analysis of the data tsunami, or even the visualization of the data explosion. In addition, this year's workshop explores questions about how to train and develop data scientists, and how citizen scientists can play a role in gaining insights from the vast amounts of information.
Many of these topics are examined in the book, The Fourth Paradigm: Data-Intensive Scientific Discovery, which is an excellent resource for these discussions. And, as evidenced in that book, the Big Data “opportunity” has actually been building for some time—but now it has reached the tipping point in terms of awareness across more science domains. The commoditization of devices, sensors, storage, and connectivity—paired with technologies like cloud computing—has made the idea of capturing and maintaining all data in those science domains a plausible reality. As a result, scientists are thinking about what can be done, rather than lamenting what could be done if only they had the research infrastructure. In preparing for this year’s event, I looked back at the very first Microsoft eScience Workshop, held in 2004. I revisited Jim Gray’s keynote and put together this six-slide composite of the main challenges Jim identified back then. As you’ll notice, while some progress has been made, many of those challenges are still being addressed. For instance, global federation has remained a key issue for distributed and disparate databases. Do you move all the data to one location? Or do you ensure that the data owners continue to curate the data and safeguard the quality of the datasets? The approach taken by SkyQuery has really advanced federation, by demonstrating how multiple datasets can be queried seamlessly and by implementing novel approaches, such as the spatial join queries. If you want more details, check out the paper, SkyQuery: A WebService Approach to Federate Databases.
Six-slide composite of the main challenges that Jim Gray identified at the first Microsoft eScience Workshop in 2004
To truly tackle these data challenges, scientific datasets need the following attributes: discoverability, accessibility, and consumability. If a dataset doesn't have all three, it might as well be kept in a file cabinet. There has been much work done lately on discoverability: for example, the emergence of different “data.gov” domain science catalogs—and even commercial ones like the Windows Azure Marketplace. The “Open Data for Open Science” session at this year’s eScience Workshop explores how to address some of these challenges from the science side and looks at how simple, Internet-based protocols, such as OData (the Open Data Protocol), can help ensure that the end-user scientist can use the data. The Monday evening event at the Adler Planetarium showcases how scientific data and information can be communicated to the public, through amazing 3-D tours powered by Microsoft Research WorldWide Telescope (WWT) and brought to life in the planetarium’s Grainger Sky Theater. Microsoft researcher Jonathan Fay, architect of WWT, has been working with the Adler to ensure that tours that were originally developed to be shown in planetarium can be taken home and experienced later. An example of the great work from the Adler is the Welcome to the Universe show and the WWT tour narrated by astronomer Mark SubbaRao. You can play the tour in your browser. You can find more tours powered by WorldWide Telescope at the Layerscape website. Whether you're attending the Microsoft eScience Workshop or just wishing you could, I encourage you to dive into these Big Data challenges.
—Dan Fay, Director, Earth, Energy, and Environment; Microsoft Research Connections
I have just returned from the ninth annual Microsoft eScience Workshop, held in conjunction with the 2012 IEEE International Conference on eScience, at the Hyatt Regency Chicago. As in previous years, the Microsoft workshop focused on exploring where we are now and what future progress we can anticipate in extending science through computing. True to the conference theme, eScience in Action, computer science and scientific discovery merged into a lively discussion of results.
The keynotes supported the theme: Drew Purves of Microsoft Research Cambridge shared computer-based environmental models. We saw geographical visualizations of continent-wide temperature variations, measured and modeled. David Heckerman of Microsoft Research described the trend in computational biology, providing examples from genomics to vaccines. Antony Williams, the 2012 Jim Gray eScience Award winner, used his work on ChemSpider to show us how scientists can stand on the shoulders of others through easy access to scientific knowledge through the web. ChemSpider, an Internet-based chemical database, provides access to data on the profusion of new chemical compounds that are being identified and explored in the growing community of chemistry researchers.
The workshop breakout sessions covered a breadth of topics, ranging from the contributions that citizen scientists can offer to the knowledge that new generations of data scientists will need. Perspectives were diverse, and I came away impressed by the maturity of the community and the richness of the discussion.
As I look back over the two days of the workshop, I remember being taught as a child—by my grandmother, who possessed timeless wisdom—that I must always assess truth for myself, and not necessarily trust what the media present in such beauty. In many ways, this lesson, drummed into me when knowledge was mainly passed on in unsearchable print, was the underlying theme of this eScience Workshop. Web designers certainly know how to package information and make it beautiful, but to discover the truth the seeker must look more deeply. Drew Purves’s presentation showed results, but the challenge he posed was the “defensibility” of models: how can we know that they are predicting accurately? David Heckerman shared how pharmacists of the future will check prescriptions against an individual’s genome to help identify which prescription will be most effective—yet another discovery of what’s true. Antony Williams opened our eyes to the challenge of determining the accuracy of chemical data already on the Internet.
Looked at in one way, every presentation was about truth, whether a citizen scientist’s contribution to her community was accurate, whether the scientific results in a publication could be replicated, or whether we can trace the code and data that together generated a result. You can view the keynotes and session presentations and see for yourself if what I am saying is true.
—Harold Javid, Director, Microsoft Research Connections