Download Research Tools
It seems like only yesterday that the eScience team at Microsoft Research came up with the idea of recognizing outstanding contributions to the field of data-intensive computing with an award named in memory of Jim Gray. Jim was a man of vision. The breadth and clarity of the agenda he set forth has provided a roadmap that extends beyond traditional data-intensive research to the maturing field of eScience.
Last night, October 9, our annual Jim Gray Award banquet brought the 2012 Microsoft eScience Workshop to a close. As I stood on stage, presenting the Jim Gray eScience Award to Antony John Williams, I remembered Jim and thought to myself, “Jim would be pleased with this choice.”
Antony is leading the charge to show how experience, knowledge, insight, and crowd-sourced contributions can build a platform to facilitate a semantic web for chemistry. ChemSpider provides the means by which that can be realized now. Jim valued doers, and, with his pioneering spirit and energy, Antony is exactly that: a doer.
Jim Gray himself was the ultimate doer, a man with far-ranging interests—from astronomy to zoology, literally A to Z—but none was dearer to him than the idea of using computers to make scientists more productive. Jim had the clarity to see the revolutionary impact of what’s come to be known as Big Data—how data-intensive science had ushered in a new era, which he ccalled the Fourth Paradigm. At the time of his loss at sea (while sailing, another of his myriad interests), Jim was working with the science community to build a worldwide digital library to integrate all scientific literature and its underlying data in one easily-accessible collection.
Which is why the selection of Antony is so very apt. Antony’s work on ChemSpider aligns precisely with Jim’s vision of a global digital library of science. Jim would also have appreciated the diversity of Antony’s many endeavors. Currently vice president of strategic development and head of Chemoinformatics for the Royal Society of Chemistry, Antony has pursued a career built on rich experience in experimental techniques, implementation of new nuclear magnetic resonance (NMR) technologies, research and development, and teaching, as well as analytical laboratory management.
His selection as the 2012 winner of the Jim Gray eScience Award acknowledges Antony’s leadership in making chemistry publically available through collective action. ChemSpider provides fast text and structure search access to data and links on more than 28 million chemicals, and this marvelous resource is freely available to the scientific community and the general public. Like the previous five winners of the Jim Gray award, Antony’s contributions to eScience have led to the advancement of science through the use of computing. As I said, I am sure that Jim would be pleased with this year’s choice.
—Tony Hey, Vice President, Microsoft Research Connections
We’re happy to announce that the beta release of the new Try F# has arrived! We’re proud of this new release, and with good reason: Try F# makes programming in F# 3.0 easy to learn, simple to use, and straightforward to share—all through the browser.
F# 3.0 is ideal for analytical, data-rich, and parallel component development, harnessing the power of functional programming.
If you are a researcher who’s been longing to learn the basics or learn about the incredible new type providers in F# 3.0 that deliver information to your fingertips, then Try F# is for you. Likewise, if you’re a teacher who wants to introduce students to the power of this elegant and pragmatic language, then the Try F# browser-based platform with easy-to-use tutorials is also for you.
What’s more, you’ll be treated to a new “learn” experience, complete with sample materials to get you started and a way to give us feedback so that you can tell us what you think about the look and feel and ease of navigation. The Try F# beta even includes new “create and share” experiences that help you write simple code to solve complex problems and then easily share snippets or sample packs with others. And remember, Try F# is in a browser-based environment, so it’s accessible from any operating system.
We would love for you to be part of the Try F# beta, which provides the tutorials, resources, and tools to begin working with F# right away. By participating, you’ll experience F# 3.0’s unique, information-rich programming features for Big Data analytics, and you’ll get the power to solve complex problems more efficiently.
F# communities make it easy to get involved:
Here’s that beta site again. Now get out there, try Try F# and give us your feedback so we can keep improving Try F#.
—Evelyne Viegas, Director of Semantic Computing, Microsoft Research Connections, and Kenji Takeda, Solutions Architect and Technology Manager, Microsoft Research Connections
This week, the annual Microsoft eScience Workshop is being held in Chicago (the “Windy City”), providing an unparalleled opportunity for domain scientists, researchers, and technologists to discuss the benefits and difficulties of incorporating more computing and information technology into the scientific process. Over the years, the eScience workshop has provided a forum where scientists could voice their data and technology challenges and get input from those who’ve confronted similar issues. Front and center this year are topics related to Big Data—be it the management of the rising data flood, the analysis of the data tsunami, or even the visualization of the data explosion. In addition, this year's workshop explores questions about how to train and develop data scientists, and how citizen scientists can play a role in gaining insights from the vast amounts of information.
Many of these topics are examined in the book, The Fourth Paradigm: Data-Intensive Scientific Discovery, which is an excellent resource for these discussions. And, as evidenced in that book, the Big Data “opportunity” has actually been building for some time—but now it has reached the tipping point in terms of awareness across more science domains. The commoditization of devices, sensors, storage, and connectivity—paired with technologies like cloud computing—has made the idea of capturing and maintaining all data in those science domains a plausible reality. As a result, scientists are thinking about what can be done, rather than lamenting what could be done if only they had the research infrastructure. In preparing for this year’s event, I looked back at the very first Microsoft eScience Workshop, held in 2004. I revisited Jim Gray’s keynote and put together this six-slide composite of the main challenges Jim identified back then. As you’ll notice, while some progress has been made, many of those challenges are still being addressed. For instance, global federation has remained a key issue for distributed and disparate databases. Do you move all the data to one location? Or do you ensure that the data owners continue to curate the data and safeguard the quality of the datasets? The approach taken by SkyQuery has really advanced federation, by demonstrating how multiple datasets can be queried seamlessly and by implementing novel approaches, such as the spatial join queries. If you want more details, check out the paper, SkyQuery: A WebService Approach to Federate Databases.
Six-slide composite of the main challenges that Jim Gray identified at the first Microsoft eScience Workshop in 2004
To truly tackle these data challenges, scientific datasets need the following attributes: discoverability, accessibility, and consumability. If a dataset doesn't have all three, it might as well be kept in a file cabinet. There has been much work done lately on discoverability: for example, the emergence of different “data.gov” domain science catalogs—and even commercial ones like the Windows Azure Marketplace. The “Open Data for Open Science” session at this year’s eScience Workshop explores how to address some of these challenges from the science side and looks at how simple, Internet-based protocols, such as OData (the Open Data Protocol), can help ensure that the end-user scientist can use the data. The Monday evening event at the Adler Planetarium showcases how scientific data and information can be communicated to the public, through amazing 3-D tours powered by Microsoft Research WorldWide Telescope (WWT) and brought to life in the planetarium’s Grainger Sky Theater. Microsoft researcher Jonathan Fay, architect of WWT, has been working with the Adler to ensure that tours that were originally developed to be shown in planetarium can be taken home and experienced later. An example of the great work from the Adler is the Welcome to the Universe show and the WWT tour narrated by astronomer Mark SubbaRao. You can play the tour in your browser. You can find more tours powered by WorldWide Telescope at the Layerscape website. Whether you're attending the Microsoft eScience Workshop or just wishing you could, I encourage you to dive into these Big Data challenges.
—Dan Fay, Director, Earth, Energy, and Environment; Microsoft Research Connections