Download Research Tools
This week, the annual Microsoft eScience Workshop is being held in Chicago (the “Windy City”), providing an unparalleled opportunity for domain scientists, researchers, and technologists to discuss the benefits and difficulties of incorporating more computing and information technology into the scientific process. Over the years, the eScience workshop has provided a forum where scientists could voice their data and technology challenges and get input from those who’ve confronted similar issues. Front and center this year are topics related to Big Data—be it the management of the rising data flood, the analysis of the data tsunami, or even the visualization of the data explosion. In addition, this year's workshop explores questions about how to train and develop data scientists, and how citizen scientists can play a role in gaining insights from the vast amounts of information.
Many of these topics are examined in the book, The Fourth Paradigm: Data-Intensive Scientific Discovery, which is an excellent resource for these discussions. And, as evidenced in that book, the Big Data “opportunity” has actually been building for some time—but now it has reached the tipping point in terms of awareness across more science domains. The commoditization of devices, sensors, storage, and connectivity—paired with technologies like cloud computing—has made the idea of capturing and maintaining all data in those science domains a plausible reality. As a result, scientists are thinking about what can be done, rather than lamenting what could be done if only they had the research infrastructure. In preparing for this year’s event, I looked back at the very first Microsoft eScience Workshop, held in 2004. I revisited Jim Gray’s keynote and put together this six-slide composite of the main challenges Jim identified back then. As you’ll notice, while some progress has been made, many of those challenges are still being addressed. For instance, global federation has remained a key issue for distributed and disparate databases. Do you move all the data to one location? Or do you ensure that the data owners continue to curate the data and safeguard the quality of the datasets? The approach taken by SkyQuery has really advanced federation, by demonstrating how multiple datasets can be queried seamlessly and by implementing novel approaches, such as the spatial join queries. If you want more details, check out the paper, SkyQuery: A WebService Approach to Federate Databases.
Six-slide composite of the main challenges that Jim Gray identified at the first Microsoft eScience Workshop in 2004
To truly tackle these data challenges, scientific datasets need the following attributes: discoverability, accessibility, and consumability. If a dataset doesn't have all three, it might as well be kept in a file cabinet. There has been much work done lately on discoverability: for example, the emergence of different “data.gov” domain science catalogs—and even commercial ones like the Windows Azure Marketplace. The “Open Data for Open Science” session at this year’s eScience Workshop explores how to address some of these challenges from the science side and looks at how simple, Internet-based protocols, such as OData (the Open Data Protocol), can help ensure that the end-user scientist can use the data. The Monday evening event at the Adler Planetarium showcases how scientific data and information can be communicated to the public, through amazing 3-D tours powered by Microsoft Research WorldWide Telescope (WWT) and brought to life in the planetarium’s Grainger Sky Theater. Microsoft researcher Jonathan Fay, architect of WWT, has been working with the Adler to ensure that tours that were originally developed to be shown in planetarium can be taken home and experienced later. An example of the great work from the Adler is the Welcome to the Universe show and the WWT tour narrated by astronomer Mark SubbaRao. You can play the tour in your browser. You can find more tours powered by WorldWide Telescope at the Layerscape website. Whether you're attending the Microsoft eScience Workshop or just wishing you could, I encourage you to dive into these Big Data challenges.
—Dan Fay, Director, Earth, Energy, and Environment; Microsoft Research Connections
The long tail: sure, it’s a well-known concept in business and marketing, but there’s a very important “hidden” long tail in the sciences, too. So, what is this hidden long tail of science? It consists of the millions of datasets that are not stored in a databank and therefore are not available for use by other scientists. Every day, researchers throughout the world are observing, calculating, and compiling data, recording it all on their local machines within their labs—often not even as a shared resource to their institutions. Regrettably, much of this data never gets deposited in larger web-accessible data repositories where it could be reused by other investigators around the globe.
As a researcher myself and working with other researchers from around the globe, I am acutely aware of scientific data pain points; after all, those of us in the research community understand better than anyone that data preservation, curation, and sharing are critical for the advancement of scientific discovery. We want to share our data beyond our immediate groups, but many times we find ourselves hindered by a lack of tools and services designed to promote data curation and sharing.
Enter DataUp, an open-source tool that helps us document, manage, and archive our tabular data. The DataUp project was born out of this need for seamless integration of data management into the researchers’ current workflows. The University of California Curation Center (UC3) at the California Digital Library (CDL), with sponsorship from Microsoft Research and the Gordon and Betty Moore Foundation (GBMF), focused on creating a tool that could be used by researchers in the environmental sciences. They recognized that this field epitomizes the problems of data management and curation; in particular, the storage of data locally without data description (metadata)—such as where it was collected, by whom, and when—that would make it more usable by others.
By conducting surveys at ecological and environmental science events, CDL found that the majority of these scientists use spreadsheets to collect and organize their data, so rather than make them learn a new program, UC3 recognized a need for a tool that works with a program most scientists already know: Microsoft Excel.
From the results of further surveys, it was determined that about half of the scientists preferred a tool that would be installed on their laptop, while the other half wanted a web-based tool that they could use on any device. Well, we sponsors and the UC3 team were not about to let this divided preference thwart the creation of a much-needed tool, so, together, we decided that there needed to be two versions of the tool: an open-source add-in (extension) for Microsoft Excel, and an open-source web application.
To achieve the project goals of facilitating data management, sharing, and archiving, both the add-in and the web application accomplish four main tasks:
The California Digital Library established the initial repository, the ONEShare. Researchers will be able to find tools from the DataUp project as part of the Investigator Toolkit for DataONE.
I want to thank Carly Strasser, Trisha Cruse, John Kunze, and Stephen Abrams from UC3 for their passion and commitment to bring DataUp to life. I also want to thank Chris Mentzel from GBMF for co-funding the project with Microsoft Research Connections.
Now, get out there and DataUp!
—Kristin Tolle, Director, Microsoft Research Connections
Are you a student looking to win a little extra spending money? Or maybe just get some props for your coding chops? If so, you’ll want to enter your Windows Phone or Windows 8 app in the Project Hawaii Mobile Code Jam Challenge. But you’d better act quickly—you’ll need to register your project by October 30.
The Code Jam is being featured as an integral part of the upcoming IEEE Consumer Communications & Networking Conference (CCNC 2013), where three winners will be selected. The first-place winner will receive US$1,500; the second-place winner, $1,000; and the third-place winner, $700. Not bad, especially since you’ll get recognized in front of your peers at CCNC. And you can win some money to blow in Vegas.
Your project must be an app that runs on Windows Phone (version 7.5) or Windows 8, and it must use one or more of the Project Hawaii services. Oh, and it has to be available for use, free of charge, in academic and research settings. Visit the Mobile Code Jam site for full contest details.
So, you ask, what are the Project Hawaii services? Well, with Project Hawaii, you can develop cloud-enhanced Windows Phone apps that access a set of cloud services, which includes Social Mobile Sharing Service (SMASH), Path Prediction, Key Value, Translator, Optical Character Recognition, Speech to Text, Relay, and Rendezvous. Learn more.
While prizes and recognition are certainly nice, the main goal of the contest is to encourage researchers and, especially, students to advance the field of mobile apps and services. You can dream up any scenario you want: maybe an app that solves a societal problem, or one that uses mobile technology to help the elderly or infirm. Or maybe something to beat the odds at pai gow. You’re bound only by your creativity and imagination.
As noted above, you’ll need to register your project by October 30. The other key date is December 14, which is the deadline for submitting your overview paper describing your entry. You’re encouraged to prepare as much documentation as possible, including examples of how the app might be used and screenshots or other displays showing the software in action. Entries will be peer-reviewed and finalists will be invited to demonstrate their software to a panel of judges during the conference program.
Remember, if you want to kick out the jams at IEEE CCNC, you’ll need to register your project by October 30. If the trick-or-treaters show up and you’re still pondering your entry, you’re out of luck, so get jammin’.
—Arjmand Samuel, Senior Research Program Manager, Microsoft Research Connections