Download Research Tools
This week, the annual Microsoft eScience Workshop is being held in Chicago (the “Windy City”), providing an unparalleled opportunity for domain scientists, researchers, and technologists to discuss the benefits and difficulties of incorporating more computing and information technology into the scientific process. Over the years, the eScience workshop has provided a forum where scientists could voice their data and technology challenges and get input from those who’ve confronted similar issues. Front and center this year are topics related to Big Data—be it the management of the rising data flood, the analysis of the data tsunami, or even the visualization of the data explosion. In addition, this year's workshop explores questions about how to train and develop data scientists, and how citizen scientists can play a role in gaining insights from the vast amounts of information.
Many of these topics are examined in the book, The Fourth Paradigm: Data-Intensive Scientific Discovery, which is an excellent resource for these discussions. And, as evidenced in that book, the Big Data “opportunity” has actually been building for some time—but now it has reached the tipping point in terms of awareness across more science domains. The commoditization of devices, sensors, storage, and connectivity—paired with technologies like cloud computing—has made the idea of capturing and maintaining all data in those science domains a plausible reality. As a result, scientists are thinking about what can be done, rather than lamenting what could be done if only they had the research infrastructure. In preparing for this year’s event, I looked back at the very first Microsoft eScience Workshop, held in 2004. I revisited Jim Gray’s keynote and put together this six-slide composite of the main challenges Jim identified back then. As you’ll notice, while some progress has been made, many of those challenges are still being addressed. For instance, global federation has remained a key issue for distributed and disparate databases. Do you move all the data to one location? Or do you ensure that the data owners continue to curate the data and safeguard the quality of the datasets? The approach taken by SkyQuery has really advanced federation, by demonstrating how multiple datasets can be queried seamlessly and by implementing novel approaches, such as the spatial join queries. If you want more details, check out the paper, SkyQuery: A WebService Approach to Federate Databases.
Six-slide composite of the main challenges that Jim Gray identified at the first Microsoft eScience Workshop in 2004
To truly tackle these data challenges, scientific datasets need the following attributes: discoverability, accessibility, and consumability. If a dataset doesn't have all three, it might as well be kept in a file cabinet. There has been much work done lately on discoverability: for example, the emergence of different “data.gov” domain science catalogs—and even commercial ones like the Windows Azure Marketplace. The “Open Data for Open Science” session at this year’s eScience Workshop explores how to address some of these challenges from the science side and looks at how simple, Internet-based protocols, such as OData (the Open Data Protocol), can help ensure that the end-user scientist can use the data. The Monday evening event at the Adler Planetarium showcases how scientific data and information can be communicated to the public, through amazing 3-D tours powered by Microsoft Research WorldWide Telescope (WWT) and brought to life in the planetarium’s Grainger Sky Theater. Microsoft researcher Jonathan Fay, architect of WWT, has been working with the Adler to ensure that tours that were originally developed to be shown in planetarium can be taken home and experienced later. An example of the great work from the Adler is the Welcome to the Universe show and the WWT tour narrated by astronomer Mark SubbaRao. You can play the tour in your browser. You can find more tours powered by WorldWide Telescope at the Layerscape website. Whether you're attending the Microsoft eScience Workshop or just wishing you could, I encourage you to dive into these Big Data challenges.
—Dan Fay, Director, Earth, Energy, and Environment; Microsoft Research Connections
As many of you know by now, I am super passionate about how we are going to double the number of women and ethnic minorities in computer science and informatics across the world. As part of my efforts to take on this achievable but daunting task, I have hired two outstanding women (who are pursuing their PhDs) as my interns this summer: Katie Doran and Meagan Rothschild. This month, Katie will tell you about her research and her experience working with me to grow more women and ethnic minorities in computing. You will hear from Meagan in December when we get closer to completing her research findings. Before we hear from Katie, let me tell you a little about her.
Katie Doran is pursuing a PhD in computer science at North Carolina State University with an emphasis on educational technologies and serious gaming. She is particularly interested in exploring how emerging games technologies, such as augmented reality and ubiquitous features, can facilitate novel interactions among players and increase learning potential. Katie is heavily involved in the Broadening Participation in Computing Community and leads multiple science, technology, engineering, and mathematics (STEM) outreach programs. I had the opportunity to meet Katie during the poster session at the CRA-W Grad Cohort event that Microsoft Research sponsors. I am excited to have her working with me on evaluating ChronoZoom as an educational tool. ChronoZoom is a web-based, interactive visualization of Big History, the broadest possible view of the past stretching from 13.7 billion years ago to today. Our vision is to enable innovative ways of teaching Big History and its various components, and empowering interdisciplinary studies.
I’d like to hand this blog over to Katie now to tell you about the exciting projects she’s been working on.
—Rane Johnson-Stempson, Principal Research Director for Education and Scholarly Communication, Microsoft Research Connections
In addition to my work on ChronoZoom, which has included hands-on sessions with more than 60 students, I have taken the lead on multiple outreach initiatives. Twice, I was able to bring student groups to the Microsoft Redmond campus for hands-on demos of TouchDevelop and IllumiShare, panels with successful women from across Microsoft, and tours of the Microsoft Home. The first group was all middle-school girls from Girls Gather for Computer Science, a summer camp focusing on hands-on science, technology, engineering, and mathematics (STEM) activities. Our second group was from the University of Washington’s Math Academy, a program for high-school students from underrepresented groups who are on track to complete the highest level math requirements at their schools before graduation. Both groups of students were phenomenal and left campus with an entirely different perspective on what it would be like to have a career as a computer scientist—especially here at Microsoft. Watching the students’ reactions—as they heard about the breadth of work being done by Microsoft employees here in Building 99, across campus, and around the world—was very encouraging. At the end of both sessions, I went home knowing that each of those students had been exposed to opportunities they never even knew existed.
My third outreach event of the summer was attending STARS Celebration 2012 in Hampton, Virginia. STARS, which stands for Students and Technology in Academia, Research, and Service, is an National Science Foundation-funded Broadening Participation in Computing project that focuses on professional development for university students in STEM fields as well as outreach with elementary and high-school students to build and reinforce interest in studying STEM topics. This event was particularly fun for me, because I have been an active member of STARS since 2008. At STARS Celebration, I was able to present on my own work—STEM outreach in Haiti, evaluating outreach, and outreach with game design and development—and the significant work being done by Microsoft Research to promote an interest in computer science! I presented two sessions on Microsoft tools for outreach use and both were standing room only. Everyone in attendance was impressed with the number of free tools that Microsoft makes available for outreach activities, such as TouchDevelop, Kodu, Pex for fun, and Microsoft .NET Gadgeteer. As a NASA Fellow, the highlight for me was getting to show off the incredibly adorable Mars Rover additions to Kodu. Based on the response I received, I expect large numbers of game designers and astronauts in about 10 years!
My research and outreach work with Microsoft Research this summer has led me to the biggest annual event for women in computing—the Grace Hopper Celebration 2012 (GHC) in Baltimore, Maryland. I’ve spent the past few weeks working with Rane to organize Microsoft’s presence at GHC. It’s been a big undertaking because Microsoft has an incredible 165 people registered, including six executives and six senior women! It is inspiring to see Microsoft employees taking such an interest in growing the number of women in computer science. With the energy I put towards this effort, it is thrilling to know that the girls I help inspire can apply to a company that is eager to hire, retain, and support exceptional women after they complete their degrees.
In addition to being overwhelmed with the amazing presence that Microsoft has here, I’ve been busy supporting Anita’s Quilt, a blog from the Anita Borg Institute that allows remarkable women in technical fields to motivate and empower one another through their stories. I’ve been handing out stickers and sharing the story of Anita’s Quilt since I arrived on Tuesday, but if we haven’t met yet, keep your eyes open for me—I’d love to give you a sticker and fill you in. I could also tell you about the wonderful young women I look forward to meeting at the NCWIT Aspirations in Computing Award Winners Reception tomorrow. They are an impressive group of brilliant, enthusiastic high-school girls who are going to go on to be the next leaders in computer science. You can find me, my mentor Rane, and a group of other talented, passionate Microsoft women volunteering at the Microsoft booth. Stop by booth #1315 to say hello, get information on internship and career opportunities, and to develop your own Windows Phone application! If you don’t have time to say hello, or you didn’t make it to GHC—you can find out about many of our initiatives at our Women in Computing website. I hope you’re all having as fantastic and inspiring an experience here at Grace Hopper as I am! —Katie Doran, Intern, Microsoft Research Connections
The long tail: sure, it’s a well-known concept in business and marketing, but there’s a very important “hidden” long tail in the sciences, too. So, what is this hidden long tail of science? It consists of the millions of datasets that are not stored in a databank and therefore are not available for use by other scientists. Every day, researchers throughout the world are observing, calculating, and compiling data, recording it all on their local machines within their labs—often not even as a shared resource to their institutions. Regrettably, much of this data never gets deposited in larger web-accessible data repositories where it could be reused by other investigators around the globe.
As a researcher myself and working with other researchers from around the globe, I am acutely aware of scientific data pain points; after all, those of us in the research community understand better than anyone that data preservation, curation, and sharing are critical for the advancement of scientific discovery. We want to share our data beyond our immediate groups, but many times we find ourselves hindered by a lack of tools and services designed to promote data curation and sharing.
Enter DataUp, an open-source tool that helps us document, manage, and archive our tabular data. The DataUp project was born out of this need for seamless integration of data management into the researchers’ current workflows. The University of California Curation Center (UC3) at the California Digital Library (CDL), with sponsorship from Microsoft Research and the Gordon and Betty Moore Foundation (GBMF), focused on creating a tool that could be used by researchers in the environmental sciences. They recognized that this field epitomizes the problems of data management and curation; in particular, the storage of data locally without data description (metadata)—such as where it was collected, by whom, and when—that would make it more usable by others.
By conducting surveys at ecological and environmental science events, CDL found that the majority of these scientists use spreadsheets to collect and organize their data, so rather than make them learn a new program, UC3 recognized a need for a tool that works with a program most scientists already know: Microsoft Excel.
From the results of further surveys, it was determined that about half of the scientists preferred a tool that would be installed on their laptop, while the other half wanted a web-based tool that they could use on any device. Well, we sponsors and the UC3 team were not about to let this divided preference thwart the creation of a much-needed tool, so, together, we decided that there needed to be two versions of the tool: an open-source add-in (extension) for Microsoft Excel, and an open-source web application.
To achieve the project goals of facilitating data management, sharing, and archiving, both the add-in and the web application accomplish four main tasks:
The California Digital Library established the initial repository, the ONEShare. Researchers will be able to find tools from the DataUp project as part of the Investigator Toolkit for DataONE.
I want to thank Carly Strasser, Trisha Cruse, John Kunze, and Stephen Abrams from UC3 for their passion and commitment to bring DataUp to life. I also want to thank Chris Mentzel from GBMF for co-funding the project with Microsoft Research Connections.
Now, get out there and DataUp!
—Kristin Tolle, Director, Microsoft Research Connections