Download Research Tools
The long tail: sure, it’s a well-known concept in business and marketing, but there’s a very important “hidden” long tail in the sciences, too. So, what is this hidden long tail of science? It consists of the millions of datasets that are not stored in a databank and therefore are not available for use by other scientists. Every day, researchers throughout the world are observing, calculating, and compiling data, recording it all on their local machines within their labs—often not even as a shared resource to their institutions. Regrettably, much of this data never gets deposited in larger web-accessible data repositories where it could be reused by other investigators around the globe.
As a researcher myself and working with other researchers from around the globe, I am acutely aware of scientific data pain points; after all, those of us in the research community understand better than anyone that data preservation, curation, and sharing are critical for the advancement of scientific discovery. We want to share our data beyond our immediate groups, but many times we find ourselves hindered by a lack of tools and services designed to promote data curation and sharing.
Enter DataUp, an open-source tool that helps us document, manage, and archive our tabular data. The DataUp project was born out of this need for seamless integration of data management into the researchers’ current workflows. The University of California Curation Center (UC3) at the California Digital Library (CDL), with sponsorship from Microsoft Research and the Gordon and Betty Moore Foundation (GBMF), focused on creating a tool that could be used by researchers in the environmental sciences. They recognized that this field epitomizes the problems of data management and curation; in particular, the storage of data locally without data description (metadata)—such as where it was collected, by whom, and when—that would make it more usable by others.
By conducting surveys at ecological and environmental science events, CDL found that the majority of these scientists use spreadsheets to collect and organize their data, so rather than make them learn a new program, UC3 recognized a need for a tool that works with a program most scientists already know: Microsoft Excel.
From the results of further surveys, it was determined that about half of the scientists preferred a tool that would be installed on their laptop, while the other half wanted a web-based tool that they could use on any device. Well, we sponsors and the UC3 team were not about to let this divided preference thwart the creation of a much-needed tool, so, together, we decided that there needed to be two versions of the tool: an open-source add-in (extension) for Microsoft Excel, and an open-source web application.
To achieve the project goals of facilitating data management, sharing, and archiving, both the add-in and the web application accomplish four main tasks:
The California Digital Library established the initial repository, the ONEShare. Researchers will be able to find tools from the DataUp project as part of the Investigator Toolkit for DataONE.
I want to thank Carly Strasser, Trisha Cruse, John Kunze, and Stephen Abrams from UC3 for their passion and commitment to bring DataUp to life. I also want to thank Chris Mentzel from GBMF for co-funding the project with Microsoft Research Connections.
Now, get out there and DataUp!
—Kristin Tolle, Director, Microsoft Research Connections
Big data—that buzzword seems to dominate information technology discussions these days. But big data is so much more than a clever catchphrase: it’s a reality that holds enormous potential. We now have the largest and most diversified volume of data in human history. And it’s growing exponentially: approximately 90% of today’s data has been generated within the past two years. The exploding science of big data is changing the IT industry and exerting a powerful impact on everyday life.
But what should big data science be, and where is it headed? These are the fundamental questions that have prompted Tsinghua University and Microsoft Research Asia to work together to establish a pioneering graduate course on Big Data Foundations and Applications. Turing Award winner and Tsinghua professor Andrew Chi-Chih Yao spent more than eight months developing the course, which launched in September 2014.
Turing Award winner and Tsinghua professor Andrew Chi-Chih Yao
Solidifying knowledge through academia-industry cooperation
On October 9, Hsiao-Wuen Hon, managing director of Microsoft Research Asia, delivered the course’s first lecture. Dr. Hon stressed that the importance of big data lies not only in its value in academic research but also in its application to real-world problems, which, he said, is why the academia-industry cooperation represented by the course is so critical.
“One of our purposes in launching this course with Tsinghua is to introduce Microsoft’s ideas to students, to let them get to know us better,” he explained. “Meanwhile, our top professional researchers can deepen their understanding of big data while teaching the students. So the course is not just about enhancing the students’ understanding of big data; it’s also about solidifying the researchers’ knowledge of big data.”
Hsiao-Wuen Hon, managing director of Microsoft Research Asia, delivered the course's first lecture.
Echoing the importance of the industry-academia connection, Professor Yao remarked, “Big data is an epoch-making subject. It has influenced all the other disciplines, including computer science and information technology. We should not only focus on the scientific research. Education development is also a new trend. ”
Leading the forefront of big data science
Wei Chen, a senior researcher at Microsoft Research Asia, has been a visiting professor at Tsinghua University since 2007. He has helped design and launch several entry-level courses at Tsinghua, and he is a strong proponent of the new big data course.
“We certainly don’t expect this course to become a platform for its product promotion. Instead, it is being established to provide students with cutting-edge knowledge, to get them engaged in research and technology development, and to foster their ability to do research and experimentation,” he said.
Wei Chen, senior researcher at Microsoft Research Asia, talks with student.
Professor Chen pointed out that while the course will provide an academic understanding of big data, it will also introduce students to real-life cases of Microsoft big data research and applications. In addition, students will have the opportunity to conduct experiments using Microsoft Azure, the company’s cloud-computing platform. He believes these practical, hands-on components distinguish this class from other big data courses.
Feeding the talent pipeline
Microsoft Research has a long tradition of collaborating with universities and has undertaken several initiatives to nurture the next generation of talented researchers. Since 2002, for instance, Microsoft Research Asia has hosted over 4,000 interns and carried out projects with more than 40 universities and institutes. The new big data course comes directly out of that tradition, and both Microsoft Research and Tsinghua University have high expectations for this collaboration. Professor Yao probably put it best, saying, “I believe this world-class course will give students a comprehensive understanding of big data and its knowledge structure, helping them reach their goals in future jobs and research.”
—Kangping Liu, Senior Research Program Manager, Microsoft Research Asia
I enjoy being able to work in technology because it has the potential for great impact in a range of research areas. But, more specifically, I have the privilege to work with the São Paulo Research Foundation (FAPESP) in Brazil to achieve that goal. Together, Microsoft Research and FAPESP created a joint research center in 2006—the Microsoft Research-FAPESP Institute for IT Research—that has been going strong ever since.
The Microsoft Research-FAPESP Institute for IT Research supports high-quality fundamental research in information and communication technologies. In the beginning, the center focused on addressing social and economic development needs in the São Paulo region. However, in recent years, the research has focused primarily on learning about the environment by using advanced technologies.
Fueling the Future: FAPESP
I am very pleased to announce that the Microsoft Research-FAPESP Institute for IT Research recently concluded its sixth request for proposal (RFP) cycle by selecting two winning proposals in the area of environmental sciences.
The objective of the sixth RFP was to explore the application of computing science to the challenges of basic research in areas related to global climate change and the environmental sciences. Researchers worldwide acknowledge that technology can provide powerful tools for environmental research, benefiting society and aiding in the planet’s sustainability. As such, this request for proposals focused on turning data into knowledge.Respondents had the option of developing their own technology application or using an existing technology that was developed by Microsoft Research (for example, research accelerators that can be used to manage, publish, analyze, or visualize a research project, or new tools for science).
The two winning proposals from the submissions received are:
Towards an Understanding of Tipping Points within Tropical South American Biomes, submitted by Principal Investigator Ricardo Torres of the Universidade Estadual de Campinas (UNICAMP)
Combining New Technologies to Monitor Phenology from Leaves to Ecosystems, submitted by Principal Investigator Patrícia Morellato of the Universidade Estadual Paulista (UNESP)
Notably, Morellato’s project is a follow-up to the recently concluded e-Phenology project, which was selected under the fourth Microsoft Research-FAPESP RFP. The research team also submitted a proposal to the Computational Ecology and Environmental Science (CEES) team and was awarded with a set of Mataki sensors to monitor small animals in the location where they are already monitoring the vegetation. The goal of that project is to explore correlations between the vegetation and small animals’ behavior and to gather a more complete understanding of how the ecosystem functions.
Congratulations to both the UNICAMP and UNESP teams on their winning submissions! I look forward to seeing the results of their research.
—Juliana Salles, Senior Research Project Manager, Microsoft Research Connections