Download Research Tools
While we live and breathe data science year-round at Microsoft Research, this summer, we offered a broad range of data science education opportunities for young researchers. Participation in these events was extremely rewarding—for both the students and the organizers.
Students and advisors at the National Water Center for the Summer Institute
DS3 announcement leads the way
The educational blitz really began in the spring, when we announced openings for the upcoming Data Science Summer School (DS3), an eight-week course (June 15 to August 7) taking place at the Microsoft Research New York City Laboratory. Limited to just eight top-level undergrads, DS3 provided hands-on training and a deep level of understanding of data science. The students not only learned how to acquire, clean, and utilize the “messy” real-world data that is the raw material of today’s research, they were also introduced to problems in applied statistics and machine learning.
National Flood Interoperability Experiment Summer Institute 2015While DS3 may have been the first program announced, the first student summer event to actually get under way was a Summer Institute in association with the National Flood Interoperability Experiment on June 1, in Tuscaloosa, Alabama. Fifty students from around the world spent seven weeks learning about and analyzing US hydrology data collated together for the first time ever. In the run-up to the event, my Microsoft colleagues and I helped with the data architecture and cyberinfrastructure on Microsoft Azure. During the event, we trained and continued to mentor the students on how to leverage the cloud and Azure ML for their research projects.
The “flood institute” culminated at the 3rd CUASHI Conference on Hydroinformatics at the University of Alabama, where students presented their group projects during talks and poster sessions. Several participants went on to submit papers and show their outcomes at other events, including the NSF Data Science Workshop, hosted by the University of Washington and New York University from August 5 to 7. This Seattle-based event invited students to submit a white paper on data science research. The students who submitted the top 100 papers were invited to participate in the event and present posters on their research. Several Microsoft employees—including two from Microsoft Research—participated in the event as panelists, speakers, and mentors. What struck me the most was that all of the posters at the NSF Data Science Workshop relied on multidisciplinary collaborations to drive research projects.
I participated in the National Flood Interoperability Experiment and the [NSF] Data Science workshop. I gained numerous new acquaintances, some of whom I now consider pals, and two projects currently underway that will lead on to publication… [I was impressed by the] diversity of ideas and curiosity to look outside my own little world of research problems.
—Solomon Vimal, visiting scholar, University of North Carolina at Chapel Hill
Heidelberg Laureate Forum
The last data science outreach happened at the Heidelberg Laureate Forum, where 200 young researchers came to the Heidelberg Institute of Technology and Science to interact directly with Abel Prize, Fields Medal, Nevanlinna, and Turing Award laureates. It was a once-in-a-lifetime opportunity for these students and new faculty to have direct access to the minds that have shaped computing and mathematics for our generation. It was an honor to present the outcomes of the National Flood Interoperability Experiment to the assembly, which included Turing Award winners from Microsoft Research—Butler Lampson, Leslie Lamport, and Tony Hoare—as well as Jennifer Chayes and Christian Borgs from the Microsoft Research New York and New England Laboratories.
The most exciting component of the Heidelberg Laureate Forum was the gathering of luminaries who have achieved the highest award in their respective fields. Many of these luminaries gave talks at the HLF that were full of insight for young researchers like myself, and all were enthusiastically involved in interacting with us and answering our questions.
—Mayan Kejriwal, PhD student, University of Texas at Austin
A commitment to growing the next generation
Kris Tolle presenting at theHeidleberg Laureate Forum
Whether they include 8 or 200 young researchers, these events have the potential to shape the future of data science. Interacting with these young researchers and guiding them toward future success is one of the most rewarding aspects of my job. My advice to these young minds was to do something that really matters and don’t leave the science out of data science.
And while the 2015 summer of data science is behind us, we are jumping into autumn with equal vigor. Stay tuned for announcements on the Data Science webpage.
—Kristin Tolle, Director, Data Science Initiative, Microsoft Research
Web browsing is one of the core applications on smartphones. After all, who hasn’t checked Facebook or watched the latest news—or amusing cat videos—on their mobile phone? However, mobile browsers on smartphones are primarily optimized for performance, not energy efficiency, so web browsing—especially the loading of web pages—tends to drain batteries and frustrate users.
Recognizing this problem, Yunxin Liu, a researcher at Microsoft Research Asia, and a team from the Korea Advanced Institute of Science and Technology (KAIST) have collaborated to reduce the energy needed to load web pages without increasing page load time or compromising the user experience. In a recent research paper, they present three techniques to reduce the energy consumption of web page loading on smartphones. Two of these, network-aware resource processing and adaptive content painting, address energy inefficiencies in smartphones’ content processing and graphic processing pipelines. The third, application assisted scheduling, takes advantage of ARM’s big.LITTLE architecture to save energy.
The researchers have implemented the proposed techniques on Chromium and Firefox mobile browsers and have conducted comprehensive evaluations using real-world websites and the latest-generation smartphones. Experimental results and user studies indicate that the techniques significantly reduce the energy cost of web page loading while introducing only barely perceivable increases in page load time. When tested for browsing with Chromium on a latest-generation big.LITTLE smartphone, the techniques achieved a 24.4% average system energy saving using Wi-Fi and a 22.5% saving when using 3G, with no discernable impact on average page load time.
The collaboration between Liu and the team at KAIST resulted from one of those fortuitous encounters that happen at scientific conferences. During the annual International Conference on Mobile Systems, Applications, and Services, Liu struck up an acquaintanceship with Duc Hoang Bui, a Vietnamese PhD student from KAIST. They had a good conversation, which resulted in Bui becoming an intern at Microsoft Research Asia and joining Liu’s project.
After the first period of research, Bui returned to KAIST to continue his doctoral studies, under the supervision of Prof. Insik Shin. Liu and Shin knew one another already, and, now, with Bui as the link, they readily saw the advantages of working together on the second stage of the research. Focusing on their strengths, Shin’s team contributed largely to the big.LITTLE technique, while Liu focused on the energy-saving work.
“Prof. Shin was very supportive during the research. We had a very nice cooperation together,” said Liu.
The research paper was presented at MobiCom 2015. One of the top international conferences on mobile computing and networking, MobiCom is an annual event sponsored ACM SIGMOBILE (the Association for Computing Machinery Special Interest Group on Mobility of Systems, Users, Data, and Computing).
“I’m really flattered to publish the paper at this top conference,” said Liu. “It’s big news for our project and the whole research team.” The researchers now plan to apply their application, which is still a prototype, in additional browsers. Their ultimate goal, of course, is to get it into real-world use, where it just might save your battery long enough for one final download of cat videos for the day.
—Miran Lee, Principal Research Program Manager, Microsoft Research
Many people living in the People’s Republic of China and Hong Kong have a new habit: they check the air pollution index before venturing outside. Air quality has deteriorated rapidly in China, with nitrogen dioxide and particulate matter levels frequently exceeding safety guidelines set by the World Health Organization.
While poor air quality clearly impacts public health, many cities have a dearth of air-quality monitoring stations: there are just 35 in Beijing and 15 in Hong Kong, for example. This lack of monitoring stations hinders evidence-based decision-making and leads to harsh criticisms of the transparency and public relevance of China’s official air pollution index.
Researchers proposed that they could estimate air quality at locations not covered by monitoring stations by analyzing the relationship between urban dynamics data.
Since air pollution is highly location-dependent, a citywide air-quality monitoring system would require building many additional monitoring stations, a solution that is prohibitively expensive. And so there is a grim reality: people may diligently check the air pollution index, but they cannot really know the air quality in their specific locale.
“Do we have sufficient data to produce reliable air-quality metrics by using urban computing?” Professor Victor Li, chairman of information engineering and head of the department of Electrical and Electronic Engineering at the University of Hong Kong, posed this question to his research team. As luck would have it, while Li was challenging his researchers in Hong Kong, Microsoft Research Asia was launching a Call for Research Proposals dealing with urban computing. (Urban computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in urban spaces to address issues that major cities face, such as air pollution, excessive energy consumption, and traffic congestion.) Li promptly submitted a proposal, which came to the attention of Microsoft researcher Yu Zheng, who has conducted extensive research on urban computing and was also looking into the issues surrounding urban air quality.
As Li and his researchers were detailing the correlations between air pollution and various urban dynamics—noting, for instance, that air quality and temperature are spatially correlated—Yu and his team had concluded that a variety of urban big data could be used compensate for the lack of monitoring stations. Li and Yu decided to collaborate and drive more in-depth research.
The joint team proposed that by analyzing the relationship between urban dynamics data, such as vehicular traffic and measured air quality, they could estimate air quality at locations not covered by monitoring stations. “We can infer real-time, fine-grained air quality information throughout a city based on the historical and real-time air quality data reported by existing monitoring stations, combined with a variety of data sources we observe in the city,” predicted Yu.
However, the researchers now grappled with the challenge of processing the massive volume of human dynamics data. A challenge initially imposed by a lack of data had become a problem of having too much data—a 180-degree swing from one extreme to the other!
But a solution to the problem of crunching the big data soon arrived, when Li’s project received a Microsoft Azure for Research Award. As Julie Zhu, a doctoral student on the project team noted, “The program arrived at exactly the right time. We were just looking into building multi-node clusters.”
Zhu and her colleagues attended Microsoft Azure training and quickly set up the new computing environment. Citing the enormous amounts of data provided by just one city, Shenzhen, Zhu described the value of Azure. “We need to collect and process about 1 terabyte per month of urban data on air quality, meteorological data, and traffic information, and so on,” she said. “What’s great about Microsoft Azure is that it goes way beyond the data storage. It integrates all the functionalities we need for data crawling, indexing, training, and visualization. Microsoft Azure truly enables us to do the real-time and scalable data processing.”
Professor Li presenting the project at the 2014 Microsoft Research Asia Faculty Summit.
Within months, the team had arrived at their initial results for Shenzhen. When Li and Yu presented their work at the 2014 Microsoft Research Asia Faculty Summit in Beijing, their novel approach generated great excitement, not only for how it processed the massive volumes of data, but also for finding the data sources in the first place. As Li commented, “All the data we used are from public channels. There are tons of data out there.” Li and Yu are now working together to build a predictive model, which would address the initial dilemma of the paucity of air-quality monitoring stations.
Yu mused on how the challenge had morphed from insufficient data into a surplus of data. “We now know that the lack of data and big data do not conflict with each other. They co-exist in the same problem. What’s important is to identify the dataset from various channels that are key contributors,” Yu explained. To which Zhu, now a Microsoft Research intern, added, “You just need a good platform and tools to handle them.”
—Winnie Cui, Senior Research Program Manager, Microsoft Research