Download Research Tools
Machine learning is the cornerstone of today’s modern data analysis. The gurus of “big data” analytics are all well versed in machine learning, but most domain specialists still must hire data scientists to meet their data-analysis needs. It's inevitable, though, that the data-modeling chain will become largely automated—simplified to the point where off-the-shelf data transformation tools will be as pervasive as those for word processing and spreadsheets. Data analysis will then be like driving a car: the user will focus on the route to the destination, without worrying about how the engine works.
We refer to this vision as the automation of machine learning, or AutoML for short. To help advance towards this grand goal, ChaLearn, an organization that promotes machine-learning challenges, has launched a contest to help democratize machine learning. Built on the new CodaLab platform, the contest offers US$30,000 in prizes donated by Microsoft. More than 60 teams already have entered the contest during the Prep round, and now, until October 15, 2015, you can enter any of five additional rounds: novice, intermediate, advanced, expert, or master. Visit the ChaLearn Automatic Machine Learning Challenge site to see the deadlines for each round. You can enter even if you have not participated in previous rounds.
Five rounds remain in the Automatic Machine Learning Challenge, each round consisting of AutoML and Tweakathon phases.
The contest problems are drawn from a variety of domains. They include challenges in the classification of text, the prediction of customer satisfaction, the recognition of objects in photographs, the recognition of actions in video data, as well as problems involving speech recognition, credit ratings, medical diagnoses, drug effects, and the prediction of protein structures.
Five datasets of progressive difficulty are introduced during each round. The rounds alternate between (1) AutoML phases, during which submitted code is blind tested in limited time on our platform, using datasets you have never seen before; and (2) Tweakathon phases, in which you are given time to improve your methods by tweaking them on those datasets and running them on your own systems, without computational resource limitation and without requirement of code submission.
During the novice round, which runs through April 14, you will encounter only binary classification problems, with no missing values and no categorical variables. All the datasets are formatted as simple data tables—no sparse matrix format, though one dataset does include a lot of zeros. The classes are balanced. The number of features does not exceed 2,000, and the number of examples does not exceed 6,000. The metric of evaluation is simply classification accuracy.
For more details, read our white paper.
Enter the AutoML challenge for a rich learning and research experience, and a chance to win!
—Isabelle Guyon, President, ChaLearn; Evelyne Viegas, Director, Microsoft Research; Rich Caruana, Senior Researcher, Microsoft Research
More than half of the world’s population now lives in cities and suburbs, and as just about any of these billions of people can tell you, urban traffic can be a nightmare. Cars stack up bumper-to-bumper, clogging our highways, jangling our nerves, taxing our patience, polluting our air, and taking a toll on our productivity. In short, traffic jams impair on our emotional, physical, and economic wellbeing.
A study by the Brazilian National Association of Public Transport showed that the country’s traffic exacted an economic toll of about US$7.2 million in 1998. Unfortunately, it’s only getting worse; there are now about three times as many vehicles in Brazil, making traffic exponentially worse, according to Fernando de Oliveira Pessoa, a traffic expert in Belo Horizonte, Brazil’s sixth-largest city.
Microsoft Research has joined forces with the Federal University of Minas Gerais, home to one of Brazil’s foremost computer science programs, to tackle the seemingly intractable problem of traffic jams. The immediate objective of this research is to predict traffic conditions over the next 15 minutes to an hour, so that drivers can be forewarned of likely traffic snarls.
The aptly named Traffic Prediction Project plans to combine all available traffic data—including both historic and current information gleaned from transportation departments, Bing traffic maps, road cameras and sensors, and the social networks of the drivers themselves—to create a solution that gets motorists from point A to point B with minimal stop-and-go. The use of historic data and information from social networks are both unique aspects of the project.
By using algorithms to process all these data, the project team intends to predict traffic jams accurately so that drivers can make smart, real-time choices, like taking an alternative route, using public transit, or maybe even just postponing a trip. The predictions should also be invaluable to traffic planners, especially when they are working to accommodate traffic from special events and when planning for future transportation needs.
Achieving reliable predictions will involve processing terabytes of data, which is why the researchers are using Microsoft Azure as the platform for the service. The exceptional scalability, immense storage capacity, and prodigious computational power of Microsoft Azure makes it the perfect resource for this data-intensive project. And because Microsoft Azure is cloud-based, running the Traffic Prediction service on Azure makes it accessible to all users, in real time, all of the time.
To date, the researchers have tested their prediction model in some of the world’s most traffic-challenged cities: New York, Los Angeles, London, and Chicago. The model achieved a prediction accuracy of 80 percent, and that was based on using only traffic-flow data. The researchers expect the accuracy to increase to 90 percent when traffic incidents and data from social networks are folded in.
So the next time your highway resembles a long, thin parking lot, you might calm yourself by contemplating how Microsoft Azure and the Traffic Prediction Project might help you avoid such tie-ups in the future.
—Juliana Salles, Senior Program Manager, Microsoft Research
Have you ever found yourself waiting for results from your Internet search engine? Oh, sure, search for Kim Kardashian and the results come flying back at warp speed. But queries with vague terms are often automatically reformulated into complex queries that may take significantly longer to provide results.
Achieving a consistently fast response time, regardless of the obscurity of the search term, is a challenging goal, one that requires the combined efforts of experts in engineering systems, operational data, distributed systems, machine learning, and performance optimization. Recently, Microsoft Research joined forces with Pohang University of Science and Technology (POSTECH) in Korea to tackle this challenge, and together, they’ve attained promising results.
Leading the collaboration are Professor Seung-won Hwang from POSTECH and researchers Sameh Elnikety and Yuxiong He from Microsoft Research.
Professor Seung-won Hwang participated in the 2014 Korea Day event at Microsoft Research Asia.
The goal of the collaborative project is to improve Bing search results. Even a few search queries that take too long to process (known as tail queries) can undermine user satisfaction and have a negative impact on revenues. In their research on how to reduce the latency in returning results for tail queries, researchers in the collaborative team must predict whether a query takes a long time to process and needs extra resources, such as selective parallelization, in order to resolve it quickly.
The team received the best paper runner-up award at WSDM 2015 in February. Pictured are Prof. Seung-won Hwang and Saehoon Kim from POSTECH (second and fourth from left), Yuxiong He from Microsoft Research (third from left), and WSDM program committee chairs.
The collaborative team has developed techniques that first identify and then accelerate tail queries, thereby improving server throughput by more than 70% in experimental trials. For example, by using past query logs, the team has developed a predictor that spots tail queries with a high rate of accuracy (98.9%). Those time-consuming queries are then handled by a resource manager that the team has perfected, which allocates additional hardware resources to the troublesome queries. These new techniques have been presented at top-tier conferences, including SIGIR 2014 and WSDM 2015, where the work received the best paper runner-up award.
As shown in this diagram, the predictor identifies time-consuming queries, which then are allocated additional hardware resources by the resource manager.
The search engine project is part of a larger program sponsored by the Korea Government Collaboration Program with the Korean Ministry of Science, ICT, and Future Planning (MSIP). Through this program, some of Professor Hwang’s doctoral students have worked as interns at Microsoft Research; later, during a sabbatical, the professor herself came to Microsoft Research as a visiting scientist.
Professor Hwang praises the program for exposing students to production-scale system problems, and calls it a great opportunity to work with top-notch researchers and to publish in top-tier conferences. The benefits of the program are mutual, as Microsoft researcher Yuxiong He points out. “The complementary knowledge and skill sets of the team members have empowered us to solve important practical problems for Microsoft and the entire IT industry,” she observes.
Sameh Elnikety also highly praised the program: “From my personal experience, this program has a positive impact to all involved: students get excellent training, faculty members work on important practical problems, and researchers collaborate with top faculty members, resulting in useful publications and tech transfers.”
Professor Hwang continues to collaborate with Microsoft Research to improve search results. The team’s next challenge is to optimize the tools to better handle queries generated from mobile devices—queries that often involve searching through geo-tagged datasets. And they’re making headway: Professor Hwang will be demonstrating a geo-tagged query optimizer at the upcoming 2015 Korea Day at Microsoft Research Asia, once again showing the power of academic-industry collaboration.
—Miran Lee, Principal Research Program Manager, Microsoft Research Asia