Download Research Tools
In keeping with our mission to collaborate with top academic and scientific researchers to foster innovations in scientific inquiry, Microsoft Research Connections was proud to sponsor the 2013 KDD Cup, arguably the world’s best-known competition in data mining. The winning teams were announced at KDD 2013, the 19th annual conference of ACM SIGKDD (the Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining) which took place in Chicago in August. KDD is the premier event for researchers grappling with today’s data deluge, as it’s the only conference spanning big data, data mining, data science, and analytics and all the related algorithms, foundations, applications, and practices.
2013 KDD Cup challenge winners, Team Algorithm, from National Taiwan University
The 2013 KDD Cup challenge focused on the ability to search literature and to collect metrics around publications—a capability that is essential to modern research, as academic and industry researchers increasingly rely on search to discover what has been published and by whom. The competition made use of a data set of 250,000 authors and 2.5 million published papers. The dataset was broken up into a distinct labeled training set, a validation set for the leaderboard, and a test set. The competitors faced two tasks: first, a prediction task to determine whether an author had written a paper, and second, a name disambiguation task to identify duplicate author names in a dataset with name variants.
These tasks go to the heart of one of the main challenges of information extraction and curation in any people-centric dataset: resolving people-name ambiguity. In the scholarly publishing world, many authors publish under several variations of their own name, and to add to the complexity of discovery, different authors might share a similar or even the same name. As a result, the profile of an author with an ambiguous name tends to contain noise, resulting in papers that are incorrectly assigned to him or her. The KDD Cup task challenged participants to determine which papers in an author profile were truly written by a given author. Read the full parameters of the challenge.
The competition was fierce, with more than 800 teams from more than 40 different countries developing approximately 12,000 data-mining models over the course of a few months. The winning solution, created by Professor Chih-Jen Lin and Team Algorithm from National Taiwan University, was the product of outstanding teamwork: eighteen students and three teaching assistants actually designed a graduate course around the competition. Other winners included teams from University of Illinois at Urbana-Champaign, Moscow State University, and FICO. Winners presented their solutions at a KDD Cup workshop and poster session at the conference. Moreover, solutions created for the competition resulted in 10 research papers that are available through the KDD Cup 2013 Workshop proceedings.
KDD Cup poster session participants at KDD 2013
On behalf of Microsoft Research Connections, I would like to thank the key collaborators who helped make this competition a success. The Microsoft Research Connections proposal for the KDD Cup challenge was selected after careful deliberation by 2013 KDD Cup chairpersons Claudia Perlich and Brian Dalessandro of Media6°. Partnering with me in designing the contest rules and evaluation criteria were Professors Martine DeCock of Ghent University and Senjuti Basu Roy of the University of Washington Tacoma, along with Ben Hamner and Will Cukierski of Kaggle. Swapna Savvana and Yitao Li from the University of Washington Tacoma helped with the logistics of the contest execution.
So congrats to the KDD Cup winners, and kudos to everyone who accepted the challenge. The many outstanding solutions showed great creativity, which is exactly what we’ll need as we move forward in this new world of data-intensive scientific inquiry.
—Vani Mandava, Senior Program Manager, Microsoft Research Connections
As part of our Windows Azure for Research program, announced on September 9, Microsoft Research is facilitating cloud training classes designed to show researchers how Windows Azure can accelerate their research.
As the global training coordinator for this program, I’m pleased to announce the first of these worldwide classes has been scheduled for September 16–17 at the University of Washington, in Seattle, co-hosted with the university’s eScience Institute. This will be followed by courses in October in Campinas, Brazil, and Beijing, China, with subsequent events scheduled across the globe. We will modify the full schedule as courses are added, so keep checking for updates!
Windows Azure is an open and flexible global cloud platform supporting any language, tool, or framework, and is ideally suited to the needs of researchers across disciplines. After attending our intensive technical course, researchers should feel confident in applying cloud computing in their current and future investigations.
This two-day course is offered free of charge, presented by trainers who specialize in Windows Azure for research. Attendees will be able to access Windows Azure on their own laptops during the training and, for evaluation purposes, for up to six months after the event. The attendee’s laptop does not need to have the Windows operating system installed, as Windows Azure is accessed via your Internet browser.
The course is intended specifically for active scientists who are interested in coding in a modern computing context, as well as for computer scientists who are working with such researchers. This is a hands-on class, so some ability to program in a modern language is useful, but the course is suitable for researchers using any language, framework, or platform. This includes Linux, Python, R, MATLAB, Java, Hadoop, STORM, SPARK, and all appropriate Microsoft technologies, such as C#, F#, .NET, Windows Azure SQL Database, and various Windows Azure services. Some basic exposure to cloud computing is helpful, but no real expertise or usage experience is required; the focus of the class is to teach you this.
The training outcomes of the course include:
Review the full course description (PDF 561 KB), which includes the schedule, intended audience, prerequisites, and learning objectives.
If you would like to attend one of these courses, please follow the instructions on Windows Azure for Research Training. You will be sent a registration link if space is available in the session. Spaces are limited, so potential attendees are encouraged to register early.
If you can’t find a course in a location near you, we will consider suggestions—you can find instructions for submitting your suggestions on the same webpage. We can’t promise to provide a course in your requested location, but we will consider all requests. Moreover, if there is sufficient interest, an online version of the training may be created.
We look forward to seeing how scientists and researchers use cloud computing in their research!
—Stewart Tansley, Director, Microsoft Research Connections
Microsoft Research is pleased to announce a new initiative to help the research community use the cloud to advance scientific discovery. Three years ago, we partnered with researchers to experiment with cloud computing on Windows Azure. The results from these early efforts—many of which are described on our website—have been outstanding. These pioneering projects have cut across disciplines, from bioinformatics to ecology, social network analysis, civil engineering, mobile computing, natural language processing, and more.
The successes of our early efforts have convinced us of the immense value of using Windows Azure in scientific research. Moreover, they made us determined to do all we can to bring “cloud power” to the broader community of researchers. One reason for our confidence is that the Windows Azure platform has expanded to include a number of fantastic new capabilities. The original Platform as a Service capabilities remain, but Windows Azure now supports persistent Windows and Linux virtual machines; Hadoop services through HDInsight; mobile services support for Windows, Android, and iOS clients; virtual networks and identity management; various database services; Windows Media Services; and programming support for C++, C#, F#, Java, Python, Ruby, and R.
By taking advantage of the same platform that thousands of our commercial customers—and we at Microsoft Research—rely on, scientists can accelerate the speed and dissemination of scientific discovery.
Science is at an inflection point where the challenges of dealing with massive amounts of data and the growing requirements of distributed multidisciplinary collaborations make moving to the Windows Azure cloud extremely attractive. This is true for the individual researcher who does not want to manage local physical infrastructure and for large teams that need to share their discovery resources and services with the larger research community.
Many researchers from many scientific disciplines are ready to benefit from the flexibility and convenience of the cloud. We look forward to supporting them by launching a program called Windows Azure for Research.
Windows Azure for Research has four components:
Microsoft Research’s commitment is to support the scientific community to build and use cloud-based data collections and tools that will drive new discoveries and create new and innovative scenarios for using cloud computing. We are encouraged by many of the new open-source scientific tools that are now available on Windows Azure and we are eager to see more being built and distributed in the year ahead. In short, we are extremely excited to engage with the research community on this endeavor.
—Dennis Gannon, Director of Cloud Research Strategy, Microsoft Research Connections