Download Research Tools
On July 10, in Boston, the External Research division of Microsoft Research will introduce the Microsoft Biology Initiative, resources designed to help biological scientists and programmers conduct research more efficiently and affordably. These include the first post-beta release of the Microsoft Biology Foundation (MBF), a language-neutral bioinformatics tool kit built as an extension to the Microsoft .NET Framework. In addition to a new genome assembler, performance enhancements, and other improvements, MBF builds upon the vision and goals that drove the development of the beta versions. Those included a commitment to community involvement, extensibility, cross-platform and interoperable functionality, language neutrality, and support for best practices. While there are other libraries of biological functionality available, MBF supports universally accepted standards of the bioinformatics community and implements a range of unique functionality derived from original Microsoft research. The code for MBF and supporting documents is available on CodePlex[RK1].
Like MBF itself, the audience during the 11th annual Bioinformatics Open Source Conference, held in conjunction with the 18th annual International Conference on Intelligent Systems for Molecular Biology, represents a powerful combination of technology and biology. To harness technology in support of biological discovery, MBF implements parsers for common bioinformatics file formats and algorithms used to manipulate DNA, RNA, and protein sequences. In addition, it provides a set of connectors to biologial Web services, such as the Basic Local Alignment Search Tool, as well as a utility that enables scientists to view their data within Excel easily and quickly.
From its core technology to the free availability of the code on which it is built, MBF is the result of collaboration between Microsoft Research and industrial and academic partners, with the aim of building the tools scientists need to pursue biological research. With Microsoft .NET as its base, MBF makes it easier for developers to leverage current technologies, with thousands of functions and a common code base that can be accessed and used with great flexibility.
One of the areas in which MBF is particularly valuable is the field of genomics, which has experienced tremendous advances since the human genome first was sequenced a decade ago. A full understanding of the human genome offers great potential for advances in health care. To reduce the computational complexity of reconstructing the the whole genome, MBF includes a new whole-genome-assembly algorithm, PaDeNA (Parallel de Novo Assembler). PaDeNA has the potential to reconstruct the DNA sequence of a patient rapidly from huge volumes of experimental data, the first step in using the genome in health care. While PaDeNA is provided freely as a part of MBF, it is designed to be modular and is fully documented, enabling experimental biologists and software developers to tweak the basic algorithm and add features to meet the needs of their research.
Another example of MBF at work is the research undertaken by David Heckerman, senior director of the eScience group within Microsoft Research. Heckerman, an expert in machine learning, is working on the design of HIV vaccines, which requires an understanding of how the virus evolves in each individual. The next versions of the biological applications Heckerman is developing will use functions built into MBF. Heckerman's applications will continue to be made available for free download on CodePlex[RK2] .
In keeping with the bioinformatics community's strong tradition of sharing expertise in support of ongoing discovery, I invite you to download it, use it for your work, and contribute your experience to the global research community.
Simon Mercer, director, Health & Wellbeing, External Research, a division of Microsoft Research
During the Association for Computing Machinery's 33rd annual SIGIR Conference, on July 19-23, 2010 in Geneva, Microsoft Research is announcing enhancements to the Microsoft Web N-gram Services, available free via a cloud-based platform. Microsoft Research created Microsoft Web N-gram Services to help drive discovery and innovation by enabling scientists to conduct research on real-world, web data. Microsoft Web N-gram Services support many research areas that have the potential to change lives, including natural language processing and empowering people to take advantage of the vast amounts of information available on the Internet via new web search capabilities.
Introduced late last year, in partnership with Bing, the Microsoft Web N-gram Services public beta now is being extended beyond professors at accredited universities to include all researchers worldwide, provided they are using the service for non-commercial purposes. The service now also includes a predictive API in support of query-language models. By opening the service up to more researchers and making these important service enhancements, Microsoft Web N-gram Services will expand not only its audience, but also access to high-quality feedback
In the video below, Kuansan Wang, principal researcher at Microsoft Research Redmond, offers a more detailed explanation of Microsoft Web N-gram Services. Wang works with a team focused on developing technologies that provide a better understanding of human languages.
Professional gatherings such as the Web N-gram workshop during SIGIR 2010 serve as another important channel for using real-world expertise to enhance ongoing development of Microsoft Web N-gram Services. Research papers, selected by an international program committee, will be presented during the workshop and will be followed by discussions about the use of web-based data services for research. Workshops and other gatherings have been critical to the development of Microsoft Web N-gram Services from the beginning. After the expansion of beta availability announced during the International World Wide Web Conference in April 2010, for example, many researchers took advantage of the opportunity to work with the services. One such researcher, Li Ding of Rensselaer Polytechnic Institute, has his work on multiword tag clouds featured in this demo.
In addition to presentations, the workshop will include a panel discussion on issues related to query representation, including a rigorous definition of the task, modeling for the task, challenges and opportunities, implications for industrial research, and future research directions.
If you are attending SIGIR 2010, I cordially invite you to attend the workshop, at 9 a.m. July 23 and take advantage of this opportunity to share your perspectives and connect with other researchers in the field. To stay updated and to learn about opportunities to participate in ongoing development, please visit the Microsoft Web N-gram Services home page.
Evelyne Viegas, senior research program manager, Microsoft External Research, a division of Microsoft Research
In even-numbered years, North America's Computing Research Association (CRA) gathers computer-science department heads, deans, provosts, and major computer-science funding agencies at the Snowbird Resort and Conference Center in the Wasatch Mountains, not far from Salt Lake City. Hot on the heels of Microsoft Research's Faculty Summit, the Snowbird Conference occurred July 18-20. The insight shared during sessions on statistics, trends, and the best ways to communicate computer science is applicable to the field as a whole, as were other ideas addressed in sessions held during the conference, including:
- A Call to Action: Peter Harsha, who represents CRA in Washington, D.C., led a session providing an inside view of how the legislative process can affect the funding of computer-science research. He also explained the role the Computer Research Advocacy Network plays in ensuring that elected officials receive targeted, timely communications.
- Understanding the Ranking of Graduate Programs: Charlotte Kuh of the National Research Council gave a progress report on a survey to update the 1995 database of Ph.D. rankings. The session chair, Jim Kurose of the University of Massachusetts Amherst, outlined the impact the CRA had in ensuring that its data included conference papers and citations in the computer-science field. Read the full story.
- Computer Games: Michael Mateas of the University of California, Santa Cruz made a case for graduate research in game design and development, presenting an array of research areas important to the industry, including artificial intelligence, procedural generation, and interactive narratives. Donald Brinkman of Microsoft Research External Research presented educational game-related activities such as Kodu and the game-themed programming approach, outlining Microsoft's near-term plans to drive next-generation educational games.
- Social Good: Lakshminarayanan Subramanian of New York University led a discussion on the potential computer-science departments have to promote social advancements through global initiatives. Examples included high-speed, point-to-point, solar-powered Wi-Fi and the use of technology to detect counterfeit currency, prescriptions, and other documents.
- Basic Computing Knowledge: Andy van Dam from Brown University presented the findings from the CRA Education Committee on trends critical to the future of computer science, including diversity, pipeline issues, and general apathy toward the field of computer science. The report, two years in the making, details best practices to introduce students to computational thinking, to address computer-science curricula, and to identify and develop cognitive, mastery, and research skills.
- Communicating Computer Science, The Hot Under the Cool: Chaired by Judith Bishop, director of Computer Science within Microsoft Research External Research, this session explored how to communicate innovation in computer science to a world already overwhelmed by technical advancements. Other participants in the session included Shyno Chacko Pandeya from the New Image of Computing Initiative, which uses the Dot Diva brand to attract middle-school girls; Virginia Gold from the Association for Computing Machinery, who provided insight into the marketing aspects of the first Computer Science Education Week campaign; and Jon Kleinberg of Cornell University, who introduced his new book Networks, Crowds and Markets, co-written with Cornell colleague David Easley, which is aimed at large classes from all fields of study.
Microsoft Research is a full member of the CRA and the conference. Rico Malvar, managing director of Microsoft Research Redmond, provided a new insight into the work of the association in promoting the interests of the members of the computing research community.
The conference was a tremendous opportunity to help support advancement of the CRA strategy and agenda, as well as network with computer-science thought leaders in North America.
Daron Green, general manager, External Research, a division of Microsoft Research