    Cast Your Spell on Spelling


    Speller ChallengeAs some of you may recall, in December 2010, Microsoft Research and Bing jointly announced the Speller Challenge—the first ever Microsoft Research-Bing contest—which enticed participants to grapple with the issue of spelling correction of web search queries. Participants vied to build a speller that would propose the most plausible spelling alternatives for each search query. As a follow-up, we now invite the community to submit papers to the upcoming Spelling Alteration for Web Search Workshop, which will take place on July 19, 2011, at the Microsoft Bing Headquarters in Bellevue, Washington.

    The Spelling Alteration for Web Search Workshop addresses the challenges of web-scale natural language processing, with a focus on search-query spelling correction. The workshop will include a prize ceremony for the winners of the Speller Challenge and will provide a forum for participants to exchange ideas and share experience, encouraging community discussions about future research directions for spelling alteration in web search. The workshop website provides details of the workshop, including information on data services and resources, submission formats, and deadlines.

    Although the workshop is not limited to the Speller Challenge participants, we encourage the inclusion of the on-demand evaluation web application (for those who submitted theirs before registration closed) and the referenced testing dataset in prospective workshop submissions. These inclusions will facilitate the comparison of different systems and approaches against the same benchmark. We also ask that submitted systems be made publicly accessible throughout the workshop, to enable live demonstrations.

    The deadline for workshop submissions is June 25, so time is of the essence. If you are interested, submit your paper online.

    Evelyne Viegas, Director of Semantic Computing, Microsoft Research Connections

    Arming the Immune System Against HIV


    Learn how the immune system fights HIV-infected cells

    In the now decades-long battle against HIV and AIDS, researchers have been stymied by the virus’s ability to evade attacks by our immune system.

    Normally, a cell that is infected by a pathogen displays on its surface characteristic pieces of the pathogen peptides, known as epitopes, which are then recognized by the body’s immune system and trigger immune responses. HIV-infected cells produce these epitopes, but because HIV mutates so readily, so do the epitopes, leading to an ongoing struggle between our immune system and HIV. Therefore, scientists have been eager to gain a better understanding of the process of epitope production in HIV-infected individuals in the hope that such knowledge could lead to ways to beat HIV at this game.

    Recent work on the degradation of HIV proteins in infected cells has provided new insight into the process of HIV epitope presentation. This important research was conducted by a team at the Ragon Institute (a joint venture of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard University), with contributions from Carl Kadie and myself at the eScience Group of Microsoft Research. The paper was published online in the Journal of Clinical Investigation (JCI) on May 9, 2011, and will appear in the June issue of JCI’s print publication.

    The Ragon team, which was led by Sylvie Le Gall and included Estibaliz Lazaro, Pamela Stamegna, Shao Chong Zhang, Pauline Gourdain, Nicole Y. Lai, Mei Zhang, and Sergio A. Martinez, examined the stability of short HIV peptides in the cytosol of human cells. They discovered that the stability of these HIV-derived peptides is extremely variable: some degraded within seconds, while others remained largely intact after an hour.

    The Ragon team observed that peptide stability is crucial to determining how much of the epitope will be displayed on the cell surface: the less peptide degradation in the cytosol, the more epitope will be present on the surface of an infected cell.

    Carl and I then performed a computational analysis of the residues of 166 tested HIV peptides, looking for specific biochemical features that characterized stable and unstable peptides. This enabled us to identify multiple motifs or patterns that allow us to predict how stable or unstable a given epitope will be. A prediction tool based on our findings is available online.

    So, what is the value in predicting epitope stability? To answer that question, we first need to know that some researchers believe HIV has both protective and non-protective epitopes. When infected cells are attacked by the immune system, protective epitopes force the virus to mutate into a version that will not survive, protecting an individual against chronic HIV infection. Non-protective epitopes, in contrast, do not induce a protective immune response. We also need to understand that epitopes are cross-reactive, which means that when the immune system learns to fight a specific epitope, it can also recognize and attack similar epitopes. Suppose, therefore, we know that HIV epitope X is protective but is also unstable. That means it won’t be produced in large quantities and thus will likely not be a successful target for an immune response. But if by using the results of our research we could identify an epitope X-prime that is cross-reactive to X but more stable, we could then develop a vaccine based on X-prime that would yield a strong immune response to both X-prime and X. This would enable the immune system to more effectively attack HIV-infected cells that express the X epitope and thereby weaken the virus.

    Our collaboration with the Ragon Institute has uncovered a path that will better help us present pieces of HIV to activate the immune system and thus hopefully design an effective vaccine against HIV. It has been extremely productive and rewarding.

    David Heckerman, Senior Director, eScience Research Group, Microsoft Research

    Paradigm Shifting in Scholarly Communications


    Earlier this year, Lee Dirks, Cheri Ekholm, and I attended Phil Bourne’s Beyond the PDF workshop at the University of California, San Diego. This workshop advanced the premise that scholarly communication can and should evolve from static and disparate data and knowledge representation, as embodied in today’s typical PDF representations of research papers, to a rich integrated content which grows and changes the more we learn. In the few months since this event, there’s been a great deal of activity: Martin Fenner and Mark Hahnel are working on Wordpress for Scientists, Peter Sefton has launched the Scholarly HTML effort, and our team here at Microsoft Research is hard at work on some great new features for our next release of the Article Authoring Add-in for Word (the beta release of version 3.0 is due this later this year).

    But even more important than developing tools or formats, we need to persuade the current generation of researchers to challenge the status quo. Scholars do see the value (to varying degrees) in sharing a greater level of detail of their research as a part of the scholarly communication process, recognizing that such communications enhance the scientific record and accelerate discovery. But active researchers are also players in a system that rewards the traditional research paper format to the exclusion of all else.

    A major thread in this conversation stresses the enormous potential of shared research data in facilitating experimental reproducibility and validation. Much ink has been spent in the past few years on the “data deluge” and the promise of new advances in science that are not based on the traditional hypothesis-experiment-analysis-conclusion paradigm, but rather start with previously unseen patterns, anomalies, or correlations within the existing wealth of collected data themselves as the catalyst for new investigations and experiments.    

    To this end, it is clear that the sharing of scientific research data holds great promise for the scientific discoveries of the future. And yet the system of academic research achievement does not yet recognize or reward researchers for sharing their data. Change is afoot, however, and the next generation will look back on this decade as one of profound transformation in determining which parts of the scientific research process are recorded and how researchers are rewarded.

    In the meantime, organizations like BioMed Central are stepping up to recognize those researchers who are on the vanguard of this movement. Microsoft Research has contributed to BioMed Central’s Research Awards for several years, and we are proud sponsors of the Open Data Award since its inception in 2010. This year’s award recognized biologist Tommi Nyman from Finland for the article, “How common is ecological speciation in plant-feeding insects? A 'Higher' Nematinae perspective,” published in the open-access journal, BMC Evolutionary Biology.

    Dr. Peter Murray-Rust (left); Jean-Luc Bouvé (center), accepting the Open Data Award
    on behalf of Dr. Tommi Nyman; and Alex Wade (right)

    Dr. Nyman and his colleagues published three additional data files with their article:

    • The collection data for their specimens and taxonomic and ecological background information
    • The sequence data used in phylogeny reconstruction and resultant phylogenetic trees
    • The data file and run parameters for Bayesian Evolutionary Analysis Sampling Trees (BEAST)

    The data are well labeled and readily understandable by other scientists; moreover, the authors showed great transparency in their work, particularly in their first additional data file, which fully documents how they sampled their insects. This level of openness is not commonly seen and it demonstrates real leadership. Their article serves as an outstanding example of how evolutionary biology research should be presented and the data published to enable other scientists to validate and build on the work.

    Microsoft Research is honored to be an ongoing sponsor of the Open Data Award, and we are thrilled to be able to play a role in encouraging the Open Data movement in this way.

    Alex D. Wade, Director for Scholarly Communication, Microsoft Research Connections

