Der deutsche Education Blog

Microsoft Research Connections Blog

The Microsoft Research Connections blog shares stories of collaborations with computer scientists at academic and scientific institutions to advance technical innovations in computing, as well as related events, scholarships, and fellowships.

  • Microsoft Research Connections Blog

    Building a Better Speller: Bing and Microsoft Research Offer Prizes for Best Search Engine Spelling Alteration Services


    Speller Challenge, presented by Microsoft Research in partnership with Bing

    When you type a word or phrase into a search engine, there are a number of things that could go wrong. You might not know how a term is spelled or, in your rush to jump to the results, you could transpose or otherwise mistype some characters.

    Spelling alteration is a popular search technique used to translate apparent typographical errors, alternative spellings, and synonyms into an improved query that returns the best possible results on the first try.

    But this approach is not without its pitfalls. You might enter a word correctly that's not widely used but has a neighbor in the dictionary that's much more popular on the Internet. One person's spelling error could be another's perfect query. Which results should the search engine provide, and how should any useful alternative searches be represented? 

    That's the task being offered to researchers and students around the world in the Speller Challenge, presented by Microsoft Research in partnership with Bing. The goal is to develop a spelling alteration system suitable for large-scale statistical data mining-based web search.

    A common approach to spelling alteration is the noisy channel model, in which the received query (q) is treated as a noise-corrupted version of the target query (c). In this model, the spelling alteration system alters q into c and returns the latter's results. The techniques to best identify query/target pairs and best estimate these statistics are the active research problem that underlies this challenge.

    But that's just the foundation. Place the spelling alteration task in the context of web search, and you have another dimension to consider. For a lot of spelling applications, target queries are assumed to be composed of tokens (i.e., words and phrases) that are drawn from a predetermined vocabulary. The effectiveness of using a fixed lexicon is a known problem because it can lead the speller not only to miss "real word" errors but also misrecognize out-of-vocabulary tokens as errors.

    In the context of search, the scale of the web magnifies this problem considerably. The challenge is therefore not necessarily to alter queries to conform to a specific dictionary of words and phrases, but rather provide relevant documents that have high matching scores in ranking.

    If this sounds like the type of problem you (or the search developer in your life) would enjoy solving, the task is to build the best speller web service that proposes the most plausible spelling alternatives for a wide range of search queries. Spellers are encouraged to take advantage of cloud computing and must be submitted to the challenge in the form of REST (Representational State Transfer) web services.

    For the purpose of the Speller Challenge, a development dataset (derived from the publicly available TREC queries that are based on the 2008 Million Query Track) will be made available to the public through the Microsoft Web N-gram service. This TREC Evaluation Dataset is annotated by using the same guidelines and processes as in the creation of the Bing Test Dataset, which is the dataset used to select the winners.

    The top five competitors will receive the following prizes:

    First place US$10,000
    Second place US$8,000
    Third place US$6,000
    Fourth place US$4,000
    Fifth place US$2,000

     —Evelyne Viegas, Director of Semantic Computing for the External Research division of Microsoft Research

    Learn More

  • Microsoft Research Connections Blog

    How Microsoft Technology and Research Are Helping Create a Clearer Picture of HIV


    Nearly 30 years since its discovery, the human immunodeficiency virus (HIV) continues to prove a difficult virus to pin down because it mutates so rapidly—a trait which, so far, has made an effective vaccine for this often-deadly condition impossible to develop.

    That might change, thanks in part to new Microsoft tools that are being used to construct maps of the mutating virus, which may in turn help identify prospective vaccine candidates.

    HIV mutates at such a high rate that the virus is distinct for each individual patient. The level of viral variation in one HIV patient is comparable to the worldwide level of variation during the course of an influenza epidemic.

    The PhyloD Viewer draws proteins as circles to reveal mutation patterns that could aid in HIV vaccine design.

    The PhyloD Viewer draws proteins as circles to reveal mutation patterns
    that could aid in HIV vaccine design.

    A first step in overcoming this challenge is to identify consistent patterns in viral adaptation. Tools such as PhyloD, PhyloD Viewer, and Phylo Detective can be used to identify and visualize HIV covariation and adaptation. By identifying patterns and constraints in HIV evolution, scientists are able to focus on HIV's weaknesses, with the goal of designing a vaccine that will be resistant to HIV mutation. The arcs in the circle pictured above, developed with the PhyloD Viewer, represent how HIV in a single patient is connected to itself as parts of it mutate.

    This research delivered a statistical approach that could help further research into HIV mutation. It also led to the observation that patterns of HIV evolution are broadly predictable based on host immunogenetic profiles. In other words, we found a promising consistency in the way that HIV adapts to the human immune response, which could pave the way for vaccine design. 

    It's worth noting that this work is built on the Microsoft Biology Foundation, which provides consistent file formats, statistical packages, and resources to farm out computations to clusters of machines—permitting scientists to focus on the science of modeling the virus and identifying its vulnerabilities.

    —Jonathan Carlson, Researcher for Microsoft Research, eScience

  • Microsoft Research Connections Blog

    Multicore Workshop Attendees Work to Integrate Software and Hardware for Optimal Performance and New Applications


    Attendees at the Second Barcelona Multicore Workshop

    The latest innovations in multicore technology are meaningless if the software you run is not written to take advantage of the advanced hardware design. To help address this and other issues, attendees at the Second Barcelona Multicore Workshop (BMW) met October 21-22, 2010, to critically examine developments in computer chip technology in the two years since the highly successful 2008 workshop.

    Today, sequential chips are almost entirely superseded by multicore processors. The hardware community is focused on designing these processors to maximize the potential performance. Meanwhile, software developers need to know how best to program for machines that use this multicore technology, particularly when it is used for desktop workloads or on mobile devices rather than traditional scientific applications.

    To help understand and solve these concerns in a multidisciplinary manner, representatives from Barcelona Supercomputing Center, Hipeac, Microsoft Research, and academics and researchers from Europe, Asia, and the United States met and cross-fertilized ideas across the hardware and software communities. Many participants report that the conference sparked new plans for collaboration, including company partnerships with academia and the sharing of valuable tools and ideas.

    Among the key discussions were:

    • Parallel programming models for the Barrelfish research operating system that Microsoft Research has developed with ETH Zurich in Switzerland. Barrelfish treats the internals of a multi-processor machine as a distributed system: Each core runs independently, and they communicate via message passing. Project leaders are working with Barcelona Supercomputing Center to use their experience with the StarSs programming model to write parallel programs that run on Barrelfish.
    • How developers are using low-power vector processors to apply ideas originally developed for high-performance computing to applications such as face and speech recognition, machine-learning, and column-store databases that might run in the cloud or on future mobile devices.
    • Panelists attending "Can Software Keep Up with the Pace of Hardware Development?" discussed what can be done to address the readiness of the software industry to meet the multicore/heterogeneous hardware trends. One major discussion explored whether processor designers could help address the issue by focusing less on specific applications and single-use benchmarks and more on the operating system and need for hardware to efficiently support many different processes on the machine at the same time. "There is a growing sense among those who do research into system software that computer architects—those who design processors and other system components—need to change their focus," reports Timothy Roscoe of ETH Zurich. "This is partly because many commercially important workloads are now OS-intensive, and some current processor designs incur a high overhead when switching to kernel mode, and partly because as chips become more parallel, the need to coordinate multiple tasks and communicate between multiple applications on cores becomes a key performance bottleneck."

    —Tim Harris, Senior Researcher, System and Networking Group at Microsoft Research Cambridge

Page 108 of 125 (374 items) «106107108109110»