Spelling correction in PubMed

One of my hobbies is bioinformatics and the cool ways computers can help with understanding biology, so I was intrigued when I saw this notice from the CLIP Colloquium, announcing a talk by John Wilbur at the National Library of Medicine: Spelling Correction in the PubMed Search Engine:

It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here I will describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries will be discussed and our solutions outlined.

About the Speaker

 John Wilbur is a Senior Scientist in the Computational Biology Branch of the National Center for Biotechnology Information. He is a principal investigator leading a research group in the study and development of statistical text processing algorithms. While at NCBI he has developed the algorithm that produces PubMed related documents and the algorithm that in PubMed allows fuzzy phrase matching. Most recently his group has developed algorithms for phrase identification in natural language text that are used in NCBI¡Çs electronic textbook project and allow for easy reference from MEDLINE documents to related textbook material. He has a strong interest in machine learning and natural language processing techniques and a focus of current research is improvements in named entity recognition in the field of molecular biology and medicine.

Published 27 March 06 02:53 by sprague

Comments

No Comments
New Comments to this post are disabled

Search

This Blog

Syndication

Page view tracker