Download Research Tools
When you type a word or phrase into a search engine, there are a number of things that could go wrong. You might not know how a term is spelled or, in your rush to jump to the results, you could transpose or otherwise mistype some characters.
Spelling alteration is a popular search technique used to translate apparent typographical errors, alternative spellings, and synonyms into an improved query that returns the best possible results on the first try.
But this approach is not without its pitfalls. You might enter a word correctly that's not widely used but has a neighbor in the dictionary that's much more popular on the Internet. One person's spelling error could be another's perfect query. Which results should the search engine provide, and how should any useful alternative searches be represented?
That's the task being offered to researchers and students around the world in the Speller Challenge, presented by Microsoft Research in partnership with Bing. The goal is to develop a spelling alteration system suitable for large-scale statistical data mining-based web search.
A common approach to spelling alteration is the noisy channel model, in which the received query (q) is treated as a noise-corrupted version of the target query (c). In this model, the spelling alteration system alters q into c and returns the latter's results. The techniques to best identify query/target pairs and best estimate these statistics are the active research problem that underlies this challenge.
But that's just the foundation. Place the spelling alteration task in the context of web search, and you have another dimension to consider. For a lot of spelling applications, target queries are assumed to be composed of tokens (i.e., words and phrases) that are drawn from a predetermined vocabulary. The effectiveness of using a fixed lexicon is a known problem because it can lead the speller not only to miss "real word" errors but also misrecognize out-of-vocabulary tokens as errors.
In the context of search, the scale of the web magnifies this problem considerably. The challenge is therefore not necessarily to alter queries to conform to a specific dictionary of words and phrases, but rather provide relevant documents that have high matching scores in ranking.
If this sounds like the type of problem you (or the search developer in your life) would enjoy solving, the task is to build the best speller web service that proposes the most plausible spelling alternatives for a wide range of search queries. Spellers are encouraged to take advantage of cloud computing and must be submitted to the challenge in the form of REST (Representational State Transfer) web services.
For the purpose of the Speller Challenge, a development dataset (derived from the publicly available TREC queries that are based on the 2008 Million Query Track) will be made available to the public through the Microsoft Web N-gram service. This TREC Evaluation Dataset is annotated by using the same guidelines and processes as in the creation of the Bing Test Dataset, which is the dataset used to select the winners.
The top five competitors will receive the following prizes:
—Evelyne Viegas, Director of Semantic Computing for the External Research division of Microsoft Research
As computer-science researchers, we at Microsoft Research are committed to strong computer-science education programs. With more than 800 researchers worldwide, we know firsthand the value of a solid computer-science education, which is why Microsoft Research is a proud supporter of the second annual Computer Science Education Week (CSEdWeek), celebrated this year from December 5-11 in the United States.
This week's focus on computer-science education couldn't be timelier. The recent report from the Association for Computing Machinery, Running on Empty, reveals that the state of computer-science education in the United States is alarming. Only nine states count computer-science courses as a core academic subject in high-school graduation requirements. Meanwhile, there will be a projected 1.4 million new computing jobs by 2018, so we need more states to see the light and join in producing qualified students.
Making matters worse, funding cuts in local school districts have hit computer-science programs heavily. Statistics show that in many school districts, teachers who can teach computer science are being reassigned to mathematics or science classes. How can we as a nation afford to cut corners on this vital component of a 21st-century education?
CSEdWeek is a call to action. It's a rallying point for teachers, parents, and schools at all levels—from K-12 through college—to focus attention on this problem and to build more robust computer-science education programs throughout the United States. Corporations, too, must make their voices heard in support of policies and programs that advance the state of computer-science education. I'm proud that Microsoft is a supporter of CSEdWeek and a strong advocate of computer-science education.
At Microsoft Research, we have a number of tools designed for high-school students, showing them how scientific computing can be part of their daily lives—and fun, as well. Two of these are showcased on our associated website:
I encourage you to give them a try!
Recently, Chris Stephenson, executive director of the Computer Science Teachers Association, made a telling statement: "Effective computer-science education means far more than learning how to use a computer. It is about computational thinking: problem decomposition, data analysis, and solution design, all of which can be incorporated across disciplines and benefit students with interests outside of computer science. But we know that until we eliminate the roadblocks to quality computer-science education, we are denying access to important skills and future opportunities."
Echoing the same sentiments, Rick Rashid, senior vice president of Microsoft Research, said: "Today, more than ever, we need to empower students with the enthusiasm and creative problem-solving skills needed to address some of the world's greatest challenges, from improving healthcare to reducing our impact on the environment."
Do you want to do more to advance computer-science education? You can explore CSEdWeek resources, including tools; suggestions for celebrations, reports, and statistics lesson plans; and event listings across the United States and Canada. You also can sign this pledge in support of CSEdWeek. And please don't confine your activism to this week. Boosting the state of computer-science education in the United States is a 24/7/52 matter!
—Judith Bishop, Director of Computer Science for the External Research division of Microsoft Research
In case you missed it, there was a great deal of passion expressed last week regarding the state of computer science education in our society. There were outreach efforts, programs highlighted, and a number of online discussions that ensued—overall, some really impressive growth in activity across the board over last year in broad awareness.
I decided to use the opportunity to spend a bit more dedicated time catching up on some online reports, material, and people.
I started with Alfred Thompson's blog. He writes one of the most widely-read and highly-respected blogs on computer science in K-12. A former high school teacher, Alfred is smart, funny, and honest, but most importantly he has an amazing talent for appreciating the perspective of today’s youth, a solid understanding of pedagogy, and a passion and talent for computer science. His blog is stop #1, #2, and #3 for me on this topic.
Mark Guzdial's Computing Education Blog is usually where I spend my time next. Mark’s comments are usually more education-centric than Alfred’s more broad technology posts. As a professor in the School of Interactive Computing at Georgia Institute of Technology, Mark sees, first-hand, the quality and quantity of students from our secondary school system. Mark is also very involved in the most active higher education debates on computer science and he frequently exchanges relevant opinions and ideas with other influencers in the field.
I also took time to read the September 2010 update of "Rising Above the Gathering Storm," which is posted on the National Academies Press website. Sobering, alarming, convincing, and motivating. This revision is appropriate subtitled: Rapidly Approaching Category 5.
Doing a bit of reflecting, and potentially stating the obvious ... the challenge is enormous and sometimes feels overwhelming, but it is also worth both support and action—even if the action seems small relative to the change needed.
It is extremely satisfying to work for Microsoft in this situation because I feel that we are working toward the public good in this area and that I am a contributing member of these efforts.
Microsoft supports thousands of people involved in outreach, including our own employees, who are frequent visitors and speakers at schools through a program called EduConnect, which enables Microsoft employees to share their knowledge and expertise with local school districts. We extend our outreach through the skills and enthusiasm of our Microsoft Student Partners—a program that recognizes top college students who are passionate about technology and communication, and equips them to share their computer know-how and enthusiasm.
We also attempt to motivate students through programs like the Imagine Cup and the upcoming Microsoft bliink 2011 web-design contest. Some students are more motivated by out-of-classroom learning situations and these programs encourage students to exercise both creativity and teamwork.
Obviously, our efforts would not be complete without connection through social medial, and I believe the Microsoft Tech Student effort is the best of the lot.
If you're a computer scientist, an IT professional, or simply a concerned citizen, I encourage you to get involved with your local schools and work to ensure that our students are getting the 21st-century education they need.
—Jim Pinkelman, Senior Director in the External Research division of Microsoft Research