Download Research Tools
When you type a word or phrase into a search engine, there are a number of things that could go wrong. You might not know how a term is spelled or, in your rush to jump to the results, you could transpose or otherwise mistype some characters.
Spelling alteration is a popular search technique used to translate apparent typographical errors, alternative spellings, and synonyms into an improved query that returns the best possible results on the first try.
But this approach is not without its pitfalls. You might enter a word correctly that's not widely used but has a neighbor in the dictionary that's much more popular on the Internet. One person's spelling error could be another's perfect query. Which results should the search engine provide, and how should any useful alternative searches be represented?
That's the task being offered to researchers and students around the world in the Speller Challenge, presented by Microsoft Research in partnership with Bing. The goal is to develop a spelling alteration system suitable for large-scale statistical data mining-based web search.
A common approach to spelling alteration is the noisy channel model, in which the received query (q) is treated as a noise-corrupted version of the target query (c). In this model, the spelling alteration system alters q into c and returns the latter's results. The techniques to best identify query/target pairs and best estimate these statistics are the active research problem that underlies this challenge.
But that's just the foundation. Place the spelling alteration task in the context of web search, and you have another dimension to consider. For a lot of spelling applications, target queries are assumed to be composed of tokens (i.e., words and phrases) that are drawn from a predetermined vocabulary. The effectiveness of using a fixed lexicon is a known problem because it can lead the speller not only to miss "real word" errors but also misrecognize out-of-vocabulary tokens as errors.
In the context of search, the scale of the web magnifies this problem considerably. The challenge is therefore not necessarily to alter queries to conform to a specific dictionary of words and phrases, but rather provide relevant documents that have high matching scores in ranking.
If this sounds like the type of problem you (or the search developer in your life) would enjoy solving, the task is to build the best speller web service that proposes the most plausible spelling alternatives for a wide range of search queries. Spellers are encouraged to take advantage of cloud computing and must be submitted to the challenge in the form of REST (Representational State Transfer) web services.
For the purpose of the Speller Challenge, a development dataset (derived from the publicly available TREC queries that are based on the 2008 Million Query Track) will be made available to the public through the Microsoft Web N-gram service. This TREC Evaluation Dataset is annotated by using the same guidelines and processes as in the creation of the Bing Test Dataset, which is the dataset used to select the winners.
The top five competitors will receive the following prizes:
—Evelyne Viegas, Director of Semantic Computing for the External Research division of Microsoft Research
When a wildfire strikes, every second counts. Time lost can all too often be measured in lost life, deforestation, and property damage. Enter the Virtual Fire application, based on Microsoft Bing Maps, ESRI ArcGIS, and other software. This web geographic information system (GIS) platform is designed to support wildfire early warning, control, and civil protection by sharing information and tools produced by the Geography of Natural Disasters Laboratory at the University of the Aegean in Greece.
With these new tools, firefighting personnel, emergency crews, and other authorities can design an operational plan to contain the forest fire, pinpointing the best ways to put it out with new levels of precision. Fire management professionals can locate fire service vehicles and other resources online and in real-time. Fire patrol aircrafts use Global Positioning System (GPS) tracking and communications to send coordinates for each item to Virtual Fire, which depicts them on a web GIS. Cameras can augment this data by transmitting images of high-risk areas into the Virtual Fire system.
One of the compelling advantages of Virtual Fire is that it enables fire management professionals to take advantage of GIS capabilities without extensive training on complicated GIS applications. The platform enables end-users to query the databases and get answers immediately, locate points of interest in high-resolution satellite images, and download information to their portable computers or GPS devices.
But the Virtual Fire application offers services beyond simple coordination of emergency efforts. Remote automatic weather stations and a weather forecasting system based on the SKIRON weather model (developed by the Atmospheric Modeling and Weather Forecasting group at the University of Athens) provide crucial data needed for fire prevention and early warning. Virtual Fire provides geographical representation of the fire risk potential and identifies high-risk areas at different local regions daily, based on a high performance computing (HPC) pilot application that runs on Windows HPC Server.
"Virtual Fire hosts and visualizes models used for predicting forest fire risk and behavior to understand how the fire is likely to spread, based on the actual meteorological data, vegetation, and landscape morphology," says Kostas Kalabokidis, geography professor at University of the Aegean and principal investigator of the Virtual Fire initiative. "These prediction data—along with a plethora of other information spanning roads, location of water tanks, the positioning of aircrafts and vehicles, vegetation types, and weather data—will be visualized over online maps such as Bing Maps. This will enable fire fighters in control centers, or on-site via handheld devices, to more effectively manage forest fires and deal with any other emergencies situations that may arise."
The system runs on servers that were donated by Hewlett Packard (three quad-core computing nodes: one head node and two computing nodes). By using the FARSITE and FlamMap fire behavior software (created by Missoula Fire Sciences Laboratory), maps are produced on demand to graphically represent the spread and intensity of a forest fire at different times and places. In addition, user feeds and email messages provide effective communication between users and administrators for reporting events.
During the course of its development, the Virtual Fire platform delivered some early successes in combating and even deterring wildfires. On July 8, 2009, an extremely dangerous wildfire broke out on Lesvos Island. The Virtual Fire system—which was at its initial stage, only partly operational with the fire-risk probability index and the weather forecasting and monitoring—provided the fire service with a better grasp of local topography and details of current and imminent weather as well as the high-risk prediction map. This resulted in a prompt initial response that prevented the fire from uncontrolled enlargement and encroachment to nearby sensitive ecological preserves and a military base camp. Virtual Fire successfully predicted the fire risk for the particular area where the event took place, which led to its status as a preferred fire risk prediction tool in 2010.
During the 2010 fire season (from April to October), no serious fire breakouts developed on Lesvos Island, in contrast to other Greek islands such as Samos. Almost all of the fire events were promptly confronted; fires were not permitted to overgrow and they responded to initial efforts to subdue them. Evidence currently under investigation suggests that Virtual Fire played an important role in these improved results, offering the local fire service valuable information to utilize for decision support with their own considerable operational experience and knowledge.
Coordinating Prefecture Board of Lesvos, Mytilene, in Greece
The results of the Virtual Fire initiative were presented July 6, 2010, at the Coordinating Prefecture Board of Lesvos, Mytilene, in Greece. Event attendees included the prefect and counsellors of Lesvos Prefecture, mayors and representatives of the Municipalities of Lesvos Island, heads of Civil Protection, officers and fire fighters of the North Aegean and Mytilene Fire Services, staff of Lesvos Forest Service, commanders and officers of military and public service authorities, representatives of social services and fire-fighting volunteer organizations of Lesvos Island, and the partners of the project from University of the Aegean, University of Athens, Microsoft Research, Microsoft Hellas, and Microsoft Innovation Center—Greece. For more information, read the press release.
—Scarlet Schwiderski-Grosche, Research Program Manager, External Research division of Microsoft Research, Cambridge
As computer-science researchers, we at Microsoft Research are committed to strong computer-science education programs. With more than 800 researchers worldwide, we know firsthand the value of a solid computer-science education, which is why Microsoft Research is a proud supporter of the second annual Computer Science Education Week (CSEdWeek), celebrated this year from December 5-11 in the United States.
This week's focus on computer-science education couldn't be timelier. The recent report from the Association for Computing Machinery, Running on Empty, reveals that the state of computer-science education in the United States is alarming. Only nine states count computer-science courses as a core academic subject in high-school graduation requirements. Meanwhile, there will be a projected 1.4 million new computing jobs by 2018, so we need more states to see the light and join in producing qualified students.
Making matters worse, funding cuts in local school districts have hit computer-science programs heavily. Statistics show that in many school districts, teachers who can teach computer science are being reassigned to mathematics or science classes. How can we as a nation afford to cut corners on this vital component of a 21st-century education?
CSEdWeek is a call to action. It's a rallying point for teachers, parents, and schools at all levels—from K-12 through college—to focus attention on this problem and to build more robust computer-science education programs throughout the United States. Corporations, too, must make their voices heard in support of policies and programs that advance the state of computer-science education. I'm proud that Microsoft is a supporter of CSEdWeek and a strong advocate of computer-science education.
At Microsoft Research, we have a number of tools designed for high-school students, showing them how scientific computing can be part of their daily lives—and fun, as well. Two of these are showcased on our associated website:
I encourage you to give them a try!
Recently, Chris Stephenson, executive director of the Computer Science Teachers Association, made a telling statement: "Effective computer-science education means far more than learning how to use a computer. It is about computational thinking: problem decomposition, data analysis, and solution design, all of which can be incorporated across disciplines and benefit students with interests outside of computer science. But we know that until we eliminate the roadblocks to quality computer-science education, we are denying access to important skills and future opportunities."
Echoing the same sentiments, Rick Rashid, senior vice president of Microsoft Research, said: "Today, more than ever, we need to empower students with the enthusiasm and creative problem-solving skills needed to address some of the world's greatest challenges, from improving healthcare to reducing our impact on the environment."
Do you want to do more to advance computer-science education? You can explore CSEdWeek resources, including tools; suggestions for celebrations, reports, and statistics lesson plans; and event listings across the United States and Canada. You also can sign this pledge in support of CSEdWeek. And please don't confine your activism to this week. Boosting the state of computer-science education in the United States is a 24/7/52 matter!
—Judith Bishop, Director of Computer Science for the External Research division of Microsoft Research