One use of our service is to break words based on n-gram probability info. No linguistic knowledge necessary.
Language Modeling 101: An introduction to conditional probabilities in the context of language data
A quick tutorial on the MicrosoftNgram Python library.
Top 100K words for Apr10 body stream is now available for analysis.
Announcing new datasets from Spring 2010. Now serving 5-grams!
Some additional FAQs for the now-open Microsoft Research Speller Challenge.
Generative-Mode API gives you new insight to the language data of the web
Language Modeling 102: A lesson on joint probabilities
Some simple performance tips that may speed up your WCF application.
Introducing the Speller Challenge, a contest from Microsoft Research and Bing.
Learn about some of the details of tokenization in our service
Working with large lexicons means engineering trade-offs become necessary.
What happens when you encounter the unknown?
Different models reflect different writing styles on the web.
A very brief introduction of the Microsoft Web N-Gram service