Language Modeling 101: An introduction to conditional probabilities in the context of language data
Announcing new datasets from Spring 2010. Now serving 5-grams!
A quick tutorial on the MicrosoftNgram Python library.
Top 100K words for Apr10 body stream is now available for analysis.
Some additional FAQs for the now-open Microsoft Research Speller Challenge.
One use of our service is to break words based on n-gram probability info. No linguistic knowledge necessary.
Generative-Mode API gives you new insight to the language data of the web
Introducing the Speller Challenge, a contest from Microsoft Research and Bing.
Some simple performance tips that may speed up your WCF application.
Language Modeling 102: A lesson on joint probabilities
Working with large lexicons means engineering trade-offs become necessary.
What happens when you encounter the unknown?
Learn about some of the details of tokenization in our service
Different models reflect different writing styles on the web.
A very brief introduction of the Microsoft Web N-Gram service