Microsoft Web N-Gram

Bringing you web-scale language model data. Web N-Gram is joint project between Microsoft Bing and Microsoft Research.

Browse by Tags

Tagged Content List
  • Blog Post: Perf tips for using the N-Gram service with WCF

    The support in Visual Studio for WCF makes writing a SOAP/XML application for the Web N-Gram service a pretty straightforward process. If you're new to this, the Quick Start guide might be helpful to you. There are a few tweaks you can make, however, to improve the performance of your application if...
  • Blog Post: The messy business of tokenization

    So what exactly is a word, in the context of our N-Gram service? The devil, it is said, is in the details. As noted in earlier blog entries, our data comes straight from Bing. All tokens are case-folded and with a few exceptions, all punctuation is stripped. This means words like I'm or didn't are...
  • Blog Post: UPDATE: Serving New Models

    Today's post was delayed slightly but we have good news — announcing the availability of additional language model datasets. As always, the easiest way to get a list is to simply navigate to http://web-ngram.research.microsoft.com/rest/lookup.svc . Shown below are the new items, in URN form: ...
  • Blog Post: What can data do for you?

    Let's think of the scale of different lexicons, in terms of order of magnitude: 1,000 - the day-to-day vocabulary of someone in the United States 10,000 - the number of different words in Moby Dick 100,000 - the number of words understood by a state-of-the-art speech recognition engine ...
Page 1 of 1 (4 items)