In previous posts I wrote how the Web N-Gram service answers the question: what is the probability of word w in the context c?  This is useful, but sometimes you want to know: what are some words {w} that could follow the context c?  This is where the Generative-Mode APIs come in to play.

Examples are the easiest way to demonstrate this feature, and the Python library is the easiest way to show examples:

>>> s = MicrosoftNgram.LookupService(model='urn:ngram:bing-body:jun09:2')
>>> for t in s.Generate('paris', maxgen=5): print t
...
('hilton', -0.9303072)
('france', -1.227608)
('and', -1.536233)
('in', -1.8312980000000001)
('the', -1.862304)

OK, those results won't surprise anyone in the U.S.    The next word to follow paris is hilton 12% of the time, and france 6% of the time.  Ouch.

Using this API turns out to be quite an interesting way to explore the data.  Our competitor has a service that's a bit like this in the form of AutoComplete/AutoSuggest, which offers a service not unlike ours.  Some of you may also have heard of Google Scribe, an application that uses this their API.  So how do we differ?  Our data, for better or for worse, is unfiltered raw data.  This is quite evident if you try the Paris example in your browser in that other search engine.  We will also give you the probability information so you can make your own decisions about the data.  What we won't give you is a multi-term completion; a multi-term result, however, is a something you could construct with multiple calls to our API based on your own criteria.  Now you can write your own version of Scribe!