One use of our service is to break words based on n-gram probability info. No linguistic knowledge necessary.
A quick tutorial on the MicrosoftNgram Python library.
Top 100K words for Apr10 body stream is now available for analysis.
Learn about some of the details of tokenization in our service
Different models reflect different writing styles on the web.