<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Mike Taghizadeh's Blog : Stemming</title><link>http://blogs.msdn.com/miketag/archive/tags/Stemming/default.aspx</link><description>Tags: Stemming</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>MOSS Search Word Stemming - Part 2</title><link>http://blogs.msdn.com/miketag/archive/2006/12/27/moss-search-word-stemming-part-2.aspx</link><pubDate>Wed, 27 Dec 2006 09:35:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1368370</guid><dc:creator>miketag</dc:creator><slash:comments>38</slash:comments><comments>http://blogs.msdn.com/miketag/comments/1368370.aspx</comments><wfw:commentRss>http://blogs.msdn.com/miketag/commentrss.aspx?PostID=1368370</wfw:commentRss><description>&lt;DIV&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;B&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;So how Does MOSS Expand Search Query Terms to Related Words?&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Here is how this works in MOSS:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;In MOSS, stemming is used in combination with the word breaker component which determines where word boundaries are. The word breaker is used at both index and query time while the stemmer is used only at query time for most languages (the exceptions currently are Arabic and Hebrew) to perform both morphological analysis and morphological generation. In the case of Arabic and Hebrew, stemming is restricted to morphological analysis at both query and index time. A stemmer links word forms to their base form. For example, ”running,” ”ran,” and ”runs“ are all variants of the verb ”to run.” Stemming is currently turned off by default for some languages including English. Stemmers are only available for languages which have significant morphological variation among their word forms. This means that for languages where stemmers are not available (such as Vietnamese) turning on this feature in the Search Result Page (CoreResult Web Part) will not have any effect, since in such languages exact match is all that is needed.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face=Verdana&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Word Stemming is &lt;U&gt;NOT&lt;/U&gt; the same thing as Wild Card Searching, which our engine supports as well. Wild Card searching has to do with doing searches with * in the query. This means you are asking the search engine to find you all words that start with the text string and end with anything, since * means match any character any number of times until you reach the end of the word which in most languages (excluding most East Asian languages) is indicated by a white space.&amp;nbsp; So a search query using * such as "Share*" will return results including "SharePoint", while a search query using morphological processing would bring back "sharing", which is an inflectional variant of Share. Wild Card searching and Word Stemming are often used to refer to the same thing but they are in fact separate and different mechanisms which can return different results. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face=Verdana&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Word Stemming would bring back words closely related to the query terms (usually inflectional variants for most languages, but for some languages derivational variants as well). &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face=Verdana&gt;&lt;/FONT&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&amp;nbsp;For example, for the following queries, here are some sample results &lt;/SPAN&gt;&lt;/P&gt;
&lt;UL type=disc&gt;
&lt;LI class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;If you type in "run" --&amp;gt; in addition to exact matches on “run”, it will bring back matches on "runs", "ran"&amp;nbsp;and "running" &lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;If you type in "page" --&amp;gt; in addition to exact matches on “page” it will bring back matches on "pages", "paged"&amp;nbsp;and "paging" &lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;If you type in "basket" --&amp;gt; in addition to exact matches on “basket” it will find "baskets", but it will not find "basketball".&amp;nbsp; A wild card search for “basket*” would find basketball, which our engine supports and I will discuss this in another article. Word Stemming does not handle this currently because we have focused on matching inflectional variants of words only rather than derivational variants. &lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;However this option is turned off by default out of the box for English and some other languages. You can turn this on by going to the Search Results Web Part, and then Options and turn on this feature which is called “Enable Search Term Stemming”. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Thanks for Ian Johnson from the Natural Language Group at Microsoft for providing his feedback on this.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Hope that helps&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Mike&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;/DIV&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=1368370" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/miketag/archive/tags/MOSS/default.aspx">MOSS</category><category domain="http://blogs.msdn.com/miketag/archive/tags/Search/default.aspx">Search</category><category domain="http://blogs.msdn.com/miketag/archive/tags/Stemming/default.aspx">Stemming</category><category domain="http://blogs.msdn.com/miketag/archive/tags/Word+Stemming/default.aspx">Word Stemming</category></item><item><title>MOSS Search Word Stemming - Part 1</title><link>http://blogs.msdn.com/miketag/archive/2006/12/21/moss-search-word-stemming-part-1.aspx</link><pubDate>Thu, 21 Dec 2006 21:37:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:1341749</guid><dc:creator>miketag</dc:creator><slash:comments>6</slash:comments><comments>http://blogs.msdn.com/miketag/comments/1341749.aspx</comments><wfw:commentRss>http://blogs.msdn.com/miketag/commentrss.aspx?PostID=1341749</wfw:commentRss><description>&lt;DIV&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN lang=it&gt;&lt;FONT face=Verdana size=2&gt;Hi all,&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN lang=it&gt;&lt;FONT face=Verdana size=2&gt;So I have been getting some questions around how MOSS does Word Stemming, so here is what I have been able to gather/research:&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;B&gt;&lt;SPAN lang=IT style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;MOSS Search Stemming:&lt;/SPAN&gt;&lt;/B&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN lang=IT style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;The word&lt;B&gt; Stemming&lt;/B&gt; refers to the process of stripping off endings of words at query/index time so that different search terms will match and retrieve documents containing related words in the index. A hypothetical example of this would be the reduction of the related words “diet”, “diets”, “dieting”, “dieted”, “dietary”, “dietician”, “dieticians”, etc to a single stem “diet” which would allow a query for any one of these words to be matched against documents containing any of the other words.&amp;nbsp; Computational implementations of stemming are typically not based on linguistic notions of stem and affix, but rather just frequently occurring character sequences. &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;The correct linguistic terminology for the process of stemming described above is &lt;B&gt;&lt;U&gt;morphological processing&lt;/U&gt;&lt;/B&gt;, meaning the linking of a given word form to its base form and other related word forms.&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Morphological processing has two main aspects: &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoListParagraph style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;1.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Morphological Analysis &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoListParagraph style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.25in"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;2.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Morphological Generation.&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;B&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Morphological Analysis&lt;/SPAN&gt;&lt;/B&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt; refers to the process of analyzing a given word form by assigning its correct morphosyntactic data (such as person, number, gender, tense, etc) and identifying its internal structure in terms of base form and any prefixes and suffixes which can be &lt;U&gt;inflectional&lt;/U&gt; (and do not change the part of speech or meaning of the word: e.g. diet, diets) or &lt;U&gt;derivational&lt;/U&gt; (which do change the part of speech or meaning of the word, e.g. diet, dietary).&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;B&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Morphological Generation&lt;/SPAN&gt;&lt;/B&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt; goes ones step further.&amp;nbsp; In addition to analyzing an inflected form down to its base form, it also generates all inflected forms which are related to the same base form.&amp;nbsp; In this case, the inflected verb form “loved” is reduced to the base form “love” by Morphological Analysis and then Morphological Generation generates all related inflected forms, i.e. “love”, “loves”, “loving” “loved” which are then matched against the document index and all matches are retrieved and displayed to the user (normally in search engines exact matches are given a higher weighting than the related words and are displayed at the top of the results list).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;The decision as to whether to apply Morphological Analysis or both Morphological Analysis and Morphological Generation in Search depends on two major considerations:&amp;nbsp; (a) whether there are any restrictions on the size of the search index, and (b) the level of morphological complexity of the language.&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoListParagraph style="MARGIN: auto 0in auto 0.5in; TEXT-INDENT: -0.25in"&gt;&lt;FONT face=Verdana&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: #1f497d"&gt;(a)&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; COLOR: #1f497d; FONT-STYLE: normal; FONT-FAMILY: Verdana; FONT-VARIANT: normal"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;If we do not want the size of the index to increase as a result of this morphological processing, then we will leave the index unchanged and use Morphological Analysis, and Morphological Generation at query time to expand a given search query term into a list of inflectionally- and/or derivationally-related word forms which are all then matched against the search index and return a list of results in which exact matches are listed first.&amp;nbsp; If the size of the index is permitted to increase then we can use Morphological Analysis at index time to store a base form along with each inflected form in the index.&amp;nbsp; At query time we will also use Morphological Analysis to attach a base form to the query term and then search on both items.&amp;nbsp; All matches for both query term and its base form will be retrieved and any exact matches on the query term listed first.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoListParagraph style="MARGIN: auto 0in auto 0.5in; TEXT-INDENT: -0.25in"&gt;&lt;FONT face=Verdana&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: #1f497d"&gt;(b)&lt;/SPAN&gt;&lt;/FONT&gt;&lt;SPAN style="FONT-WEIGHT: normal; FONT-SIZE: 7pt; COLOR: #1f497d; FONT-STYLE: normal; FONT-FAMILY: Verdana; FONT-VARIANT: normal"&gt;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;If the morphology of a particular language is very rich then a very large number of query terms could be generated at run time.&amp;nbsp; This could have a profound effect on processing efficiency.&amp;nbsp; In languages such as Arabic and Hebrew therefore which have a very large number of forms of a single word (the number can get into the thousands), it is therefore preferable to avoid morphological generation and just use Morphological Analysis at both index and query time.&amp;nbsp; This increases the size of the index but has the advantage that the number of forms of a word stored in the index and generated from the query will not exceed 2 (the word itself and its associated base form).&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoListParagraph style="MARGIN: auto 0in auto 0.5in; TEXT-INDENT: -0.25in"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;PS: For Arabic related issues and questions, please see:&lt;/SPAN&gt;&lt;/P&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;
&lt;P class=MsoNormal dir=ltr style="DIRECTION: ltr; unicode-bidi: embed; TEXT-ALIGN: left"&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Arabic specific MOSS&lt;/SPAN&gt; &lt;SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"&gt;&lt;A title=http://www.microsoft.com/middleeast/arabicdev/office/office2007/SharePoint.aspx href="http://www.microsoft.com/middleeast/arabicdev/office/office2007/SharePoint.aspx"&gt;&lt;FONT color=#800080&gt;http://www.microsoft.com/middleeast/arabicdev/office/office2007/SharePoint.aspx&lt;/FONT&gt;&lt;/A&gt; , &lt;BR&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Arabic MOSS search &lt;A title=http://www.microsoft.com/middleeast/arabicdev/office/office2007/Search.aspx href="http://www.microsoft.com/middleeast/arabicdev/office/office2007/Search.aspx"&gt;http://www.microsoft.com/middleeast/arabicdev/office/office2007/Search.aspx&lt;/A&gt; ,&lt;BR&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Arabic 2007 office servers &lt;A title=http://www.microsoft.com/middleeast/arabicdev/office/office2007/office2007server.aspx href="http://www.microsoft.com/middleeast/arabicdev/office/office2007/office2007server.aspx"&gt;http://www.microsoft.com/middleeast/arabicdev/office/office2007/office2007server.aspx&lt;/A&gt;&lt;/SPAN&gt;&lt;SPAN style="FONT-SIZE: 10pt; COLOR: black; FONT-FAMILY: 'Arial','sans-serif'"&gt;&lt;?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /&gt;&lt;o:p&gt;&lt;/o:p&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/SPAN&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;The key point of difference between wild card searching and word stemming (or morphological processing) is that the former is just string based and allows the user to find in some cases both inflectional and derivational variants of the query term.&amp;nbsp; The Stemming approach deals very well with inflectional variants but we currently don’t handle derivational morphology for most languages.&amp;nbsp; In languages where we do have some limited treatment of derivational morphology such as Arabic and Hebrew, this treatment is limited to high frequency terms.&lt;/SPAN&gt;&lt;/P&gt;&lt;/DIV&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: Verdana"&gt;Thanks for Ian Johnson from the Natural Language Group at Microsoft for providing his feedback on this.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face=Verdana size=2&gt;Stay tuned for Part 2, where I will explain how MOSS does Stemming in more detail.&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face=Verdana size=2&gt;Hope that helps&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt"&gt;&lt;FONT face=Verdana size=2&gt;Mike&lt;/FONT&gt;&lt;/P&gt;
&lt;P class=MsoNormal style="MARGIN: 0in 0in 0pt" mce_keep="true"&gt;&amp;nbsp;&lt;/P&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=1341749" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/miketag/archive/tags/MOSS/default.aspx">MOSS</category><category domain="http://blogs.msdn.com/miketag/archive/tags/SharePoint/default.aspx">SharePoint</category><category domain="http://blogs.msdn.com/miketag/archive/tags/Search/default.aspx">Search</category><category domain="http://blogs.msdn.com/miketag/archive/tags/Stemming/default.aspx">Stemming</category><category domain="http://blogs.msdn.com/miketag/archive/tags/Word+Stemming/default.aspx">Word Stemming</category></item></channel></rss>