Welcome to MSDN Blogs Sign in | Join | Help

MOSS Search Word Stemming - Part 2

So how Does MOSS Expand Search Query Terms to Related Words?

 

Here is how this works in MOSS:

 

In MOSS, stemming is used in combination with the word breaker component which determines where word boundaries are. The word breaker is used at both index and query time while the stemmer is used only at query time for most languages (the exceptions currently are Arabic and Hebrew) to perform both morphological analysis and morphological generation. In the case of Arabic and Hebrew, stemming is restricted to morphological analysis at both query and index time. A stemmer links word forms to their base form. For example, ”running,” ”ran,” and ”runs“ are all variants of the verb ”to run.” Stemming is currently turned off by default for some languages including English. Stemmers are only available for languages which have significant morphological variation among their word forms. This means that for languages where stemmers are not available (such as Vietnamese) turning on this feature in the Search Result Page (CoreResult Web Part) will not have any effect, since in such languages exact match is all that is needed.

 

Word Stemming is NOT the same thing as Wild Card Searching, which our engine supports as well. Wild Card searching has to do with doing searches with * in the query. This means you are asking the search engine to find you all words that start with the text string and end with anything, since * means match any character any number of times until you reach the end of the word which in most languages (excluding most East Asian languages) is indicated by a white space.  So a search query using * such as "Share*" will return results including "SharePoint", while a search query using morphological processing would bring back "sharing", which is an inflectional variant of Share. Wild Card searching and Word Stemming are often used to refer to the same thing but they are in fact separate and different mechanisms which can return different results.

 

Word Stemming would bring back words closely related to the query terms (usually inflectional variants for most languages, but for some languages derivational variants as well).

 

 For example, for the following queries, here are some sample results

  • If you type in "run" --> in addition to exact matches on “run”, it will bring back matches on "runs", "ran" and "running"
  • If you type in "page" --> in addition to exact matches on “page” it will bring back matches on "pages", "paged" and "paging"
  • If you type in "basket" --> in addition to exact matches on “basket” it will find "baskets", but it will not find "basketball".  A wild card search for “basket*” would find basketball, which our engine supports and I will discuss this in another article. Word Stemming does not handle this currently because we have focused on matching inflectional variants of words only rather than derivational variants.

However this option is turned off by default out of the box for English and some other languages. You can turn this on by going to the Search Results Web Part, and then Options and turn on this feature which is called “Enable Search Term Stemming”.

 

Thanks for Ian Johnson from the Natural Language Group at Microsoft for providing his feedback on this.

 

Hope that helps

Mike

 

Published Wednesday, December 27, 2006 6:35 AM by miketag

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# re: MOSS Search Word Stemming - Part 2

Great investigation!  Thanks!

Wednesday, December 27, 2006 1:53 AM by aches

# MOSS Search Word Stemming

Two posts that explain Search Word Stemming in MOSS by Mike TaghizadehMOSS Search Word Stemming - Part...

Wednesday, December 27, 2006 2:37 AM by Mohamed Yehia - Microsoft

# MOSS 2007 Search Stemming

This article implies support for stemming but it does not seem to be enabled by default. Stemming =...

# re: MOSS Search Word Stemming - Part 2

Still wondering where that option is. I've looked all over and can't find it. I googled and still can't find it. Maybe the option to turn on stemming doesn't exist. At least that's what me thinks.

Tuesday, January 02, 2007 12:49 AM by John Smith

# re: MOSS Search Word Stemming - Part 2

Ah, didn't realise it was switched off by default.  I've just chatted about this in more detail over at my blog, but if you are looking to switch it on, here's my quick and dirty method:

- Go to the search page, enter any old query to return the search results page

- Under Site Actions, select 'Edit page'

- Locate the 'Search Core Results' web part (usually in the bottom zone)

- From the Edit button, select 'Modify shared web part'

- In the task pane that appears on the right hand side, under 'Results Query Options', check the box labeled 'Enable Search Term Stemming'

Hey presto, stemming is switched on.  Be warned, it will increase your index size and potentially impact search performance

Tuesday, January 02, 2007 7:36 AM by Sharon

# re: MOSS Search Word Stemming - Part 2

Thank you Sharon. Works like a charm.

Tuesday, January 02, 2007 11:21 AM by John Smith

# Recommended Reading for January

Recommended Reading for January (click here for previous recommendations): · MOSS Search Word Stemming:

Thursday, January 04, 2007 4:42 AM by Microsoft SharePoint Products and Technologies Team Blog

# re: MOSS Search Word Stemming - Part 2

A quick follow up on Sharon's comment.

Is turning on stemming really increasing the index size?

My reading of the post is that stemming is only configurable for the search web part. Which means that when you type the keyword run, the search engine will also search for ran, running, runs, etc... in the index. But it doesn't change the composition of the index.

Also are there any numbers out there on the impact of turning on that feature in terms of precision/recall and performances?

Thanks,

Tony.

Thursday, January 04, 2007 10:05 AM by Tony

# re: MOSS Search Word Stemming - Part 2

Is there a way to turn on stemming just for People search?

Thursday, January 04, 2007 11:56 AM by Tom Baldwin

# re: MOSS Search Word Stemming - Part 2

Stemming is for the results web part which brings content and people back. You can look into building your own web part which seperates the two and you can pick and choose.

Thursday, January 04, 2007 1:14 PM by miketag

# re: MOSS Search Word Stemming - Part 2

Wildcard Search (e.g. "Share*") doesn't work on my MOSS Box. Even if I enable stemming.

Is there anything else that I need to configure?

Friday, January 05, 2007 11:59 AM by Klaus

# re: MOSS Search Word Stemming - Part 2

Stemming is NOT the same thing as wild card searching. I have talked about in the article. We do support wild card as well, look into the SDK.

Friday, January 05, 2007 1:17 PM by miketag

# re: MOSS Search Word Stemming - Part 2

The only thing that I found in the SDK regarding wild card search involves building a custom web part. Is there no way to simply flip a switch to allow for wild cards in search?

Friday, January 05, 2007 7:07 PM by Attila

# re: MOSS Search Word Stemming - Part 2

No, we support this through building a custom web part.

Friday, January 05, 2007 8:40 PM by miketag

# Recomandare pe Ianuarie

Va recomand sa citi cu caldura aceste articole.... · MOSS Search Word Stemming: Part 1 and Part 2 – written

Saturday, January 06, 2007 9:15 AM by .: Stefan Gabriel Georgescu's blog :.

# re: MOSS Search Word Stemming - Part 2

I think this is really annoying. Why do I have to write my own WebPart just to have Wild Card search. This is such an important feature, which customer complaint alot about in SPS2003. Why can't MS give us this feature out of the box? I really don't get it.

Saturday, January 06, 2007 9:45 AM by Klaus

# re: MOSS Search Word Stemming - Part 2

I think because the performance would suffer... but I also think this is needed out of the box.

Wednesday, January 10, 2007 7:54 AM by Manfred

# re: MOSS Search Word Stemming - Part 2

Does any one know, where i can find an example on how to build a custom web part that has the wild card search feature?

Thanks

Monday, January 15, 2007 8:57 AM by Miguel

# re: MOSS Search Word Stemming - Part 2

I agree with the responder just above me - WHERE can we find the steps to create the custom SQL query.  Wildcards do not work 'out of the box' in MOSS 2007.  Get over it.  Help us create a custom SQL part for it then!

Friday, February 02, 2007 10:40 AM by --David

# re: MOSS Search Word Stemming - Part 2

Hi Mike,

I am new in Enterprise Search, i would highly appreciate if you can help me in the following: I have a Document Library called "CVs" that includes Word Documents for our employees' CVs,an example for a file name: Bashir's CV. When i type CV in the search center search box, i only get all the employees' CVs without the URL of the Document Library "CVs". However, if i type CVs, i get the Document Library URL and all the employees' CVs. Should i enable any feature in MOSS?

Thank you in advance

Bashir Jadallah

bjadallah@netways.com

Monday, February 12, 2007 4:48 AM by Bashir Jadallah

# Mike Taghizadeh covering MOSS 2007 Search Capabilities

Mike Taghizadeh covering MOSS 2007 Search Capabilities

Thursday, February 22, 2007 4:40 PM by Lars Fastrup on SharePoint Search

# re: MOSS Search Word Stemming - Part 2

If the configuration to allow stemming is set in the core results web part, will that setting be published to from an authoring environment to a production environment in a WCM publishing scenario? Or, do I have to go to the search page on the production host and make the same config?

Wednesday, July 25, 2007 7:54 AM by Rblitz

# re: MOSS Search Word Stemming - Part 2

I followed your instructions (thanks!!!!), checked the checkbox for enable stemming in the search core results webpart.  Next I did a full crawl.  After the crawl, when I searched for "egger" I got a hit, but "egg" still got me nuthin.  Went back and checked that the box was still checked, which it was.  Am I missing something?

Wednesday, August 29, 2007 2:02 PM by Steve Rubin

# re: MOSS Search Word Stemming - Part 2

Is stemming and word breaker supported on the Danish language?

It works for me on English, but not on Danish even that I have set the Browser language to da-DK.

Do I miss something?

Friday, November 16, 2007 9:07 AM by Keutmann

# re: MOSS Search Word Stemming - Part 2

Ok, if a search engine cannot support wildcard search OOTB as MS says, when its a total failure. I mean why do i have to write a custom web part for a very basic requirement, I  can write a custom web part for custom client needs for example, but not for something that should be out of the box. All what I am saying to my clients, after installing sharepoint, is that to not rely on its search engine. Prove us the contrary and provide wildcard search out of the box!!!!

Monday, November 19, 2007 9:39 AM by Ray

# re: MOSS Search Word Stemming - Part 2

Ray (et al) I'm in total agreement. why bother saying you have search capabilities when it does not include wildcard.  isn't that the whole  idea behind any search??  If I'm "searching" for something vague like "micro"  the search results should bring back, microscope, microgram, micrometer, microsoft, microphone, etc.  get it???  now I would have to actually type in microsoft to get info about that searched phrase!  so how MOSS has search working would basically NOT all someone to find documents, names in SP that have microsoft in them if the search keyword = micro.

Wildcard shoudl be the default and NON wildcard should be an option if needed to be turned off.  

I'm so close to wrapping up my custom search for a client and now have to figure out wildcard searches to be 100% done.

Tuesday, November 27, 2007 3:16 PM by Jeff

# re: MOSS Search Word Stemming - Part 2

I cannot get this to work as well. I am using an English MOSS environment where the "Enable Search Term Stemming" checkbox is unchecked in the search results page.

I have done the following:

Uploaded a text file with the term "new york" to a MOSS site;

replaced the tsneu.xml in C:\Program Files\Microsoft Office Servers\12.0\Data\Office Server\Applications\<GUID>\Config with the following contents

<XML ID="Microsoft Search Thesaurus">

<thesaurus xmlns="x-schema:tsSchema.xml">

 <diacritics_sensitive>0</diacritics_sensitive>

 <expansion>

  <sub>detroit</sub>

  <sub>new york</sub>

 </expansion>

</thesaurus>

</XML>

Restarted Office SharePoint Server Search and performed a full crawl afterwards.

Searched for "detroit" expecting to retrieve the "new york" file.

But I did not get the expected results. Dit I miss something.

I hope you can help me.

Best regards,

Andries

Thursday, December 06, 2007 8:55 AM by Andries den Haan

# re: MOSS Search Word Stemming - Part 2

Wow, as much as I love MOSS yet no freakin' wildcard capability in search?  Every, and I mean _EVERY_ search engine in the world has this - most of them free - yet MS can't figure it out in its flagship enterprise application??  What an unbelievable farce...

Thursday, December 06, 2007 6:29 PM by Clayton

# re: MOSS Search Word Stemming - Part 2

I have turned on the Word Stemming and it works find if you go to the Search Subsite but what if you use the Search Box on the Home Page. This displays the results in the OSSSearchResults.aspx page in the _layouts folder and does not seem to return the same results. i.e it will not implement Word Stemming. Is there a way to make this work the same as the Search Subsite?

Thursday, December 20, 2007 4:44 PM by Troy

# MOSS Search Word Stemming - Followed by Problem with indexing

We configured the Word Stemming for finnish language support and after that the search fails to function correctly. There seems to be something wrong with crawl, because the number of indexed items has dropped radically. Our environment has been migrated from SPS2003. Can "legacy leftovers from SPS2003" cause problems? Another question: How much do browser settings affect the search?

Tuesday, January 22, 2008 2:06 AM by Mikko

# re: MOSS Search Word Stemming - Part 2 - Danish

In response to the question by Keutmann. Yes stemming is supported on the danish language. I have it running in a solution.

Friday, February 01, 2008 2:56 PM by Allan Pedersen

# re: MOSS Search Word Stemming - Part 2

I cant believe there is no wild card search. I've been searching all over the place trying to get this functionality from trying Faceted search, wild card people search http://www.sharepointbuzz.com/index.php/2008/01/24/how-to-extend-wildcard-people-search-on-moss-2007/, Ontilica search, everything on codeplex.com and nothing really does what is needed. I just had a search engagement here with Microsoft and the consultant can't understand why MS did not include wildcard search either. Please give us this feature.

Friday, April 04, 2008 9:55 AM by John

# re: MOSS Search Word Stemming - Part 2

This is absurd!  I too am flabbergasted that such a BASIC search feature is not included by default!  I am not going to sit here and write a custom web part for a feature that is included in *every* search engine in the world.

If Windows Desktop Search can do this, why the *hell* can't MOSS 2007?  We are in the pilot stage for MOSS 2007 and if I have to code this by hand, then trust me, we will not be buying such a POS.

Tuesday, May 27, 2008 4:55 PM by james barker

# re: MOSS Search Word Stemming - Part 2

Is it possible to highlight the word when using stemming. If I type in tutorial, the word tutorial will be highlighted but not in results where it's tutorials.

An answer would be appreciated. Thanks

Thursday, July 17, 2008 7:32 PM by Kevin Gauthier

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker