In this part we will touch a little of the core relevancy model. Be careful! The modification work described in this part can be a disaster to your search result if you don't know what you are doing.
The theory in this part of relevancy tuning comes from a technical article wrote by Dmitriy Meyerzon, Avi Schmueli and Jo-Anne West :Evaluating and Customizing Search Relevance in SharePoint Server 2007. You can find it in MOSS SDK and MSDN library. It should be your pre-reading material.
After go through the article, you should now understand that we can change several parameters to meet our custom needs. For example, the weighting of a managed property.
Scenario: The end users have a request - A field in their Lotus Notes system is very important, and when they do a search, they want the top results should emphasize this field.
Brilliant idea. But it will be very tough if your search engine cannot make this change. Luckily, the relevancy model in MOSS can be adjusted to meet the request. Here' re the steps.
1. We need to map this field to a managed property. For example, this field has a name:"NotesTitle", but in fact it is a subject not the title of the document. You need to create a managed property first and then map this field to this new property. Setup the Lotus Notes content source first,then crawl the database. This will make the field name shown in crawled properties list. And then you can do the mapping job.
2. After field "NotesTile" is mapped to a managed property "NotesTitle", you can now change the weight of this property. We need to call object model to do this, but there is a free tool on SharePointSearch.com.
You can see the weight and length parameters of "NotesTitle" are all zero. By default, MOSS will only have three properties have this setting enabled, they are Title(75.855), Author(8.215) and FileName(29.43). If you want to change a new managed property and make it more important, you need to write a higher value in their weight parameter.
Let's change it to a bigger number. The bigger the weight, the more important this property will have. (Please note there's a bug in this tool, you cannot input any value higher than 99 actually)
Then press "Update". No restart of any service, the relevancy will be applied automatically.
So here's the result. If you can read Chinese, you will get a more clear idea.
Result before weight tuning, please note "NotesTitle" field is listed in grey font, below Path property. In the top results, you cannot find anything of the keyword in NotesTitle. And the Rank of first result is 854.
Result after tuning. You can see the keyword is not in the "general title", but in our NotesTitle. And the Rank of first result is 879.
This can be applied to many factors. For example, you can modify languageprior parameter to make the result langauage bias. When an English user searches for something, he will get English results first. When a Chinese user searches for something, he will get Chinese content first. The language is detected through language setting of client's browser.
Feel free to modify it and use on your own box. It may need some tweaks to fit your environment.
This is another result for Hylanda wordbreaker testing...
The story is very interesting. After we got 18000+ search results for each wordbreaker, we need to evaluate the results to decide when, where and how the new wordbreaker is better than the original one. Such thing cannot be decided by myself, or by any of the physical measure. When people say "Hey, you know sometimes Yahoo is better than Google", they are measuring it with their own eyes.
This is quite similar with my research for my Master's degree. I studied Psychological Acoustics for two years in that time, try to use some way to measure sound quality. This is called subject quality evaluation. So I used the same method, to measure search result quality.
This method is called Paired Comparison. The tester will be given two sets of result, and choose which is better. For example, there're 50 results in each set(A, B). So first, A1 and B1 show up and the user choose A, then A2/B2 show up and user choose B... In the end, count the number of A and B, you will get tester's preference.
To make sure there's no psychological interfere during the test, the order and set can be randomized. Tester will not know which is A and which is B, they just need to choose the better one.
Here's the interface of the program.
Testing. If tester cannot choose which is better(sometimes the two sets are the same good or bad), he can choose not to vote or vote for none.
Final result will displayed in a message box.
Of course, this is just a simple proof of concept. I shared the source and a sample in the release package so you can do your own research work.
When I was testing the new wordbreaker provided by Hylanda, they gave me a keyword library which had 100,000+ keywords. Each of them needed to be queried, and the results needed to be saved.
After some sorting/removing garbage work, there were still 18,000+ keywords. It would really be a pain to query them by hand.
There is also a request from http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2512852&SiteID=1.
So I wrote a small tool to do this job automatically. It reads keyword list from a pure text file (each keyword in one line), and then queries them on MSS/MOSS, saves the results in XML files to the disk.
The source code is very simple. Most part of it is from MOSS SDK. So feel free to use it.