<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Jie Li's GeekWorld : WordBreaker</title><link>http://blogs.msdn.com/opal/archive/tags/WordBreaker/default.aspx</link><description>Tags: WordBreaker</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>Subject Search Result Evaluation Tool</title><link>http://blogs.msdn.com/opal/archive/2007/12/17/subject-search-result-evaluation-tool.aspx</link><pubDate>Mon, 17 Dec 2007 08:08:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6786400</guid><dc:creator>Jie Li</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/opal/comments/6786400.aspx</comments><wfw:commentRss>http://blogs.msdn.com/opal/commentrss.aspx?PostID=6786400</wfw:commentRss><description>&lt;p&gt;&lt;a title="http://www.codeplex.com/SubjectSearchEva" href="http://www.codeplex.com/SubjectSearchEva"&gt;http://www.codeplex.com/SubjectSearchEva&lt;/a&gt;&lt;/p&gt; &lt;p&gt;This is another result for Hylanda wordbreaker testing... &lt;/p&gt; &lt;p&gt;The story is very interesting. After we got 18000+ search results for each wordbreaker, we need to evaluate the results to decide when, where and how the new wordbreaker is better than the original one. Such thing cannot be decided by myself, or by any of the physical measure. When people say "Hey, you know sometimes Yahoo is better than Google", they are measuring it with their own eyes.&lt;/p&gt; &lt;p&gt;This is quite similar with my research for my Master's degree. I studied Psychological Acoustics for two years in that time, try to use some way to measure sound quality. This is called subject quality evaluation. So I used the same method, to measure search result quality.&lt;/p&gt; &lt;p&gt;This method is called Paired Comparison. The tester will be given two sets of result, and choose which is better. For example, there're 50 results in each set(A, B). So first, A1 and B1 show up and the user choose A, then A2/B2 show up and user choose B... In the end, count the number of A and B, you will get tester's preference. &lt;/p&gt; &lt;p&gt;To make sure there's no psychological interfere during the test, the order and set can be randomized. Tester will not know which is A and which is B, they just need to choose the better one.&lt;/p&gt; &lt;p&gt;Here's the interface of the program.&lt;/p&gt; &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/SubjectSearchResultEvaluationTool_B453/snap135_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="250" alt="snap135" src="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/SubjectSearchResultEvaluationTool_B453/snap135_thumb.jpg" width="459" border="0"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt; &lt;p&gt;Testing. If tester cannot choose which is better(sometimes the two sets are the same good or bad), he can choose not to vote or vote for none.&lt;/p&gt; &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/SubjectSearchResultEvaluationTool_B453/snap136_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="317" alt="snap136" src="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/SubjectSearchResultEvaluationTool_B453/snap136_thumb.jpg" width="644" border="0"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt; &lt;p&gt;Final result will displayed in a message box.&lt;/p&gt; &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/SubjectSearchResultEvaluationTool_B453/snap138_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="204" alt="snap138" src="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/SubjectSearchResultEvaluationTool_B453/snap138_thumb.jpg" width="371" border="0"&gt;&lt;/a&gt;&lt;/p&gt; &lt;p&gt;&amp;nbsp;&lt;/p&gt; &lt;p&gt;Of course, this is just a simple proof of concept. I shared the source and a sample in the release package so you can do your own research work.&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=6786400" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/opal/archive/tags/Enterprise+Search/default.aspx">Enterprise Search</category><category domain="http://blogs.msdn.com/opal/archive/tags/User+Experience/default.aspx">User Experience</category><category domain="http://blogs.msdn.com/opal/archive/tags/WordBreaker/default.aspx">WordBreaker</category><category domain="http://blogs.msdn.com/opal/archive/tags/Subject+Search+Result+Evaluation/default.aspx">Subject Search Result Evaluation</category></item><item><title>Improve User Experience in Enterprise Search Step By Step - Part IV - Relevancy Tuning by WordBreaker</title><link>http://blogs.msdn.com/opal/archive/2007/11/23/improve-user-experience-in-enterprise-search-step-by-step-part-iv-relevancy-tuning-by-wordbreaker.aspx</link><pubDate>Fri, 23 Nov 2007 16:18:04 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:6483779</guid><dc:creator>Jie Li</dc:creator><slash:comments>1</slash:comments><comments>http://blogs.msdn.com/opal/comments/6483779.aspx</comments><wfw:commentRss>http://blogs.msdn.com/opal/commentrss.aspx?PostID=6483779</wfw:commentRss><description>&lt;p&gt;We have been talking about XSL/XML for so long a time. Now we want to give relevancy a shot.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Hey, relevancy? &lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Yes, relevancy.&lt;/p&gt; &lt;p&gt;Relevancy is the most important thing for a search engine, more important than the page numbers it crawled, more important than the result update interval in most of the case - because users always look at the top results. &lt;/p&gt; &lt;p&gt;Relevancy is a very complex problem. It is affected by many factors, it is quite different in different languages. In this article, we will take English and Chinese for example. (I know a little German and Japanese as well but ...)&lt;/p&gt; &lt;p&gt;In this series we will go through wordbreaker, weighting, and other useful stuff. Because I'm now in a Karaoke party, I cannot describe everything in detail. I assume you already know how to deal with Bestbets and Did you mean feature. If you want to have some other information, please read &lt;em&gt;Luca Bandinelli's multilingual whitepaper.&lt;/em&gt;&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Wordbreaker&lt;/strong&gt;&lt;/p&gt; &lt;p&gt;Wordbreaker is the first issue if you hate a search engine. although word breaking in Latin languages(English, Spanish, French, German, Dutch...) is much easier than that in other languages(Chinese, Japanese, Korean, Arabic...), it's still a boring thing to deal with.&lt;/p&gt; &lt;p&gt;MOSS/MSS comes with many wordbreakers, but sometimes you may not be very satisfy with them. Is there any 3rd party word breaker I can use? Yes, some Microsoft partners have quite a long history in delivering word breaking technology to production use. For example, Hylanda is a leading Chinese word breaking technology company. They did a pretty good Chinese wordbreaker for SQL Server 2000. Since we didn't change the interface of wordbreaker even in MOSS/MSS, it can be used directly here. &lt;/p&gt; &lt;p&gt;To change a wordbreaker, you need to do the following things.&lt;/p&gt; &lt;p&gt;a. Register the wordbreaker.&lt;/p&gt; &lt;p&gt;This depends on the installation manual of the wordbreaker:). But generally speaking, it should be something like this:&lt;/p&gt; &lt;p&gt;regsvr32 YourWordBreaker.dll &lt;/p&gt; &lt;p&gt;b. Get GUID string of your new wordbreaker. We will need it in the next step.&lt;/p&gt; &lt;p&gt;Search for your wordbreaker dll's name in registry, and you will find something located in CLSID branch. For example:&lt;/p&gt; &lt;p&gt;HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\&lt;font color="#ff0000"&gt;{4474fffd-da87-4116-9be9-874939d2bd04}&lt;/font&gt;&lt;/p&gt; &lt;p&gt;Copy this guid string for further usage.&lt;/p&gt; &lt;p&gt;c. &lt;font color="#004000"&gt;&lt;font color="#000000"&gt;Navigate to the branch of your language. Replace the values with your wordbreaker.&lt;/font&gt;&lt;/font&gt;&lt;/p&gt; &lt;p&gt;HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\LanguageResources\Default\&lt;font color="#004000"&gt;YourLanguage&lt;/font&gt;&lt;/p&gt; &lt;p&gt;WBDLLPathOverride is the path of your wordbreaker dll. In my case, my wordbreaker is located at C:\Hylanda\HlChsBrKr.dll&lt;/p&gt; &lt;p&gt;WBreakerClass is the GUID string you just got.&lt;/p&gt; &lt;p&gt;&lt;a href="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/ImproveUserExperienceinEnterpriseSearchS_7AD1/snap119_2.jpg"&gt;&lt;img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" height="203" alt="snap119" src="http://blogs.msdn.com/blogfiles/opal/WindowsLiveWriter/ImproveUserExperienceinEnterpriseSearchS_7AD1/snap119_thumb.jpg" width="835" border="0"&gt;&lt;/a&gt; &lt;/p&gt; &lt;p&gt;Don't forget to restart your search service by net stop osearch, net start osearch. Then do a FULL crawl for all the content source. If the wordbreaker did the crawl job mismatch with the new installed wordbreaker, it will result a bad search result because of query time word breaking.&lt;/p&gt; &lt;p&gt;Sorry I can't post the images of search results, they are still in testing process. But I can tell you the improvement is HUGE.&lt;/p&gt;&lt;img src="http://blogs.msdn.com/aggbug.aspx?PostID=6483779" width="1" height="1"&gt;</description><category domain="http://blogs.msdn.com/opal/archive/tags/Enterprise+Search/default.aspx">Enterprise Search</category><category domain="http://blogs.msdn.com/opal/archive/tags/Microsoft+Search+Server+2008/default.aspx">Microsoft Search Server 2008</category><category domain="http://blogs.msdn.com/opal/archive/tags/User+Experience/default.aspx">User Experience</category><category domain="http://blogs.msdn.com/opal/archive/tags/WordBreaker/default.aspx">WordBreaker</category></item></channel></rss>