Relevance is about how closely the search results match what the user wanted to find.
To improve the search results that MOSS Search returns, we need to understand how search results are ranked:
SharePoint performs two types of ranking, dynamic ranking and static ranking. Dynamic ranking, is something that happens on the Query Servers and depends on query and term matching whereas static ranking occurs at index time. Static ranking is query independent and is computed at index time. Lets dive deeper into each of these:
Dynamic Ranking:
This looks at the content or property values for a content item such as:
Anchor Text
This evaluates the text that describes a target. E.g. <A href=http://portal/site> Company Name Enterprise Gateway Portal</A>
Property Weighting
Property weighting infers that matches on a specific property value can be more relevant than other property values or in document’s body.
string strURL = "http://<SiteName>"; SearchContext srchContext; using (SPSite site = new SPSite("http://yourSiteName")) { srchContext = SearchContext.GetContext(site); } Ranking ranking = new ranking(srchContext)); foreach (RankingParameter param in ranking.RankingParameters) { RankingParameter lookedup = ranking.RankingParameters[param.Name]; Console.WriteLine(lookedup.Name + ": " + lookedup.Value); }
string strURL = "http://<SiteName>";
SearchContext srchContext;
using (SPSite site = new SPSite("http://yourSiteName"))
{
srchContext = SearchContext.GetContext(site);
}
Ranking ranking = new ranking(srchContext));
foreach (RankingParameter param in ranking.RankingParameters)
RankingParameter lookedup = ranking.RankingParameters[param.Name];
Console.WriteLine(lookedup.Name + ": " + lookedup.Value);
Title Extraction
Title is a very important property of ranking and are often wrong (e.g. “Slide 1”, or “Word Template Name”) MOSS 2007 has an intelligent way of overcoming this problem. What is does, is use a text extraction algorithm that generates a shadow title. How does it find a shadow title if one does not exist? It uses the headings inside your document. These are normally displayed using text formatting such as Heading 1 or Heading 2.
Please note that this only works for Office file types, another words, the Office IFilter that MOSS 2007 search uses to pick up this information.
URL Matching
Name of a website is normally a common type of query. MOSS Search matches site name to URL equivalent.
Static Ranking
This describes the ranking that is not impacted by the content or property values for a content item.
File Type Biasing
In most search scenarios, certain file types are more relevant than others. This effects the MOSS Search relevance calculation ranks.
Automatic Language Detection
Foreign language results are less relevant than results in user’s language
Click Distance from authoritative pages
NOTE: the difference between Click Distance and URL Depth. Click distance is not based on URL depth but rather on the path the user takes through pages to get to information.
Authoritative Pages (Configured in SharePoint Central Administration):
URL Depth
Items with shorter urls are more relevant than items placed in longer URLs; E.g. http://msw/ vs http://portal/divisionalsite/ProjectSite1/MeetingSite/ .Short URLS are like prime real estate and organisations tend to allocate them to the most important content.
Relevance Metrics
· Precision@N: Avg. No. Of relevant documents in top 5, 10,etc.
· Mean Average Precision: Avg. Precision from N-1 to R
· Reciprical Rank: 1/rank of the top relevant document
· Normalized Discounted Cumulative Gain (NDCG) : Represents ratio of current ranking to ideal
User’s Perceived Relevance
· Summarization and Highlighting : Query-dependant summarization and highlighting of hits within summary.
· Duplicate removal: Near duplicates documents are detected across index and removed at query time; can be disabled by admin
· Best Bets: Best Bets promotion IS NO LONGER PART OF ranking algorithm
· Did you mean? : Index informed spell checker; Only available for English, Spanish, French, (not sure of last language).
Optimization
· First crawl your content J
· Manage authoritative pages and demoted sites carefully
· Mine query logs to identify keywords
· Review list of descriptions, keywords, and best bets periodically as content prioritization can change over time
· Use admin object model CAREFULLY to change weight given to properties
· Features in ranking formula can also be added using object model to personalize ranking criterias:
o http://msdn2.microsoft.com/en-us/library/microsoft.office.server.search.administration.ranking.rankingparameters.aspx