SharePoint Strategery

Best used when *strategy* breaks down... (blog by Brian Pendergrass, Microsoft SharePoint - Premier Field Engineer)

Beware crawling the non-Default zone for a SharePoint 2013 Web Application

Beware crawling the non-Default zone for a SharePoint 2013 Web Application

Rate This
  • Comments 7

Update: I've now published another post "Problems Crawling the non-Default zone *Explained" that goes on to explain the underlying behaviors that I warned about and described in this post...

---------------------------------------

After playing for a while with SharePoint 2013 Search, I thought we were out of the woods regarding crawls of the non-Default Alternate Access Mapping (AAM) zone for a SharePoint Web Application. This caused all sorts of problems in earlier versions of SharePoint (primarily busted contextual scopes, broken social tagging, and workflow emails linking to the incorrect zone) because there is a built in assumption by other components throughout SharePoint that the Default zone is being crawled.

I'm still working to fully nail down the impacts for SP2013, but, from my initial testing [in SP2013], when crawling a non-Default URL, all search results will be relative to the URL crawled rather than the URL from which you query (and suspect it’s going to break scoping rules for queries as well), meaning you will get unexpected URLs when you query.

Update: I want to seriously caution against using Server Name Mappings, particularly in SharePoint 2013. Admittedly, with SharePoint 2010, Server Name Mappings did appear to provide a workaround. However, although they appear to work, Server Name Mappings were definitely not designed for this particular scenario.

Second, In SharePoint 2013, I know for certain that some managed properties (e.g. SPSiteUrl and ParentUrl to name two) in the Index absolutely do not get *updated by Server Name Mappings, so adding them will only make the problem worse!!! In other words, you'll have some URL-based properties that are relative to one URL and other MPs relative to the mapped URL...

But because Server Name Mappings were not intended for this scenario, I would not have expectation that this should work in all cases.

For example, if I issued a query from some site in the Web Application http://initech, then I should expect all results from this Web Application to be returned relative to http://initech (as in http://initech/result1.aspx and http://initech/result2.aspx). However, if I were crawling the URL of a non-Default zone, then my results will all be returned relative to this non-Default URL (such as: http://bargainclownmart:88/sites/myTeam/result1.aspx and http://bargainclownmart:88/sites/myTeam/result2.aspx ).

Update: I recently published "Alternate Access Mappings (AAMs) *Explained" to provide more insights on AAMs and to better illustrate its often misunderstood concepts.

In this scenario below, I have two Web Applications with the following Alternate Access Mappings (as a side note, I believe Host Named site collections are now the preferred method over AAMs, but I wanted to demonstrate this as an example):

Internal URL Zone Public URL for Zone
http://sp-foo:88 Default http://sp-foo:88
http://testingfoo:88   Intranet http://testingfoo:88
http://bargainclownmart:88 Internet http://bargainclownmart:88
http://bargainclownmart.officespace.lab:88    Extranet      http://bargainclownmart.officespace.lab:88   
 http://faceman  Default  http://faceman 
 http://initech  Intranet  http://initech  
 http://initech.officespace.lab Internet  http://initech.officespace.lab

 

Observed behaviors when crawling the Default URLs...

In my content source, I specify http://faceman and http://sp-foo:88 as the start addresses and then perform a full crawl.

As expected, the URL for results is relative to the URL from which the query is performed. For example, notice the URL in the browser's address navigation bar shows http://sp-foo:88 and the results for this Web Application are also displayed relative to this same http://sp-foo:88 URL:

Results related to another Web App would also be relative to this zone (which to knowledge is new to SP2013). For example, if I query from the http://initech URL (in other words, from the Intranet zone), then all results related to this Web App would be relative to the http://initech URL (such as http://initech/result1.aspx, http://initech/result2.aspx, etc...) as seen in the last two results in the screen shot below...

 

For comparison, observed behaviors when crawling the non-Default URLs...

In my content source, I then specify http://faceman and the Internet zone http://bargainclownmart:88 as the start addresses and then perform a full crawl.

For my queries from any zone for any Web App, the search results related to the http://sp-foo:88 Web App will always return relative to the URL that was crawled... in this case http://bargainclownmart:88. In other words...

 

The moral to this story...

Always crawl the default URL (*the URL being crawled must be a Windows Authenticated zone) unless there is a REALLY good reason otherwise.

 

Blog - Comment List MSDN TechNet
  • Loading...
Leave a Comment
  • Please add 2 and 6 and type the answer here:
  • Post