Welcome to MSDN Blogs Sign in | Join | Help

Use a Dedicated Web Front End for Crawling

In SPS 2003 we found that indexing could be a pretty heavy weight operation.  In a small/medium environment it's not a big deal.  In an enterprise environment crawling hundreds of GB to TBs indexing can potentially cause more traffic than user traffic.  The solution ended up being... setup a web front end where the index server can index against that one web front end thus not impacting normal user traffic (at least from a web front end server).  We use to call this the "Target" box.  A dedicated web front end for crawling.  This was accomplished by adding the IP and host name of the farm in the hosts file of the index server.  There was one bright customer that found that having this "Target" box AS the Index and got amazing performance on the indexing since it didn't have to make the requests to another box to then make requests to SQL. (By the way, this topology (to have the index be a WFE) in SPS 2003 is a not supported topology and the topology page will complain.

In MOSS 2007 the product supports both the Index/WFE server role in a topology AND the dedicated web front end for crawling.  What you need to know is how the product supports this scenario...  When you select this option (Central Admin > Operations > Services on Server> (MOSS) Search configuration) to setup this special role, the Index server adds a host file for all of the hostnames for all web apps and the IP address of the WFE.  Everything's cool right?  Well, just make sure it's the correct IP, if your server only has one IP, everything's cool.  If it's not the correct IP, you'll have communication problems, failed indexing, etc...  Even if you try to correct these entries, they will be overwritten.  The solution?  Make sure the first NIC is the IP you're using, the alternate solution, if you have multiple NICs or alternate DIPs (dedicated static IP addresses) for each site, then simply turn off this radio option which will remove the static entries in the hosts file and maintain the list yourself.  It's not rocket science.

So in a nutshell...  If you have a 3/4 server farm or 5+, you can configure a dedicated WFE for crawling, in a sense, imagine how much more efficient it would be to simply have your index server be a web front end as well.  By saying that this server, the Index/WFE is then it can hit the web service locally to index the content thus reducing unnecessary network traffic.  In a WAN environment, you'll need to put in host files for the other servers that are "WFEs dedicated for crawling" to reduce unnecessary indexing load on your user emphasized web front ends.  With enhancements in the product it is now easier to setup this scenario by chosing that the server is a dedicated web front end for crawling i.e. target box.  By the way, you can use a robots.txt file in the root of the web app on servers not to be used for indexing to prevent indexing thus alleviating any misconfigurations or rogue crawlers.  You really have 2 options, either manage the hosts file yourself to configure the index to crawl itself, or select the "use a dedicated web front end for crawling" which will add entries to your hosts file for all web apps.  I recommend verifying the IP of those automated entries to prevent long hours of troubleshooting wondering why you lost connection with the rest of the farm and why sites now won't render.

Cool.  Enjoy.  The Target server scenario is a very cool one when executed well.  It can really decrease crawl times and allow you to run more threads without impacting end user traffic (well slightly since it's simply still traffic against SQL, but keeps more of the traffic off the network.

Let me give Daniel Webster credit for finding that the logic for adding these hosts files doesn't work all the time.  It's also a topic that hasn't recevied the posts that it deserves.  No longer a backend network hack, this "target" server concept is now a real considered/tested scenario and one of my first recommendations for scaling an environment with more than 500GB.  Why have an Index only server?

<update 2/8/07> 

I was thinking about this more and had a few threads on this topic.  One question... why make a WFE an Index box?  I think you need to look at it the other way around.  It's basically making your index server have a WFE role as well so it reduces one of the hops.  This particular WFE you won't want to put in load balancing rotation. 

Another thought...  It also makes a good Central Admin, SSP Admin box.  You can extend the SSP and Central Admin on more than 1 box. Even if you provision these web apps on your other WFEs, by configuring the IIS app pool setting to shut down when idle... will reduce memory usage and reduce number of worker processes.

Dedicated Index...  (2 hops)

Index -> WFE -> SQL

Index/WFE (1 hop)

Index/WFE -> SQL

</update>

Published Tuesday, February 06, 2007 7:09 AM by joelo

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

Tuesday, February 06, 2007 8:21 AM by JG

# re: Use a Dedicated Web Front End for Crawling

Cool article.

Tuesday, February 06, 2007 9:57 AM by Bob Fox

# re: Use a Dedicated Web Front End for Crawling

Great Post Joel.   We learn something new everyday  :)

Thursday, February 08, 2007 12:10 PM by sundanceca

# re: Use a Dedicated Web Front End for Crawling

Is there a separate article on "why have an index only server"?

My deployment is four very large regional hubs...I noticed that MS apparently uses one central index server.... thoughts?

Friday, February 09, 2007 12:26 AM by joelo

# re: Use a Dedicated Web Front End for Crawling

MS has regional deployments which have SSPs which each have index servers, for enterprise search they crawl all other regions.  As well, if you were to look at that index server in the main SSP in redmond you'd notice host files for each of the regional "target" servers.

Wednesday, February 14, 2007 2:01 PM by Piotr's blog

# Index as Dedicated Front End for Crawls?

Hmm Joelo raises an interesting point regarding the use of the Index server as a Dedicated WFE. Aside...

Wednesday, March 21, 2007 4:39 PM by Joel Oleson's SharePoint Land

# Farm Topology Gotcha... Query server caution!

In a previous post I mentioned the WFE/Index role and how making the index server have a WFE (Use a Dedicated

Wednesday, September 12, 2007 1:32 AM by Joel Oleson's SharePoint Land

# SharePoint Disk Allocation and Disk I/O

Had a good conversation with a large customer this morning at TechED SEA. They said we have questions

Monday, December 03, 2007 2:52 PM by Joel Oleson's Blog SharePoint Land

# SharePoint Server Topology - Server Roles and Services on Server

Many of you may think this is basic info, so don't over think what I'm saying here. I'm sitting in Shane

Monday, December 03, 2007 3:56 PM by Noticias externas

# SharePoint Server Topology - Server Roles and Services on Server

Many of you may think this is basic info, so don&#39;t over think what I&#39;m saying here. I&#39;m sitting

Wednesday, December 05, 2007 4:24 PM by Joel Oleson's Blog SharePoint Land

# 10 Things To Optimize your SharePoint Server Indexing

1) Put your Search db and on separate disks transaction logs, both the fastest most optimized disks with

Wednesday, December 05, 2007 5:19 PM by Noticias externas

# 10 Things To Optimize your SharePoint Server Indexing

1) Put your Search db and on separate disks transaction logs, both the fastest most optimized disks with

Saturday, January 05, 2008 2:26 PM by Joel Oleson's Blog SharePoint Land

# I'm using forms or kerb auth and search/crawl (indexing) isn't working

Updated with Apology to Dan and extra clarrification and bonus tip to optimizing your app pools and saving

Monday, January 28, 2008 9:31 PM by Joel Oleson's Blog SharePoint Land

# Anatomy of Indexing

Those of you who follow my blog, know I'm a big fan and instigator of local crawling (WFE + Index server

Wednesday, May 28, 2008 8:47 PM by Mirrored Blogs

# Search Deployment Guidance - Part 3 with notes from the SharePoint UG session

Body: In the previous two posts in this series I showed some ideas and thoughts on how to approach your

Tuesday, October 28, 2008 3:54 AM by Shan

# re: Use a Dedicated Web Front End for Crawling

I have configured a dedicated WFE for crawling by manually editing a host file entry as per Microsoft's instructions at http://technet.microsoft.com/en-us/library/cc262267.aspx

Is there a way to verify that the index server I specified to be a dedicated WFE for crawling is really a dedicated WFE? How do we verify that this server is basically acting as a dedicated WFE?

Thanks

Thursday, April 09, 2009 10:33 AM by DavidLewis27

# re: Use a Dedicated Web Front End for Crawling

I am building a 7 server farm.  I will have 3 WFE, 2 app servers, and 2 database servers.  So I sould install the index on one of the WFE and install the Query service on the other 2(for redundancy purposes)???? and leave the query service off the 2 app servers  or keep both services on the same WFE as a dedicated WFT search server????

thanks

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker