Follow us on Twitter
Follow us in Facebook
Office Dev Content
SharePoint Dev Content
The configuration of the content enrichment feature only supports a single web service endpoint. This can be limiting for a number of common scenarios:
There are several possible technologies one could consider to solve these scenarios. This blog post focuses on using the Windows Communication Foundation (WCF) Routing Service technology included in .NET Framework 4.0. You can also check out our upcoming blog post on how to deploy a network load balancing cluster for high performance and availability.
WCF Routing enables development of complex routing logic, load-balancing, and fault tolerance. All of these mechanisms support our underlying requirement for scaling out, but not all of them need to be implemented for all scenarios. We’ll look closer at routing logic in particular in this blog post.
In short, the benefits of WCF Routing are:
We’ll start by recapping some basics, and then move on to a concrete example.
A search topology can consist of anywhere from 1 to n content processing components. The role of a content processing component is to parse and transform the data coming into the system before delivering it to the indexing component. This processing takes place in discrete processing flows that can range from 0 to n instances within a specific content processing component. The number of active flow instances will depend on available resources and the amount of data being crawled. A ballpark figure can be calculated as the number of physical cores on the host multiplied by three. There’s no guarantee that this calculation will be true in the future.
When content enrichment is enabled for the Search service application, all active flow instances will potentially call out to the configured web service endpoint for every document. Assuming a web service that has no temporal cost, the web service will receive roughly the same number of calls per second as the crawl rate (documents per second) of the farm. How much of a bottleneck the web service becomes, if at all, depends on the following factors:
The following is a simple visual representation of how a search topology with two content processing components can be configured to communicate with a single WCF Routing Service. The WCF Routing Service in turn distributes incoming requests to the appropriately registered service endpoints based on a set of defined filters and the content of the received SOAP envelope. Each service implementation has a backup endpoint that will ensure high availability in case of a failure situation. Typically a CommunicationException or TimeoutException will cause the router to try the backup endpoint.
Even though a single connector appears between nodes in the drawing, there will most likely be multiple HTTP connections at run time. The number of allowed active connections can be throttled through the service throttling subsection of the service behavior section in the web configuration file (for IIS hosting). By default the underlying connections will be persistent, which creates less overhead than re-creating an http-connection for every call.
There may be situations where you have different web service implementations aimed at different types of content. You can pack all of them into a single service and handle requests differently depending on content, but in other cases you may know a priori that some content will be tougher to process and that it’s desirable to dedicate a particularly beefy host to those documents. Also important, maintainability of your service implementations may decrease if you have no separation of business logic. To show you how to achieve this, we’ll walk through an implementation of a WCF Routing Service where we do content-based routing predicated on the content source of an item.
The following fictitious values are used in the example.
Role
Value
Web Service 1
servicehost1.contoso.com
Web Service 1 backup
servicehost2.contoso.com
Web Service 2
servicehost3.contoso.com
Web Service 2 backup
servicehost4.contoso.com
WCF Routing Host
routinghost.contoso.com
While there are different ways of implementing a WCF Routing Service, and different levels of complexity, we’ll focus on a very simple router that we can express mostly declaratively through the web configuration file. Initially you’ll need to have Internet Information Services (IIS) set up on a server and create a new site (including a new directory on your local drive for the site).
Let’s start with the web.config file and look the different sections in it separately before tying it all together in a full example. Every section described below will be a descendant of the <system.serviceModel> node. We’ll start with the binding used by both the router’s exposed service, and the clients it talks to.
We’ve created a single basicHttpBinding where we’ve configured large values for the readerQuotas and the maxReceivedMessageSize. These values can be reduced later on once you know the limits you want to have in place. They are used to limit the allowed size and complexity of the received SOAP envelope.
<basicHttpBinding> <binding name="basicHttpBinding_IContentProcessingEnrichmentService" maxReceivedMessageSize = "8388608"> <readerQuotas maxDepth="32" maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" maxNameTableCharCount="2147483647" /> <security mode="None" /> </binding> </basicHttpBinding>
This is where we define the endpoint that the router uses to expose itself. We will configure the content enrichment feature in SharePoint to use this endpoint through the cmdlets later. Take note that the baseAddress attribute is not required when hosting in IIS, it’s simply here to make it clear what host this service is for.
<service behaviorConfiguration="RoutingServiceBehavior" name="System.ServiceModel.Routing.RoutingService"> <host> <baseAddresses> <add baseAddress="http://routinghost.contoso.com:800"/> </baseAddresses> </host> <endpoint name="RoutingServiceEndpoint" address="" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="System.ServiceModel.Routing.IRequestReplyRouter" /> </service>
Here we define the endpoints to the content enrichment web service implementations that the router will route to. These are not different from a normal implementation that you host in a single-service scenario. As can be seen in the following example, we’re configuring a total of four client endpoints. These cover our two different service implementations, with an additional backup for each in case of a failure.
<client> <endpoint name="Service1" address= "http://servicehost1.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> <endpoint name="Service1Backup" address= "http://servicehost2.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> <endpoint name="Service2" address= "http://servicehost3.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> <endpoint name="Service2Backup" address= "http://servicehost4.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> </client>
We need to create a service behavior where we reference the name of the filter table that will be defined in the next step. In addition, to enable full inspection of the SOAP envelopes in our XPath filters, we set the attribute routeOnHeadersOnly to false.
<behavior name="RoutingServiceBehavior"> <routing filterTableName="ContentSourceFilters" routeOnHeadersOnly="False"/> </behavior>
Here we define the filters and the filter table where we map the filters to normal endpoints and backup endpoints. The XPath expressions look for all Property nodes in the SOAP envelope by using a predicate that specifies the name of the property and the value. This predicate is used to match against specific content sources. There are various types of filters that we can use, but the XPath type is sufficient in speed and functionality for this example. To develop more complex scenarios, take a look at custom filters in the online WCF documentation.
<routing> <namespaceTable> <!-- Define prefix for Content Enrichment namespace, used in XPath filters --> <add prefix="cc" namespace= "http://schemas.microsoft.com/office/server/search/ contentprocessing/2012/01/ContentProcessingEnrichment"/> </namespaceTable> <!-- Filter definitions --> <filters> <filter name = "Sharepoint" filterType = "XPath" filterData= "//cc:Property[cc:Name[. = 'ContentSource'] and cc:Value[. = 'Local Sharepoint Sites']]"/> <filter name = "Fileshare" filterType = "XPath" filterData= "//cc:Property[cc:Name[. = 'ContentSource'] and cc:Value[. = 'Large Fileshare']]"/> </filters> <!-- Filter mappings --> <filterTables> <filterTable name="ContentSourceFilters"> <add filterName="Sharepoint" endpointName="Service1" backupList="BackupSharepoint"/> <add filterName="Fileshare" endpointName="Service2" backupList="BackupFileshare"/> </filterTable> </filterTables> <!-- Backup lists --> <backupLists> <backupList name="BackupSharepoint"> <add endpointName="Service1Backup" /> </backupList> <backupList name="BackupFileshare"> <add endpointName="Service2Backup" /> </backupList> </backupLists> </routing>
It’s time to tie it all together in a single configuration. The following example uses all the previous pieces to build a complete configuration file.
<?xml version="1.0"?> <configuration> <system.serviceModel> <bindings> <basicHttpBinding> <binding name= "basicHttpBinding_IContentProcessingEnrichmentService" maxReceivedMessageSize = "8388608"> <readerQuotas maxDepth="32" maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" maxNameTableCharCount="2147483647" /> <security mode="None" /> </binding> </basicHttpBinding> </bindings> <services> <service behaviorConfiguration="RoutingServiceBehavior" name="System.ServiceModel.Routing.RoutingService"> <host> <baseAddresses> <add baseAddress= "http://routinghost.contoso.com:800" /> </baseAddresses> </host> <endpoint name="RoutingServiceEndpoint" address="" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract= "System.ServiceModel.Routing.IRequestReplyRouter" /> </service> </services> <client> <endpoint name="Service1" address= "http://servicehost1.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> <endpoint name="Service1Backup" address= "http://servicehost2.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> <endpoint name="Service2" address= "http://servicehost3.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> <endpoint name="Service2Backup" address= "http://servicehost4.contoso.com:800/ContentEnrichmentService.svc" binding="basicHttpBinding" bindingConfiguration= "basicHttpBinding_IContentProcessingEnrichmentService" contract="*" /> </client> <behaviors> <serviceBehaviors> <behavior name="RoutingServiceBehavior"> <routing filterTableName="ContentSourceFilters" routeOnHeadersOnly="False"/> </behavior> </serviceBehaviors> </behaviors> <routing> <namespaceTable> <add prefix="cc" namespace= "http://schemas.microsoft.com/office/server/search/ contentprocessing/2012/01/ContentProcessingEnrichment"/> </namespaceTable> <filters> <filter name = "Sharepoint" filterType = "XPath" filterData = "//cc:Property[cc:Name = 'ContentSource' and cc:Value = 'Local Sharepoint Sites']"/> <filter name = "Fileshare" filterType = "XPath" filterData = "//cc:Property[cc:Name = 'ContentSource' and cc:Value = 'Large Fileshare']"/> </filters> <filterTables> <filterTable name="ContentSourceFilters"> <add filterName="Sharepoint" endpointName="Service1" backupList="BackupSharepoint"/> <add filterName="Fileshare" endpointName="Service2" backupList="BackupFileshare"/> </filterTable> </filterTables> <backupLists> <backupList name="BackupSharepoint"> <add endpointName="Service1Backup" /> </backupList> <backupList name="BackupFileshare"> <add endpointName="Service2Backup" /> </backupList> </backupLists> </routing> </system.serviceModel> </configuration>
The markup code of the service file needs to reference the RoutingService class and Routing assembly, rather than your own implementation/assembly, which would be the normal procedure. The code part can be just an empty implementation since it won’t be used.
<%@ ServiceHost="" Language="C#" Debug="true" Service= "System.ServiceModel.Routing.RoutingService, System.ServiceModel.Routing, version=4.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" %>
To summarize, we’ve shown how it is possible to overcome some of the limitations with a single web service endpoint through the use of WCF Routing. The fact that the router itself is still a single point of failure can be overcome through other load balancing mechanics like NLB.
If you want to learn more about how to customize search with content enrichment, check out the official documentation on MSDN, and the other blog posts on content enrichment.