Summary: Learn how to use SharePoint Server 2010 Business Connectivity Services (BCS) feature to search content in external system when the number of records in system are more than throttling limits defined for the system.
Searching external systems with large number of records is a complex problem because on one end the BCS is trying to avoid bringing down the external system/SharePoint by limiting number of records which can be fetched but on other end it will not be able to crawl the system due to throttling. In this walkthrough, you will see you how properties can be configured on metadatamodel to chunk the calls for crawling the external system. The article will also show the changes in requirements for implementing search for External Systems.
Applies to: Microsoft SharePoint Server 2010
Search In BCS
Setup Server Environment
Setup BCS in Farm
Create Metadata Model
In this scenario, AdventureWorks database hosted in SQL Server 2008 is used as an External System containing the records to be crawled. A metadata model file is created with IdEnumerator and a SpecificFinder stereotypes implemented to retrieve the data for an External Content Type(ECT). Properties are configured in model to help BCS get the ids for external items to be crawled in batches while staying under the throttling limits.
Changes have been made in the search functionality in BCS and IdEnumerator is no a requirement for implementing the crawl. BCS uses one of the following stereotypes for crawling the external system:
<Method Name="ReadList"> <Properties> <Property Name="RootFinder" Type="System.String"></Property></Properties> <Parameters>
In addition to above, SpecificFinder stereotype is used if the columns returned by it are more than what's returned by Finder/IdEnumerator.
There are other properties shown in Table 1 which affect the behavior of search on UI and how data gets crawled
This property is applied on Model Metadata object. This specifies that a LobSystemInstance in the model file should be displayed in the search user interface. For e.g. dispalyed on page for defining the Line of Business search content source
Specifies the Finder method that will be used to enumerate the items to crawl
If set on the Finder or Idenumerator method instance, will cache the values returned and not call the SpecificFinder unless there is a cache miss.
Table 1-Metadata model properties for search
The list of software required to run this scenario is as follows:
AdventureWorks sample database available on Codeplex is used in this scenario to represent the external system.
First step in this scenario is to setup the BCS infrastructure in the farm. It inlcudes setting up BCS Service Application and Profile pages for the ECT.
Follow the following steps to setup the infrastructure:
SharePoint Designer 2010 (SPD) provides functionality to create ECTs and save or export them in a metadata model file but the IdEnumerator stereotypes cannot be modelled using SPD. Therefore this stereotype has to be modelled manually. To make things easier and get the basic metadata ready, it's recommended to create the initial metadata model using SPD. This can be further edited using any xml editor/notepad to add the IdEnumerator or other methods types (which cannot be created using SPD).
In this scenario we will be using IdEnumerator to crawl the system. After getting the list of Id's identifying the external items, search will call SpecificFinder on each Id returned by the enumerator to index the content as the fields in SpecificFinder are more than what gets returned by IdEnuemrator.
LastIdSeen filter type is used to batch the calls to external system. LastIdSeen filter is used by SharePoint Search to pass the identifier of the last external item seen as part of call to IdEnumerator method. It passes this as the filter value while calling the IdEnumerator in a loop. The external system should implement the logic to return the records under the throttling limits by using the filter value as the point from where next set of items needs to be fetched. In this scenario, we are using the value 43658 as default value for filter. This gets used when IdEnumerator is called for first time and Search has yet not seen any Id(LastSeenId).
Snippet below shows method definition for a LastIdSeen Filter.
<FilterDescriptor Type="LastId" Name="LastIdSeen">
<Property Name="UsedForDisambiguation" Type="System.Boolean">false</Property>
<Property Name="IsDefault" Type="System.Boolean">false</Property>
<Property Name="CaseSensitive" Type="System.Boolean">false</Property>
<Parameter Direction="In" Name="@SalesOrderId">
<TypeDescriptor TypeName="System.Int32" IdentifierName="SalesOrderID" AssociatedFilter="LastIdSeen" Name="SalesOrderID">
<DefaultValue MethodInstanceName="SalesOrderHeaderReadIds" Type="System.Int32">43658</DefaultValue>
Metadata snippet below shows IdEnumerator method. If no default value is passed, this gets the top 1000 rows else gets the records starting from the LastIdSeen parameter(SalesOrderId) passed to this method.
<Method IsStatic="false" Name="SalesOrderHeaderReadListIds">
<Property Name="RdbCommandType" Type="System.Data.CommandType, System.Data, Version=188.8.131.52, Culture=neutral, PublicKeyToken=b77a5c561934e089">Text</Property>
<Property Name="RdbCommandText" Type="System.String">IF (@SalesOrderId < 43658) BEGIN
SELECT TOP 1000 SalesOrderId, ModifiedDate FROM [Sales].[SalesOrderHeader] ORDER BY SalesOrderId ASC
SELECT SalesOrderId, ModifiedDate FROM [Sales].[SalesOrderHeader] WHERE SalesOrderId BETWEEN @SalesOrderId AND (@SalesOrderId+1000) ORDER BY [SalesOrderId] ASC
File containing the metadata is available at the end of article as an attachment.
Once ECT has been defined, external data can be crawled and the items can be indexed by SharePoint search. Pre-requisites to enable crawling are:
Once all of above is done, start a full crawl of the content source and ensure that content is crawled succesfully by verifying into the crawl log.
Search for any SalesOrder which is in present in the system document and verify if the results are returned for that. In our example, search for term "SO43659" and you will get one result from external system as shown. The result item url is the profile page used to show the ECT profile. On selecting the link in search result, system navigates you to the profile page and show details of the external item.
BCS Search has been simplified and now items can be easily crawled by using SpecificFinder stereotypes. In addition using throttling features, governance can be imposed to ensure that external system and SharePoint doesn't get overhauled by users requests.
Thanks for putting this together and doing the TechNet Radio interview with Bill. Without these I would not have known about Last Id Seen or the IDEnumerator requirement for 2010.
I can see this was written a long time ago but hope you can still help me with this. I am trying to implement the IdEnumerator with LastId filter type as you have described here. It seems to crawl all record just fine but does not seem to be using the Specific finder and hence only has value in the ID column. ie the return parameter in the IdEnumerator. I would also like to implement incremental crawl with ModifiedDate but that would be the next step once I get this right. Much appreciate your help.