New Update to the MOSS 2007 C# Managed Protocol Handler (2007.4)
Heads up, I've released an update to my protocol handler sample on CodePlex: http://www.codeplex.com/MOSSPH/Release/ProjectReleases.aspx?ReleaseId=23453
This release includes several fixes and enhancements that makes building a Protocol Handler even easier than before. If you've tried to use the sample and gave up, take a look at the new sample as the new improvements may get you past some of the difficulties you encountered:
- XML based test content source - once compiled and installed this version will actually crawl content! I've created an XML file that is used as a content source (TestData.xml) and this serves as nested containers and items. Learn by example as to where to place what code based on where I've injected the XML specific code. Then turn on tracing and see exactly what a typical output should be. All XML specific code in the ContentEnumerator class is bracketed with #regions, including the usual TODO comments. Play around with the TestData.xml to:
- Add custom properties
- Experiment with the last modified date/time's for incremental crawling.
- Change the content to see modified documents reflected in the search results.
- Note: the content of a pretend document is written locally to a temporary file and that filename is crawled.
- PreserveSig - the interfaces are now decorated with the PreserveSig attribute which eliminates all the kluge around throwing exceptions to return HR codes. All methods which needed to return an HRESULT now do so and the implementer need only consider what HRESULT to return, not HOW to return it. This has the effect of eliminating the annoying error messages in the crawl log.
- IContentEnumerator - it's easier than ever to abstract your custom logic away from the sample. With the introduction of IContentEnumerator the communications between the ProtocolHandler/Accessor class and the ContentEnumerator are more formal, thus allowing you to have multiple ContentEnumerators for different content sources or for experimentation (such as the supplied XML source). Note: the ContentEnumerator class no longer inherits from Uri.
- Support for custom properties - the ContentEnumerator class now exposes a property that can be populated with an array of custom properties for the container or item. There should be no need to create a custom IFilter to chunk the custom properties to the gatherer. I've also cleaned up the existing properties and grouped them together into classes for more readability. The custom properties may also be of a variety of data types.
- More efficient incremental crawling - the accessor now chunks a container's URLs with the date/time using the DIRLINK_WITH_TIME ID. In this way accessors won't even be created by the gatherer if the date is not newer than the last crawl. (see http://msdn.microsoft.com/en-us/library/aa965720.aspx)
- Container URLs now return a date last modified as well, so if the container has not changed its URLs don't need to be emitted.
At this point I think it's pretty solid in terms of functionality and layout, let me know if you have any further suggestions.
Please access the new source using the "Source" tab. I haven't modified any of the documentation, so if you still need that please refer to release 2007.2
Let me know how it goes. Good luck and happy crawling!
-John