shajan's notes on microsoft search

IFilter : Used for extracting text from files

Microsoft Search uses IFilters (aka filter) to extract text and metadata out of documents. IFilters are COM Objects that implement the IFilter interface. Example of an IFilter is an html filter. The html filter is capable of extracting the text in html documents. In addition to text, it can also emit metadata like titles and links. MSDN documents the IFilter COM interface.

For a list of IFilters develped by Independent Software Vendors, please see  http://addins.msn.com/

 

Published Tuesday, February 15, 2005 12:29 AM by Shajan Dasan
Filed under:

Comments

No Comments
New Comments to this post are disabled

About Shajan Dasan

Leads a development team in Microsoft Search, responsible for the crawler; text extraction and linguistic components. Products that ship these components include Office 12, Windows Desktop Search, SQL Server and Index Server. Before Search, implemented the typesafety verifier in the .Net Just In Time compiler, and did the initial implementation of Code Access Security system including Security Policy System, Stack Walk, Link Demands and IsolatedStorage.

© 2008 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker