Welcome to MSDN Blogs Sign in | Join | Help

Displaying PDF icons in search results

I received some questions today on how to display the pdf icons in search results. After you can index pdf documents  (installing the pdf ifilter, bouncing the search service and all that good stuff), please use the instructions on the blog given below to surface the pdf icons in search results. Note that you need to do this on each WFE.

 http://www.sharepointblogs.com/wpowell/archive/2007/08/09/pdf-document-type-icon-in-moss.aspx

Posted by Deb Haldar | 1 Comments

Adobe releases a 64-bit IFilter

Adobe has finally released a true 64 bit version of the pdf ifilter. Adobe has successfully tested the filter on the following platforms :

Environment Operating System Application
Desktop Windows XP x64 SP2

* Windows Desktop Search 3 and 4

* Windows Indexing Service

Windows Vista x64 SP1 * Windows Search 4
Server Windows Server 2003 x64 Edition R2 SP2

* Microsoft Office SharePoint Server 2007

* Microsoft Exchange Server 2007

* Windows Desktop Search 3

Windows Server 2008 x64 SP1 * Microsoft Office SharePoint Server 2007
* Microsoft SQL Server 2005
* Windows Search 4

Information - http://blogs.adobe.com/acrobat/2008/12/adobe_pdf_ifilter_9_for_64bit.html

Download - http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025

One of our consultants, Jie Li did some initial benchmarking of the filter alongside the foxit 64-bit ifilter and found the adobe pdf filter to be about 5 times slower than it's Foxit counterpart. You can read about it on Jie's blog. The adobe filter, albeit slow, presents a zero cost solution to those in need of indexing pdf documents on a tight budget.

Errata for Filter Pack KB .

This correction applies to both Office SharePoint Server 2007 and Search Server 2008.

1.In step 2, the correct registry key to set the CLSID of the  filter is :

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office
 Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\

2. In step 7, the CLSID for OneNote filter is incorrect. The correct value is:

.One = {B8D12492-CE0F-40AD-83EA-099A03D493F1}

The KB will be revised shortly to reflect these changes. 

 

Posted by Deb Haldar | 7 Comments

MS Filter Pack released !

I'm pleased to announce that after months of blood, sweat and toil, the MS Filter Pack is finally available !!! The package can be downloaded from :

 http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en

Contents:

The filter pack includes the following IFilters:

·         Metro (.docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb)

·         Zip (.zip)

·         OneNote (.one)

·         Visio (.vdx, .vsd, .vss, .vst, .vdx, .vsx, .vtx)

Supported Products:

·         SPS2003, MOSS2007, Search Server 2008, Search Server 2008 Express

·         WSSv3

·         Exchange 2005

·         SQL 2005, SQL 2008

·         Windows Desktop Search 3.01, WDS 4

Overview:

·         The Filter Pack installs the above IFilters on the machine

·         Each IFilter is registered with Windows Indexing Service

·         Each product above has a corresponding KB to describe how to register the filters

Q&A:

“I noticed <product X> is not listed as a supported product, why is it not included?”

-          When we created the project plan we came up with the list of Microsoft Search products that we would be supporting.  During the project lifecycle we’ve tested to ensure that the Filter Pack works properly with each of these products.  We will work to determine if any new Search products can be supported in the future.

 

“Is the Filter Pack localized for <language y>?”

-          The Filter Pack will be localized in 36 different languages (see below).  It has been passed off for localization – details will be posted as they become available.  At the time of release (12/18), the Filter Pack will available in en-us only.

 

Fully Localized SKU Languages

Language Pack Languages

Arabic

Bulgarian

Brazilian

Croatian

Chinese (SC)

Estonian

Chinese (TC)

Hindi

Czech

Latvian

Danish

Lithuanian

Dutch

Romanian

English

Serbian (Latin)

Finnish

Slovak

French

Slovenian

German

Ukrainian

Greek

 

Hebrew

 

Hungarian

 

Italian

 

Japanese

 

Korean

 

Norwegian (Bokmal)

 

Polish

 

Portuguese

 

Russian

 

Spanish

 

Swedish

 

Thai

 

Turkish

 

 

“Is the Filter Pack available for x64/x86?”

-          The Filter Pack will be available in both x64 and x86 – there are two separate downloads (same location).

 

“What about the Tiff/MODi IFilter”

      -        Unfortunately, at the time of release, the TIFF filter is not shipped with the Filter Pack. We do understand how important

                the issue is for our customers and will be working on providing an alternative solution.

Posted by Deb Haldar | 63 Comments

FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Sometimes back I had the chance to run a performance and international sufficiency analysis on the Adobe and FOXIT ifilters for some of our customers. The following report is now made available for a broader audience.

 

PERFORMANCE ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER

Machine :   Intel Xeon CPU @ 1.4 GHz (4 hyperthreaded processors)

                    4.00 GB of RAM

                    32-bit Win2K3 SP1

                    Indexer performance set to partly reduced.

 

 

FOXIT v1.0

ADOBE v.8

Total # of pdf documents

10917

10917

# successful crawls

10871

10909

# errors

44 (expired ebooks etc)

0

# warnings

2 (corrupted doc)

2 (corrupted doc)

 

 

 

CRAWL TIME:

 

 

        Portal Content

00:49:21.163

03:34:39.237

        Anchor Crawl 1

00:02:03.527

00:02:39.073

        Anchor Crawl 2

00:00:02.173

00:00:02.437

       TOTAL Crawl Time

00:51:26.863 (~ 51 minutes)

03:38:00.747 (~ 218 minutes)

                       

 

Analysis:

 

1.      The FOXIT filter is 4.27 times faster than the Adobe filter on a quad proc machine. This is expected since the adobe filter is not truly multithreaded and serialized the threads.

2.      The Adobe filter crawls some documents which ideally should not be crawled (expired ebooks etc).

 

 

INTL SUFFICIENCY ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER

Both the adobe and FOXIT filters do not return the correct locale for non-english documents. Both of them always emits LOCALE = 1033 (en-us).Hence we pass them to the neutral wordbreaker and this compromises search relevance.

Tests were performed on JPN, CHS, FRE and HEB pdf documents using both the indexer and standalone test tools.

 

Language

# Tokens

MOSS returns result with FOXIT ?

MOSS returns result with Adobe?

Correct locale emitted by FOXIT?

Correct locale emitted by Adobe?

JPN

2

No

No

No

No

CHS

2

No

No

No

No

FRE

2

Yes

Yes

No

No

HEB

2

Yes

Yes

No

No

 

 Note that since French is syntactically very close to English, we still get back valid results. In case of the Hebrew documents, I’d say it’s a matter of coincidence that the token the language expert gave me was correctly wordbroken.

 

Posted by Deb Haldar | 16 Comments

64-bit support for Adobe PDF IFilter finally available.

Finally Adobe has come up with an interim solution to address the non-availability of a 64 bit ifilter. Now one can use the 32-bit Adobe IFilter on 64-bit platforms after installing a DCOM addin from Adobe. The installation instructions can be found on the Adobe Labs Wiki

This is a great oppurtunity for folks who have a 64-bit installation of Microsoft Office Sharepoint Server 2007 and want to index pdf documents, but do not want to spend money on a 64-bit FOXIT IFilter.

DWG (Autocad) Filter is now available.

I'm pleased to announce that after a lot of trails and tribulations, the DWG IFilter is finally ready. Those who have been following the posts by Marco and myself under Chronicles of an IFilter development must be curious to know as to what the end result was. Well, here's the latest on the topic in Marco's own words:

CAD & Company would like to introduce you to the DWG IFilter 2007, our newest version of DWG IFilter. This release is our answer to the changing search and indexing needs with DWG and DXF files.

 

Why this release? At CAD & Company, the DWG IFilter has been available for some time now. Due to evolution in Microsoft Search technology and the continuing progress in the DWG file format, we needed to add some new features to the IFilter to support these changes. In addition to these technical improvements to the product, DWG IFilter 2007 makes the installation easier than ever before by including a one-click support to register with SharePoint 2007 (MOSS & WSS) and SQL Server 2005 FTS. DWG IFilter 2007 will also support the 2007 and 2008 AutoCAD DWG file types in addition to previous versions of DWG file types.

 

What is an IFilter? An IFilter generally allows your search and search indexing product to access a certain file type; and for non-Microsoft file types you will need a non-Microsoft IFilter.

 

Will DWG IFilter 2007 improve desktop searching? With DWG IFilter 2007, you will easily be able to add AutoCAD DWG files to your desktop search results. DWG IFilter 2007 can be used with any popular desktop tool, including Microsoft Windows Desktop Search.

 

Will DWG IFilter 2007 work with Microsoft SharePoint? DWG IFilter 2007 adds DWG and DXF content search to your SharePoint sites in just one click! Depending on your version, you can extend the scope of the SharePoint search to include network shares, intranet sites and Microsoft Exchange Public folders, making search an incredibly powerful tool to locate all your files.

 

The DWG IFilter 2007 will enable your search tool to serve as a quick index to all your DWG files. Depending on your search tool, your results list will include a “teaser” which displays the extracted text surrounding your search terms. This makes locating the right DWG file faster and easier than ever before.

 

Please visit www.dwgifilter.com to download your free trial and find out more about DWG IFilter 2007!

 

For any questions or comments, feel free to contact us at ifilter@cadcompany.nl.

 

 

Kind Regards,

 

Marco van Schagen

DWG IFilter Team

www.dwgifilter.com

 

CAD & Company

Postbus 37218

1030 AE Amsterdam

the Netherlands

T +31 20 494 66 66, F +31 20 494 66 67

www.cadcompany.nl

 

Posted by Deb Haldar | 22 Comments

Long awaited 64-bit PDF IFilter finally available.

Finally we have a 64 bit PDF IFilter - surprisingly the solution is not from Adobe or Microsoft, but from a company called Foxit Software.The IFilter is compatiable with the following Microsoft products: Windows Indexing Service, MSN Desktop Search, Internet Information Server, SharePoint Portal Server, Windows SharePoint Services (WSS), Site Server, Exchange Server, SQL Server and all other products based on Microsoft Search technology.

 There's one simple workaround to get the filter running on 64 bit MOSS 2007. The steps are given below.

1.       Install Foxit 64bit PDF Ifilter. http://www.foxitsoftware.com/pdf/ifilter/

2.       Add a pdf extension in MOSS search settings

3.       Open regedit, locate [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf]

4.       Change the default value to {987f8d1a-26e6-4554-b007-6b20e2680632} .

5.       Recycle the search service: net stop osearch
                                              net start osearch

6.       Start a full crawl to index your pdf documents :)

 

Posted by Deb Haldar | 35 Comments

Indexing pdf documents with Adobe Reader v.8 and MOSS 2007

The version 8 of the adobe reader has some significant architectural changes (for the better of course) including an inbuilt IFilter to index PDF documents. Previously the adobe IFilter was available as a seperate download. This new change in architecture compromised the ability to search pdf documents from within MOSS 2007. However, the pdf filter works fine with WDS 3.0 . While many consultants recommend that if we're to index pdf documents through MOSS 2007, we use the the v.6 of adobe IFilter and if we want to index pdf documents through WDS 3.0 or higher, we use the v.8 of adobe reader. But what if we wanted to index pdf documents using both WDS and MOSS 2007?!!! Here's how you can use MOSS 2007 with adobe reader v.8, the version currently patronized by WDS:)

1. Download Adobe Reader v.8 .

2. Add the filter-extension to the File types crawled:

Start -> Program -> Microsoft Office Server -> SharePoint 3.0 Central Administration  -> <Name of SharedService Provider> -> Search Settings -> File Types -> New File Type (Add extension  pdf here)

3. Modify the following Registry keys by changing their "Default" value to the new CLSID of the Adobe IFilter:  {E8978DA6-047F-4E3D-9C78-CDBE46041603}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office                  

server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Default --> {E8978DA6-047F-4E3D-9C78-CDBE46041603}

 

                HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server                          

Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Default --> {E8978DA6-047F-4E3D-9C78-CDBE46041603}

4.  Add the Installation directory of the Adobe Reader v.8 to the System Path. For example, if the Reader is installed on "D:\Program  Files\Adobe", then add "D:\Program Files\Adobe\Reader 8.0\Reader" to the system path by:

           --> Right Click on My Computer -> Properties -> Advanced -> Environment Variables -> Path (Under System Variables) -> Edit -> (Add "D:\Program Files\Adobe\Reader 8.0\Reader").

 

This effectively tells the adobe IFilter where to pick up the dependent DLLs.

 

5. Recycle the search service: > net stop osearch

                                                            > net start osearch

 

6. Walla! Now we can crawl and search PDF documents with v.8 Reader.

Posted by Deb Haldar | 76 Comments
Filed under:

Indexing XPS documents with MOSS 2007

With the XML Paper Specification (XPS) document format gaining popularity, many are wondering how to integrate MOSS 2007 to index and search XPS documents. Here's a quick recipe to configure your index server to crawl and catalog tokens in XPS documents:

     1.       Install the XPS Essentials pack from :

http://www.microsoft.com/whdc/xps/viewxps.mspx

 

2.       Add the filter-extension to the File types crawled:

Start -> Program -> Microsoft Office Server -> SharePoint 3.0 Central Administration  -> <Name of SharedService Provider> -> Search Settings -> File Types -> New File Type (Add extension  xps here)

 

3.       Verify that the xps entry  is added to the  extensions list under:

 

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office    Server\12.0 \Search\Applications\<Site>\Gather\Portal_Content\Extensions\ExtensionList

 

4.       Add the following registry key:

 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\Filters\ .xps]
        Default = (value not set)
        Extension = xps
        FileTypeBucket REG_DWORD = 0x00000001 (1)
        MimeTypes = application/xps     

5.       Identify the xps filter to MOSS by adding the following registry key:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office
 Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.xps]

             Set the "Default" value to the CLSID of XPS IFilter.

             Default REG _SZ = {1E4CEC13-76BD-4ce2-8372-711CB6F10FD1}         

6.      Finally, recycle the Search Service by executing the following command from the command window:

   D:\> net stop osearch

   D:\> net start osearch

 

7.   Add the xps documents in the content source and initiate the crawl.

                

Notes: 1. Tested on Win2K3 and MOSS 2007

           2. More Info on XPS: http://www.microsoft.com/whdc/xps/default.mspx

           3. Document with screenshot of configuration procedure attached. 

Insights into MS IFilter Testing Strategy.

Ever since I started dealing with filters, I've seen numerous questions regarding "What does the proper validation of an IFilter mean? What tests should we execute and how to excute them?" . Hence, its only appropriate that we publish a document detailing our rigorous test procedure so that everyone targeting components at MS Search products can benefit from it. 

 Disclaimer: The following list presents only a subset of the testing methodologies we apply at MS Search and are by no means meant to be a quick recipe for weeding out ALL security vulnerabilities in your filter.The list is meant to provide an overview of the  issues one should think about while testing and implementing filters.

----------------------------------------------------------------------------------------------------------------

A. Architectural Considerations : - COMPLIANCE REQUIRED

 


1. The Filter DLL does not require the client to be installed on the indexing machine.
2. The Filter dll does not make references to other binaries during compile time.
3. The filter dll is monolithic, self- contained without any other external dependencies.
For an overview of the problems caused by non-monolithic DLLs, please see:
http://blogs.msdn.com/ifilter/archive/2006/11/20/breaking-the-monolithic-filter-dll.aspx

B.Threading Model: - COMPLIANCE REQUIRED

 


Filter threading model must be marked as either "BOTH" or "Free" under:
HKEY_CLASSES_ROOT\CLSID\{GUID}\InprocServer32

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{GUID}\InprocServer32

We recommend using "BOTH" threading model.An object that is marked with a threading model of "Both" takes on the threading model of the thread that created the object. Marking the threading model as "Both" necessitates that the filter is threadsafe.

C.OS Versions Supported:


The Filter should support the follwing OS versions:
-> WinXP & Win2K3 : Filtering of <document format> should be checked with WDS 3.0.
--> For Vista and Longhorn, use the built in search facility.

D. Backwards compatiability with SPS2003 :


1. Register filter dll with SPS 2003.
2. Create a content source with your documents, crawl and query.

E. Loading Mechanisms : - COMPLIANCE REQUIRED

 


The filter needs to support all three loading mechanisms for backward and forward compatiability reasons. We recommend trying to load via IPersistStream and fall back to IPersistStorage or IPersistFile only if IPersistStream is not supported.

The IFilterExplorer can be used to check which loading mechanisms are supported:
http://www.citeknet.com/Products/IFilters/IFilterExplorer/tabid/62/Default.aspx

F. Dedicated support for 64 bit platforms :


For 64 bit platforms, there should be no dependency on 32 bit binaries, i.e., no WOWing applications.
Run <Depends.exe> to check if dependencies are satisfied to prevent runtime errors.

Known Issue: A dependency on MSJAVA.dll shows up in red in dependency walker. You can safely ignore this.

G. Code Coverage:

We recommend at least 70% code coverage. This can be easily profiled using VS 2005 Team System.

H. IFiltTst - Consistency, Legitimacy and Illegitimacy tests:

 


IFiltst can be used to run the following test:
Consistency Test: The chunks emitted by the filter should be consistent between two runs.
Legitimacy Test: This test validates that the filter is initialized with proper config and getText() and getValue() are functioning as expected.
Illegitimacy Test: In essence, this test tries to validate that the filter is well behaved by trying to exercise inappropriate configs during initialization and also by calling getText() on value type chunks and vice versa.

Details of using IFilttst can be found here: http://msdn2.microsoft.com/en-us/library/ms692580.aspx

I. Security tests with Fuzzing :


1. Fuzz a minimum of 0.5 million of each document format handled by the filter and feed them to FilterTest.
2. Have PageHeap enabled throughout the Fuzz test run.
3. Analyze any heap corruption, stack overflow, buffer overrun, crashes etc and resolve/fix the bugs.

Pageheap can be enabled with Appverifier. Download here:
http://www.microsoft.com/downloads/details.aspx?familyid=bd02c19c-1250-433c-8c1b-2619bd93b3a2&displaylang=en

NOTE: The fuzzer is an internal tool. A list of external fuzzers is provided here: http://www.infosecinstitute.com/blog/2005/12/fuzzers-ultimate-list.html

Again, use these at your own risk:)

J. Performance Scaling:

 


Optimum usage of processors in a server environment is crucial for performance. The goal is to achieve 80% performace scaling with the addition of each new processor. Here's the test outline.
1. On a Quad proc machine, use ifilttst.exe with one thread to filter a large corpus of document and note down the time taken.
-> Now use ifilttst.exe with two threads to filter the same corpus. The time taken should be (0.556 * TIME FOR FILTERING WITH ONE THREAD)
-> With the addition of each subsequent thread, the new time T2 can be found with the formula:
T2 = T1 * 1/[(1.8)^ (log2 N)] where N is the number of threads.

K. AppVerifier Tests :


The Appverifier tests seek to weed out critical security and performance defects. The tests should be conducted in 3 layers,each layer of test executed in a seperate test run.The layers are described below.
1. BASIC:
-> Exceptions - Ensures that the application does not hide AVs using structured exception handling.
--> Handles - Ensures that the application does not attempt to use invalid handles.
--> Heaps - Checks for memory corruption issues in the heap.
SETTINGS: Full Page Heap
                 Dll : <IFilter Dll>                
--> Locks - Verifies correct usage of critical sections and identifies potential deadlocks (timeout 7 minutes).
--> Memory - Ensure calls to APIs for virtual space manipulations are used correctly.
-->Threadpool - Checks for dirty threadpool thread and other threadpool related issues.
-->TLS - Ensures that Thread LOcal Storage APIs are used correctly.

The expectation for this scenario is that the application does not break into the debugger. This means that you have no errors that need to be addressed.

2. LOW RESOURCE SIMULATION: Accept the default settings. Filter a corpus(large collection of documents) containing 10000+ files. Use IFiltTst to loop through the corpus filtering the files. As long as we can get through the corpus without breaking into the debugger, it should be fine.

3. MISCELLANEOUS: Here check the
--> Dangerous APIs: checks for proper usage of API calls such as "TerminateThread"
--> Dirty Stack - detect uninitialized variables in future function calls in that thread's context.
Accept the DEFAULT Settings here as well.


HOW TO RUN THE TESTS:
1. Start Appverifier.
2. Add your application (IFiltTst) to Appverifier.
3. Check off the test mentioned above. You need to run the test three times
   for each layer.
4. Save your application.
5. Set the PROPAGATE property to true -> this ensures appverifier settings are
   applied to any threads spawned by IFiltTst.
6. Run IFiltTst from the command line on a corpus containing 10000+ files.
7. Save the Logs from the three runs.

Detailed information about using Appverifier can be found here:
http://msdn2.microsoft.com/en-us/library/aa480483.aspx

L. Globalization:

 


If the document format facilitates marking the language / locale of contents (eg.MS Word), filtering of the documents marked with above languge tags must be verified. This is important as the the filter emits a locale information based on the language of the document, which is used by MSSearch to invoke the correct WordBreaker and Stemmer for the document.

M. Registry and File I/O:

 


1.Use Filemon.exe with the filemon filter set to the name of your dll and verify that no file system I/O was initiated by IFilter other than the documents it is indexing. Take special note if the filter is creating temp files.
2. Use Regmon.exe to verify that no registry read/write operations are performed.

www.sysinternals.com has both 32 and 64 bit versions of Filemon and Regmon.

N. Prefix/Prefast for Vista :


In Office team, the OACR checks for this if we build with windows Prefast requirements.However in other environments, we need to use the Visual Studio build configuration manager to enable Prefast error checking.

More info( MS Employees):
PREFIX internal website
PREFAST: wrapped in OACR

WWW Resources:

http://msdn2.microsoft.com/en-us/library/ms933794.aspx 

O. Calls to undocumented windows API :


Run APIScan to ensure we do not make any calls to undocumented windows API's.

Note: This requirement is solely for MS and MS partners to avoid situations like Secret API fiasco.

P. SAL annotation :


SAL annotation is an excellent way to weed out potential security flaws in the code. More info at:
http://msdn2.microsoft.com/en-us/library/ms235402(VS.80).aspx

Q. UI Popups :


Use Filtdump to filter the document and ensure there are No UI Popups.

R. International Sufficiency:

We've seen a lot of issues in the past where Unicode / DBCS characters were not handled correctly by IFilters and Protocol Handlers. The problem is a bit more serious in Protocol Handlers as the address of the content source might be encrypted in a DBCS charset and the data retrieval fails.

  • Use multiple special Unicode characters in the file contents and test for their output. The following figure provides a sample of Unicode characters to test.:
  •  S. Security Code Review:

    This is the final line of defense against introducing security bugs in your code. DO NOT be skimpy on this!!! :)

    Debugging IFilters with WDS 3.0 and Windows Vista.

    With the public release of Vista a week back, soon developers will be wondering how to write and debug IFilters in the vista environment. Here's a quick and easy way, in addition to the one stated by Marco under Chronicles of an IFilter development - inception to deployment.:

     1.     Disable back-off (this prevents the indexer to stop when Windows events are generated):

    HKLM\Software\Microsoft\Windows Search\Gathering Manager:DisableBackOff = 1

    2.     Disable filter host termination:

    HKLM\Software\Microsoft\Windows Search\Gathering Manager:DebugFilters = 1

    3.     Restart the WSearch service.

    4.     Attach a debugger to SearchFilterHost.exe.

    5.     Set break points in your IFilter dll.

    6.     Set symbols for your IFilter.

    7.     Touch the file of interest by opening it and modifying it slightly.

    8.     Wait for Break Point to hit.

    9.     Start debugging.

    Note that the FilterHost process is the same in vista and WDS.

    Debugging IFilters in MOSS/WSS.

    The steps we use to debug searchfilters with MOSS/ WSS are listed below:

    1.     Disable filter host termination, add Assert to suspend filter host when it starts:

    MOSS  [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Global\Gathering Manager]

    WSS  [HKEY_LOCAL_MACHINE\SOFTWARE \Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Global\Gathering Manager]

    "DebugFilters"=dword:00000001

    2.     Restart the OSearch service.

    3.     MOSS:  Start a full/incremental crawl

    WSS:  Touch the file of interest by opening it and modifying it slightly, and wait a few minutes.

    4.     When you see the filter host assert, attach a debugger to mssdmn.exe.

    5.     In the debugger, sxe ld <filter>.dll  (or you can do this via Image File Execution Options)

    6.     Set break points in your IFilter dll.

    7.     Set symbols for your IFilter.

    8.     Wait for Break Point to hit.

    9.     Start debugging.

    Co-existence of 32 bit and 64 bit IFilter binaries in MS Search Products.

    Over the last few months we've seen numerous questions from consultants and customers alike on:

       1. Why are certain IFilter binaries not available in BOTH 32 bit and 64 bit incarnations? 95% of the questions in this category pointed to the unavailability of 64 bit Adobe IFilter.

       2. What is Microsoft's strategy for dealing when 32 bit binaries from within 64 bit search processes?

       3. Is there a technical workaround I can apply to use 32 bit IFilter from 64 bit search process?

    ================================================================================ 

    Lets handle the issues in reverse order:)

    Issue# 3

    From a strictly technical perspective, one can use the 32 bit PDF Filter from the 64 bit MOSS search service
    By creating a utility to drop the 32 bit filter as a COM+ service component.The other option is to use dllhost.exe as a surrogate Host. However, this will NOT be officially supported by Microsoft, and when the 64 bit PDF filter dll does become available, you'd need to unregister the COM+ service and re-index the PDF documents.

    Issue # 1 & 2

    Our Test Manager, Dwight summed up the answers for issue#1 and 2 in his reply to one of our consultants:

    " As suggested by Deb below, our search server products (Microsoft Office Sharepoint Server, Windows Sharepoint Services, SQL, Exchange, etc) do not support loading 32-bit binaries, and we have been referring customers who are using 3rd party IFilter products to original IFilter developers. As a result, we have not deployed 32-bit versions of these filters on our 64-bit installations, and we therefore do not index file formats which do not support 64-bit filters on our server products deployed internally.

    We have kept our internal deployments 'pure' so that our dog-food experience exactly matches that of our customers.  Overall, this has been a question we have struggled with for a while now. How do we support our server customers who may be running 3rd party code in our process, insulate ourselves from aberrant 3rd party code, and allow well behaving 3rd party code to run without restriction? We have designed the system to recover from many classes of software malfunctions, and virus attacks, but it is a arms race in the end.

    I appreciate that IT organizations have a conflicting set of requirements: 99.99%+ uptime, and support 3rd party tools; some of which never were designed to be deployed in a server environment.

    Adobe should be proud of their work over the last year in order to make their 32-bit IFilter multi-threaded safe, and I would encourage then to continue on the server bandwagon by generating a 64-bit version. "

    Dwight

    Chronicles of an IFilter development - inception to deployment.

    I often asked myself the question - How do independent vendors develop IFilters for MS Search Products and what are the challanges they face? It occurred to me that if I could somehow document the development lifecycle of an IFilter developed by someone other than Microsoft, it'd probably provide answers to a lot of baffling questions facing independent vendors developing these components.

    I recently had the oppurtunity to have a detailed discussion with Marco van Schagen from CAD & Company, who has just embarked on a fascinating voyage of refactoring the CAD(DWG) IFilter.

    This thread is meant to address the issues faced by Marco which may be of broader interest to several of us implementing our own filters.

     In Marco's own words:

    Currently I am planning a new version of our 2005 version DWG iFilter. This is to support the newer 2007 DWG file format, and address questions on it's operation with SQL 2005 and MOSS (sharepoint) 2007.

    Would you be interested in information on this product, please visit http://www.cadcompany.nl/ifilter

    As I am unexperienced with iFilter development, I have many questions to find answers for.

    In my preparations, Deb Haldar has provided me with crucial information to help me get on the right track. I would like to share this information in his blog to help make this the "One stop shop" for IFilter related issues. Probably a seperate thread will be created to track the development cycle of an iFilter from scratch.

    The existing 2005 version iFilter project is coded mostly in C++, in VC 6.0.

    Some questions I'd like to discuss:

    - Should we use C++ or transfer to dot Net and why?

    - What is required for coding a proper iFilter

    - Testing for Multithreading compatiability

    - Registration with Sharepoint and SQL 2005

    I would like to start sharing my information soon, and I am interested in your comments.

    Marco van Schagen

     

    Posted by Deb Haldar | 55 Comments
    More Posts Next page »
     
    Page view tracker