Welcome to MSDN Blogs Sign in | Join | Help

Errata for Filter Pack KB .

This correction applies to both Office SharePoint Server 2007 and Search Server 2008.

1.In step 2, the correct registry key to set the CLSID of the  filter is :

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office
 Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\

2. In step 7, the CLSID for OneNote filter is incorrect. The correct value is:

.One = {B8D12492-CE0F-40AD-83EA-099A03D493F1}

The KB will be revised shortly to reflect these changes. 

 

Posted by Deb Haldar | 7 Comments

MS Filter Pack released !

I'm pleased to announce that after months of blood, sweat and toil, the MS Filter Pack is finally available !!! The package can be downloaded from :

 http://www.microsoft.com/downloads/details.aspx?FamilyId=60C92A37-719C-4077-B5C6-CAC34F4227CC&displaylang=en

Contents:

The filter pack includes the following IFilters:

·         Metro (.docx, .docm, .pptx, .pptm, .xlsx, .xlsm, .xlsb)

·         Zip (.zip)

·         OneNote (.one)

·         Visio (.vdx, .vsd, .vss, .vst, .vdx, .vsx, .vtx)

Supported Products:

·         SPS2003, MOSS2007, Search Server 2008, Search Server 2008 Express

·         WSSv3

·         Exchange 2005

·         SQL 2005, SQL 2008

·         Windows Desktop Search 3.01, WDS 4

Overview:

·         The Filter Pack installs the above IFilters on the machine

·         Each IFilter is registered with Windows Indexing Service

·         Each product above has a corresponding KB to describe how to register the filters

Q&A:

“I noticed <product X> is not listed as a supported product, why is it not included?”

-          When we created the project plan we came up with the list of Microsoft Search products that we would be supporting.  During the project lifecycle we’ve tested to ensure that the Filter Pack works properly with each of these products.  We will work to determine if any new Search products can be supported in the future.

 

“Is the Filter Pack localized for <language y>?”

-          The Filter Pack will be localized in 36 different languages (see below).  It has been passed off for localization – details will be posted as they become available.  At the time of release (12/18), the Filter Pack will available in en-us only.

 

Fully Localized SKU Languages

Language Pack Languages

Arabic

Bulgarian

Brazilian

Croatian

Chinese (SC)

Estonian

Chinese (TC)

Hindi

Czech

Latvian

Danish

Lithuanian

Dutch

Romanian

English

Serbian (Latin)

Finnish

Slovak

French

Slovenian

German

Ukrainian

Greek

 

Hebrew

 

Hungarian

 

Italian

 

Japanese

 

Korean

 

Norwegian (Bokmal)

 

Polish

 

Portuguese

 

Russian

 

Spanish

 

Swedish

 

Thai

 

Turkish

 

 

“Is the Filter Pack available for x64/x86?”

-          The Filter Pack will be available in both x64 and x86 – there are two separate downloads (same location).

 

“What about the Tiff/MODi IFilter”

      -        Unfortunately, at the time of release, the TIFF filter is not shipped with the Filter Pack. We do understand how important

                the issue is for our customers and will be working on providing an alternative solution.

Posted by Deb Haldar | 32 Comments

FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Sometimes back I had the chance to run a performance and international sufficiency analysis on the Adobe and FOXIT ifilters for some of our customers. The following report is now made available for a broader audience.

 

PERFORMANCE ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER

Machine :   Intel Xeon CPU @ 1.4 GHz (4 hyperthreaded processors)

                    4.00 GB of RAM

                    32-bit Win2K3 SP1

                    Indexer performance set to partly reduced.

 

 

FOXIT v1.0

ADOBE v.8

Total # of pdf documents

10917

10917

# successful crawls

10871

10909

# errors

44 (expired ebooks etc)

0

# warnings

2 (corrupted doc)

2 (corrupted doc)

 

 

 

CRAWL TIME:

 

 

        Portal Content

00:49:21.163

03:34:39.237

        Anchor Crawl 1

00:02:03.527

00:02:39.073

        Anchor Crawl 2

00:00:02.173

00:00:02.437

       TOTAL Crawl Time

00:51:26.863 (~ 51 minutes)

03:38:00.747 (~ 218 minutes)

                       

 

Analysis:

 

1.      The FOXIT filter is 4.27 times faster than the Adobe filter on a quad proc machine. This is expected since the adobe filter is not truly multithreaded and serialized the threads.

2.      The Adobe filter crawls some documents which ideally should not be crawled (expired ebooks etc).

 

 

INTL SUFFICIENCY ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER

Both the adobe and FOXIT filters do not return the correct locale for non-english documents. Both of them always emits LOCALE = 1033 (en-us).Hence we pass them to the neutral wordbreaker and this compromises search relevance.

Tests were performed on JPN, CHS, FRE and HEB pdf documents using both the indexer and standalone test tools.

 

Language

# Tokens

MOSS returns result with FOXIT ?

MOSS returns result with Adobe?

Correct locale emitted by FOXIT?

Correct locale emitted by Adobe?

JPN

2

No

No

No

No

CHS

2

No

No

No

No

FRE

2

Yes

Yes

No

No

HEB

2

Yes

Yes

No

No

 

 Note that since French is syntactically very close to English, we still get back valid results. In case of the Hebrew documents, I’d say it’s a matter of coincidence that the token the language expert gave me was correctly wordbroken.

 

Posted by Deb Haldar | 8 Comments

64-bit support for Adobe PDF IFilter finally available.

Finally Adobe has come up with an interim solution to address the non-availability of a 64 bit ifilter. Now one can use the 32-bit Adobe IFilter on 64-bit platforms after installing a DCOM addin from Adobe. The installation instructions can be found on the Adobe Labs Wiki

This is a great oppurtunity for folks who have a 64-bit installation of Microsoft Office Sharepoint Server 2007 and want to index pdf documents, but do not want to spend money on a 64-bit FOXIT IFilter.

DWG (Autocad) Filter is now available.

I'm pleased to announce that after a lot of trails and tribulations, the DWG IFilter is finally ready. Those who have been following the posts by Marco and myself under Chronicles of an IFilter development must be curious to know as to what the end result was. Well, here's the latest on the topic in Marco's own words:

CAD & Company would like to introduce you to the DWG IFilter 2007, our newest version of DWG IFilter. This release is our answer to the changing search and indexing needs with DWG and DXF files.

 

Why this release? At CAD & Company, the DWG IFilter has been available for some time now. Due to evolution in Microsoft Search technology and the continuing progress in the DWG file format, we needed to add some new features to the IFilter to support these changes. In addition to these technical improvements to the product, DWG IFilter 2007 makes the installation easier than ever before by including a one-click support to register with SharePoint 2007 (MOSS & WSS) and SQL Server 2005 FTS. DWG IFilter 2007 will also support the 2007 and 2008 AutoCAD DWG file types in addition to previous versions of DWG file types.

 

What is an IFilter? An IFilter generally allows your search and search indexing product to access a certain file type; and for non-Microsoft file types you will need a non-Microsoft IFilter.

 

Will DWG IFilter 2007 improve desktop searching? With DWG IFilter 2007, you will easily be able to add AutoCAD DWG files to your desktop search results. DWG IFilter 2007 can be used with any popular desktop tool, including Microsoft Windows Desktop Search.

 

Will DWG IFilter 2007 work with Microsoft SharePoint? DWG IFilter 2007 adds DWG and DXF content search to your SharePoint sites in just one click! Depending on your version, you can extend the scope of the SharePoint search to include network shares, intranet sites and Microsoft Exchange Public folders, making search an incredibly powerful tool to locate all your files.

 

The DWG IFilter 2007 will enable your search tool to serve as a quick index to all your DWG files. Depending on your search tool, your results list will include a “teaser” which displays the extracted text surrounding your search terms. This makes locating the right DWG file faster and easier than ever before.

 

Please visit www.dwgifilter.com to download your free trial and find out more about DWG IFilter 2007!

 

For any questions or comments, feel free to contact us at ifilter@cadcompany.nl.

 

 

Kind Regards,

 

Marco van Schagen

DWG IFilter Team

www.dwgifilter.com

 

CAD & Company

Postbus 37218

1030 AE Amsterdam

the Netherlands

T +31 20 494 66 66, F +31 20 494 66 67

www.cadcompany.nl

 

Posted by Deb Haldar | 19 Comments

Long awaited 64-bit PDF IFilter finally available.

Finally we have a 64 bit PDF IFilter - surprisingly the solution is not from Adobe or Microsoft, but from a company called Foxit Software.The IFilter is compatiable with the following Microsoft products: Windows Indexing Service, MSN Desktop Search, Internet Information Server, SharePoint Portal Server, Windows SharePoint Services (WSS), Site Server, Exchange Server, SQL Server and all other products based on Microsoft Search technology.

 There's one simple workaround to get the filter running on 64 bit MOSS 2007. The steps are given below.

1.       Install Foxit 64bit PDF Ifilter. http://www.foxitsoftware.com/pdf/ifilter/

2.       Add a pdf extension in MOSS search settings

3.       Open regedit, locate [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf]

4.       Change the default value to {987f8d1a-26e6-4554-b007-6b20e2680632} .

5.       Recycle the search service: net stop osearch
                                              net start osearch

6.       Start a full crawl to index your pdf documents :)

 

Posted by Deb Haldar | 27 Comments

Indexing pdf documents with Adobe Reader v.8 and MOSS 2007

The version 8 of the adobe reader has some significant architectural changes (for the better of course) including an inbuilt IFilter to index PDF documents. Previously the adobe IFilter was available as a seperate download. This new change in architecture compromised the ability to search pdf documents from within MOSS 2007. However, the pdf filter works fine with WDS 3.0 . While many consultants recommend that if we're to index pdf documents through MOSS 2007, we use the the v.6 of adobe IFilter and if we want to index pdf documents through WDS 3.0 or higher, we use the v.8 of adobe reader. But what if we wanted to index pdf documents using both WDS and MOSS 2007?!!! Here's how you can use MOSS 2007 with adobe reader v.8, the version currently patronized by WDS:)

1. Download Adobe Reader v.8 .

2. Add the filter-extension to the File types crawled:

Start -> Program -> Microsoft Office Server -> SharePoint 3.0 Central Administration  -> <Name of SharedService Provider> -> Search Settings -> File Types -> New File Type (Add extension  pdf here)

3. Modify the following Registry keys by changing their "Default" value to the new CLSID of the Adobe IFilter:  {E8978DA6-047F-4E3D-9C78-CDBE46041603}

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office                  

server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Default --> {E8978DA6-047F-4E3D-9C78-CDBE46041603}

 

                HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server                          

Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf

Default --> {E8978DA6-047F-4E3D-9C78-CDBE46041603}

4.  Add the Installation directory of the Adobe Reader v.8 to the System Path. For example, if the Reader is installed on "D:\Program  Files\Adobe", then add "D:\Program Files\Adobe\Reader 8.0\Reader" to the system path by:

           --> Right Click on My Computer -> Properties -> Advanced -> Environment Variables -> Path (Under System Variables) -> Edit -> (Add "D:\Program Files\Adobe\Reader 8.0\Reader").

 

This effectively tells the adobe IFilter where to pick up the dependent DLLs.

 

5. Recycle the search service: > net stop osearch

                                                            > net start osearch

 

6. Walla! Now we can crawl and search PDF documents with v.8 Reader.

Posted by Deb Haldar | 57 Comments
Filed under:

Indexing XPS documents with MOSS 2007

With the XML Paper Specification (XPS) document format gaining popularity, many are wondering how to integrate MOSS 2007 to index and search XPS documents. Here's a quick recipe to configure your index server to crawl and catalog tokens in XPS documents:

     1.       Install the XPS Essentials pack from :

http://www.microsoft.com/whdc/xps/viewxps.mspx

 

2.       Add the filter-extension to the File types crawled:

Start -> Program -> Microsoft Office Server -> SharePoint 3.0 Central Administration  -> <Name of SharedService Provider> -> Search Settings -> File Types -> New File Type (Add extension  xps here)

 

3.       Verify that the xps entry  is added to the  extensions list under:

 

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office    Server\12.0 \Search\Applications\<Site>\Gather\Portal_Content\Extensions\ExtensionList

 

4.       Add the following registry key:

 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\Filters\ .xps]
        Default = (value not set)
        Extension = xps
        FileTypeBucket REG_DWORD = 0x00000001 (1)
        MimeTypes = application/xps     

5.       Identify the xps filter to MOSS by adding the following registry key:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office
 Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.xps]

             Set the "Default" value to the CLSID of XPS IFilter.

             Default REG _SZ = {1E4CEC13-76BD-4ce2-8372-711CB6F10FD1}         

6.      Finally, recycle the Search Service by executing the following command from the command window:

   D:\> net stop osearch

   D:\> net start osearch

 

7.   Add the xps documents in the content source and initiate the crawl.

                

Notes: 1. Tested on Win2K3 and MOSS 2007

           2. More Info on XPS: http://www.microsoft.com/whdc/xps/default.mspx

           3. Document with screenshot of configuration procedure attached. 

Insights into MS IFilter Testing Strategy.

Ever since I started dealing with filters, I've seen numerous questions regarding "What does the proper validation of an IFilter mean? What tests should we execute and how to excute them?" . Hence, its only appropriate that we publish a document detailing our rigorous test procedure so that everyone targeting components at MS Search products can benefit from it. 

 Disclaimer: The following list presents only a subset of the testing methodologies we apply at MS Search and are by no means meant to be a quick recipe for weeding out ALL security vulnerabilities in your filter.The list is meant to provide an overview of the  issues one should think about while testing and implementing filters.

----------------------------------------------------------------------------------------------------------------

A. Architectural Considerations : - COMPLIANCE REQUIRED

 


1. The Filter DLL does not require the client to be installed on the indexing machine.
2. The Filter dll does not make references to other binaries during compile time.
3. The filter dll is monolithic, self- contained without any other external dependencies.
For an overview of the problems caused by non-monolithic DLLs, please see:
http://blogs.msdn.com/ifilter/archive/2006/11/20/breaking-the-monolithic-filter-dll.aspx

B.Threading Model: - COMPLIANCE REQUIRED

 


Filter threading model must be marked as either "BOTH" or "Free" under:
HKEY_CLASSES_ROOT\CLSID\{GUID}\InprocServer32

HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{GUID}\InprocServer3