Welcome to MSDN Blogs Sign in | Join | Help

FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Sometimes back I had the chance to run a performance and international sufficiency analysis on the Adobe and FOXIT ifilters for some of our customers. The following report is now made available for a broader audience.

 

PERFORMANCE ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER

Machine :   Intel Xeon CPU @ 1.4 GHz (4 hyperthreaded processors)

                    4.00 GB of RAM

                    32-bit Win2K3 SP1

                    Indexer performance set to partly reduced.

 

 

FOXIT v1.0

ADOBE v.8

Total # of pdf documents

10917

10917

# successful crawls

10871

10909

# errors

44 (expired ebooks etc)

0

# warnings

2 (corrupted doc)

2 (corrupted doc)

 

 

 

CRAWL TIME:

 

 

        Portal Content

00:49:21.163

03:34:39.237

        Anchor Crawl 1

00:02:03.527

00:02:39.073

        Anchor Crawl 2

00:00:02.173

00:00:02.437

       TOTAL Crawl Time

00:51:26.863 (~ 51 minutes)

03:38:00.747 (~ 218 minutes)

                       

 

Analysis:

 

1.      The FOXIT filter is 4.27 times faster than the Adobe filter on a quad proc machine. This is expected since the adobe filter is not truly multithreaded and serialized the threads.

2.      The Adobe filter crawls some documents which ideally should not be crawled (expired ebooks etc).

 

 

INTL SUFFICIENCY ANALYSIS OF 32-BIT FOXIT PDF IFILTER vs. ADOBE PDF IFILTER

Both the adobe and FOXIT filters do not return the correct locale for non-english documents. Both of them always emits LOCALE = 1033 (en-us).Hence we pass them to the neutral wordbreaker and this compromises search relevance.

Tests were performed on JPN, CHS, FRE and HEB pdf documents using both the indexer and standalone test tools.

 

Language

# Tokens

MOSS returns result with FOXIT ?

MOSS returns result with Adobe?

Correct locale emitted by FOXIT?

Correct locale emitted by Adobe?

JPN

2

No

No

No

No

CHS

2

No

No

No

No

FRE

2

Yes

Yes

No

No

HEB

2

Yes

Yes

No

No

 

 Note that since French is syntactically very close to English, we still get back valid results. In case of the Hebrew documents, I’d say it’s a matter of coincidence that the token the language expert gave me was correctly wordbroken.

 

Published Wednesday, November 14, 2007 7:37 PM by Deb Haldar

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Wednesday, November 14, 2007 3:41 PM by Noticias externas

Sometimes back I had the chance to run a performance and international sufficiency analysis on the Adobe

# FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Wednesday, November 14, 2007 5:08 PM by Mirrored Feeds

Sometimes back I had the chance to run a performance and international sufficiency analysis on the Adobe

# re: FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Thursday, November 15, 2007 10:45 AM by smuehlst

Deb,

regarding your comments about the returned locale information: Do the indexed PDF documents actually contain locale information? As far as I know only "Tagged PDF" documents can optionally contain locale information, and those are fairly rare.

Regards

Stephan

# re: FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Thursday, November 15, 2007 1:53 PM by Deb Haldar

Stephan, thanks for bringing up the point.I was

not aware only Tagged pdf docs contain locale info.

A lot of our customers (especially ones in east asia) complained about poor relevance in search results on localized pdf docs.The reason is since we always get back an english locale all the time, the proper WordBreaker and stemmer is never invoked.As Stephan mentioned above, tagging the pdf documents with correct locale might give better relevance. Any thoughts ? :)

regards,

Deb.

# Rilasciato finalmente l'IFilter Adobe PDF a 64bit

Tuesday, November 27, 2007 4:03 AM by Igor Macori

Adobe ha finalmente rilasciato la versione a 64 bit dell'IFilter per indicizzare i documenti PDF

# filter pack

Friday, November 30, 2007 9:31 AM by Cindy Foxenshu

Filter pack - when will it be available?  We are now in December and there was an announcement elsewhere that it would be available in July and an announcement here that it would be available in August.

If it was released can you tell us where to find it, and if it was not released can you tell us whether it ever will be?

# re: FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Tuesday, January 22, 2008 1:56 AM by Elisha Snow

it's weird. i can index and search non-english documents with Foxit ifilter. hmmmm.

# re: FOXIT vs. Adobe PDF IFilter [ 32-bit only ]

Tuesday, January 22, 2008 2:02 AM by Amy

Foxit PDF iFilter Support Chinese/Japanese/Korean PDF documents

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker