Welcome to MSDN Blogs Sign in | Join | Help

PDF iFilter Battle, second round

If you still remember the last round of our PDF iFilter battle, FoxIT won it. Now in this round, we bring in another challenger: TET PDF iFIlter. It is also avaliable on x86 and x64, free for non-commercial desktop use, will need a license for Server installation.

So here's the new result for file set II:

 

File Number

Total File Size(MB)

Avg File Size(MB)

Crawl Time(m:s)

Crawl Time(s)

File Per Second

Success

Error

FoxIT

2676

2406

0.90

7:46

466

5.74

2759

0

Adobe

2676

2406

0.90

40:58

2458

1.09

2757

2

TET

2676

2406

0.90

13:48

828

3.23

2752

0

 

I also obtained an archive copy from People's Daily, from 2001 to 2006. ~20,000 PDF files, 13.4GB total. Tested on a 8 cores XEON box.

 

 

File Number

Total File Size(MB)

Avg File Size(MB)

Crawl Time(h:m:s)

Crawl Time(s)

File Per Second

Success

Error

FoxIT

19890

13793

0.69

00:30:53

1853

10.73

19884

7

Adobe

19890

13793

0.69

05:19:04

19144

1.03

19887

4

TET

19890

13793

0.69

01:40:09

6009

3.31

19879

12

 

And licensing comparsion for production(USD):

  Desktop Server 1-2 Cores
Per Server
4 Cores
Per Server
8+ Cores Per Server
Adobe Free Free Free Free Free
Foxit Free Not Free 329.99 589.97 1109.93
TET $119 for commercial usage Not Free 595 595 595

 

Summary

It is good to see another vendor joined this market. TET showed good performance, although still behind Foxit. But it's licensed based on servers not cores, the cost would be lower than Foxit if you have a typical 2 way quad cores box.

Published Tuesday, March 10, 2009 4:30 AM by Jie Li

Comments

# Click & Solve » PDF iFilter Battle, second round

# re: PDF iFilter Battle, second round

Friday, July 31, 2009 12:41 PM by cy21

Great post.

What are the errors that were encountered? FoxIT shows 7, Adobe shows 4, and TET shows 12. Are they true errors or are they notices for items that are correctly not crawled, such as expired items, items marked as not to be crawled, password protected, etc...?

I think this would be a large factor when considering which iFilter to use.  One may consider a slower rate of indexing to be acceptable if a larger percentage of the corpus will be properly indexed.

Anonymous comments are disabled
 
Page view tracker