Index and Search PDF Files in SharePoint Server 2010

Index and Search PDF Files in SharePoint Server 2010

Rate This
  • Comments 14

Like Office SharePoint Server 2007, there’s no OOTB PDF iFilter in SharePoint Server 2010. If you add PDF as a file type for SharePoint Search, you will get the following result:

snap0086

You can see that only the file attributes are indexed.

You need to install a x64 PDF iFilter for this. There’re three PDF iFilter on market, Adobe, Foxit, and TET. You can refer to my earlier post for comparison. Since the registry name is changed in 2010, you may need to manually modify it to make the iFilters registered. Foxit recently updated their installer to reflect this change.

http://www.foxitsoftware.com/pdf/ifilter/

Quote from Foxit PDF iFilter change log:

Version Number: 1.0.0.3213

* Fixes a crash issue that is caused by embedded fonts.

* Adds the following registry settings in the installation program: 

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\Filters\.pdf]

"Extension"=".pdf"

"FileTypeBucket"=dword:00000001

"MimeTypes"="application/pdf"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\14.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf]

@="{987f8d1a-26e6-4554-b007-6b20e2680632}"

So run the installer, and then restart SharePoint Server Search 14 service. This service name is subject to change when RTM, but you can easily get the idea.

snap0088

Recrawl the files.

snap0089

 

It worked. Please note the installer will not get you PDF icon file, you need to follow the steps here http://www.foxitsoftware.com/pdf/ifilter/installation.html to download icon file and modify DOCICON.XML.

This also applies to Search Server 2010. FAST Search index PDF files OOTB, so you don’t need to go with these steps.

Jie Li

Technical Product Manager, SharePoint

Leave a Comment
  • Please add 8 and 6 and type the answer here:
  • Post
  • What about Adobe IFilter for PDF? Any idea?

    I tried to setup same registry entries, except {987f8d1a-26e6-4554-b007-6b20e2680632}, which maybe be:

    • {F6594A6D-D57F-4EFD-B2C3-DCD9779E382E}

    or

    • {B801CA65-A1FC-11D0-85AD-444553540000}

    Not working yet.

  • @Pablo

    First of all, I don't want to recommend Adobe PDF iFilter. If budget is not a huge concern you should avoid from using it to get much better performance.

    You can get the CLSID from their documentation: www.adobe.com/.../configuring_pdf_ifilter_for_ms_sharepoint_2007.pdf

  • Each time I try to index a pdf, I receive this warning: The FAST Search backend reported warnings when processing the item. ( Document conversion failed:

    To make sure, I've pushed a pdf into the collection on the FAST server:

    PS C:\FASTSearch\tmp> docpush -c sp .\en062410.pdf

    [2010-06-24 11:52:58.481] WARNING    sp Reported warning with http://cohowinery.com/.\en062410.pdf: processing::Document conversion failed:

    [2010-06-24 11:52:58.481] INFO       sp All add operations completed

    Any help greatly appreciated since we've a LOT of pdf files to index.

  • Hello

    Fast crawl pdf OOTB, but this error is due to a configuration issue. Fast User and connector user need to be the same, or you need to add connector user to fast user group.

    After this doc push and crawl work

  • I have FAST Search Server for Sharepoint 2010 and it does not index PDF text content oout of the box - it is a standalone server and connected to my 2010 farm through http:\\. .DOC .TXT and other fommon file formats work but not PDF. I have also installed the advanced filter pack via powersell.

    Please explain or prove to me that FAST does pdf out of the box as its not doing it for me, in adition it has been a huge pain to set up and I still dont have PDF's indexing. I have spent many days finding a technical whitepaper or something the oficialy states what FAST actualy supports - please point me to such an article.

    Otherwise I have installed foxit ifilter and tested the filter by running this command - to my suprise it is using the IFilter (IFilter2Html.exe RadEditorAjaxEndUserManual.pdf > out.xml) however back end still reports that pdf document could not be converted while other common documents work. A big part of the processing engine is written in Python - is it possible to debug this error any further and if so then how?

    Any help is apreciated --> robjay@gmail.com

  • Tool for search pdf and ppt files : http://isabout.info/

  • Good news, the issue that I had with PDF files not indexing were to do with FAST search server installation. I have installed the product onto a drive where drive permissions were not set correctly for the install account. For some reason when I installed FAST, not all the folders had permission for the FAST service account.

    What I ended up doing is reapplying permissions on all files and sub-folders giving the FAST install account full access.

  • Setting the permissions worked for me as well. Applying Full Control rights to the \FASTSearch directory and subdirectories for the FAST Search account, the one specified during the FAST install.

  • Try all things but still cannot get search result from pfd file

    Though was able to bring up the pdf icon. But search nope....

  • Please find the powershell script for installing PDF ifilter on SharePoint 2010.

    www.directsharepoint.com/.../powershell-script-for-installing-pdf.html

  • I work for Adlib. In case you were wondering they are one of the leading experts in <a href=”http://adlibsoftware.com/”> pdf archive software </a>. Check them out if you want to Index and Search PDF Files in SharePoint Server 2010

    -Ron

  • I've used C# for catalog index , but how can i search only folder and file name (instead of file content search) with size, path  .

    I've search file and folder using enumerate and getdirectory, getfolder method in VB.NET but i could not index on this. Kindly provide information about index using .net.

  • Hi... I am using sharepoint 2010..We have number of PDF documents which are secured..that means copy protected (restriction on content copy and paste) and print protected..

    I tried installing Adobe PDF ifilter v 6 & then V9 also. But these documents are not been searched

    .The PDFs have been secure using acrobat professional XI..where in the changes allowed & Print allow is set to None. Also 'Enable copying of text, images and other content' option is unchecked.

  • This also applies to Search Server 2010. FAST Search index PDF files OOTB, so you don’t need to go with these steps.

Page 1 of 1 (14 items)