office 2007 files in sharepoint v2 (the lurking danger and how to avoid it)

first off, if you're using office 2007 file inside sharepoint v2 you need to read this post. there are some very important things you need to be aware of.

as some background, office 2007 introduced a new xml based file format for office documents. the new .docx, .xlsx and .pptx files are actually all zip files containing several xml files which makeup the document and its styling.

want to see for yourself?

try renaming one of those new files types with a .zip extension and you'll be able to look inside the file or extract it and examine all the xml documents. you'll see something like this:

image 

getting back to sharepoint, microsoft released a kb article 939909 with details around the support for office 2007 documents in sharepoint v2... but essentially it says they aren't supported. if you're using them, you need to be aware of the potential issue described in this post.

sharepoint automatically stays in sync with office files and the metadata stored inside of them. when a change happens in either sharepoint or the file itself, a synchronization between the sources will occur. there are two terms used to describe this action:

  • demotion - term used to describe the act of sharepoint writing metadata into a document
  • promotion - term used to describe the act of a document writing metadata into sharepoint

you can learn a bit more about this whole promtion / demotion thing on msdn

there is a metadata issue with this that affects both windows sharepoint services v2 and sharepoint portal server 2003. since sharepoint v2 is not aware of the office 2007 file format, it does not demote sharepoint metadata into the files, as it does with previous office formats.

so why does this matter to me?

scenario time:

let's say you have a sharepoint v2 document library with some required custom metadata fields, say "customer" and "business unit". when a new office 2007 document is created and saved back to sharepoint, a dialog will prompt the user to enter any required fields.

when entered, metadata will be saved back to sharepoint and displayed correctly in the library but the metadata is not demoted into the document.

so what's the implications of that?

well when sharepoint and a file are out of sync it can cause unexpected results and loss of metadata. a prime scenario for this is upgrading to sharepoint v3... metadata can and will be lost in that scenario.

the dangerous part is that metadata will have appeared to have been upgraded successfully, and will appear in sharepoint document libraries... however as soon as a document is modified in any way (as simple as edit > ok) metadata will be reset to defaults, effectively losing all values. that's why if you aren't aware of this issue, it can be quite a confusing and potentially quite bad.

safely upgrading from sharepoint v2 to sharepoint v3 with office 2007 files

if you have office 2007 files in sharepoint v2, you will need to upgrade from v2 > v3 in a certain order and run a tool after the upgrade to fix the metadata relationships with their files.

the upgrade process will be:

  1. run prescan on your v2 database(s) and fix any issues identified.

  2. disable any sybari / forefront scanning .

  3. attach your v2 database(s) to your v3 farm, upgrading them.

  4. after upgrading, run the metadata_refresher tool below to sync any office 2007 documents with their metadata.

    note: modifying documents and/or their metadata in any way before running the refresh tool will cause metadata loss. the metadata will be reset to default values, so ensure this tool (or something like it) is run immediately after upgrades to v3 have completed.

metadata refresher tool

the source code attached to this post contains an application which can correct the metadata sync issue. two things to note:

  1. only the most recent version of files will be corrected.
  2. any documents which are linked copies will be skipped to avoid breaking the relationship but these can be located in "scan mode".

see the readme.txt file for details on usage.

disclaimer: while this code has been tested and proven to work, anyone considering its use should test and validate in their own test environments before attempting to use on any type of production server/data. the code is provided as is, with no warranty of any kind and confers no rights.

metadata_refresher.zip