Welcome to MSDN Blogs Sign in | Join | Help
Modifying Open XML Documents that are in SharePoint Document Libraries using Web Services

[Blog Map]

When using the Open XML SDK with SharePoint web services, one of the most basic operations is to get a document from a document library using web services, modify it using the Open XML SDK (and LINQ to XML), and save it back to the document library.  This post describes how to do this, and provides a sample in C#.

It is simple to extend this sample to iterate through all documents in a library, apply some changes to each one, and save them back.  In an upcoming post, I’ll present a sample to ‘sanitize’ (remove comments, accept revisions, and remove personal information) all documents in a document library.  This is pretty useful.  I keep a library of documents that I send externally as needed, and it’s always best to not have personal information embedded in the documents.  By running this upcoming sample, I can regularly check to make sure that the document library is clean, even if other folks are editing documents in the library.

For a brief tutorial on SharePoint web services, see “Getting Started with SharePoint (WSS) Web Services using LINQ to XML”.  For this example, you need to add two references to web services (both Lists and Copy).  The procedure for adding a reference to the Copy web service is the same as adding a reference to the Lists web service.

This code uses the Open XML SDK.  Remember to add a reference to the Open XML SDK assembly.  This code uses V1 of the SDK.  It should work with V2 CTP but I haven't tried it.

The code references the System.IO.FileFormatException class, which is in the WindowsBase assembly, so add a reference to it.

This code uses the technique of converting XmlNode to XElement (and back again), as detailed in “Convert XElement to XmlNode (and Convert XmlNode to XElement)”, so that we can use LINQ to XML instead of XmlDocument.

One important aspect of the code is that you retrieve the document as a byte array:

ModifyDoc.CopyWebService.FieldInformation[] fields;

byte[] byteArray;

copy.GetItem(url, out fields, out byteArray);

 

After retrieving the byte array, you can write the byte array to a MemoryStream, and use the MemoryStream to open an in-memory Open XML document.  After modifying the in-memory document, you can convert it back to a byte array and serialize back to the SharePoint document library.  The technique is described in the post, “Working with In-Memory Open XML Documents”.

Here is the code to serialize it back to the SharePoint document library:

string[] urls = { url };

ModifyDoc.CopyWebService.CopyResult[] copyResults;

copy.CopyIntoItems(url, urls, fields, mem.ToArray(), out copyResults);

 

Now that we’ve covered these basics, in the near future, I'll show using SharePoint web services and the Open XML SDK to do some more interesting stuff.

Here is the complete listing (the code is added as an attachment to this post):

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Xml;

using System.Xml.Linq;

using System.IO;

using DocumentFormat.OpenXml.Packaging;

 

namespace ModifyDoc

{

    public static class MyExtensions

    {

        public static XDocument GetXDocument(this OpenXmlPart part)

        {

            XDocument xdoc = part.Annotation<XDocument>();

            if (xdoc != null)

                return xdoc;

            using (StreamReader sr = new StreamReader(part.GetStream()))

            using (XmlReader xr = XmlReader.Create(sr))

                xdoc = XDocument.Load(xr);

            part.AddAnnotation(xdoc);

            return xdoc;

        }

 

        public static void PutXDocument(this OpenXmlPart part)

        {

            XDocument xdoc = part.GetXDocument();

            if (xdoc != null)

            {

                // Serialize the XDocument object back to the package.

                using (XmlWriter xw =

                    XmlWriter.Create(part.GetStream

                   (FileMode.Create, FileAccess.Write)))

                {

                    xdoc.Save(xw);

                }

            }

        }

 

        public static string StringConcatenate(

            this IEnumerable<string> source)

        {

            return source.Aggregate(

                new StringBuilder(),

                (s, i) => s.Append(i),

                s => s.ToString());

        }

 

        public static XElement GetXElement(this XmlNode node)

        {

            XDocument xDoc = new XDocument();

            using (XmlWriter xmlWriter = xDoc.CreateWriter())

                node.WriteTo(xmlWriter);

            return xDoc.Root;

        }

 

        public static XmlNode GetXmlNode(this XElement element)

        {

            using (XmlReader xmlReader = element.CreateReader())

            {

                XmlDocument xmlDoc = new XmlDocument();

                xmlDoc.Load(xmlReader);

                return xmlDoc;

            }

        }

    }

 

    class Program

    {

        static void Main(string[] args)

        {

            string documentLibraryName = "Open XML Documents";

            string documentName = "Test.docx";

 

            XNamespace s = "http://schemas.microsoft.com/sharepoint/soap/";

            XNamespace rs = "urn:schemas-microsoft-com:rowset";

            XNamespace z = "#RowsetSchema";

 

            // Make sure that you use the correct namespace, as well as the correct reference

            // name.  The namespace (by default) is the same as the name of the application

            // when you created it.  You specify the reference name in the Add Web Reference

            // dialog box.

            //

            // Namespace  Reference Name

            //    |           |

            //    V           V

            ModifyDoc.ListsWebService.Lists lists =

                new ModifyDoc.ListsWebService.Lists();

 

            // Fix Namespace and Reference Name for the Copy web service too

            ModifyDoc.CopyWebService.Copy copy =

                new ModifyDoc.CopyWebService.Copy();

 

            // Update the following URL to point to the Lists web service for

            // your SharePoint site.

            lists.Url = "http://localhost/_vti_bin/Lists.asmx";

 

            lists.Credentials = System.Net.CredentialCache.DefaultCredentials;

            copy.Credentials = System.Net.CredentialCache.DefaultCredentials;

 

            XElement listCollection = lists.GetListCollection().GetXElement();

 

            // get the node for the library that we want

            XElement library = listCollection

                .Elements(s + "List")

                .Where(l => (string)l.Attribute("Title") == documentLibraryName)

                .FirstOrDefault();

 

            if (library == null)

            {

                Console.WriteLine("Library {0} doesn't exist.", documentLibraryName);

                Environment.Exit(0);

            }

 

            // get the ID of the library

            string libId = (string)library.Attribute("ID");

 

            XElement item = GetItemByLinkFilename(lists, libId, documentName);

 

            if (item == null)

            {

                Console.WriteLine("Document {0} doesn't exist.", documentName);

                Environment.Exit(0);

            }

 

            // get the document from the doc library as a byte array

            string url = item.Attribute("ows_EncodedAbsUrl").Value;

 

            ModifyDoc.CopyWebService.FieldInformation[] fields;

            byte[] byteArray;

            copy.GetItem(url, out fields, out byteArray);

 

            // create a memory stream from the byte array

            using (MemoryStream mem = new MemoryStream())

            {

                mem.Write(byteArray, 0, (int)byteArray.Length);

                try

                {

                    // create a WordprocessingDocument from the memory stream

                    using (WordprocessingDocument wordDoc =

                        WordprocessingDocument.Open(mem, true))

                    {

                        XNamespace w =

                            "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

 

                        // modify the document as necessary

                        // for this example, we'll insert a simple paragraph at the

                        // beginning of the document

                        XDocument doc = wordDoc.MainDocumentPart.GetXDocument();

                        doc.Element(w + "document")

                            .Element(w + "body")

                            .AddFirst(

                                new XElement(w + "p",

                                    new XElement(w + "r",

                                        new XElement(w + "t", "Hello, there")

                                    )

                                )

                            );

 

                        // write the XDocument back into the Open XML document

                        wordDoc.MainDocumentPart.PutXDocument();

                    }

 

                    // use the Copy web service to save the document back to the

                    // document library.

                    string[] urls = { url };

                    ModifyDoc.CopyWebService.CopyResult[] copyResults;

                    copy.CopyIntoItems(url, urls, fields, mem.ToArray(), out copyResults);

                }

                catch (System.IO.FileFormatException e)

                {

                    // document is invalid

                    Console.WriteLine(e);

                    Environment.Exit(0);

                }

            }

        }

 

        private static XElement GetItemByLinkFilename(

            ModifyDoc.ListsWebService.Lists lists, string libId,

            string documentName)

        {

            XNamespace z = "#RowsetSchema";

 

            // get the XElement for the row that contains info about the document

            // that we want to modify

            XElement queryOptions = new XElement("QueryOptions",

                new XElement("Folder"),

                new XElement("IncludeMandatoryColumns", false)

            );

            XElement viewFields = new XElement("ViewFields");

            XElement item = lists.GetListItems(libId, "", null,

                viewFields.GetXmlNode(), "", queryOptions.GetXmlNode(), "")

                .GetXElement()

                .Descendants(z + "row")

                .Where(i => (string)i.Attribute("ows_LinkFilename") == documentName)

                .FirstOrDefault();

            return item;

        }

    }

}

Posted: Friday, January 09, 2009 1:54 AM by EricWhite
Attachment(s): ModDocument.cs

Comments

Julien Chable said:

En ce début d’année 2009, les personnes étant pour beaucoup en vacances, le web n’a pas regorgé d’une

# January 9, 2009 6:05 AM

Erika Ehrli said:

Here is a list on links that I want to share with you. LINQ for Office Developers Some Office solutions

# January 9, 2009 3:19 PM

Kerry said:

Dear Eric,

All works well but the changes are not copied back to Sharepoint site (Services 3.0).

When I look at 'copyresults' it says the document must be checked out first before changes..?

Would like to merge this method for accessing the docs together with your "move-insert-delete-paragraphs-in-word-processing-documents" post, is this feasable?

Look forward to hearing your comments!

Thanks

# February 9, 2009 6:46 AM

EricWhite said:

Hi Kerry,

I made a mistake in this blog post.  The Copy web service protocol specification, located at http://msdn.microsoft.com/en-us/library/cc313170.aspx, states, "This protocol does not provide a way to control whether the overwriting of files during the copy operation is allowed."  It also states, "Consider using different protocols for copying files when the protocol client needs to control whether the overwriting of existing files during the copy operation is allowed".  I believe that overrighting works with plain WSS, but perhaps doesn't with MOSS.  I haven't verified this, and the protocol spec doesn't indicate the circumstances when you can't overwrite.

I also believe (but haven't verified) that it is possible to always overwrite using frontpage server extensions.  I intend to update this post soon with new code and explanations.

Regarding whether it would be feasible to merge this method for accessing the docs with the move-insert-delete-paragraphs code - absolutely!  One of the overloads of the BuildOpenDocument method takes a stream - this can be used with a memory stream.  You can then get the byte array to upload from the memory stream.  This would enable powerful scenarios.

-Eric

# February 9, 2009 8:05 AM

Kerry said:

Thanks Eric,

Look forward to your update, its the missing piece in the Jigsaw!

Kerry

# February 9, 2009 9:24 AM

Ryan Riley said:

How would you go about just reading the file instead of copying it? I don't see a way of getting a read stream from just the Lists service.

Thanks!

# February 24, 2009 5:59 PM

EricWhite said:

Hi Ryan,

Actually, what you get is a byte array.  From that you can create a memory stream, if a stream is what you need.  Does this help?

-Eric

# February 24, 2009 6:09 PM

Julien Chable said:

Ces dernières semaines furent assez complètes et complexes, et le temps m’a manqué pour partager avec

# March 6, 2009 11:47 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker