Welcome to MSDN Blogs Sign in | Join | Help

SharePoint at the Intersection

One of the really cool things about working at Microsoft is the chance to work on next generation technologies that really make a difference.  A few weeks ago, my boss offered me the chance to work on the technical evangelism effort for the next version of SharePoint, and I jumped at the chance.

Note: This isn't replacing our evangelism efforts around Open XML, it is augmenting them.  More about this in a minute.

We, of course, use SharePoint throughout Microsoft.  Fairly often, when working in a large company, you end up in a situation where you must communicate something to a large group of internal users, and SharePoint is by far the easiest way to do this.  Sometimes it is as simple as posting some documents on your //my site, and letting the group know about them.  There are many more elaborate scenarios too, of course.  When writing the LINQ to XML documentation, I wrote some quick and dirty tools to convert the documentation set to a SharePoint wiki.  Then, the dev team was able to do their technical review of the docs in the wiki, simply changing the code or text as appropriate, adding comments, or whatever.  The ease with which they could do the technical review really facilitated great participation within the team.  Their barrier to entry was really low – they just had to be on the corporate network, have a browser, and they were off to the races.  I used the history feature of SharePoint wikis to see the exact changes made by each team member.  In contrast, other tech review tools involve downloading a special application, with associated installation issues, or passing around documents, which means that the developers or program managers can't see comments or corrections already made by another member of the team, which adds to their (and my) workload.  Using SharePoint wikis worked well - the end result is that to date, I haven't found any cases where the LINQ to XML docs are not semantically complete.

The beauty of this is the ease with which you can use SharePoint.  I don't believe that I ever cracked open a manual or read a book on SharePoint.  I just jumped onto my //my site and did it.

About a year ago, I was talking to my (then) boss, and he asked me what I would really like to do in Microsoft.  My response:  LINQ to XML is one of the most powerful and enabling technologies that I have come across.  There are many XML dialects used throughout the Microsoft stack (Open XML, CAML, XAML, XHTML, to name a few), and I wanted to work at the intersection of these technologies, using the power of LINQ to XML to connect them.

I can imagine writing SharePoint code that takes a collection of Open XML documents, and presents a new view of them in SilverLight.  LINQ to XML has the capacity to reduce code line counts by an order of magnitude.  You can write a couple of hundred lines of code in this scenario that does some incredibly cool things.  This vision of Open XML within SharePoint is particularly compelling.  With the ISO ratification of Open XML, we have this dialect documented in extreme detail.  This means that we have the ability within SharePoint to write .NET code that cracks open every document in a document collection, a site, or a site collection, and present the extracted information in innovative and powerful ways.  Leveraging custom XML within Open XML means that embedded business or semantic information can be aggregated and transformed, giving upper management a new lens with which to view the activity of their group or division.  Combining this with the metadata stored in SharePoint opens up even more possibilities.

Another scenario: take some external data source, generate Open XML documents, and automatically provision a SharePoint site with documents that enable a group to accomplish their mission in an easier way.

As any of the many SharePoint experts could tell you, these are some of the underlying reasons behind the dramatic growth of the use of SharePoint!  I'm really excited about the future!

Posted by EricWhite | 1 Comments

Using Power Tools for Video Studio to Modify Open XML Documents

There is a cool add-in for Visual Studio that I use.  It is the Open XML Package Editor, which is a graphical treeview based editor for examining and editing Open XML Package files.  Once you install the add-in, you can drag an Open XML document into Visual Studio, and browse through the parts, and open specific parts for editing in Visual Studio's great XML editor.  Visual Studio doesn't keep the file open, so you can use it in the following scenarios:

  • You can using Word 2007 to create a file, save the file, and close Word.  You can then drag the document onto Visual Studio, and look at the markup that was created.  You can then open the doc again in Word, change it, close it, and then Visual Studio will tell you that the file has changed outside the editor.  You can tell VS to reload the doc, and see the changed markup.  Because the Open XML Package Editor doesn't keep the file open, you don't need to close the file in VS before opening it again in Word.
  • You can write a program to manipulate the document in C# or VB.  If you have the Open XML document open in VS, you can run the program, and VS will tell you that the file has changed, and you can reload to see the results of your modifications.
  • You can manually modify the XML, and open the file in Word to see if your manual modifications worked and did what you wanted.  You can then close the doc, tweak the XML, and open it in Word again.

This is a very convenient way to examine and edit the XML in Open XML docs!

Of course, Word does not indent the XML when serializing.  You can use the XML formatting option in the XML editor to make it easy to view and edit the XML.  I like to set an option in VS so that when you format the XML, it lines up the attributes.  This is a far easier way to see the XML when there are lots of attributes, or lots of namespaces.  To set this option, select Tools, Options on the menu, expand Text Editor, expand XML, click on Formatting, and set the option, "Align attributes each on a separate line".

To format an XML document, select Edit, Advanced, Format Document on the menu (or type ^e, d).

You can download the Microsoft Visual Studio Tools for the Office System Power Tools here.

Posted by EricWhite | 2 Comments
Filed under:

New Version of Functional Programming Tutorial

Some time ago, I was talking to some members of a team that used LINQ and LINQ to XML in one particular area of their code.  They said that the code written using a conventional approach to XML was in the order of 6000 lines of code.  When re-written using LINQ and LINQ to XML, it was around 800 lines of code.  It was faster to code, and it was easier to debug.  And there is some correlation between lines of code and bugs regardless of the type of code, so reducing lines of code means better software.

In the fall of 2006, I wrote a tutorial on Query Composition using Functional Programming Techniques in C# 3.0.  In that tutorial, I tried to expose my learning process as I discovered for myself the transition in approach to writing code in the functional programming style.  I wrote the tutorial very shortly after I had (some of) the necessary functional programming epiphanies.  I wanted to communicate the ideas behind functional programming while the transition from object-oriented programming to functional programming was fresh in my mind.  It was fun to write.

I’ve written an updated version of that tutorial, containing my new insights into coding in the functional style using C#.  For those who read the previous tutorial, there is new material here.  As with the previous tutorial, I've targeted this tutorial to C# developers who have no functional programming experience, i.e. the typical object-oriented coder.

One thing that I want to say at this point:  This stuff is easy.  It's also really fun.  You don't have to read academic papers to learn about, enjoy, and benefit from functional programming in C# 3.0.  There are about half a dozen concepts you need to learn, each one easy.  Concepts like just a new way to write a method (that has no name), or a new way to write a static method for a class.  Then you put them all together, and the result is more than the sum of the parts.

Some traditional OOP developers that I have talked to have had various reactions, including complete resistance to learning about functional programming, and wondering why they would want to use it.  However, my message is that there IS something to get here.  In the last 18 months, I have written a fair number of programs, almost all of them in the pure functional style.  My experience is that functional programming does reduce code line count for certain types of problems.

At the time I wrote the tutorial, I wasn't actually sure how my development as a functional programmer would proceed.  Would I continue to like it?  What else would I learn about this style of development?  Well, I've learned quite a bit.  I made a couple of theoretical mistakes in the previous version of this tutorial, and they are corrected in this tutorial.  I also made a couple of implementation mistakes, also fixed.  Some of the stuff in that tutorial isn't something that you use often, so I'm relegating that material to an appendix (not yet written).  I have learned new coding approaches to solving problems in the FP style.  And I have also developed a code formatting style that works well for me.

A few years ago, I took a class in mountaineering.  I learned all about lots of stuff, including figure 8 knots, belay devices, and prussic knots, which is a knot that is commonly used to ascend a rope if you don't have a mechanical device (a jumar ascender).  At one point, we had to take a test where we had to start at the bottom of a rope and ascend the rope to the top using our prussic knots.  The instructors tied a knot in the rope that we were ascending, so we had to learn the techniques of bypassing a knot or obstruction in the rope that we were ascending, all the while staying tied securely to the rope, and maintaining good safety.  One of the students asked the question, "Why do we have to learn this particular technique?  Is it really likely that we will end up in the wild, and need to ascend a rope and get around a knot?"  The instructor's reply was enlightening.  She said, "It isn't really that you are going to encounter this exact situation.  However, the skills that you learn doing this exercise are the skills that you will apply in many other situations.  If you can do this properly, you can do many other necessary technical climbing activities properly."

The example presented in this tutorial shows you how to write code to extract the text of paragraphs of an Open XML document, along with the associated style of each paragraph.  You may not have a particular need to do so (or you may).  However, just as learning to ascend a rope using prussic knots, if you work through the example, and fully understand the code in each step of the tutorial, you will gain a set of skills that you can combine and recombine in many other situations to solve very different problems.

Learning about functional programming made me a better and faster coder.

You can find the tutorial here:

http://blogs.msdn.com/ericwhite/pages/FP-Tutorial.aspx

 

The GroupAdjacent Query Operator Extension Method

This is a bit of a geeky post for the LINQ and LINQ to XML folks.

This post introduces a GroupAdjacent generic extension method that groups elements in a collection with adjacent elements based on a common key.  For example, grouping the following array creates six groups (not 3, as with GroupBy): 

int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 }; 

 

GroupAdjacent groups them like this:                   In contrast, the standard GroupBy extension
                                                       method groups them like this:             

Group 1                                                Group 1
1                                                      1     
      
Group 0                                                Group 0
0                                                      0     
0                                                      0     
                                                       0     
Group 2                                                0     
2                                                      0     
                                                       0     
Group 0                                                0     
0                                                      0     
0      
0                                                      Group 2
                                                       2     
Group 2                                                2     
2

Group 0
0
0
0

Quite often when you are processing an Open XML document using LINQ to XML, you want to group adjacent paragraphs that have the same style.

Let's say that the above array represents paragraphs in a document.  If the value is > 0, the paragraphs are a heading of the specified level.  The zeros represent paragraphs styled as normal.  We want to do some processing on the collection of paragraphs in such a way that we process all adjacent paragraphs of the same style in a single query.  We could group them, and execute a query on each group, but as you saw above, the standard GroupBy extension method will group all of the 0's together and all of the 2's together.

As with the GroupBy extension method, we use a key selector function to get the key for each element in the source collection.  GroupAdjacent should have the same set of overloads as GroupBy.  When I get time, I'll code them and post them.

Using the identity function for the key selector, the following code shows the use of GroupAdjacent:

int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };

var groups = ia.GroupAdjacent(i => i);

foreach (var g in groups)
{
    Console.WriteLine("Group {0}", g.Key);
    foreach (var i in g)
        Console.WriteLine(i);
    Console.WriteLine();
}

This produces the output that you saw above.

Implementation Notes

 

This method is lazy.  Until the results are iterated, the source is not iterated.  However, for each group, GroupAdjacent creates a List<T>, and populates it with the group.  As it turns out, this is the correct way to do it.  The entire source collection must be iterated, and if we were to not iterate through one of the groups, then iteration would not work.  Also, we MUST create the List<T> so that each group can be iterated multiple times.  Therefore, I decided to populate the List<T> for each group.  Iteration through the group then iterates through the list.

The following code is valid and should give the expected results:

int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };

var groups = ia.GroupAdjacent(i => i);

foreach (var g in groups)
{
    Console.WriteLine("Group {0}", g.Key);
    foreach (var i in g)
        Console.WriteLine(i);

    // it is perfectly valid to iterate through the group more than once.
    foreach (var i in g)
        Console.WriteLine(i);

    Console.WriteLine();
}
 

If any C# experts out there know a better way to implement this, I'd be happy to hear it.

Implementation

 

This approach was suggested by one of the LINQ architects a year or two ago.  I finally got around to it.  Thanks for the idea.

Following is the implementation of GroupAdjacent:

using System;
using System.Collections.Generic;
using System.Linq;

public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
    public TKey Key { get; set; }
    private List<TSource> GroupList { get; set; }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
    }

    System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
    {
        foreach (var s in GroupList)
            yield return s;
    }

    public GroupOfAdjacent(List<TSource> source, TKey key)
    {
        GroupList = source;
        Key = key;
    }
}

public static class LocalExtensions
{
    public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
        this IEnumerable<TSource> source,
        Func<TSource, TKey> keySelector)
    {
        TKey last = default(TKey);
        bool haveLast = false;
        List<TSource> list = new List<TSource>();

        foreach (TSource s in source)
        {
            TKey k = keySelector(s);
            if (haveLast)
            {
                if (!k.Equals(last))
                {
                    yield return new GroupOfAdjacent<TSource, TKey>(list, last);
                    list = new List<TSource>();
                    list.Add(s);
                    last = k;
                }
                else
                {
                    list.Add(s);
                    last = k;
                }
            }
            else
            {
                list.Add(s);
                last = k;
                haveLast = true;
            }
        }
        if (haveLast)
            yield return new GroupOfAdjacent<TSource, TKey>(list, last);
    }
}

class Program
{
    static void Main(string[] args)
    {
        int[] ia = new int[] { 1, 0, 0, 2, 0, 0, 0, 2, 0, 0, 0 };

        var groups = ia.GroupAdjacent(i => i);

        foreach (var g in groups)
        {
            Console.WriteLine("Group {0}", g.Key);
            foreach (var i in g)
                Console.WriteLine(i);

            Console.WriteLine();
        }

    }
}

New Version of the Open XML SDK is Available for Download

 

Erika Ehrli has just posted the news that the April 2008 CTP of the Open XML SDK is now live on the web, and available for download!

 

I'm really excited about the direction that the Open XML SDK team is taking.  My favorite new feature of the SDK is the ability to add annotations to parts, which enables a better programming experience when using LINQ to XML, my fav XML programming interface.

 

I recently posted on what annotations are, and what they are good for.  In addition, there is a topic in the Open XML SDK documentation that goes into greater depth.

 

•        Download Page: http://www.microsoft.com/downloads/details.aspx?FamilyId=AD0B72FB-4A1D-4C52-BDB5-7DD7E816D046&displaylang=en

•        Online docs: http://msdn2.microsoft.com/en-us/library/bb448854.aspx

•        Office Open XML Formats Resource Center:  http://msdn2.microsoft.com/en-us/office/bb265236.aspx

•        XML in Office Developer Portal: http://msdn2.microsoft.com/en-us/office/aa905545.aspx

•        MSDN Forum: Open XML Format SDK: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1647&SiteID=1

 

I'm really impressed with the folks on the Open XML SDK team.  I can't wait to see what they come up with next!

 

Good job, team!

Posted by EricWhite | 0 Comments
Filed under: ,

Stop the Hypocrisy

Jan van den Beld has a great post that exposes the hypocisy of the anti-Open XML community.  An excerpt from his post:

Many views should come to bear on decisions made in standards bodies. It is ok to represent your views. It’s healthy. But it seems inappropriate to me to tear down anyone and any organization who disagrees with you through allegations of corruption. It is time to move on, particularly when you are throwing rocks while living in a house with LOTS of glass. Let the voices of 61 countries around the world stand and stop the attacks, if only to stop highlighting your own hypocrisy.

Posted by EricWhite | 2 Comments
Filed under:

Standards Norway Responds to Allegations

Steven McGibbon notes that Standards Norway have posted a press release "Standard Norges behandling av OOXML for avstemming i ISO". He has posted a quick translation.  He will replace with an official text if Standards Norway releases an English version.

Oliver Bell also has a good post about alleged irregularities in Germany and Norway.

Posted by EricWhite | 1 Comments
Filed under:

FUD About Availability of the Final Specification

Some of the opponents of Open XML have stated that because a final draft of the specification with all approved changes is not available, National Standards Bodies should decide to vote Disapprove.

This is pure FUD, and they know it.

The final draft of the specification can't be made available until after the vote reconsideration period, per ISO/IEC JTC1 rules.  Furthermore, this was announced at the Ballot Resolution Meeting.  Participants were instructed that national bodies need to base their decision on the documents that came out of the BRM.  Doug Mahugh, who was a member of the US delegation to the BRM, blogged about this.  The opponents of Open XML understand this process fully, but are deliberately spreading disinformation to attempt to obfuscate the issue and derail the process.  They should behave more ethically, and stop it.

I've done some research into the exact rules that apply.  Bear with me here, as understanding the procedures fully gets a little technical.  However, the facts are the only recourse that we have when confronted with outright deception.  And my experience is that when these opponents are confronted with the facts, they simply can't respond.

Usually, the opponents say that this period following the BRM comprises the "Final Draft International Specification (FDIS) Vote".

The approval of DIS 29500 falls under the "JTC 1 Directives", which apply to the Joint Technical Committee #1.  The two parents of JTC1 are ISO and IEC. Section 13 of these JTC 1 Directives describes the JTC 1 Fast-Track process, and there is NO "FDIS vote" in this JTC 1 Process.

Here are the facts.  The DIS 29500 vote occurred last September - all that is happening now is a reconsideration of National Body voting positions based on the work done up to and through the BRM. That is why there is no announcement of a new vote, and why the electronic ballot system is not used.

The opponents are using confusion about the exact procedures of the Fast-Track process to spread their misinformation.  There is a different ISO/IEC Fast-Track Process (as opposed to the JTC1 Fast-Track Process), which is described in the ISO/IEC Directives Part 1, Annex F.  This Fast-Track Process applies to other Technical Committees in either ISO or IEC, but not to JTC1.  In that ISO/IEC Fast-Track Process, there is an FDIS vote that lasts two months, and is simply a yes/no vote.  No BRM follows that vote.  There are other important differences, but it is key to understand that the other Fast-Track Process does have a later FDIS vote, but the JTC1 Fast-Track Process does not.

Some individuals and National Bodies who have not studied the JTC1 Directives in detail may be under the misunderstanding that this is an FDIS vote.  The opponents of Open XML are happy with this misunderstanding, of course.

JTC 1 Directives Section 9.6 states:

For a FDIS/DIS/FDAM/DAM/FDISP to be approved, the count taken by ITTF shall meet the following criteria:

  • At least two-thirds of the P-members voting shall have approved;
  • Not more than one-quarter of the total number of votes cast are negative.

In the case of the JTC1 Fast-Track Process, of the five options in section 13.9 (FDIS/DIS/FDAM/DAM/FDISP), the relevant option is DIS, not FDIS.

JTC 1 Directives Section 13.9 states, "If, after the deliberations of this ballot resolution group, the requirements of 9.6 are met, the Project Editor shall prepare the amended DIS (or DAM) and send it to the SC Secretariat who shall forward it to the ITTF for publication as an IS."

Notice the sequence of events here:

  1. Complete the BRM.
  2. See if the requirements of section 9.6 are met (e.g., 2/3 of voting P-Members vote Yes, etc.)
  3. Then the Project Editor prepares the amended DIS.

There is no statement that the Project Editor must prepare a draft before ISO/IEC determines if the requirements of 9.6 are met.  In fact, the exact opposite is true.

JTC1 Directives 13.12 is as follows:

The time period for post ballot activities by the respective responsible parties shall be as follows:

  • Immediately after the vote, ITTF shall send the results of the vote to the JTC 1 Secretariat and to the SC Secretariat, and for the latter to distribute the results without delay to its NBs, to any NBs having voted that are not members of the SC and to the proposer;
  • As soon as possible after the distribution of the results of the vote to its NBs but in not less than two and one-half months the SC Secretariat shall convene a ballot resolution group meeting, if required;
  • In not more than one month after the ballot resolution group meeting the SC Secretariat shall distribute the final report of the meeting and final DIS text in case of acceptance.

Notice that the final DIS text is distributed only in case of acceptance.

ISO/IEC clearly understood this when they ruled that there would be no new documents released before the end of March.  ECMA has no say about this.

So what should a National Standards Body do (or probably already has done)?

Excerpting from Doug's blog post:

  • Study the original DIS29500 submission, or the ECMA-376 specification. That's the starting point: the standard as submitted to ISO/IEC. We started studying this in the US V1 committee in January 2007, and some people were looking at it even earlier as it went through the Ecma process in 2006.
  • Next, study the national body comments submitted with the votes last September. The comments from your country show the main concerns of your country, so you'll want to focus on those first, but you can also review other countries' comments as well. (You'll see a lot of duplication, including word-for-word duplication of specific comments across many countries -- that's from the "denial of service attack" strategy the anti-Open XML crowd was using during the ballot period.)
  • Now take a close look at the proposed dispositions that Ecma distributed on January 14. For each national body comment, Ecma proposed a change to address the problem (in the majority of cases), or explained why they felt a change wasn't a good idea at this time (in a small minority of comments).
  • The final set of documents to review is the BRM resolutions that describe changes approved at the ballot resolution meeting. These are solutions to technical problems, editorial changes, or other changes that were suggested by BRM attendees and approved by the majority of countries attending the BRM.

What should the opponents of Open XML do?

They should stop deliberately misleading National Standards Bodies about the process.

Posted by EricWhite | 9 Comments
Filed under:

Czech Republic Votes Yes on DIS 29500

Stephen McGibbon has posted on Czech Republic voting "Yes" on Open XML.  The CNI press release is here.

CNI is an ISO/IEC JTC1 P Member.

An excerpt from the press release:

During processing the standard proposal ISO/IEC DIS 29500 CNI was observing the maximum openness and transparency of the whole process and created conditions allowing every interested person to join the expert discussion. All received suggestions were carefully discussed and their enlistment into the standard proposal considerably contributed to the improvement of its technical expertise.

Posted by EricWhite | 3 Comments
Filed under:

A Tale of Two Debates

So I've been in this job for nearly three months.  Seems much longer than that.  I've been feeling the need to blog about the nature of the debates around Open XML and the ISO process.

There are two separate and distinct debates running.

 

One of them is a reasoned, fairly dispassionate discussion about the technical issues, IPR issues, process issues and the like.  Contributors to this debate are people such as Patrick Durusau, Jan van den Beld, Rick Jelliffe, and Alex Brown.

 

The other is a weird debate by people at the fringes.  And these people are pursuing a policy of "Defeat Open XML at *ALL* costs, regardless of the consequences for the future regarding their relationship to ISO or the National Bodies of the 'O' countries.  Rob Weir, Bob Sutor and Andy Updegrove seem to make the majority of the noise in this debate, although there are others.

 

It's simply my hope and belief that the obstructionist elements will not prevail, and that this standard will pass, and allow the world to get on with the fun and challenging tasks of building good tools and good office suites based on the standard.  I just can't wait until I get to help some large companies implement some powerful tools using Open XML that revolutionize the way that they create, edit, search, and store documents.

 

Anyway, just to illustrate, here are some links that I think epitomize the nature of the two debates.

 

IBM disenfranchises 'O' member countries when convenient.   How do all these 'O' member countries feel about how they were welcomed by Rob and IBM UNTIL it becomes convenient for IBM to try to marginalize the 'O' members?  And I have another question:  How do the people at ISO feel having Rob and IBM attack their very integrity, when the BRM was run according to the ISO rules? I simply have to think that upper management at IBM isn't aware exactly of what Rob is doing, otherwise cooler heads would have prevailed.

 

There are certain 'O' member countries that have important standards people who are also employees of IBM; nothing wrong with that, but it sure puts these people in an awkward situation.  Their own employer is marginalizing the influence of their own country, whose interests they are supposed to represent as a standards body participant.

 

From the ISO/IEC DIS 29500 Ballot Resolution Meeting (BRM) Frequently Asked Questions (FAQ):

 

http://www.jtc1sc34.org/repository/0932.htm#q6-8

 

6.8  If votes are taken during the BRM, who votes?

Those present.

 

And what about basically calling Patrick Durusau a corrupt person, with no evidence presented whatsoever?  I honestly don't know how the accuser lives with himself.

 

Gray Knowlton makes an interesting point about the intimidation of National Bodies by Bob Sutor.

 

But fortunately the other debate continues.  Of course, not everyone in the debate agrees with Ecma, or with Microsoft.  But the discourse is basically a civil one.

 

I appreciate Patrick Durusau's letter on the openness of Open XML.

 

Jan van den Beld, former Secretary General of ECMA International, and his challenge to ISO/BRM critics to create a better process.

 

And on Mr. van den Beld's blog, he presents his view of the BRM proceedings.

 

Rick Jelliffe has contributed to the conversation.  I know that some people attempt to paint him with the Microsoft colors, but if you actually *read* what Rick has written and contributed to the process, you can see the technical quality of his comments.  And you will see the evidence that he questions decisions made on the standard.  You can see that his goal is to improve the quality of the resulting standard.  And he certainly feels free to apply pressure to Microsoft.

 

There are many, many more examples of participants in the two debates, and their messages.  I am just struck by the disparity of the differences in quality between the two.

 

There has been a vast amount of important work accomplished during this process.  I believe that the world will see the value of an open, comprehensive, *complete* standard for word processing documents, spreadsheets, and presentations.  I sincerely encourage national standards bodies to see that the best conclusion is to place the maintenance of this standard in the hands of ISO.  Let's not waste this work.

Posted by EricWhite | 18 Comments
Filed under:

Open XML Sets the Standard in Cross-Platform Implementation

Developers can implement Open XML on a variety of platforms.

 

I've written a fair amount of code that processes Open XML markup, and it is obvious that the markup is not platform specific.  The description of paragraphs, styles, workbooks, worksheets, rows, and cells don't have anything to do with whether you are writing code for Windows, Macintosh, Linux, or the iPhone.  However, there are parts of the Open XML specification that the opponents of Open XML have said are platform dependant, such as the specification of embedded, linked objects, but this simply isn't true.  Of course, the best proof of platform neutrality are the many implementations of Open XML on a variety of platforms.

 

Here is a screen clipping of an Open XML document that is being edited using ThinkFree.com.  The screen clipping is of the Firefox web browser, and shows the editing of a document in a browser window. 

 

 

ThinkFree's rich web client runs using the Java Plug-in 1.6.0_05.  It uses JRE version 1.6.0_05 Java HotSpot(TM) Client VM.  ThinkFree's server is a Linux server running Apache:

 

 

 Hey, for those who want to live in a "No Microsoft Zone", feel free to use ThinkFree to edit your Open XML documents.  ThinkFree's browser implementation works just as well on other platforms, such as Macintosh and Safari.

 

The same document looks like this in Microsoft Word 2007:

 

 

The opponents of Open XML who say that Open XML is tied to one particular platform are spreading disinformation.

 

I've made a short list of the various ways that Open XML embodies good cross-platform design:

  • Open XML is based on standards that have been implemented on many platforms: XML and Unicode.
  • Open XML is based on the widely deployed ZIP file format, as documented in the PKWARE specification.  Open XML only uses the DEFLATE decompression method, which the first and most commonly implemented compression method for ZIP files.
  • Several countries wanted improved interoperability with existing W3C standards by eliminating dependencies on specific Web browsers, such as Mozilla Firefox, Microsoft Internet Explorer, or Apple Safari.  In the BRM, it was proposed to have a mechanism where applications can customize content for browsers according to their support for different levels of W3C HTML, XHTML, and CSS content.
  • Some people falsely believed that the spec was tied to COM.  However, object embedding and linking is implementable on multiple platforms.  The KParts example that I posted demonstrates that object embedding and linking is actually quite easy.  Some schema processors are not fully compliant with the specification.  The Open XML reference schemas were tweaked so that they could be accommodate a broad set of platforms.
  • Beyond that, Open XML allows for schema languages other than XSD for the validation of Custom XML and Structured Document Tags.   Specification conformant Open XML can be validated using RELAX NG, Schematron, and NVDL schemas.

 

The specification is very much platform agnostic.  In the BRM, it was made even more so.

 

The proof of the cross-platform capabilities are found in actual implementations:

  • Microsoft Office 2008 for the Mac uses as its native file format ECMA 376, compatible with Office 2007 for Windows.
  • ThinkFree allows users to access Open XML documents via a web interface, or through a rich client interface.  The rich client interface is supported on Windows, Linux, and Macintosh systems.
  • Other implementations, including iPhone from Apple, Dataviz Documents to Go from Palm, and Gnumeric from Gnome.

 

Posted by EricWhite | 1 Comments
Filed under:

Technical Improvements in the Open XML SDK

Sometimes I get to write a blog post that is really fun to write, and this is one of them.  This particular subject started brewing in my mind last November and December, before I started in my current job.  At the time, I was writing some code to see the most effective and approachable way to access Open XML documents using LINQ to XML.

One of the problems that I ran into is that after I had populated an XML tree from a part, there was no good place to keep that populated XDocument.  It would be possible to keep it in a dictionary, and then look it up from the part every time you need it, but this didn't appeal to me.  However, if the Open XML SDK had annotations, in the style of LINQ to XML, then after populating an XDocument from a part, we can attach the XDocument to the part.  Before populating the XDocument, we first check to see if we already have one.  Well, annotations have been added to the April 2008 CTP of the Open XML SDK.

This makes it easier to deal with the XML contained in the parts.  All a developer needs to do is to load the WordprocessingDocument, and get the XDocument for specific parts as necessary.  If the XDocument has already been loaded, the work to load it will not be repeated.

There are more sophisticated uses of this new feature.  One possible enhancement: automatically reserialize the XDocument objects back to the package if the XDocument was changed.  I'll be blogging more on this.

In the following example, I've written an extension method, GetXDocument, that you can call on any OpenXmlPart.  You can see how this method uses annotations.

public static XDocument GetXDocument(this OpenXmlPart part)
{
    XDocument xdoc = part.Annotation<XDocument>();
    if (xdoc != null)
        return xdoc;
    using (StreamReader streamReader = new StreamReader(part.GetStream()))
        xdoc = XDocument.Load(XmlReader.Create(streamReader));
    part.AddAnnotation(xdoc);
    return xdoc;
}

 

Here is the entire example:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using Microsoft.Office.DocumentFormat.OpenXml.Packaging;
using System.Xml;
using System.Xml.Linq;

namespace OpenXmlSdkExample
{
    public class Comment
    {
        public int Id { get; set; }
        public string Text { get; set; }
        public string Author { get; set; }
        public Paragraph Parent { get; set; }
        public Comment(Paragraph parent) { Parent = parent; }
    }

    public class Paragraph
    {
        public XElement ParagraphElement { get; set; }
        public string StyleName { get; set; }
        public string Text { get; set; }
        public IEnumerable<Comment> Comments()
        {
            XNamespace w =
              "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XElement p = ParagraphElement;

            var commentIds = p
                             .Elements(w + "commentRangeStart")
                             .Attributes(w + "id")
                             .Select(c => (int)c);

            return
                commentIds
                .Select(i =>
                    new Comment(this)
                    {
                        Id = i,
                        Author =
                            Parent.MainDocumentPart.CommentsPart.GetXDocument()
                            .Root
                            .Elements(w + "comment")
                            .Where(c => (int)c.Attribute(w + "id") == i)
                            .First()
                            .Attribute(w + "author")
                            .Value,
                        Text =
                            Parent.MainDocumentPart.CommentsPart.GetXDocument()
                            .Root
                            .Elements(w + "comment")
                            .Where(c => (int)c.Attribute(w + "id") == i)
                            .First()
                            .Descendants(w + "p")
                            .Select(run => run
                                           .Descendants(w + "t")
                                           .StringConcatenate(e => (string)e)
                                           + "\n")
                            .Aggregate(new StringBuilder(), (sb, v) => sb.Append(v), sb => sb.ToString())
                            .Trim()
                    }
                );
        }
        public WordprocessingDocument Parent { get; set; }
        public Paragraph(WordprocessingDocument parent) { Parent = parent; }
    }

    public static class LocalExtensions
    {
        public static string DefaultStyle(this WordprocessingDocument doc)
        {
            XNamespace w =
              "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XDocument styleXDocument = doc.MainDocumentPart.StyleDefinitionsPart.GetXDocument();
            return (string)(
                from style in styleXDocument.Root.Elements(w + "style")
                where (string)style.Attribute(w + "type") == "paragraph" &&
                      (string)style.Attribute(w + "default") == "1"
                select style
            ).First().Attribute(w + "styleId");
        }

        public static IEnumerable<Paragraph> Paragraphs(this WordprocessingDocument doc)
        {
            // a good convention to use is to name the XNamespace
            // variable with the same name as the namespace prefix,
            // and to name XName variables with the local name of the element
            XNamespace w =
              "http://schemas.openxmlformats.org/wordprocessingml/2006/main";
            XName r = w + "r";
            XName ins = w + "ins";
            string defaultStyle = doc.DefaultStyle();

            // query for all paragraphs in the document.
            return
                from p in doc
                          .MainDocumentPart
                          .GetXDocument()
                          .Root
                          .Element(w + "body")
                          .Descendants(w + "p")
                let styleNode = p
                                .Elements(w + "pPr")
                                .Elements(w + "pStyle")
                                .FirstOrDefault()
                select new Paragraph(doc)
                {
                    ParagraphElement = p,
                    StyleName = styleNode != null ?
                        (string)styleNode.Attribute(w + "val") :
                        defaultStyle,
                    // in the following query, we need to select both
                    // the r and ins elements in order to assemble the text
                    // properly for paragraphs that have tracked changes.
                    Text = p
                           .Elements()
                           .Where(z => z.Name == r || z.Name == ins)
                           .Descendants(w + "t")
                           .StringConcatenate(element => (string)element)
                };
        }

        public static XDocument GetXDocument(this OpenXmlPart part)
        {
            XDocument xdoc = part.Annotation<XDocument>();
            if (xdoc != null)
                return xdoc;
            using (StreamReader streamReader = new StreamReader(part.GetStream()))
                xdoc = XDocument.Load(XmlReader.Create(streamReader));
            part.AddAnnotation(xdoc);
            return xdoc;
        }

        public static string StringConcatenate<T>(this IEnumerable<T> source,
            Func<T, string> func)
        {
            StringBuilder sb = new StringBuilder();
            foreach (T item in source) sb.Append(func(item));
            return sb.ToString();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open("Test.docx", true))
            {
                Console.WriteLine(wordDoc.DefaultStyle());
                foreach (var p in wordDoc.Paragraphs())
                    Console.WriteLine("{0}:{1}", p.StyleName.PadRight(20), p.Text);
            }
        }
    }
}
 

Posted by EricWhite | 9 Comments
Filed under: ,

Spreadsheet Simulation Demonstrating an Embedded KPart on Linux

In the review of the Open XML specification, several national standards bodies submitted comments regarding the embedding of linked objects.  The complaint was that Open XML markup was tied too specifically to Microsoft's technologies, i.e. OLE.

Ecma's response was something along the lines of this:

OLE is referenced within the DIS 29500 specification, but as a generalized approach for embedding objects from any external component technology. OLE in this context does not refer to Microsoft's OLE technology.  Instead, it refers to the generalized abstraction of embedding and linking objects within a document or spreadsheet. The oleLink element is analogous to §9.3.3 of the ODF standard where the draw:object-ole element is defined.

This isn't verbatim, but is the gist of the response. 

This particular response was singled out for criticism by some of the opponents of Open XML.  Some people didn't believe that other object embedding technologies, such as KParts, could be implemented using the existing Open XML markup.

My friend, Bob McClellan, and I decided to put together a proof-of-concept that shows that the markup in DIS 29500 is capable of configuring other object linking and embedding technologies than OLE.  Bob wrote a small program in C++, to be run on Linux, which shows the use of the markup to embed a KPart.  This program is a "simulated spreadsheet" application.  The parts to render the spreadsheet from the markup is pretty simplistic.  It is there just to provide a framework and context so that the embedded KPart can be put on the window in a fashion that would be similar to how you would do it in a real application.  The following screen shot shows the simulated spreadsheet with an embedded HTML viewer KPart:

 

In the above screenshot, you can see the data in the simulated cells.  The cells contain: 111, 222, 333, etc.  You can also see the embedded HTML viewer.

The following blog page presents the complete story of embedding a KPart object using DIS 29500 markup.  It also contains complete source code.  Those who are interested are free to compile and link this KDE application, and see the instantiation of the embedded KPart.

Here is the blog page: Spreadsheet KPart Simulation

Posted by EricWhite | 3 Comments
Filed under:

An Analysis of the ODF Alliance's "Top 10 Worst Dispositions"

The stridency and shrillness of the anti-Open XML contingent reached a new high a few weeks ago with their release of "Ecma's Top 10 Worst Responses to NB Comments."  Mauricio Ordoñez analyzed their complaints in detail.  You can find the results here.  I really appreciate how Mauricio summarized the situation when he said, "It seems the DIS29500 project editor did a good job of addressing the ODF Alliance concerns if these 11 represent what they consider the “Top Worst”.
Posted by EricWhite | 1 Comments
Filed under:

The Legacy Hashing Algorithm in Open XML

In Open XML, there is a feature whereby you can restrict editing, and allow only users who have a password to modify the file.  Now, understand, this isn’t a password that really protects the file.  It is easy to write an XML program that opens a file that has had its editing restricted, and modify the file.  It is easy to use an XML API to remove the hashing password.  In other words, this hashed password isn’t really about security of data, or elevation of privilege; it is about restricting editing for an average information worker.

Let’s not confuse what this hashing function is used for.  It’s used to control UI behavior for modifying the document.  An application that needs to implement restricted editing for legacy files will need to be able to hash a password using this algorithm.  I would recommend that applications implement the legacy hash for backwards compatibility but when re-saving the document they use one of the secure algorithms recommended by the standard.  Note that this particular feature is optional.  If an application has not implemented the hashing algorithm, the default behavior would be to open the file and allow editing.

But in the interests of furthering computing in general, a friend of mine (Bob McClellan) put together a C implementation of the obsolete hashing algorithm.  He used the Open XML specification.

As an aside, Bob McClellan is one of the best developers that I’ve ever known.  He is super competent.  You will be hearing more about Bob on my blog in the next few days.

Anyway, here is the hashing algorithm in C.  It is just sample code that we have put together.  However, we’re reasonably confident about the quality of the code.  If you find anything wrong with this code, please let us know.  We’ll fix it ASAP.

Note that this code only deals with utf-16; if you need to hash a password using any other encoding, you would need to convert the string to utf-16 first.  This exercise is left to you.

#include <stdio.h>

// Replace this typedef with a more appropriate type for UTF-16, if necessary
typedef unsigned short char16_t;

// These tables are used to generate the high word of the hash.
static unsigned short lookup_length[15] = {
    0xE1F0, 0x1D0F, 0xCC9C, 0x84C0, 0x110C, 0