Welcome to MSDN Blogs Sign in | Join | Help

The Microsoft Office Word Team's Blog

All things Microsoft Office Word, from the Word team.
Why A New Default File Format?

Word 2007 (along with Excel and PowerPoint 2007) has new XML based default file format (.docx). I've referred to Brian Jones' blog for more info on these new formats before, and highly recommend checking out his recent post summarizing why we've adopted this new default format. We're really excited about this new format and what it means for Word, and that post does an excellent job of summarizing why.

-Jonathan

Posted: Wednesday, October 25, 2006 10:55 AM by wrdblog

Comments

Tom Edwards said:

There's one thing I don't understand about all this. If it's an open XML format, why can't view my documents in Notepad or other text editors as I would a normal XML file? I'm not applying any protection to the documents, nor am I signing them. I'm just hitting Save, and it's coming out in what I presume to be the whitepaper's 'binary'.

# October 26, 2006 6:18 AM

wrdblog said:

Hi Tom – The reason the whole file won’t open in notepad etc., is because the new formats are ZIP packages that contain multiple, normal XML files. Once taken out of the ZIP package, any of these XML files can be opened with Notepad or other text editors.

To check this out:

1) Copy a file saved in one of the new formats to your desktop.

2) With extensions showing, change the file extension to .zip

3) Open the resulting ZIP folder

4) Copy any of the XML files to your desktop

5) Open the XML file in notepad or another text editor

ZIP is used for its compression goodness and the package/parts approach is taken to make it easier to work with individual parts of the documents (like body text,  images, etc.).

-Jonathan

# October 26, 2006 1:39 PM

Tom Edwards said:

Got to say, I wouldn't have expected that! Well played. :-)

You might have opened up a can of worms now, though. I've been looking through my documents' XML trees and I've found that some paragraphs are split up into fragments, by the looks of it to specify a <code>xml:space="preserve"</code> property (the XML gives no other reason for the fragmentation: the text is uniformly formatted, etc.). <a href="http://www.steamreview.org/external/docx-extract.xml">This is one example</a>. What's going on with that para?

# October 26, 2006 3:49 PM

wrdblog said:

Hi Tom - Sorry for the delay. Would you be able to forward the affected file?

-Jonathan

# November 3, 2006 2:58 PM

wrdblog said:

Hi Tom -

If I understand your question correctly, you are curious why text runs that appear to be uniformly formatted are split up into fragments such as the second bold text run (“Yeah, what went wrong there?”) in the file you provided.

This particular fragmentation is because there is no rsidRPr on the text “went wrong”. This was likely caused by the fact that this specific text was brought into the document as bold (so we didn’t rsid stamp it as having its run properties changed versus the other segments of the run which were formatted in Word).

The structure and behavior of WordprocessingML (including the rsidRPr & rsid elements) is described in detail in the Ecma Office Open XML File Formats Standard (http://www.ecma-international.org/news/TC45_current_work/TC45_available_docs.htm).

-Jonathan

# November 6, 2006 2:02 PM

Tom Edwards said:

Actually the whole document is a transcription, so nothing was pasted in. But that does open up the possibility that it's to do with the fact that there were many unnatural pauses in my typing as I listened to what I needed to write next?

Whatever the case, I'm going to stop bothering you now. Congrats on RTM! :-)

# November 6, 2006 2:46 PM

The Microsoft Office Word Team's Blog said:

It's been awhile, but for what it's worth, the time has been spent planning a great Word vNext. Stay

# June 7, 2007 1:57 PM
New Comments to this post are disabled
Page view tracker