• Sign In
 
  • MSDN Blogs
  • Microsoft Blog Images
  • More ...
Search
  • Advanced search options...
Tags
  • .NET
  • Altova
  • blogging
  • code samples
  • Codeplex
  • Custom XML
  • DII
  • DIS29500
  • ECMA-376
  • IBM
  • Java
  • Monarch
  • ODF
  • Office 2007
  • OpenXMLDeveloper.org
  • PHP
  • Redmond
  • SharePoint
  • System.IO.Packaging
  • TechEd
  • UOF
  • VSTO
  • Windows
  • WordprocessingML
  • workshops
Archives
Archives
  • January 2012 (1)
  • October 2011 (1)
  • July 2011 (2)
  • April 2011 (1)
  • March 2011 (3)
  • December 2010 (1)
  • August 2010 (1)
  • June 2010 (1)
  • May 2010 (1)
  • April 2010 (3)
  • March 2010 (1)
  • November 2009 (4)
  • October 2009 (1)
  • September 2009 (2)
  • July 2009 (2)
  • June 2009 (4)
  • May 2009 (5)
  • April 2009 (4)
  • March 2009 (4)
  • February 2009 (2)
  • January 2009 (4)
  • December 2008 (4)
  • November 2008 (3)
  • October 2008 (4)
  • September 2008 (3)
  • August 2008 (2)
  • July 2008 (5)
  • June 2008 (7)
  • May 2008 (5)
  • April 2008 (8)
  • March 2008 (14)
  • February 2008 (15)
  • January 2008 (13)
  • December 2007 (12)
  • November 2007 (5)
  • October 2007 (9)
  • September 2007 (6)
  • August 2007 (10)
  • July 2007 (9)
  • June 2007 (8)
  • May 2007 (12)
  • April 2007 (14)
  • March 2007 (12)
  • February 2007 (10)
  • January 2007 (17)
  • December 2006 (14)
  • November 2006 (10)
  • October 2006 (11)
  • September 2006 (12)
  • August 2006 (12)
  • July 2006 (12)
  • June 2006 (23)
  • May 2006 (14)
Common Tasks
  • Blog Home
  • Email Blog Author
  • About
  • RSS for comments
  • RSS for posts

New blog: Gray Matter

Doug Mahugh - Office Interoperability
MSDN Blogs > Doug Mahugh > New blog: Gray Matter

New blog: Gray Matter

Doug Mahugh
18 Dec 2007 12:02 PM
  • Comments 2

Gray Knowlton

People often ask me how much smaller Open XML documents are than corresponding Office binary documents. It's a hard question to answer with any precision, because the difference in size is so dependent on the document content. For example, a long simple text document will compress by orders of magnitude, a typical document with graphics and other types of content compresses somewhat less, and you can even create documents that are a bit larger in Open XML format than they are in the binary formats.

Gray Knowlton's post on "File size reduction for Open XML" covers some of the issues involved, and explains what you can expect in general terms. Gray is a group product manager in Office, and I'm on his team. We have a lot going on around Open XML these days, and this post is the first in a 3-part series he'll be doing on file size reduction, document "sanitization" and improvements in document format security. You can subscribe to Gray's blog right here, and you can expect he'll have some interesting things to say about Open XML going forward.

  • 2 Comments
Comments
  • Geek Lectures - Things geeks should know about » Blog Archive » New blog: Gray Matter
    18 Dec 2007 1:23 PM

    PingBack from http://geeklectures.info/2007/12/18/new-blog-gray-matter/

  • Dave S.
    19 Dec 2007 1:36 PM

    I downloaded Gray's test cases and found that zipping the Test 5 MSO binary format resulted in a file 58% (15/26ths) of the MSO-XML format file size.

    Zipping the MSO-XML file resulted in an 85% (22/26) reduction, for a zip-off final that leaves zipping MSO binary with a compression  advantage of 70% file size over zipping MSO-XML.

    After extracting and re-zipping the MSO-XML file the file size is 14k, or (14/15)  for a 7% reduction from the zipped MSO binary file.

    It looks like zipping the old format is the easist way to get a decent reduction.

    Resuffixing (to .zip), extracting, resuffixing (to .docx), and re-zipping the MSO-XML contents offers a slight  size advantage for a lot more work than click & zip.

    A comparison of the zipped files shows the MS-zip algorithm is consistently worse than WinZip's. The largest difference (% and byte count)  was on the document.xml file, which MS-zip squeezed it down 97% and WinZip to 99%. This shows MS-zip at a 3% to 1% disadvantage. (All values referring to Test 5.)

Page 1 of 1 (2 items)
  • © 2012 Microsoft Corporation.
  • Terms of Use
  • Trademarks
  • Privacy Statement
  • Report Abuse
  • 5.6.402.223