The interoperability team here at Microsoft has posted about a C# SourceForge open source project that converts from binary documents to Open XML. The blog post indicates that the code works with Mono, so it provides some level of portability across operating systems. The blog post also has a good explanation about the architecture of the project. There’s more good information on the SourceForge site. The Developer’s Corner on the SourceForge site has a number of good links to Open XML resources – the binary file format specs, and Open XML specs, and the Implementation Notes site.
This blog is inactive.New blog: EricWhite.com/blogBlog TOCWhat caught my eye are the well-written papers that contain detailed information about some of the issues involved in conversion. Of particular note is the guide that provides a very nice explanation of Freeform Shapes in the Office Drawing Format.
How to Retrieve Text from a Binary .doc File
A Guide to Table Formatting
The Storage of Macros and OLE Objects