<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="http://blogs.msdn.com/utility/FeedStylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx</link><description>I've blogged in the past about some of the things we did with the SpreadsheetML format to help improve the performance of opening and saving those files. While a file format most likely won't affect the performance of an application once the file is loaded,</description><dc:language>en-US</dc:language><generator>CommunityServer 2.1 SP1 (Build: 61025.2)</generator><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#875712</link><pubDate>Thu, 26 Oct 2006 13:49:48 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:875712</guid><dc:creator>Adam</dc:creator><description>&lt;p&gt;IMHO, optimisation of what is intended to be a long-lived format, in order to suit current parsers and CPUs is a bad trade-off.&lt;/p&gt;
&lt;p&gt;If people want the next generation of file formats to last at least 20 years (not unreasonable, I've heard requests for a 100-year format), then lets consider where we'll be in 10 years time.&lt;/p&gt;
&lt;p&gt;Lets see, if Moore's Law holds, CPUs will have 2^7 times as many transistors, or be roughly 100 times more powerful than they are now. Not only that, but with a 2 year release cycle, you could completely rewrite your parser *5* times in order to squeeze every last instruction out of the encoding and decoding processes.&lt;/p&gt;
&lt;p&gt;And that's only half-way through a short lifecycle.&lt;/p&gt;
&lt;p&gt;If you're sacrificing simplicity, consistency, ease of (re)implementation, etc... in the standard, which is almost guaranteed to outlast any given implementation of it, to make the current (reference?) implementation quicker, I personally think that's a bad trade-off.&lt;/p&gt;
&lt;p&gt;Are MS really thinking 20+ years down the line when they're considering some of the decisions here?&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#875906</link><pubDate>Thu, 26 Oct 2006 15:56:13 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:875906</guid><dc:creator>Francis</dc:creator><description>&lt;p&gt;Hitting the limits is not as hard as you think. I hit the old 65k row limit. I have also hit the new 1 million row limit.&lt;/p&gt;
&lt;p&gt;Many data archives, e.g. the renowned General Social Survey and World Values Surveys, have hundreds, if not thousands, of columns, and hundreds of thousands, if not millions of rows. And the U.S. census, even a 1% extract, would still overwhelm Excel 2007 with 3 million rows.&lt;/p&gt;
&lt;p&gt;Incidentally, have you tried opening the part 4 of the OpenXML spec in Word B2TR? It takes about 10 minutes on my computer, and that is not including post-opening repagination.&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#875930</link><pubDate>Thu, 26 Oct 2006 16:17:51 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:875930</guid><dc:creator>hAl</dc:creator><description>&lt;p&gt;I think the 1 million row limit is too low myself. &lt;/p&gt;
&lt;p&gt;would have preffered 16 million rows by 1000 columns in such dimensions.&lt;/p&gt;
&lt;p&gt;We use excel to manipulate certain large query dumps and that often results in large amounts of rows but very few columns. Upping the limit to a million row is a major advancement for us even so much that we already use some Office 2007 test version for that. &lt;/p&gt;
&lt;p&gt;The performance of the 2007 beta version with such large files is mediocre to poor but still faster then OOo 1.x which seems uable to handle certain large loads. &lt;/p&gt;
&lt;p&gt;I hope the performance of the final version will improve quite a bit still.&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#875970</link><pubDate>Thu, 26 Oct 2006 16:38:14 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:875970</guid><dc:creator>Doug</dc:creator><description>&lt;p&gt;It would seem that Francis and hAl are manipulating data on a scale that is better suited to a database.&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#876043</link><pubDate>Thu, 26 Oct 2006 17:32:24 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:876043</guid><dc:creator>Ben Langhinrichs</dc:creator><description>&lt;p&gt;Brian - &lt;/p&gt;
&lt;p&gt;I tend to think the whole debate about tag size (your #2) if overrated, both by Rob Weir and you. &amp;nbsp;It seems unlikely to make a huge difference in speed, but if it does, I hardly think that the &amp;quot;human-readable&amp;quot; ODF tags are wildly preferable. &amp;nbsp;Let's face it, anybody manipulating XML is going to have to look up names and such anyway, so I see every reason to keep things short and simple. &amp;nbsp;This is definitely an area where Open XML seems better to me, although I do think even Open XML loses some of the advantage by nesting elements more deeply than I would like. Still, I would guess that I think that because I have not seen more complex examples where it is important.&lt;/p&gt;
&lt;p&gt;The issues in your #3 and #4 seem the more important and relevant. &amp;nbsp;Rob even states in his post that because of Microsoft's limitations on beta benchmarks, he can't use a real life example, but this is a case where his performance evaluation clearly doesn't take into account the very important logic of on-demand parsing and use. &amp;nbsp;It makes complete sense to split things up into more smaller files, and this is an area where I think ODF could be improved. &amp;nbsp;If the breaking into smaller files is done logically, and I don't know the details of how this is done in Open XML, this could make a huge difference in larger, complex documents. &amp;nbsp;If I join the ODF TC, which I am considering, I would definitely advocate for looking at how Open XML implements both #3 and #4 to see how ODF could be improved. &amp;nbsp;These are the sorts of areas where I hope both ODF and Open XML TCs look at the other spec and see what can be learned. &amp;nbsp;Not just copied, but learned.&lt;/p&gt;
&lt;p&gt;- Ben&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#876067</link><pubDate>Thu, 26 Oct 2006 17:41:20 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:876067</guid><dc:creator>sinleeh</dc:creator><description>&lt;p&gt;Dear Brian,&lt;/p&gt;
&lt;p&gt;Is there a theoretical limit on how many cells SpreadsheetML can handle, or are the limits simply the results of hardware limitation?&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#877118</link><pubDate>Thu, 26 Oct 2006 21:05:05 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:877118</guid><dc:creator>hAl</dc:creator><description>&lt;p&gt;@Doug&lt;/p&gt;
&lt;p&gt;We export data from the database because it is easier to manipulate in excel than with the database tools we have access to. Also it is easy to combine data from other databases or sorted files into the spreadsheet afterwards.&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#877275</link><pubDate>Thu, 26 Oct 2006 23:10:37 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:877275</guid><dc:creator>Francis</dc:creator><description>&lt;p&gt;Doug: much of this information is flat, and a spreadsheet is essentially a flat-file database. The data sources I mentioned has no relationships (or objects), so a relational (or object-oriented) database offers no advantages. And databases usually do not offer the sort of functions that a good spreadsheet does.&lt;/p&gt;
&lt;p&gt;Besides, easy-to-use desktop databases that handle this volume of data are few and far between. (Access' 2007 unchanged 256 column limit means it is even more limited than Excel.)&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#877543</link><pubDate>Fri, 27 Oct 2006 02:28:21 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:877543</guid><dc:creator>Dean Harding</dc:creator><description>&lt;p&gt;Adam: That's an insane argument! Don't develop performant file formats because in 7 YEARS computers will be fast enough to handle a slower one? In the meantime, nobody is going to use the slow one!&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#877665</link><pubDate>Fri, 27 Oct 2006 03:19:34 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:877665</guid><dc:creator>biff</dc:creator><description>&lt;p&gt;Adam, Dean: it's even worse than that - there are plenty of 7 year old computers around and those people buy today will stay around for years and years, so designing a format that is dog slow right now is not an option.&lt;/p&gt;
&lt;p&gt;There are people who would suffer degraded performance for their ideology, for the rest of us there are better formats :)&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#889760</link><pubDate>Sat, 28 Oct 2006 18:45:39 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:889760</guid><dc:creator>Dennis E. Hamilton</dc:creator><description>&lt;p&gt;And if you get the performance now, 7 years from now it will simply be astounding!&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#917326</link><pubDate>Wed, 01 Nov 2006 10:10:09 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:917326</guid><dc:creator>assman</dc:creator><description>&lt;p&gt;Why can't you get the best of both worlds? &amp;nbsp;Why not have both a format with long names and a corresponding version with short ones. &amp;nbsp;In fact I don't understand why you even need a textual representation of xml. &amp;nbsp;Couldn't you have a binary version of xml which could be automatically viewed as a textual representation through the action of a simple one to one translator. &amp;nbsp;All you need is some standard binary serialization format for XML. &amp;nbsp;I think W3C is doing some work in this area:&lt;/p&gt;
&lt;p&gt;&lt;a rel="nofollow" target="_new" href="http://www.w3.org/XML/Binary/"&gt;http://www.w3.org/XML/Binary/&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You would then have a zip container which when unzipped gives you a serialized binary which when unserialized gives you pure xml. &amp;nbsp;&lt;/p&gt;
&lt;p&gt;I guess the problem with the solution is that there is no standardized way of doing binary serialization of xml and therefore serializing it yourself makes your format less open. &amp;nbsp;However I think serialization could be done in an extremely simple, transparent and easy way which would be very easy to duplicate. &amp;nbsp;&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#928166</link><pubDate>Thu, 02 Nov 2006 07:25:00 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:928166</guid><dc:creator>Brutus</dc:creator><description>&lt;p&gt;@Ben&lt;/p&gt;
&lt;p&gt;I don't think &amp;quot;human readable&amp;quot; tags are worth the speed hit because humans are generally not going to be mucking with XML themselves via eye-balling. &amp;nbsp;99.999999999% of the time computer programs will be manipulating the XML, not human beings.&lt;/p&gt;
</description></item><item><title>re: Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#932074</link><pubDate>Thu, 02 Nov 2006 17:37:23 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:932074</guid><dc:creator>BrianJones</dc:creator><description>&lt;P&gt;I think it's also worth looking at the XML itself and deciding what you think is human readable.&lt;/P&gt;
&lt;P&gt;HTML, which most folks would say is human readable does the following for a table:&lt;/P&gt;
&lt;P&gt;&amp;lt;table&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;lt;tr&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;tc&amp;gt;A1&amp;lt;/tc&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;tc&amp;gt;B1&amp;lt;/tc&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;lt;/tr&amp;gt;&lt;BR&gt;&amp;lt;/table&amp;gt;&lt;/P&gt;
&lt;P&gt;SpreadsheetML does this:&lt;/P&gt;
&lt;P&gt;&amp;lt;sheetData&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;lt;row&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;c&amp;gt;&amp;lt;v&amp;gt;A1&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;c&amp;gt;&amp;lt;v&amp;gt;B1&amp;lt;/v&amp;gt;&amp;lt;/c&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;lt;/row&amp;gt;&lt;BR&gt;&amp;lt;/sheetData&amp;gt;&lt;/P&gt;
&lt;P&gt;ODF does this:&lt;/P&gt;
&lt;P&gt;&amp;lt;table:table&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;lt;table:table-row&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;table:table-cell&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;text:p&amp;gt;A1&amp;lt;/text:p&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/table:table-cell&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;table:table-cell&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;text:p&amp;gt;B1&amp;lt;/text:p&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;lt;/table:table-cell&amp;gt;&lt;BR&gt;&amp;nbsp;&amp;lt;/table:table-row&amp;gt;&lt;BR&gt;&amp;lt;/table:table&amp;gt;&lt;/P&gt;
&lt;P&gt;So, while OpenXML does use terse tag names, it's just like HTML where you just look up the meaning and from then on it's pretty straightforward.&lt;/P&gt;
&lt;P&gt;-Brian&lt;/P&gt;</description></item><item><title>Going offline for a week</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#3981924</link><pubDate>Sat, 21 Jul 2007 02:22:25 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3981924</guid><dc:creator>Brian Jones: Open XML Formats</dc:creator><description>&lt;p&gt;I'll be offline for the next week or so. Sorry if I don't answer your e-mails or comments during that&lt;/p&gt;
</description></item><item><title>Going offline for a week</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#3982303</link><pubDate>Sat, 21 Jul 2007 03:04:53 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:3982303</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;I&amp;amp;#39;ll be offline for the next week or so. Sorry if I don&amp;amp;#39;t answer your e-mails or comments during&lt;/p&gt;
</description></item><item><title>Harmonization: Finding the Differences</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#7361937</link><pubDate>Fri, 01 Feb 2008 03:08:27 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7361937</guid><dc:creator>Brian Jones: Open XML Formats</dc:creator><description>&lt;p&gt;There have been some discussions over the past several years about the harmonization of Open XML with&lt;/p&gt;
</description></item><item><title>Harmonization: Finding the Differences</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#7362587</link><pubDate>Fri, 01 Feb 2008 03:46:26 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7362587</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;There have been some discussions over the past several years about the harmonization of Open XML with&lt;/p&gt;
</description></item><item><title>More thoughts on harmonization</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#7369026</link><pubDate>Fri, 01 Feb 2008 09:55:54 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7369026</guid><dc:creator>Doug Mahugh</dc:creator><description>&lt;p&gt;Brian Jones has a post this afternoon on the concept of harmonization of document formats, and in particular&lt;/p&gt;
</description></item><item><title>More thoughts on harmonization</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#7369917</link><pubDate>Fri, 01 Feb 2008 10:54:23 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:7369917</guid><dc:creator>Noticias externas</dc:creator><description>&lt;p&gt;Brian Jones has a post this afternoon on the concept of harmonization of document formats, and in particular&lt;/p&gt;
</description></item><item><title>Brian Jones: Open XML Formats : Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#8567660</link><pubDate>Sun, 01 Jun 2008 17:09:32 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8567660</guid><dc:creator>Dating</dc:creator><description>&lt;p&gt;I've blogged in the past about some of the things we did with the SpreadsheetML format to help improve the performance of opening and saving those files. While a file format most likely won't affect the performance of an application once the file is loaded&lt;/p&gt;
</description></item><item><title>Brian Jones: Open XML Formats : Performance of an XML file format for spreadsheets</title><link>http://blogs.msdn.com/brian_jones/archive/2006/10/26/performance-of-xml-file-formats.aspx#8577190</link><pubDate>Fri, 06 Jun 2008 10:56:54 GMT</pubDate><guid isPermaLink="false">91d46819-8472-40ad-a661-2c78acb4018c:8577190</guid><dc:creator>Weddings</dc:creator><description>&lt;p&gt;I've blogged in the past about some of the things we did with the SpreadsheetML format to help improve the performance of opening and saving those files. While a file format most likely won't affect the performance of an application once the file is loaded&lt;/p&gt;
</description></item></channel></rss>