Welcome to MSDN Blogs Sign in | Join | Help

XML's overhead will open wallets?

There's a new article on the overhead that XML creates on networks, and what can be done about it:"Eyes, wallets wide open to XML traffic woes in 2005" This is a topic near and dear to my heart:  I've been involved in  long-running threads on the xml-dev mailing list on this, and gave a paper / presentation on this at the XML 2004 conference.  Let's look at the points raised in the serarchwebservices article in some detail: I think it addresses a real challenge that people are having with XML, but it paints a somewhat misleading picture of the alternative solutions.

 

First, the article begins: "Enterprise affection for XML Web services may have C-level hearts fluttering over the immediate efficiency and productivity gains, but the other shoe is about to drop in this relationship." The obvious rejoinder is that in most organizations, human efficiency and productivity gains add vastly more to the bottom line than savings on hardware and wired network bandwidth, which gets cheaper and cheaper all the time.


Next, it is true that many people are starting to "realize en masse how taxing XML is on enterprise networks".  This is true, but only in a couple of fairly specific scenarios. As I put it in the XML 2004 paper:

 

It is quite clear from surveying the research in this area that XML really does impose a significant overhead on a significant set of real-world applications, especially those in enterprise-class transaction processing environments and those involving wireless communication. In both scenarios it is clear that developers, vendors, and customers desire the benefits of standards-based portability and interoperability, but are unable to use XML in its current form.

Furthermore, currently deployed technological fixes do not alleviate this pain for these two classes of users. As for reducing size, conventional text compression algorithms do not work at all on the short messages with little redundant text that are common in web services applications and preferred by wireless developers. Likewise, the studies noted above generally show that the processing of of these algorithms often negates any perceived performance benefit from reducing the amount of bandwidth needed to send message. Furthermore, "throwing hardware at the problem" is not a viable option for battery powered mobile devices with intrinsically limited bandwidth and where every extra CPU cycle drains the battery all the sooner.

 

Let's be clear, however -- this refers to a relatively small number of use cases in which XML could be valuable, but its size or processing overhead stands in the way of its widespread use today.  The article says "Users and experts expect 2005 to be the year companies realize en masse how taxing XML is on enterprise networks, sparking a spending spree on XML acceleration products and optimized appliances that offload this burden."  Time will tell, of course, but I would find these predictions more credible if the article itself didn't have some factual errors.

 

For example,  the author asserts that "standards bodies like the World Wide Web Consortium (W3C) work in the shadows on the ratification of a single binary XML standard that could bring an about-face to the commitment companies have to the ASCII text encoding that is currently the foundation of XML 1.0."  This is not even close to being true.  Besides the technical point that XML 1.x defines a Unicode text, not ASCII encoding :-) the real objective of the W3C XML Binary Characterization Working Group  is:

 

 gathering information about uses cases where the overhead of generating, parsing, transmitting, storing, or accessing XML-based data may be deemed too great for a particular application, characterizing the properties that XML provides as well as those that are required by the use cases, and establishing objective, shared measurements to help judge whether XML 1.x and alternate (binary) encodings provide the required properties.

 

The W3C is explicitly not ratifying a single binary XML standard, it is investigating whether that is even worth attempting.  The early indication seems to be that while specialized, proprietary binary formats are widespread across the XML industry, finding a generalized standard binary format will be somewhere between politically difficult and technically impossible.

 

Finally, the article quotes James Kobielus of the Burton Group

Network managers are going to implement these XML acceleration appliances to offload the overhead of XML processing from application servers so [the app servers] can focus on their core competency, which is business logic.

 

I'm highly skeptical of this, although I am intrigued by the capabilities of the XML acceleration appliances.  First, as it stands now, the acceleration appliances can only be used by network managers to offload processing of standalone operations such as XSLT transformations or processing WS-Security SOAP headers.  Using them to offload the time consuming aspects of XML processing from general-purpose hardware requires more involvement and investment from the industry as a whole.  Examples would include  software products that detect the presence of XML acceleration hardware and use APIs that exploit it, and standards for efficiently exchanging parsed XML Infosets across hardware components in a distributed system.  

 

Where does that leave us?  As I see it (and stealing from my XML 2004 presentation):

 

  • We have to deal with the reality that XML really requires too much bandwidth for many wireless scenarios, and requires far more processing resources than equivalent formats in transaction processing scenarios.  Moore's Law won't make them go away, because it doesn't apply to wireless bandwidth or batteries. The bare facts are not really in dispute, what is in dispute is how to reduce the costs without destroying the benefits of XML.  There are numerous alternatives, including XML-specific compression algorithms and improved XML text parsers, that are being researched that would not require end-user eyes or wallets to be opened.
  • No known alternative offers anything resembling a silver bullet.  There are probably plenty of alternative serializations of the XML Infoset that would be both smaller to transmit and faster to parse than XML text, but whether they offer enough value to justify putting them into the XML core  is not at all clear. Likewise it is clear that dedicated hardware  components can parse XML more efficiently than conventional parsers, but it is much less clear whether this translates into more cost-effective systems.
  • As with all software optimization, the first thing to do is to determine where the bottlenecks are and figure out how to address them.  Many of the "XML is bloated and slow" complaints I hear could be alleviated by being more clever about how technology is used: "Doctor, Doctor it costs me lots of money per megabyte of XML I download to my mobile device."  Uhh, get a better mobile data plan? Or,  "Doctor, doctor, it hurts when I try to process a 1-MB file to find the two attribute values I need"!  Uhh, so don't DO that.  Don't use expensive validation unless you really get value from it, restructure the XML so that a pull parser or SAX can find what you need quickly, use the right tool for the job, whatever it takes.
  • Use enterprise-class tools to do the heavy lifting: Leverage the support for XML in database products such as SQL Server 2005 to pull out small chunks of relevant XML rather than forcing the parser to do that job. Use the fastest XML technology available, even if it costs money.
  • Accept that premature standardization is the problem, not the solution.  It is probably best to let individual industries such as wireless figure out serializations that meet their needs and then come to more global organizations such as W3C for standardization.  It may be that experimentation and evolution brings us to a single, optimal serialization format toward which we can all migrate, but it is a very good bet that design-by-committee and consortium politics will not.   Yes, there will be a period of confusion and inefficiency as developers have to support multiple formats for different user bases, and it will probably be obvious what we should have done in 20:20 hindsight.  But so long as the alternative formats are relatively simple, it should be no more difficult to handle diversity than it is to handle the multiple graphics formats that are in widespread use -- even mobile devices typically support JPEG, GIF, and PNG.

Published Tuesday, December 28, 2004 2:05 PM by mikechampion
Filed under:

Comments

# re: XML's overhead will open wallets?

Tuesday, December 28, 2004 4:18 PM by damien morton
For me, binary xml is way overdue. Im working on real-time financial applications, and have found that serialization and deserialization of xml messages is a very significant overhead.

I know that xml is text, and therefore 'open', but I dont believe that would be a problem if binary xml was standardised. As noted in the text of the paper, most "view source" commands are actually displaying a serialised version of their internal data structures. A standardised binary xml would have a plethora of "view source" tools available for it.

# re: XML's overhead will open wallets?

Tuesday, December 28, 2004 4:40 PM by Stephane Rodriguez

# re: XML's overhead will open wallets?

Tuesday, December 28, 2004 4:43 PM by Stephane Rodriguez

By the way, I think those pointing Xml overhead are very right when it comes to using taxonomies rather than simple xml.
As a corollary, what about a "simple xml" proposal (without entities, and I'll go as far as pointing without support for attributes either) ?

# re: XML's overhead will open wallets?

Tuesday, December 28, 2004 5:13 PM by Mike Champion
Stepahnie Rodriguez: Thanks for the XML Optimization link! That is the kind of "work smarter" thing I was talking about.

damien morton: I agree; the paper http://www2003.org/cdrom/papers/alternate/P872/p872-kohlhoff.html I cited in the XML 2004 stuff opened my eyes on that. The question is whether a standard binary XML format that meets the needs of the financial industry will also meet the needs of others, e.g. the wireless folks. That's an open question, but from what I've seen, I'm not terribly optimistic. I hope I'm unduly pessimistic!

# re: XML's overhead will open wallets?

Wednesday, December 29, 2004 5:16 AM by Kurt Cagle
One of the central problems with any new technology (not that XML is all that "new" anymore) comes from learning the best methodologies involved in the handling of that technology. Binary serialization of XML misses the point that XML is an abstract representation of data, and the moment that you move into the binary realm the abstraction goes away; at which point you're basically left with the whole DCOM/CORBA mess of attempting to rectify simplify data types, not to mention complex object types with differing methods of platform serialization.

Thus, certain sectors that are dealing with the fairly high bandwidth cost of that XML need to make some ugly decisions: Forego standardization and platform neutrality by choosing a specific format (albeit one not standards "blessed") for binary encapsulation of the XML, or rethinking the proposition concerning how much XML needs to actual go over the wire and be serialized in the first place. My suspicion is that better designed schemas, the use of "LOD" instances which can load in additional information as required, and other such optimization schemes could go a long way to alleviating the problems that XML faces in that space.

# re: XML's overhead will open wallets?

Wednesday, December 29, 2004 7:38 AM by Softwaremaker
+ 1, Mike

On a slightly different perspective and from a designing front, I have been advocating against the use of XML in all layers of a enterprise application esp. when tightly bound object technology is much more desired. In my presentations on SO(A), I have always preach on using service-messaging as communication b/w applications, NOT between tiers of an application.

However, many businesses are using XML Services as a communication mechanism JUST SO they can be seen as implementing an SO(A)... and of course, all for the wrong reasons.

Hence, many of them complain when performance suffers. What do they expect when they are making verbose calls to their own Data Access Layer via SOAP ?

# re: XML's overhead will open wallets?

Wednesday, December 29, 2004 7:54 AM by Jack
Both the links to your paper and presentations are broken. Could you please share the latest links ? Thanks.

# re: XML's overhead will open wallets?

Wednesday, December 29, 2004 9:39 AM by Mike Champion
Jack: Thanks! There was a problem in the PPT link in that it had some escaped spaces at the end. That's fixed. The other link worked ... It's possible that these are passoword protected and I just have a cookie set, but I've checked this on a couple of machines and a couple of browsers.

# re: XML's overhead will open wallets?

Friday, December 31, 2004 2:09 PM by damien morton
Kurt - I dont understand how moving from text to binary changes the level of abstraction.

Yes, you will need to come up with e.g. standardised representations of floats and ints, but I think that problem is far simpler than problem of managing text encodings in xml.

I also agree that any given binary xml standard cant solve everyones problems, but I think that any standard is better than no standard, if it makes for significantly faster processing and significantly more compact representations. From what I understand, those two goals are completely complementary. The Sun presentation on Fast WebServices is particularily interesting, in that they achieve a 5-10 times increase in processing performance, and a message size one fifth that of the equivalent soap message.
http://java.sun.com/developer/technicalArticles/WebServices/fastWS/

As far as LOD goes - for my problmes we already have LOD implemented, trimming down the message elements to the minimum possible, and still the xml processing overhead of our client-side application is too high - much higher than if the messages were binary encoded. Even the simple conversion of text to floating point accounts for 5-10% of our application performance.

Think about it this way: what is gained from converting an IEEE 8-byte double into a 12 digit text representation wrapped in ~10 character xml start and end element tags (all of which in various unicode encodings) as it travels between two parties?

We could come up with our own binary encoding, but we work in a heterogenous technology environment, and it would be better to use standardised libraries for a standardised encoding.





# Today and tomorrow: Will Indigo heal...? Explicitness is good, especially for Contracts

Thursday, February 10, 2005 5:45 AM by Christian Weyer: Smells like service spirit
New Comments to this post are disabled
 
Page view tracker