Welcome to MSDN Blogs Sign in | Join | Help

Use of GetElementsByTagName considered harmful

As you may already be aware, there is an article on msdn about the great performance improvements we made in the V2 xml stack.  This is a pretty big event for the team, since increasing performance was one of the big goals of this release. 

While I posted this to give a shout-out to the team for the great perf work in they did for Whidbey, I want to take a minute to point out another interesting part of the article.  Up at the top of this article you will find a this paragraph:

2.       We also found that Sun is using the GetElementsByTagName method in C#, and this results in a mismatch of the C# and Java versions. Although both harnesses use XmlElement.GetElemensByTagName(), the C# implementation of this method keeps track of a live list of nodes which is negatively impacted by some of the DOM test scenarios that have edits. A better method for element selection in the C# harness is XmlElement.SelectNodes() which matches the Java harness in functionality.

When I first read this, it caused me some concern.  Were we somehow ‘cheating’ by changing the nature of the test?  Was there something wrong with GetElementsByTagName that I (and the various xml developers out there) should be aware of?  I did a little digging, and wanted to share the results.

The W3C Level1 DOM spec has this to say about the results of GetElementsByTagName (I underlined the last sentence for emphasis):

The DOM also specifies a NodeList interface to handle ordered lists of Nodes, such as the children of a Node, or the elements returned by the Element.getElementsByTagName method, and also a NamedNodeMap interface to handle unordered sets of nodes referenced by their name attribute, such as the attributes of an Element. NodeLists and NamedNodeMaps in the DOM are "live", that is, changes to the underlying document structure are reflected in all relevant NodeLists and NamedNodeMaps.

 

In other words, GetElementsByTagName is, according to the spec, supposed to return a ‘live list’ where changes to the underlying DOM are reflected in the returned NodeList.  The MS implementation of this function conforms to the spec, but the Sun/Java implementation does not.

Unfortunately, this conformance comes at a cost.  In this case, the cost is a non-trivial hit to the speed and working set of code which uses GetElementsByTagName.  Let me explain why.

 

In order to keep the NodeLists returned from GetElementsByTagName live, each list needs to ‘register’ somehow with the related element so that it can be notified when the children of that element change.  When we first implemented this function, we thought about adding a series of events to XmlElement.  NodeLists could register for these events and react appropriately.  Unfortunately, this turns out to bloat the size of a DOM tree, and the performance costs caused by the related cache misses were unacceptable.  Because of this we moved to a model where only XmlDocument exposed ‘OnChange’ events.  When an item is added into the DOM, the XmlDocument.NodeInsterted event fires, each NodeList can determine if the new element should be added to its list or not, and then take the appropriate action.

 

This solution has the advantage that it adds no extra overhead to scenarios which do not call GetElementsByTagName, and the overhead is minimal even for scenarios which do use GetElementsByTagName, as long as those scenarios do not change the DOM (in this case the only overhead is the cost of registering for the event and a small increase in allocated memory for the created delegate). 

 

The real problem shows up when you combine calls to GetElementsByTagName and edits to the DOM.  Imagine that you had a moderate sized DOM (a few hundred elements) and for each element you called GetElementsByTagName, which results in a few hundred NodeLists registering for the XmlDocument.NodeInsterted.  Now, after walking through the tree, you add a few elements.  Each time you add an element, the NodeInserted event fires, and hundreds of delegate functions will get called.  As you can imagine, this can cause quite a hit to the performance of your application.

 

The bottom line?  If you are using GetElementsByTagName, especially in scenarios where you edit the DOM, you should investigate using SelectNodes instead.  SelectNodes takes an XPath expression, but all you would need to do to get back the same elements as what is called by GetElementsByTagName would be to pass in “\\ElementName” [edit: that should read "//ElementName] to SelectNodes where you would have passed “ElementName” in to GetElementsByTagName.  As always, you should measure how this changes your perf before switching over, especially in cases where you are not editing the DOM, since SelectNodes does a little more work to support the full syntax, but don’t be afraid to try.  Even if your perf is equivalent in both cases, you may want to look at switching over, since we are very focused on improving XPath performance, so you may find your perf improve over time along with our XPath implementation (note: as always Microsoft’s plans are always subject to change, while we currently are focused on XPath performance, some future change may alter this in the future).

 

Published Wednesday, July 20, 2005 5:58 AM by eriksalt

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# GetElementsByTagName

It's been quiet on this blog as we get closer to Whidbey RTM.  Meanwhile, Erik Saltwell - the dev...
Wednesday, July 20, 2005 2:05 AM by Microsoft XML Team's WebLog

# re: Use of GetElementsByTagName considered harmful

It's //ElementName, not \\ElementName.
Wednesday, July 20, 2005 5:17 AM by Oleg Tkachenko

# re: Use of GetElementsByTagName considered harmful

Or even "descendant::ElementName". That probably would be faster.
Wednesday, July 20, 2005 5:18 AM by Oleg Tkachenko

# re: Use of GetElementsByTagName considered harmful

Thanks Oleg (so much for my proof readong skills :-). I updated the page.
Tuesday, July 26, 2005 12:55 AM by eriksalt

# re: Use of GetElementsByTagName considered harmful

I would be surprised if descendent gave you much better perf here, since you are - by definition of the API call - going to be at the root of your subtree, so everything will be your descendant. I haven't tried it though, so I would be interested to see what the numbers look like.
Tuesday, July 26, 2005 1:05 AM by eriksalt

# re: Use of GetElementsByTagName considered harmful

I think the real problem with GetElementsByTagName is not live collection and event handlers themselves, but inability to remove/dispose collection when it is not need any more. XmlNodeList is not IDisposable, so once it is created and handlers are registered, it's for the document lifetime. Second issues is that second call to GetElementsByTagName for the same name returns new live collection and register more event handleres. In my experience these 2 factors hurt performance the most, not the live property of the collection alone.
Tuesday, November 01, 2005 2:01 PM by Dima

# re: Use of GetElementsByTagName considered harmful

Dima, thanks these are well thought out comments. There are some areas where I disagree with you - 1) It is not always for the document lifetime, there are times where we detect that the list is no longer around and destroy un-register, but this is really just a technicality. I agree that we should have put more thought into IDisposable before the code went into the wild. 2) I don't know if I would recomend that we return the same list. This has some usage implications which I don't care for.

If you would like to discuss this more, please feel free to drop me an email at erik.saltwell@microsoft.com
Wednesday, November 16, 2005 12:18 PM by eriksalt

# embattle Brussels

Tuesday, November 21, 2006 6:43 AM by embattle Brussels

# indigested London

Wednesday, November 22, 2006 2:34 AM by indigested London

# piccalilli Edinburgh

Thursday, December 14, 2006 12:07 PM by piccalilli Edinburgh

# prejudice Berlin

prejudice Berlin  http://www.mynhotel.info/sleet_Germany/cuneiform_Bavaria%20(Bayern)/prejudice_Munich_1.html

Friday, December 15, 2006 10:51 PM by prejudice Berlin

# re: Use of GetElementsByTagName considered harmful

mmm.. nice design, I must say..

Saturday, February 17, 2007 10:00 AM by ...

# re: Use of GetElementsByTagName considered harmful

Lavoro eccellente! ..ringraziamenti per le informazioni..realmente lo apprezzo: D

Saturday, February 24, 2007 8:26 AM by ...

# re: Use of GetElementsByTagName considered harmful

The information I found here was rather helpful. Thank you for this.

Monday, February 26, 2007 2:41 AM by ...

# re: Use of GetElementsByTagName considered harmful

luogo interessante, soddisfare interessante, buon!

Monday, March 05, 2007 5:34 AM by ...

# re: Use of GetElementsByTagName considered harmful

Luogo molto buon:) Buona fortuna!

Saturday, March 10, 2007 4:44 PM by ...

# re: Use of GetElementsByTagName considered harmful

luogo interessante, soddisfare interessante, buon!

Thursday, March 15, 2007 5:20 PM by ...

# re: Use of GetElementsByTagName considered harmful

Lo trovo piuttosto impressionante. Lavoro grande fatto..)

Saturday, March 17, 2007 11:48 AM by ...

# re: Use of GetElementsByTagName considered harmful

Stupore! ho una sensibilit molto buona circa il vostro luogo!!!!

Monday, March 19, 2007 2:55 AM by ...

# re: Use of GetElementsByTagName considered harmful

Interessare, molto interessante. Come avete fatto questo?

Tuesday, April 10, 2007 4:13 AM by ...

# re: Use of GetElementsByTagName considered harmful

L'information interessante que vous avez! I'am allant revenir bientot.

Thursday, April 12, 2007 10:37 PM by ...

# re: Use of GetElementsByTagName considered harmful

Thank you for clearing this up. I had huge problems when using GetElementsByTagName on an XML file of 3MB, where appending child nodes something took eons. But I didnt find the other SelectNodes() much faster (as I did alot of SelectNodes()), so I ended up writing a quick one that satisfied my needs and nothing more. (ugly but quick :)

private static XmlElement GetFirstElementByTagName(XmlElement parent, String tagName)

{

foreach (XmlNode node in parent.ChildNodes)

{

if ((node is XmlElement) && (node.Name == tagName))

{

return (XmlElement)node;

}

}

return null;

}

Tuesday, May 22, 2007 10:47 AM by redsolo

# I do not think so

I do not think so go to http://apartments.waw.pl/

Tuesday, August 14, 2007 2:31 PM by warsaw hotels

Leave a Comment

(required) 
required 
(required) 
 
Page view tracker