Marcelo's WebLog

Improving the world one entity at a time (now tweeting on @mlrdev)

Faster XML - don't resolve DTDs if you don't need to

Faster XML - don't resolve DTDs if you don't need to

  • Comments 2

When loading an XML document through XmlDocument.Load or XDocument.Load, the default behavior when finding a DTD reference is to go resolve the URL, which typically means one or more web requests. Often, however, the DTD is there more as a marker of what the document contains than anything else, and you might not want to reference entities or do validation.

For example, let's say I have the following code. This will load an XML document from a string, then write it out to the console.

const string xmlText =
@"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>
<html lang='en' xml:lang='en' xmlns='http://www.w3.org/1999/xhtml'>
<head><title>Hello</title></head>
<body><h1>Hello from XHTML</h1></body>
</html>"
;

XmlDocument
doc = new XmlDocument();
doc.LoadXml(xmlText);
Console.WriteLine(doc.InnerXml);
Console.ReadKey();

The problem here is that there is no reason for us to access the DTD for this particular application, and yet the following requests will go out.

http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent 

The first solution, available on Silverlight, is to use the XmlPreloadedResolver. By default it will contain DTDs for XHTML and RSS, but you can add your own. In this case, you get the same level of functionality as if the real resources had been requests. For example, you can use entity references such as &nbsp; to mean "non-breaking white space". If you assign an instance as your XmlResolver, you're good to go.

A second solution, which is also available on the full .NET Framework, is to simply set the XmlResolver property to null. In this case, no requests will be made at all, but you won't be able to use entity references nor have the document validated.

Note that this isn't just available on XmlDocument, but also through XmlReaderSettings, anytime you're creating an XmlReader (which can of course be used with both XmlDocument and XDocument, along with many other parts of the framework).

In the next release of the .NET Framework, there will be a few other ways in which you may save the resource requests, but these options are available to you today.

Enjoy!

  • I didn't realise the XmlDocument class did that!

    Are the DTD requests cached at all, or are they retrieved every time?  What happens if a network connection isn't available - does the lookup just fail silently?

  • Daniel, I think the response I wrote was lost (I was messing with my network settings while submitting), but I've upgraded it to a full post, should be up on Monday.

Page 1 of 1 (2 items)