Removing duplicate namespaces in XML Literals (Shyam Namboodiripad)

Removing duplicate namespaces in XML Literals (Shyam Namboodiripad)

Rate This
  • Comments 2

A common problem that one often runs into with XML literals and the LINQ to XML API is duplicate XML namespaces. Consider the following example. The code imports a default XML namespace - “hello”.

Code:

Imports
<xmlns="hello">

Module Module1
    Sub Main()
        Dim x = <A>
                    <%= <B></B> %>
                </A>
        Console.WriteLine("x:")
        Console.WriteLine(x)

        Dim y = <A></A>
        y.Add(<B></B>)
        Console.WriteLine("y:")
        Console.WriteLine(y)
    End Sub
End Module

Output:

x:
<A xmlns="hello">
  <B></B>
</A>


y:
<A xmlns="hello">
  <B xmlns=
"hello"
></B>
</A>

As you can see, in the output for variable y, the element <B> has a spurious (duplicate) XML namespace declaration. Although this XML is technically ‘correct’ (i.e. legal XML according to the XML 1.0 specification), several tools have problems consuming XML that contains duplicate namespaces.

For example, consider the below code where I load an MSBuild project file (.vbproj) and try to modify its contents. MSBuild projects are essentially just XML files, so I can use VB XML literals to work with such files. I have set the default XML namespace for my code to match the namespace of the XML in the MSBuild file. Notice that the output has the same problem as before – the <NoWarn> node that I added has a duplicate XML namespace.

If I were to save the XML produced by this code as a “.vbproj” file and build it using MSBuild, MSBuild would fail to process the file because of the duplicate namesapace on the <NoWarn> element. In other words, even though the XML is legal, MSBuild can’t consume XML that contains duplicate namespaces.

Code:

Imports
<xmlns="http://schemas.microsoft.com/developer/msbuild/2003">

Module Module2
    Sub Main()
        Dim projectFile As String = "..\..\ConsoleApplication1.vbproj"
        Dim y = XDocument.Load(projectFile)
       
Dim element =  <NoWarn>42016</NoWarn> 
        y.<Project>.<PropertyGroup>.First.Add(element)
        Console.WriteLine("y:")
        Console.WriteLine(y)
    End Sub
End Module

Output:

y:
<Project ToolsVersion="4.0" DefaultTargets="Build" xmlns="
http://schemas.microsoft.com/developer/msbuild/2003">
  <PropertyGroup>

    <NoWarn xmlns="
420164201642016http://schemas.microsoft.com/developer/msbuild/2003">42016</NoWarn>
  </PropertyGroup>

</Project>

Why does VB allow XML with duplicate namespaces to be generated?

Consider the first code example above. As you can see from the output for variable x, the VB compiler correctly figures out that a namespace declaration need not be emitted on node <B>. The compiler can figure this out because node <B> is ‘syntactically’ embedded (using an
embedded expression) inside node <A> which is already in the same namespace (i.e. “hello”).

For variable y however, it is very hard for the VB compiler to know that the node <B> is actually going to be embedded inside node <A>. To know this, the compiler would have to inspect the code flow and try to figure out that the node <B> is being passed to an function named ‘Add’ that is defined on type ‘XElement’ and that the target object for function (i.e. y) actually holds a node <A> that is already in the same namespace (i.e. “hello”). Even if the compiler were smart enough to figure this out for this case, it would be almost impossible to make it smart enough to figure this out for cases (like the second code example above) where the source XML is not part of the program (but comes from some file / network packet).

Because the compiler doesn’t know what document the node <B> is going to end up inside, it ‘fully qualifies’ it with the default namespace of the code file (i.e. “hello”).

Ok so the VB compiler can’t figure this out. Surely the LINQ to XML API can, can’t it?

Yes, the LINQ to XML API can figure out and remove duplicate namespaces. But it does not enforce the removal of duplicate namespaces by default. I think the reason it doesn’t is performance (i.e. there is a performance hit involved in checking each node to see whether the node has any duplicate namespaces). After all, the XML is legal (albeit a bit ugly) even when it has duplicate namespaces – so why force an extra namespace check always?

In VS 2010 / .NET 4.0, the LINQ to XML API provides ways to work-around this problem and generate better looking XML. You can add an ‘annotation’ to the root XElement / XDocument node that will tell the API not to emit duplicate namespaces as demonstrated in the below example. The API will then check each node as it is added and remove any unnecessary duplicate namespace declarations from the node.

Alternately, you can use ‘SaveOptions’ / ‘ReaderOptions’ as demonstrated in the examples below. In this case, the XML will be generated with duplicate namespaces, but the API will do extra work to remove the duplicate namespaces at the time of saving the XML.

Code:

Imports
<xmlns="hello"> 

Module Module1
    Sub Main()
        Dim y = <A></A>
        y.AddAnnotation(SaveOptions.OmitDuplicateNamespaces)
        y.Add(<B></B>)
        y.Add(<C></C>)
        Console.WriteLine("y:")
        Console.WriteLine(y)
    End Sub
End Module

'If you wish to save the XML to a file
Module Module2
    Sub Main()
        Dim y = <A></A>
        y.Add(<B></B>)
        y.Add(<C></C>)
       
y.Save("out.xml", SaveOptions.OmitDuplicateNamespaces)
    End Sub
End Module

'If you wish to create an XmlReader object for your XML
Module Module1
    Sub Main()
        Dim y = <A></A>
        y.Add(<B></B>)
        y.Add(<C></C>)
       
Dim reader =
        y.CreateReader(ReaderOptions.OmitDuplicateNamespaces)

    End Sub
End Module

Output

y:
<A xmlns="hello">
  <B></B>
  <C></C>
</A>

Hope this helps clean up the XML for your apps! :)


Some references from MSDN
:

XDocument.Save Method
SaveOptions Enumeration
Imports Statement (XML Namespace)

Ways to remove duplicate namespaces before VB 2010:

Bill McCarthy has a couple of blog posts about how you can clean up namespaces if you are using VS 2008 / .NET 3.5 -
http://msmvps.com/blogs/bill/archive/2007/12/09/more-on-xml-namespaces-in-vb.aspx
http://msmvps.com/blogs/bill/archive/2007/11/24/cleaning-up-your-xml-literal-namespaces.aspx

Leave a Comment
  • Please add 7 and 5 and type the answer here:
  • Post
  • For who cannot still use .Net framework 4, and thus the SaveOptions.OmitDuplicateNamespaces value, I made the following C# algorithm, that runs on C#3.5 (but can easily be adapted to .Net 2.0):

    public static void RemoveDuplicateNamespaceDeclarations(this XmlDocument xmlDoc)

           {

               var xmlElem = xmlDoc.DocumentElement;

               if(xmlElem != null)

               {

                   var namespaceMgr = new XmlNamespaceManager(xmlDoc.NameTable);

                   RemoveDuplicateNamespaceDeclarations(xmlElem, namespaceMgr);

               }

           }

           private static void RemoveDuplicateNamespaceDeclarations(

                   XmlElement xmlElem,

                   XmlNamespaceManager namespaceMgr)

           {

               namespaceMgr.PushScope();

               // xpath on "@xmlns:*" seams not to work so...

               var namespaceDefinitions = (from xmlAttr in xmlElem.Attributes.OfType<XmlAttribute>()

                                           where "xmlns".Equals(xmlAttr.Prefix, StringComparison.Ordinal)

                                           select new {

                                               Prefix = xmlAttr.LocalName,

                                               Url = xmlAttr.Value,

                                               Node = xmlAttr

                                           }).ToArray();

               foreach (var namespaceDefinition in namespaceDefinitions)

               {

                   var currentUrl = namespaceMgr.LookupNamespace(namespaceDefinition.Prefix);

                   if (!string.IsNullOrEmpty(currentUrl))

                   {

                       Debug.Assert(namespaceDefinition.Url.Equals(currentUrl, StringComparison.OrdinalIgnoreCase));

                       xmlElem.RemoveAttributeNode(namespaceDefinition.Node);

                   }

                   else

                   {

                       namespaceMgr.AddNamespace(namespaceDefinition.Prefix, namespaceDefinition.Url);

                   }

               }

               // Recurse into Child Elements

               foreach (XmlNode xmlChildNode in xmlElem.ChildNodes)

               {

                   if (xmlChildNode.NodeType == XmlNodeType.Element)

                   {

                       RemoveDuplicateNamespaceDeclarations((XmlElement)xmlChildNode, namespaceMgr);

                   }

               }

               namespaceMgr.PopScope();

           }

  • Thank you Duarte! I have also updated my post above with links to a couple of blog posts from Bill McCarthy that talk about the same thing (i.e. how to remove duplicate namespaces with VB 2008 / .NET 3.5).

    -Shyam

Page 1 of 1 (2 items)