Welcome to MSDN Blogs Sign in | Join | Help

We have recently blogged about the new XML Schema Designer and the various views over schemas offered by it. We visit here the concept of schema sets, which are actually the central organizing concept around what is shown in the designer for a buffer in Visual Studio for which we have XML Schema information (be it an XSD file or an XML file or Visual Basic project with associated schemas). A schema set can be thought of as a collection of pairs, the first part of each pair being an XML namespace and the second part being the location of an XSD file that is associated to that namespace in the set. A namespace can have multiple files associated to it in a particular set and a file can be part of a multiple namespaces (due to the XML Schema concept of chameleons, whereby a file without a targetNamespace attributed defined for its schema element automatically assumes the targetNamespace of any and all files that include it). We explore below the ways in which schema sets are built and computed and how they are visualized in the XML Schema Designer.

A schema set is basically built by walking the tree of external references from a particular set of root XSD files. An external reference is either an include element, which brings in the schema at the specified schemaLocation attribute into the schema in the same namespace as the current schema, or an import element, which specifies a particular XML namespace to import as well as, optionally, a particular schema location where the schema processor can find an XSD file for that namespace (there is also a redefine element defined in XML Schema, but we can treat as essentially equivalent to an include for this discussion). To illustrate these concepts and how they get applied, let’s see them in use in a particular sample industry schema (brainml.xsd, an XML Schema for neurological modeling defined at http://brainml.org ).

When we first open brainml.xsd in Visual Studio, this becomes the root of the XML Schema Set to be constructed and displayed in the Schema Explorer tree that comes up to help visualize the hierarchy of a schema set. Analyzing the external references (which must always be declared at the top before any globals are defined), we see the following two:

<xs:import namespace="http://www.w3.org/XML/1998/namespace" />

<xs:import namespace="urn:bml/brainml.org:internal/BrainMetaL/1" schemaLocation="citation.xsd"/>

This gets interpreted as a request to bring in schemas for the two namespaces “http://www.w3.org/XML/1998/namespace” and “urn:bml/brainml.org:internal/BrainMetaL/1”. In the second case, we are also given a location hint where to find an xsd file for this namespace (more on how the first one is resolved a bit later), so we bring in the citation.xsd file into the set and, analyzing it, we see the following externals:

<xs:import namespace="http://www.w3.org/XML/1998/namespace" />

<xs:import namespace="http://www.w3.org/1999/xlink" schemaLocation="xlink.xsd"/>

<xs:include schemaLocation="brainmetal.xsd"/>

The first import is like the one we had previously seen for the XML namespace and will get resolved the same way. The second again brings in a new namespace into the set and a new file (xlink.xsd) for this namespace. The third includes another file (brainmetal.xsd) that has the same namespace (“urn:bml/brainml.org:internal/BrainMetaL/1”) as the targetNamespace of the current file. This process gets repeated for the rest of the unprocessed files in the set, though no new references are introduced in any of these, so finally we end up with the following tree view in our schema explorer of the namespaces and files in the set.

clip_image002

So how did the xml.xsd file get found and associated to the xml namespace in the set? The answer is that, aside from schema location, Visual Studio also has other places that it looks in for schemas to resolve namespaces when a specific schema location is not provided (remember, the schemaLocation is just a hint to the schema processor, which can apply its knowledge of the environment to figure out how to resolve a namespace). Visual Studio will look in the current project, solution and even other open schemas to resolve schema references, and also comes preconfigured with a set of well known schemas, such as the schema for the xml namespace that is being referenced here.

So how would we know what the well known namespaces and associations that are available in a particular context are? This can be seen through the schema dialog. For example, in the case above, the property window for the brainml.xsd code buffer shows a “Schemas” property and, clicking on it, brings up the following dialog.

clip_image004

This dialog shows a table view of namespaces for which Visual Studio has a known association and the locations of the files that are known to provide schemas for those namespaces. The left hand column, labeled “Use”, allows us to control when or if these known associations are used. The default option is “Automatic”, which means use the schema if needed to resolve an import (such as the current scenario of finding a schema for the well known xml namespace). An option of “Use” says the schema is to be used in the current set. Note that this option is pre-selected for the files computed to be in the set; selecting it for a new file would essentially introduce a new root into the set computation described above. Finally, there is also a do not use option to allow us to exclude a file from a set that would otherwise be included in one of the above scenarios.

There are also two special values (localized strings that are not valid XML identifiers) used where a namespace name would normally appear in the schema explorer to help visualize and distinguish two error conditions that could arise in building a schema set. One is the “Not Found or Invalid” name that is used when the path specified in a schemaLocation is either not found (i.e. an include of a non-existent or non-readable file) or if the file is present but cannot be parsed as an XML schema (i.e. it is either not valid XML or we do not find a schema root element). For example, if we edit brainmetal.xsd in the set above to have the root element read “schemaInvalid” our schema explorer view of the schema changes as in the following diagram.

clip_image006

There is another special name that can appear in place of the namespace, and that is the “Unauthorized Zone” name. The files that appear under this name are files that were attempted to be imported or included into the schema set by a schema file in a different security zone that does not have permissions to access the zone that file is in. This is similar to the Internet Explorer policy whereby a web page cannot redirect or read from a location on the user’s machine or intranet (i.e. a different security zone) unless the machine’s zone policy has been configured to allow this. The schema processor is essentially acting as a proxy for the remote site when requesting included or imported files on their behalf and thus enforces the zone security policies that are in place. This prevents any possible attacks whereby processing schema externals can be used by a malicious external site to either force opening a file or be used in combination with other exploits to potentially post back or gleam information about the user system.

For example, imagine there is a schema available in an external web site named “Remote_Import_Local.xsd” that attempts to import a local file from the path “d:\schemas\Security\Local.xsd”. Even if this path exists in your local machine and there is a valid schema file there, it will not be included in the schema set (and will in fact not even be opened as part of building the set), and you will instead get the following view in the schema explorer.

clip_image008

The inclusion of Local.xsd in the “Unauthorized Zone” and the warnings in the error pane about not being able to resolve the schema location are an indication to the end user that the schema they were visiting attempted to bring in a schema from a zone that it is not authorized to access.

-Fred Garcia

SDE, XML Tools Team

1 Comments
Filed under:

We are happy to announce that we are releasing the sources for LINQ to XSD on CodePlex  at http://linqtoxsd.codeplex.com. LINQ to XSD allows you to program with strongly-typed classes  based on an XSD schema and was previously released on MSDN as an alpha  preview.

 

We have recently received  a  number of requests from customers who would like to use this preview in their projects and re-distribute it. We hope the CodePlex project will be able to satisfy the requirements of the customers who need to use this technology today. The community is welcome to submit their improvements and contributions on CodePlex and carry this project forward.

 

4 Comments
Filed under:

Complex XSLT stylesheets often contain several include and/or imports instructions. VS XML Editor has very limited support for such scenarios. (Most noticeable it lacks the concept of "primary stylesheet".)

 

In the XSLT Debugging session you may need to put breakpoint in the template that is defined in one of the referenced files and VS doesn't provide convenient way of opening these files.

Even more tricky is putting breakpoint on the built-in template rule. There is no way to find and open them in the VS.

 

VS 2008 SP1 has a solution for this problem but unfortunately it is not enabled by default. I hope VS 2010 would fix this.

This feature can be enabled by  setting to 'True' the registry key 'XsltImportTree' under 'HKEY_CURRENT_USER\Software\Microsoft\VisualStudio\9.0\XmlEditor'.

With this feature each time you start XSLT Debugger the XSLT import/include tree would be added to the solution explorer. To see it you need to have "Solution Explorer" tool window opened (View | Solution Explorer).

 

This is a sample screen XSLT Debugger in of VS 2009 SP with the feature enabled:

debugger screenshot 

I hope you would find this useful.

(See also http://www.tkachenko.com/blog/archives/000740.html)

 

Sergey Dubinets 

 

3 Comments
Filed under:

I've written in the past about XML and languages, and why you might be interested in being aware of the language associated with text.

Text with no language is just not quite there
Impact of text language on WPF  
Text, language and sorting

For dealing with languages, xml:lang is your friend, as you can tell from these older posts.

Something that is a bit special about xml:lang is that xml is a reserved namespace. From http://www.w3.org/TR/REC-xml-names/#xmlReserved

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. It MAY, but need not, be declared, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

Here is the code you can use to write an xml:lang attribute using an XmlWriter.

XmlWriterSettings settings = new XmlWriterSettings();

settings.Indent = true;

 

using (StringWriter textWriter = new StringWriter())

using (XmlWriter writer = XmlWriter.Create(textWriter, settings))

{

    writer.WriteStartElement("e");

 

    writer.WriteStartElement("t1");

    writer.WriteAttributeString("xml", "lang", null, "en-US");

    writer.WriteString("Hello, world!");

    writer.WriteEndElement();

 

    writer.WriteStartElement("t2");

    writer.WriteAttributeString("xml", "lang", null, "es-AR");

    writer.WriteString("¡Hola, mundo!");

    writer.WriteEndElement();

 

    writer.WriteEndElement();

    writer.Flush();

 

    Trace.WriteLine(textWriter.ToString());

}

Here is the traced output.

<?xml version="1.0" encoding="utf-16"?>
<e>
  <t1 xml:lang="en-US">Hello, world!</t1>
  <t2 xml:lang="es-AR">¡Hola, mundo!</t2>
</e>

If you are not using XmlWriter, but instead prefer to use LINQ to XML, it is even easier. This is because LINQ to XML has support for the xml namespace built in. Here is the code you could use to set the language on elements by adding the xml:lang attribute after creating the XDocument. Notice the lack of the xml:lang attribute in the source.

XDocument doc = XDocument.Parse(@"

    <e>

        <t1>Hello, world!</t1>

        <t2>¡Hola, mundo!</t2>

    </e>");

 

XNamespace xmlNs = XNamespace.Xml;

 

foreach (var element in doc.Root.Descendants())

{

    if (element.Name.LocalName == "t1")

        element.SetAttributeValue(xmlNs + "lang", "en-US");

    if (element.Name.LocalName == "t2")

        element.SetAttributeValue(xmlNs + "lang", "es-AR");

   

}

 

Trace.WriteLine(doc.ToString());

Here is the traced output.

<e>
  <t1 xml:lang="en-US">Hello, world!</t1>
  <t2 xml:lang="es-AR">¡Hola, mundo!</t2>
</e>

Enjoy!

Marcelo Lopez Ruiz

http://blogs.msdn.com/marcelolr/

 

2 Comments
Filed under: ,

Here's a good word of warning: even if an object "feels" read-only because you're not calling code to modify it, if it's not documented as safe for use from multiple threads, then you shouldn't risk it.

As an example, let’s look at XmlSchema and XmlSchemaSet. Initializing these has a cost associated with it, and so it's nice to be able to build them once and then reuse them. But you have to be very careful in doing this. The docs say that all instance methods are not safe for multiple thread usage, but you don't really use them directly during validation, so it's hard to tell from the outside what's safe and what's not.

In a nutshell, the only thing you can do that is safe for concurrent usage is to use a validating reader. Here's the sample code to try this out (for some reason, this "breaks" more on 64-bit machines, but it's unsafe on all architectures).

First, a little helper to create an XmlSchema.

private XmlSchema CreateSchema()
{
  string schemaText = @"<?xml version='1.0'?>
<xs:schema id='play' targetNamespace='http://tempuri.org/play.xsd'
 elementFormDefault='qualified' xmlns='http://tempuri.org/play.xsd'
 xmlns:xs='http://www.w3.org/2001/XMLSchema'>
 <xs:element name='myShoeSize'>
  <xs:complexType>
   <xs:simpleContent>
    <xs:extension base='xs:decimal'>
     <xs:attribute name='sizing' type='xs:string' />
    </xs:extension>
   </xs:simpleContent>
  </xs:complexType>
  </xs:element>
</xs:schema>"
;

  using (StringReader reader = new StringReader(schemaText))
  {
    return XmlSchema.Read(reader, null);
  }
}

Next, a simple XmlSchemaSet.

private XmlSchemaSet CreateSchemaSet(XmlSchema schema)
{
  XmlSchemaSet set = new XmlSchemaSet();
  set.Add(schema);
  set.Compile();
  return set;
}

Finally, some validation:

private void ValidateDocument(XmlSchemaSet set)
{
  string doc = @"<myShoeSize xmlns='http://tempuri.org/play.xsd' sizing='123' />";
  XmlReaderSettings settings = new XmlReaderSettings();
  settings.Schemas = set;
  settings.ValidationEventHandler += new ValidationEventHandler(settings_ValidationEventHandler);
  using (StringReader reader = new StringReader(doc))
  using (XmlReader x = XmlReader.Create(reader, settings))
  {
    while (x.Read()) { }
  }
}

private
int failCount;
void
settings_ValidationEventHandler(object sender, ValidationEventArgs e)
{
  System.Threading.Interlocked.Increment(ref failCount);
}

Now, armed with these, I will show you some code that is thread-safe, but that a single line reorder would cause to break.

XmlSchema schema = CreateSchema();
Thread[] threads = new Thread[10];
XmlSchemaSet set = CreateSchemaSet(schema);
for (int i = 0; i < threads.Length; i++)
{
  threads[i] = new Thread((x) =>
    {
      for (int j = 0; j < 1000; j++)
      {
        // If the CreateSchemaSet were here
        // instead of outside this would break!
        //
        // Don't add the schema to the
        // XmlSchemaSet from multiple threads!
        //
        // XmlSchemaSet set = CreateSchemaSet(schema);
        //
       
ValidateDocument(set);
      }
    });
}

Array.ForEach(threads, (t) => t.Start());
Array.ForEach(threads, (t) => t.Join());
this.Text = "Failure count: " + failCount;
 

The part before the thread creation runs on a single thread, and so there are no multi-thread concerns; the stuff inside the callback is happening on multiple threads at the same time. You can only use the set for validation here! 

Enjoy!

Marcelo Lopez Ruiz

http://blogs.msdn.com/marcelolr/ 

 

Converting from XmlDocument to XDocument has a number of benefits, including the ability to use LINQ to XML, use a much cleaner object model, get better name handling with XName and being able to use functional constructors. However, there are a lot of XmlDocuments out there, so what is the best way to convert a XmlDocument to an XDocument?

This question came up in the forums a little while ago, and I thought it might be interesting to do some comparisons.

I first came up with a few ways of turning an XmlDocument into an XDocument.

private static XDocument DocumentToXDocument(XmlDocument doc)
{
  return XDocument.Parse(doc.OuterXml);
}

private static XDocument DocumentToXDocumentNavigator(XmlDocument doc)
{
  return XDocument.Load(doc.CreateNavigator().ReadSubtree());
}

private static XDocument DocumentToXDocumentReader(XmlDocument doc)
{
  return XDocument.Load(new XmlNodeReader(doc));
}

Next I whipped up a function to time these with something quick and dirty. I make sure the past activity doesn't leave much in terms of leaving garbage, and I warm up the action a bit (I also warm up the Stopwatch methods, just in case).

private static long Time(int count, Action action)
{
  GC.Collect();
  for (int i = 0; i < 3; i++)
  {
    action();
  }

  Stopwatch watch = new Stopwatch();
  watch.Start();
  watch.Stop();
  watch.Reset();
  watch.Start();

  for (int i = 0; i < count; i++)
  {
    action();
  }

  long result = watch.ElapsedMilliseconds;
  watch.Stop();
  return result;
}

And finally, all together:

StringBuilder sb = new StringBuilder();
sb.Append("<parent>");
for (int i = 0; i < 1000; i++)
{
  sb.Append(" <child>text</child>");
}
sb.Append("</parent>");

string text = sb.ToString();
XmlDocument doc = new XmlDocument();
doc.LoadXml(text);

long docToXDoc = Time(1000, () => DocumentToXDocument(doc));
long docToXDocNavigator = Time(1000, () => DocumentToXDocumentNavigator(doc));
long docToXDocReader = Time(1000, () => DocumentToXDocumentReader(doc));
 

Note that the actual numbers don't matter much, as this is my laptop running a bunch of things in the background, in the debugger and whatnot, but the relative values are interesting to see.

These are the values I got (they vary a bit each run, but not by much).

  • Using OuterXml: 1973 ms.
  • Using a navigator over the document: 1254 ms.
  • Using a reader over the document: 1154 ms.

Not surprisingly, avoiding the creation of a big string just to re-parse it is a big win - save the planet, use less CPU power!

So if we like the reader option, what is a convenient way of encapsulating that? Well C# 3 extension methods aren't too bad.

Here is one way of writing the methods.

public static class XmlDocumentExtensions
{
  public static XDocument ToXDocument(this XmlDocument document)
  {
    return document.ToXDocument(LoadOptions.None);
  }

  public static XDocument ToXDocument(this XmlDocument document, LoadOptions options)
  {
    using (XmlNodeReader reader = new XmlNodeReader(document))
    {
      return XDocument.Load(reader, options);
    }
  }
}

Now, as long as the class is visible to the code you're writing, you can write code like this.

XmlDocument doc = new XmlDocument();
doc.LoadXml("<parent><child>text</child></parent>");

XDocument
xdoc = doc.ToXDocument();
var children = xdoc.Document.Element("parent").Elements("child");
foreach (var child in children)
{
  Console.WriteLine(child.Value);
}

Of course, if you could you would just start off from an XDocument - these address the cases where you already have an XmlDocument around and you can't just change all code to use XDocument.

One thing that I like about extension methods is that it helps bridge dependencies across libraries in a clean way.

Enjoy!

Marcelo Lopez Ruiz

http://blogs.msdn.com/marcelolr/ 

1 Comments
Filed under:

Under certain conditions opening Web projects from remote sites may become very slow. We've seen quite a few 'hang' reports submitted via 'Send Information to Microsoft' feedback (aka Dr. Watson). The reason is that sometimes XML editor (which is used when you edit web.config) may begin walking remote Web site file structure looking for XML schema files. Over a slower link this process may take minutes and IDE will appear unresponsive.

 

A hotfix is available at http://code.msdn.microsoft.com/KB958094

 

Irinel Crivat

Program Manager | Data Programability

Today, we officially released MSXML4.0 Service Pack 3 (SP3) on the Microsoft Download Center as a stand-alone installer in multiple languages.

MSXML4.0 SP3 is a complete replacement of MSXML4.0, MSXML4.0 SP1 and MSXML4.0 SP2 and contains a number of bug fixes to enhance security and reliability.

Approximately nine years ago, MSXML4.0 was released to the web. MSXML4.0 was superseded by MSXML6.0 six+ years ago and is only intended to support legacy applications.  After MSXML4.0 SP3, there are no future service packs planned.  Also, please note that SP2 support will end in November 2009. 

MSXML4.0 customers are recommended to upgrade to MSXML6.  MSXML6 has new functionality, performance and security improvements.  In addition, MSXML6 provides improved W3C compliance and increased compatibility with System.XML in .Net. Key changes introduced between MSXML4 and MSXML6 and migration are described in Upgrading to MSXML6.0.

MSXML6 is now available for all supported down-level platforms.  It is either “in-the-box” in Windows (e.g in XP SP3+ and Vista+) or can be downloaded from the Microsoft Download Center .

- MSXML Team

6 Comments
Filed under:

The new XML Schema Designer (see this previous blog post for a d/l location to the CTP) offers four views over XML Schema Sets: XML Schema Explorer, Graph View, Content Model View, and Start View.  This entry will focus on the Content Model View.  The Content Model View provides a detailed view of individual nodes in a schema set.

Basics of the view:

Q: What’s the purpose of the Content Model View?

A: The Content Model View (CMV) allows you to understand the specifics of a schema set. It lets you drill down into the content model of schema nodes, and will facilitate a deeper understanding of the schema set and the XML instance document the schema describes.

Q: How do you get to it?

A: You open an XSD in VS, drag nodes onto the design surface and switch the designer to the Content Model View using the toolbar embedded into the design surface. You can also switch to the CMV for a particular schema node from the Graph View or Schema Explorer, using the right click menu.

Q: What do you show on the CMV?

A: While you can only add global nodes to the CMV, you can drill down to the local nodes by using the expand/collapse button on the nodes. The content that is displayed in the CMV is the ‘compiled’ content for a schema node.
Elements, attributes, complex and simple types, groups, attribute groups are shown as nodes, while occurrence constraints (minOccurs, maxOccurs), derivation relations (extends, restricts), compositors (sequence, choice) are shown along with the nodes to which they apply. References are also shown in the CMV, but not as top level nodes – you have to drill down to get to them.

Q: What if I want to view more than one node, like in the Graph View?

A: The vertical “filmstrip”, titled Workspace, - on the left shows you all of the nodes in your workspace. The highlighted nodes are shown on the designer canvas. You can highlight multiple nodes to compare them side by side in the CMV.

Like in the Graph View, you can zoom and pan. You can also view the properties of the currently selected node in the Properties window.

Here’s a screenshot:

Here the CMV shows a complexType schema node, expanded to view the content model. Notice how the schema nodes can be expanded and collapsed, depending on what you’re examining at the moment.

- Rohit Eipe

SDET, DP XML Tools

 

In certain scenarios, it is important to be able to compare two XML trees for equivalence.  For example, if you are writing a web service that serves results of queries, and you want to cache query results so that duplicate queries use previously cached results instead of always accessing the underlying database.  However, the senders of those queries may potentially be using a variety of tools to generate the queries, and these tools may introduce trivial differences into the XML.  The intent of the queries may be identical, but XNode.DeepEquals returns false if you compare the XML that contains semantically equivalent, but trivially different queries.

This post describes an approach and presents code for normalizing LINQ to XML trees.  After normalizing XML trees, you can call XNode.DeepEquals with a greater chance that semantically equivalent XML trees will evaluate as equivalent.  The approach presented in this post makes use of the assistance that XSD can provide when normalizing XML trees.  If an XML document has been validated using XSD, and the XML tree has been annotated with the Post Schema Validation Infoset (PSVI), then we can use that PSVI to help normalize the tree.  Much of this post is based on the OASIS specification Schema Centric XML Canonicalization Version 1.0 (C14N), which describes a variety of ways to normalize an XML tree, including through the use of PSVI.

Let’s look at a few common cases where XNode.DeepEquals reports that equivalent XML trees are unequal.  In the following example, the two trees have exactly the same content.  The only difference is that one of them uses a default namespace, and the other uses the same namespace using a namespace prefix.  XNode.DeepEquals will report that the two trees are different.

XElement root1 = XElement.Parse(

@"<Root xmlns='http://www.northwind.com'>

    <Child>1</Child>

</Root>");

 

XElement root2 = XElement.Parse(

@"<n:Root xmlns:n='http://www.northwind.com'>

    <n:Child>1</n:Child>

</n:Root>");

 

if (XNode.DeepEquals(root1, root2))

    Console.WriteLine("Equal");

else

    Console.WriteLine("Not Equal");

 

Here’s another case where XNode.DeepEquals returns false:

XElement root1 = XElement.Parse(

@"<Root a='1' b='2'>

    <Child>1</Child>

</Root>");

 

XElement root2 = XElement.Parse(

@"<Root b='2' a='1'>

    <Child>1</Child>

</Root>");

 

if (XNode.DeepEquals(root1, root2))

    Console.WriteLine("Equal");

else

    Console.WriteLine("Not Equal");

 

These two are equivalent, but the attributes are not ordered.  However, because you can’t have two attributes in an element that have the same qualified name, we can sort the attributes by namespace and name, and then compare.

The situation gets more interesting when we consider normalizing a document that has been validated using XSD.  Consider the following simple schema:

<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>

  <xsd:element name='Root'

               type='xsd:double'/>

</xsd:schema>

Let’s say that we are comparing two small XML documents:

Document 1:

<Root>25</Root>

Document 2:

<Root>+25</Root>

These two documents have essentially the same content, but the ‘+’ before the 25 in the second document will cause the two documents to compare as not equals.  However, if we have first validated the documents and annotated the tree with a PSVI using the XDocument.Validate extension method, we know that the data type of the element is double, and can normalize the value of the element.  Once normalized, the two documents will compare as equals.

Going further, a schema can declare default attributes and elements.  If an XML document in one case has an attribute with the default value, and in another case, the XML document is missing the default attribute, the attribute can be added when validating, and the two trees will compare as equivalent.

For illustration, consider this schema and two documents:

<xsd:schema xmlns:xsd='http://www.w3.org/2001/XMLSchema'>

  <xsd:element name='Root'>

    <xsd:complexType>

      <xsd:simpleContent>

        <xsd:extension base='xsd:string'>

          <xsd:attribute name='ADefaultBooleanAttribute'

                         default='false'/>

        </xsd:extension>

      </xsd:simpleContent>

    </xsd:complexType>

  </xsd:element>

</xsd:schema>

Document 1:

<Root/>

Document 2:

<Root ADefaultBooleanAttribute='false'/>

The above two documents will compare as equivalent after validation and adding PSVI.

In the remainder of this post, I’ll present a list of areas where it would be possible to normalize a tree before comparison.  Then, I’ll discuss a simple implementation of normalization using LINQ to XML.

Normalization Issues

The following is a list of issues that can be addressed when normalizing an XML tree:

  • Insignificant white space should not exist in a normalized tree.
  • Namespace prefixes and the use of default namespaces should not be significant.  It is sufficient to compare qualified names while disregarding whether the namespaces are serialized by a prefix, or as the default namespace.
  • Missing default elements and attributes should be added to the XML tree when normalizing.
  • Values of elements and attributes of certain data types can be normalized.  Types that can be normalized include xsd:boolean, xsd:dateTime, xsd:decimal, xsd:double, xsd:float, xsd:hexBinary,and xsd:language.
  • The attributes xsi:schemaLocation and xsd:noNamespaceSchemaLocation exist only to give hints to a schema processor about the location of the schema.  We can discard these attributes when normalizing an XML tree.
  • We can order attributes alphabetically by namespace and name, eliminating insignificant ordering differences.
  • Comments and processing instructions are not semantically significant when comparing trees.  We can remove them when normalizing.

One important point about normalization is that you can define application-specific normalization rules.  It would be easy to modify the normalization code presented in this post to, say, look for a particular named complex type in the PSVI, and normalize that element per specific rules, perhaps converting attribute or element values to upper or lower case, or some other bespoke normalization.

LINQ to XML Implementation of Normalization

The approach that I took when normalizing is to generate a new, normalized XML tree (instead of modifying the existing tree to normalize it.)  This has the advantage that we can write this code in the pure functional style.  The resulting code will be small and easy to maintain.  In the example code that I present in this post, the pure functional transform to generate a new normalized XML tree is about 190 lines of code.  For more information on this style of cloning, see the blog post “Manually Cloning LINQ to XML Trees”.

Instead of optimizing for processing speed, I optimized for compactness and readability of code.  This is probably a good trade-off.  This code will be pretty fast in any case, and other processes in our solution, such as database access, can be multiple orders of magnitude slower, so it probably doesn’t matter if there are ways to write the code so that it will execute slightly faster.

XML Trees Must Validate before Normalization

The first and most important point is that this code relies on the XML validating successfully before normalizing.  The code allows you to specify no schema, but in this case, it only does the normalization that doesn’t require PSVI.  But if you specify a schema, and the code doesn’t throw an exception, then the document(s) were valid per the schema.

White Space

The default behavior of LINQ to XML is to discard insignificant white space when populating an XML tree.  No text nodes for insignificant white space are materialized.  This makes it easy for us – the code presented here assumes that insignificant white space is not in the XML tree.

Namespace Prefix Normalization

Because LINQ to XML has a simple model for namespaces, normalization is easy.  The full namespace is included in every XName object.  There can be XAttribute objects that contain namespace declarations, however, these “attributes” only have an effect upon serialization.  For the purposes of normalization, we can simply remove all XAttribute objects that are namespace declarations.  We can determine whether an attribute is a namespace declaration using the XAttribute.IsNamespaceDeclaration property.

Adding Default Elements and Attributes

One of the overloads of the Validate extension methods for XDocument allows us to pass a Boolean parameter that indicates that the XML tree be populated with the PSVI.  As part of this population of the PSVI, default elements and attributes will be added to the tree.  They will, then, be cloned in the new tree.

Normalizing Values of Elements and Attributes of Certain Data Types

For five data types (xsd:boolean, xsd:decimal, xsd:double, xsd:dateTime, and xsd:float), we can take advantage of one particular aspect of LINQ to XML semantics: LINQ to XML is lax when validating values that are passed to constructors, and is strict when serializing values.  For instance, if we know that an attribute is type xsd:double, we can create a new normalized attribute like this:

return new XAttribute(a.Name, (double)a);

 

And in a similar way, we can create a normalized xsd:double element like this:

return new XElement(element.Name,

    NormalizeAttributes(element, havePSVI),

    (double)element);

 

Elements and attributes of type xsd:hexBinary and xsd:language are case insensitive.  We can normalize by converting them to lower case.

Removing Comments, Processing Instructions, and the Attributes xsi:schemaLocation and xsi:noNamespaceSchemaLocation

This is straightforward – the code trims these while cloning.

Ordering Attributes by Name

While cloning the tree, it is straightforward to order attributes by name.

Normalization Issues Not Handled

Schema Centric XML Canonicalization Version 1.0 (C14N) describes a number of other possibilities for normalization that I’ve not handled:

  • When using an XML programming interface other than LINQ to XML, there are some things we could do to normalize entities from a DTD.  However, because LINQ to XML has eliminated entity support, and all entities are expanded before the LINQ to XML tree is populated, we can disregard normalization issues with entities.
  • Data stored in base64Binary can have white space interjected at very specific points.  Normalization could include making sure that this white space conforms to the specification.  I left this as an exercise for the reader.
  • This code does no character modeling normalization other than any normalization done by XmlReader when deserializing the XML.
  • An XML document can contain an element or attribute that contains an XPath expression.  This expression will probably rely on the use of particular namespace prefixes.  This code does not attempt to normalize XPath expressions that use specific namespace prefixes.  If the XML that you are normalizing contains XPath expressions, it will be necessary to add in the XAttribute namespace definitions so that the XML will serialize with the correct namespace prefix.  The code presented here doesn’t deal with these issues.
  • There are rules for normalization of white space within attributes for certain data types.  This code doesn’t do any normalization of this white space.
  • The C14N document states that if a complex element contains only child elements (mixed content is not allowed), and if the element validates against a model group whose compositor is xsd:all, then the order of child elements is not significant.  It could be possible to sort them by namespace and name when normalizing.  Initially, I wrote the code to do this normalization.  However, I think it’s possible that code can implement differing behavior based on ordering, so I elected to not implement this.

About the Code

This post presents two methods: Normalize and DeepEqualsWithNormalization.

Normalize

This method creates and returns a new, cloned, normalized XDocument.

Because it relies on schema validation, it normalizes an XDocument, not an XElement.  However, it is easy to create an XDocument from an XElement if you need to normalize an XML tree rooted in an XElement.

It is valid to pass null for the schema parameter, in which case the method will only do the normalizations that are possible without using PSVI.

Signature:

public static XDocument Normalize(XDocument source, XmlSchemaSet schema)

 

DeepEqualsWithNormalization

This method compares two XDocument objects after normalization.

It is valid to pass null for the schema parameter, in which case the method will only do the normalizations that are possible without using PSVI.

Signature:

public static bool DeepEqualsWithNormalization(XDocument doc1, XDocument doc2,

    XmlSchemaSet schemaSet)

 

Test Cases

In the attached code, along with the public and private methods to do the normalization, etc, there are about a dozen test cases that cause full code coverage of the normalization code.  The sample code runs each test case, and reports whether the new normalized trees are equivalent.

The Code

The following listing shows the code used for XML tree normalization and comparison, including an extensions class and the Normalize and DeepEqualsWithNormalization methods:

using System;

using System.Collections.Generic;

using System.IO;

using System.Linq;

using System.Text;

using System.Xml;

using System.Xml.Linq;

using System.Xml.Schema;

 

public static class MyExtensions

{

    public static string ToStringAlignAttributes(this XDocument document)

    {

        XmlWriterSettings settings = new XmlWriterSettings();

        settings.Indent = true;

        settings.OmitXmlDeclaration = true;

        settings.NewLineOnAttributes = true;

        StringBuilder stringBuilder = new StringBuilder();

        using (XmlWriter xmlWriter = XmlWriter.Create(stringBuilder, settings))

            document.WriteTo(xmlWriter);

        return stringBuilder.ToString();

    }

}

 

class Program

{

    private static class Xsi

    {

        public static XNamespace xsi = "http://www.w3.org/2001/XMLSchema-instance";

 

        public static XName schemaLocation = xsi + "schemaLocation";

        public static XName noNamespaceSchemaLocation = xsi + "noNamespaceSchemaLocation";

    }

 

    public static XDocument Normalize(XDocument source, XmlSchemaSet schema)

    {

        bool havePSVI = false;

        // validate, throw errors, add PSVI information

        if (schema != null)

        {

            source.Validate(schema, null, true);

            havePSVI = true;

        }

        return new XDocument(

            source.Declaration,

            source.Nodes().Select(n =>

            {

                // Remove comments, processing instructions, and text nodes that are

                // children of XDocument.  Only white space text nodes are allowed as

                // children of a document, so we can remove all text nodes.

                if (n is XComment || n is XProcessingInstruction || n is XText)

                    return null;

                XElement e = n as XElement;

                if (e != null)

                    return NormalizeElement(e, havePSVI);

                return n;

            }

            )

        );

    }

 

    public static bool DeepEqualsWithNormalization(XDocument doc1, XDocument doc2,

        XmlSchemaSet schemaSet)

    {

        XDocument d1 = Normalize(doc1, schemaSet);

        XDocument d2 = Normalize(doc2, schemaSet);

        return XNode.DeepEquals(d1, d2);

    }

 

    private static IEnumerable<XAttribute> NormalizeAttributes(XElement element,

        bool havePSVI)

    {

        return element.Attributes()

                .Where(a => !a.IsNamespaceDeclaration &&

                    a.Name != Xsi.schemaLocation &&

                    a.Name != Xsi.noNamespaceSchemaLocation)

                .OrderBy(a => a.Name.NamespaceName)

                .ThenBy(a => a.Name.LocalName)

                .Select(

                    a =>

                    {

                        if (havePSVI)

                        {

                            var dt = a.GetSchemaInfo().SchemaType.TypeCode;

                            switch (dt)

                            {

                                case XmlTypeCode.Boolean:

                                    return new XAttribute(a.Name, (bool)a);

                                case XmlTypeCode.DateTime:

                                    return new XAttribute(a.Name, (DateTime)a);

                                case XmlTypeCode.Decimal:

                                    return new XAttribute(a.Name, (decimal)a);

                                case XmlTypeCode.Double:

                                    return new XAttribute(a.Name, (double)a);

                                case XmlTypeCode.Float:

                                    return new XAttribute(a.Name, (float)a);

                                case XmlTypeCode.HexBinary:

                                case XmlTypeCode.Language:

                                    return new XAttribute(a.Name,

                                        ((string)a).ToLower());

                            }

                        }

                        return a;

                    }

                );

    }

 

    private static XNode NormalizeNode(XNode node, bool havePSVI)

    {

        // trim comments and processing instructions from normalized tree

        if (node is XComment || node is XProcessingInstruction)

            return null;

        XElement e = node as XElement;

        if (e != null)

            return NormalizeElement(e, havePSVI);

        // Only thing left is XCData and XText, so clone them

        return node;

    }

 

    private static XElement NormalizeElement(XElement element, bool havePSVI)

    {

        if (havePSVI)

        {

            var dt = element.GetSchemaInfo();

            switch (dt.SchemaType.TypeCode)

            {

                case XmlTypeCode.Boolean:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        (bool)element);

                case XmlTypeCode.DateTime:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        (DateTime)element);

                case XmlTypeCode.Decimal:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        (decimal)element);

                case XmlTypeCode.Double:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        (double)element);

                case XmlTypeCode.Float:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        (float)element);

                case XmlTypeCode.HexBinary:

                case XmlTypeCode.Language:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        ((string)element).ToLower());

                default:

                    return new XElement(element.Name,

                        NormalizeAttributes(element, havePSVI),

                        element.Nodes().Select(n => NormalizeNode(n, havePSVI)),

                        (!element.IsEmpty && !element.Nodes().OfType<XText>().Any()) ?

                            "" : null

                    );

            }

        }

        else

        {

            return new XElement(element.Name,

                NormalizeAttributes(element, havePSVI),

                element.Nodes().Select(n => NormalizeNode(n, havePSVI)),

                (!element.IsEmpty && !element.Nodes().OfType<XText>().Any()) ?

                    "" : null

            );

        }

    }

}

 

Eric White
http://blogs.msdn.com/ericwhite/

 

The new XML Schema Designer (see this previous blog post for a d/l location to the CTP) offers four views over XML Schema Sets: XML Schema Explorer, Graph View, Content Model View, and Start View.  This entry will focus on the Graph View.  The Graph View is a hyperbolic 2D browser over nodes and relationships in a schema set.

Basics of the view:

Q: What’s the purpose of the Graph View?

A: To give you an overview of your schema set: the global nodes in it and how the nodes are related.  This is a 5000 foot view of a schema set and should answer questions such as… how big is the schema?, what does the type hierarchy look like?, what’s the design pattern (Russian Doll, Garden of Eden, ...)?, etc. 

Q: How do you get to it?

A: You open an XSD in VS, drag nodes onto the design surface and switch the designer to the Graph View.

image

Q: Are all nodes and all relationships between nodes shown?

A: No.  In the CTP we just show Complex Types and Elements.  In current builds we show all global nodes.  For relationships we show relationships for a node’s content model up until we hit a local def. 

Q: Can you edit nodes on the design surface?

A: Not on the surface directly.  This is something the team has spent a fair bit of time on and we’re interested in your feedback:  How important is this to you?  What does it mean to edit a node to you too?  e.g. Do you want to just change node names or do you want to drag relationships around, change extension/restriction/etc?  What editing features are compelling?  Note you have tight integration with the XML Editor so you can tile an XML Editor instance with the designer and get support for SxS editing, viewing, navigating in XML Editor to specific nodes from the design surface, etc. 

The view is built with WPF and has nice zoom + pan capability.  We also show the properties of nodes using the Property Window.

Here are a few screenshots:

Graph view showing a type hierarchy (all types and relationships between them):

image

Graph view showing all globals and relationships from the brainml & brainmetal namespaces (note the mix of attrgroups, types, and elements and edges between them):

 image

Shoot me an email at timlav@microsoft.com or post any comments & feedback you have here.

More on the Start View next week….

Thanks!

- Tim Laverty

PM, Data Programmability Tools

The MSXML Team is pleased to announce that MSXML 4.0 Service Pack 3 (SP3) Beta is available for public testing. This new service pack includes a number of security bug fixes as well as reliability improvements.

MSXML4 SP3 is a complete replacement of previous service packs. The new service pack includes:

² A number of security bug fixes which provides a safer browsing experience.

² Reliability improvements.

MSXML 4 SP3  is a Download Center only release and is applicable to the following Windows Operation Systems:

· Windows 2000 SP4

· Windows XP SP2

· Windows XP SP3

· Windows Server 2003 SP1

· Windows Server 2003 SP2

· Windows Vista RTM

· Windows Vista SP1

· Windows 2008

We want to know what you think!  Please:

² Run your existing application against MSXML 4.0 Service Pack 3 (SP3) Beta to ensure the new service pack is compatible with your application with no behavior change.

² Send your feedback or file bugs to our product team through Microsoft Connect site. If you have not signed up yet, please do so - we are listed as "MSXML4 SP3 Beta" under the "Data Platform Development” site.

What you think is important for us. If you have any questions or comments or require any further information regarding the new service pack, please feel free to provide any feedback that you have.

-MSXML Team

The MSXML Team is getting ready to release the MSXML 4.0 Service Pack 3 Beta very soon!

MSXML4 SP3 is a complete replacement of previous MSXML 4.0 service packs. The new service pack includes:

- A number of security bug fixes

- Reliability improvements

The Beta will be available on Microsoft Download Center in the very near future.  The RTM release is scheduled in the next few months.

MSXML 4 SP3 is applicable to the following Windows Operation Systems:

•     Windows 2000 SP4

•     Windows XP SP2

•     Windows XP SP3

•     Windows Server 2003 SP1

•     Windows Server 2003 SP2

•     Windows Vista RTM

•     Windows Vista SP1

•     Windows 2008

What you think is important for us. If you have any questions or comments or require any further information regarding the new service pack, please feel free to reply on this blog.

 

- MSXML Team

In the Orcas SP1 release we released our XML Schema Explorer.  In the Visual tudio 10 PDC VPC we shipped our first rev of the next set of planned features for our XML Schema Designer.  Check out the Visual Studio 2010 and .NET Framework 4.0 CTP VPC here.  Over the next few weeks I’ll be posting in more detail about the features in the CTP.

- Tim Laverty

PM, Data Programmability XML Tools

Beth Massi and Yang Xiao did a great Channel 9 session that centered on XML Literals and the XML Schema Explorer. Check the session out here. One thing that doesn’t come across in the video is that the Explorer can be used with any XSD set, not just with XSDs associated w/ your VB project. See our previous blog posts here and here on the Schema Explorer for more detail. …and thanks to Beth & Yang for doing the video!

Tim Laverty

PM, Data Programmability XML Tools

More Posts Next page »
 
Page view tracker