Prettification of XML Serialization within Web Services

This came up yesterday on an internal mailing list.

A colleague remarked that, following best practices in modular schema design results in bloat of serialized messages, particularly with namespace attributes. This issue manifests itself when using .NET Web Services, either client or server, either .NET 1.x or 2.0.

Was this person right? Yes and No. This bloat can happen, but it can be avoided pretty simply in .NET if you know the knob to twist. Before I show you the knob I want to explain the problem a bit more.

The Schema and the Code

Consider the case where there is a common schema that is used across enterprise. As a simple example, an address schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

                  targetNamespace="urn:my-enterprise:Basics"

                  elementFormDefault="qualified" >

 

      <xs:complexType name="Address">

            <xs:sequence>

                  <xs:element name="Town" type="xs:string" />

                  <xs:element name="Street" type="xs:string" />

                  <xs:element name="Number" type="xs:int" />

            </xs:sequence>

      </xs:complexType>

</xs:schema>

If you are following a schema-first design, then you generate your data types from this XSD. Using the Xsd.exe tool in .NET, you would get this sort of class declaration:

  [System.Xml.Serialization.XmlTypeAttribute(Namespace="urn:my-enterprise:Basics")]

  public class Address

  {

    public string Town;

    public string Street;

    public int Number;

  }

Each business unit in the enterprise defines services and messages in its own namespace, deriving from the general definitions from the canonical schema namespace. For example, an AccountHolder element:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

                targetNamespace="urn:my-unit:AccountTypes"

                  xmlns:s1="urn:my-enterprise:Basics"

                  elementFormDefault="qualified">

 

      <!-- the location is a hint only -->

      <xs:import namespace="urn:my-enterprise:Basics" 

    schemaLocation="Basics.xsd"/>

 

      <xs:element name="AccountHolder">

            <xs:complexType>

                  <xs:sequence>

                        <xs:element name="Name" type="xs:string" />

                        <xs:element name="DateOfBirth" type="xs:date" />

                        <xs:element name="HomeAddress" type="s1:Address" />

                  </xs:sequence>

            </xs:complexType>

      </xs:element>

</xs:schema>

If you generated code for this type, you'd get something like this:

  [System.Xml.Serialization.XmlTypeAttribute(Namespace="urn:my-unit:AccountTypes")]

  [System.Xml.Serialization.XmlRootAttribute(Namespace="urn:my-unit:AccountTypes", IsNullable=false)]

  public class AccountHolder

  {

    public string Name;

    [System.Xml.Serialization.XmlElementAttribute(DataType="date")]

    public System.DateTime DateOfBirth;

    public Address HomeAddress;

  }

We know about the impedance mismatch problem between Schema and Code, but in this example, the schema is pretty simple and so the problem does not rear its ugly head. We can use the generated classes to serialize and de-serialize instances to and from XML. Cool.

Prettification during Explicit Xml Serialization

Let's say we use the AccountHolder element in an scenario where we just want to manually serialize it into XML. The XML Serialization capability that is built-in to .NET makes this easy. The output XML looks like this:

<AccountHolder xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:my-unit:AccountTypes">

  <Name>Smythe</Name>

  <DateOfBirth>2006-01-05</DateOfBirth>

  <HomeAddress>

    <Town xmlns="urn:my-enterprise:Basics">Willingham</Town>

    <Street xmlns="urn:my-enterprise:Basics">Ryland</Street>

    <Number xmlns="urn:my-enterprise:Basics">123</Number>

  </HomeAddress>

</AccountHolder>

The XML is well-formed and valid, and can be "de-serialized" by some other application, somewhere else, running on some other platform. Maybe a Java app using XmlBeans or JAXB, etc. The code fragment that would produce this XML is here:

  XmlSerializer s1= new XmlSerializer(holder.GetType());

  s1.Serialize(System.Console.Out, holder);

  System.Console.WriteLine("\n");

But, you'll notice the output XML is not the prettiest or leanest it could be. THE DREADED BLOAT. You can see the repetition of xmlns in the serialized form. The AccountHolder element and its sub-elements (including HomeAddress) use an xml namespace of "urn:my-unit:AccountTypes", but the child elements of HomeAddress are defined in the namespace of "urn:my-enterprise:Basics". The result is, the explicit declaration of the XML namespace for each of those elements. You can also see the inclusion of xsi and xsd prefixes, which are not used in the document.

If we are flaming XML aesthetes, we can tweak the format using the .NET serialization classes to prettify it. In particular, if we pass a namespace collection to the serializer using this overload, we can optimize the XML to look like this:

<AccountHolder xmlns:b="urn:my-enterprise:Basics" xmlns="urn:my-unit:AccountTypes">

  <Name>Smythe</Name>

  <DateOfBirth>2006-01-05</DateOfBirth>

  <HomeAddress>

    <b:Town>Willingham</b:Town>

    <b:Street>Ryland</b:Street>

    <b:Number>123</b:Number>

  </HomeAddress>

</AccountHolder>

Exactly equivalent, but leaner (35% fewer bytes), and easier on the eyes. It can still be validated against the original XSD, and it can still be "de-serialized" by an app running on a non-.NET platform. Of course, it goes without saying a .NET app could also de-serialize that XML. The code that produces this leaner XML is here:

  XmlSerializerNamespaces xmlns = new XmlSerializerNamespaces();

  xmlns.Add("b","urn:my-enterprise:Basics");

 

  XmlSerializer s1= new XmlSerializer(holder.GetType());

  s1.Serialize(System.Console.Out, holder, xmlns);

  System.Console.WriteLine("\n");

Prettification within Web Services?

Now, let's consider the scenario where we use these types within a Web Service. Maybe we are writing a ASP.NET (ASMX) Service using those types, or maybe we're writing a client that access an external service that uses those types. We are doing WSDL First because we are smart and like interop, so it really does not matter what platform the external service is running on, or what language it is implemented in. It's all XML.

In this case, the XML serialization is done implicitly by .NET. This is good, because it saves labor. This affects our prettification approach though, because we are no longer instantiating an XmlSerializer and explicitly calling a Serialize() method. Instead we are calling a proxy method. At the application layer, there is no XmlSerializer object exposed, and therefore the app cannot interact with it and specify which XmlSerializerNamespaces to pass in.

Which means, we are now back to the big ugly XML, instead of the lean, pretty XML. It looks like this:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <soap:Body>

    <getAccountHolderResponse xmlns="urn:my-unit:AccountsServiceMessages">

      <AccountHolder xmlns="urn:my-unit:AccountTypes">

        <Name>Smythe</Name>

        <DateOfBirth>2006-01-05</DateOfBirth>

        <HomeAddress>

          <Town xmlns="urn:my-enterprise:Basics">Willingham</Town>

          <Street xmlns="urn:my-enterprise:Basics">Ryland</Street>

          <Number xmlns="urn:my-enterprise:Basics">123</Number>

        </HomeAddress>

      </AccountHolder>

    </getAccountHolderResponse>

  </soap:Body>

</soap:Envelope>

(Don't get me started on the cultural bias that says lean=pretty for people. This is a technical blog, so we'll save that for a different venue.) By the way, this goes beyond mere aesthetics. Leanness and readability can be an issue if your network is overburdened or if you are logging and auditing messages, or even tracing messages. Most real-world XML schema are going to have much more data, more nesting, and longer namespace names. All of which means, you could be spending a large portion, maybe 30% or more, of your messages just with xmlns="...".

But wait! There's a way. By modifying the type definitions in code, we can tickle the XmlSerializerNamespaces that is used for a given type. Simply modify the generated type so it looks like this:

  [System.Xml.Serialization.XmlTypeAttribute(Namespace="urn:my-enterprise:Basics")]

  public class Address

  {

 

    [XmlNamespaceDeclarations]

    public XmlSerializerNamespaces namespaces;

 

    public Address()

    {

      namespaces= new XmlSerializerNamespaces();

      namespaces.Add("b", "urn:my-enterprise:Basics");

    }

 

    public string Town;

    public string Street;

    public int Number;

  }

Then, use the type as normal. The on-the-wire XML you will get will look like this:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <soap:Body>

    <getAccountHolderResponse xmlns="urn:my-unit:AccountsServiceMessages">

      <AccountHolder xmlns="urn:my-unit:AccountTypes">

        <Name>Smythe</Name>

        <DateOfBirth>2006-01-05</DateOfBirth>

        <HomeAddress xmlns:b="urn:my-enterprise:Basics">

          <b:Town>Willingham</b:Town>

          <b:Street>Ryland</b:Street>

          <b:Number>123</b:Number>

        </HomeAddress>

      </AccountHolder>

    </getAccountHolderResponse>

  </soap:Body>

</soap:Envelope>

That's a 12% reduction in message size in this simple example, but the potential is much larger for more complex messages with more nesting of types.

That's all for now. Keep it lean and mean!

-Dino

[Update: as several commenters have pointed out, one can use the partial classes added in .NET 2.0 to eliminate the need to modify generated source code.  a very good point - see this post for details.]