Welcome to MSDN Blogs Sign in | Join | Help

Welcome to The Metaverse

Navigating the service-oriented, identity aware metaverse

News

  • Disclaimer:
    The content of this blog are my own personal opinions and do not necessarily represent Microsoft's position, commitments or strategy. In addition, my thoughts and opinions often change, and as a weblog is intended to provide a semi-permanent point in time snapshot you should not consider out of date posts to reflect my current thoughts and opinions.




    Add to Technorati Favorites
Exchanging Data - shipping typeless containers

I have had several discussions over the last couple of weeks with various parties interested in whether or not it's wise to exchange data contained in datasets between a caller and a service.

I really dislike using datasets to pass information back and forth between a caller and a service. Why would you want to define a service's contract to say “hey, you can pass me whatever data you like and I'll return you some data that you have no idea what it is until you start to pick it apart“. That is the most non-commital form of design that there is. In my experience, the main reason that designers/developers do this is because they haven't sat down and thought through their application's design sufficiently and to me that's a warning signal that has always served me well - in almost all applications I have come into contact with, the signal has correctly indicated a system with several difficulties.

What kinds of problems am I referring to? Here are a few that spring to mind:

  1. Correctness: In systems which predominantly pass untyped data (ie: in Datasets):
    • The flow of information around the system is often inferred and is liable to be highly unpredictable and error prone.
    • In such systems, it is often the case that the dev team have to implement all manner of framework and substrate technologies to make sense of the non-explicit, “fluffy” nature of the application.
    • Debugging such applications is usually very complex and difficult and the most likely resolution is incorrect understanding/assumptions about the nature of a particular process or part of the app.
  2. Datasets impact performance: Datasets are not represented tersely when serializsed. A dataset contains the XML schema of the data it contains along with the data itself and also serializes a bunch of internal dataset specific information. I hear that the Data team here are working on making datasets far more efficiently serializable, but I still maintain the same design issues I discussed above
  3. Datasets limit interoperability: When you serialize a dataset onto the wire, the consumer (be it the service being called or the caller recieving the return of a method) must understand how to de-serialize the wire-format back into something it can parse and re-construct internally. If the reciever is not a Microsoft dataset, then it will have to do a lot of work to tease the data out of the overall XML document from the wire.

A more wire-efficient and typesafe alternative is to design [Serializable] structs for your services to exchange information and to pass these serialized structs around on the wire. These structures clearly define what types of information are expected / returned from each action and vastly reduce incorrect assumptions resulting in more reliable code that is easier to develop and maintain.

I have also had discussions about whether it's wise to engineer services with methods such as:


void DoWork(string action, string data) 
 

As I am sure you can already guess, I dislike this approach too! While the size and interop issues have been remedied by using serializable strings, such an infrastructure is essentially laying a dumber layer of SOAP over SOAP! Do you really want to build a message dispatcher? If so, shouldn't you be implementing a SOAP message dispatcher. But then, wouldn't you be reimplementing ASMX/WSE/Indigo? Applications should use the platform on which they are built to the fullest, utilizing all the services and features the platform provides. All that you'd be doing in implementing something such as this is to take on the burden of implementing a message processor and dispatcher. Is that your application's role? If not, I suggest entierly avoiding this pattern.

Also, from a consumer's perspective, how do I call your actions? Where is the list of valid action types? And what about the data? What do I pass to you for any given action? It's too arbitrary and error-prone. Say what your methods do. Say what types of data you want to exchange. Let the platform you choose do the work of making those calls happen.

 

Posted: Friday, July 30, 2004 12:38 PM by RichTurner666

Comments

Roaan Vos said:

Regarding the DoWork (string action, string data), how would you implement the following if the mentioned DoWork is not acceptable.
I have a "service" that accepts a document (parameters) but my service is developed in such a way that it can cope with different versions of the document i.e. it can work with old clients and new clients without the old clients having to renegotiate their contract with the service. Since you dot not recommend passing a string for the data (e.g. XML document), how would one go about doing it ?
PS. Isn't one of the "features" (for lack of a better word) of web services (cross platform integration via SOA) the idea that we move away from actual types? That's the idea that I (maybe incorrectly) got from a lot of Don Box's presentations and that's why there is a move towards schemas.
# July 30, 2004 1:05 PM

anon said:

Although I agree in many respects, one big drawback to the type of design espoused in this post is is its rigidity over time.

What happens when version 2 of your custom data type needs to be passed to the old web service? If your service interfaces are that tightly coupled then it might be difficult--even if the web service could safely ignore the new attributes, it wouldn't get the chance because the type checking would get in the way.

Plus, it's probably worth pointing out that the appropriateness of typeless containers depends (as do so many things) on the situation. I can imagine numerous cases where the whole point of the service may be to expose amorphous information that might still have some structure of its own. In such cases, a DataSet (serialisation weight aside) might be a fine choice.
# July 30, 2004 1:11 PM

Bob Beauchemin said:

Your posting describes the DataSet correctly, but then refers to it as a typeless container.

You said: "A dataset contains the XML schema of the data it contains along with the data itself..."

This makes it a *self-describing* container, not a typeless one. Relational resultsets (which is what DataSets represent) have always been self-describing, strongly typed, "arrays of structures", if you want to think of them that way.

If you want to restrict your output to a *specific* resultset (eg, output of "select au_id .... from authors"), then include the schema for that *instance* of the resultset in the WSDL (your strongly typed structure) and return an array of them. But that is (an instance of) a DataSet. I've posted about this in "About Web Services and Schema + Any".
# July 30, 2004 2:04 PM

Fumiaki Yoshimatsu said:

Share schema not "your type". That is the tenet and DataSet is doing exactly it (though when serialized, it is schema-invalid because of diffgrams). You must always understand how to deserialize what was sent, if you ever want to deserialize it. In many situation we don't want to deserialize XML (or deserialize the way you want me to), so that is not "DataSet is not interoperable" kind of problem. DataSet is useful in many situations including cross tier data transfer, because of its "untypedness", just as untyped language is useful in many situations. Note that it is untyped for typed-heads. For others, it is not untyped, just as Bob told us above.
# July 30, 2004 3:38 PM

Rich Turner said:

Roan - don't confuse type within your development toolset with type on the wire. Yes, we are moving to a world where we share schema and not types. Types are how you represent a data entity within your applications. Schemas define what data looks like on the wire. If you need to represent a document on the wire, you could either define a document schema and generate the necessary .NET classes from the XSD, or you could declare the type as an XmlDocument etc rather than Dataset of string.

Anon - your questions are related to versioning. While I won't be able to do this subject justice right now, I will post a more detailed response at another time. However, one needs to recognise that if you update a published schema and add new mandatory elements, then older code will not be able to generate/consume this new contract without modification. If you update an existing contract with optional elements, then your legacy apps and services will usually ignore/discard these additional elements. Note, however, that if the app needs these elements to be processed / operated upon, then you will again need to update the code. A strategy to allow more flexibility would be to design your schema to include optional fields or open content types. More on this in another post.

Bob - I call the dataset "typeless" because without parsing the message payload, I can't easily determine the thing that the payload contains. Imagine parsing the wire-structure of an ADO Dataset on a non-Microsoft platform.

The XSD included inside a dataset specifically describes the information that the dataset contains which may be a superset of what the calling code was expecting. If you know you're returning a list of customers, then create a customer struct and return an array of them rather than returning a dataset that could contain completely irrelevant information.

If you do return strongly typed data, there's little chance your methods will be invoked unless the wire-payload match the schema you built against.

I understand that sometimes apps need to exchange arbitrary information such as a Word document, but prefer to create busienss data schemas and corresponding structures rather than just pass strings or datasets.
# July 30, 2004 4:30 PM

Bob Beauchemin said:

You can't tell (easily or otherwise) what *any* XML data in any payload contains unless you either:
1. Have a schema for it that arrives at design time
2. Have a schema for it that arrives at execution time
3. Code intimate knowledge of it into every program that uses it.

With DataSet you have one more option (#2) than with any other way. And you can accomplish 1 and 3 with relative ease.

There's very little irrevelent information in the DataSet (although it would be nice to have a version that replaces isDataSet="true" with an xsi:type hint) unless the end user either doesn't understand the schema (the ms: specific annotation schema is posted now) or chooses to map it to a less accurate representation than the relational resultset that it represents. If you choose the latter, either because your platform doesn't have such a representation (and you don't build one) or because you've removed information is the sending, you'll have to code the logic into each end-user program (both sides, if you map it to serialized XML) yourself. It's a representation of implied semantics within the data itself.
But it would also be nice if the DataSet had a "bare mode" for this purpose. A "WSDLSIMPLE" mode, if you would.
# July 30, 2004 6:08 PM

Roaan Vos said:

Rich - That is excatly my point. The DoWork is the method that retrieves (gets) that data on the wire. If we use the (very usefull) web services features built into .net, the framework does all the type "creation/conversion" from the schema for us, calls the applicable method, but that means we can't handle (for example) the different versions of the document we want to handle (my and anon's issue).
Looking forward to your post on "...A strategy to allow more flexibility..." (as replied to anon) that does not have a DoWork (string method, string data, string schema) method
# July 31, 2004 2:15 AM

drebin said:

I had flashbacks of making everything a "Variant" in the old VB days.. :-(

A loosely-type system is a system INVITING bugs and unexpected behaviour.

Loosely-typed=bad
# July 31, 2004 7:47 AM

John Cavnar-Johnson said:

I see that drebin has boiled down your post to its real content: loosely-typed = bad. I disagree with that sentiment. Let’s take a look at your rhetorical question: “Why would you want to define a service's contract to say hey, you can pass me whatever data you like and I'll return you some data that you have no idea what it is until you start to pick it apart”. First, that’s a blatant mischaracterization of what it means to define a service contract that accepts or returns a dataset. As Bob Beauchemin has pointed out, a dataset is a self-describing container, not a typeless container. A dataset is not a variant. A fairer description of a service method that accepts a dataset and returns a dataset would be: pass me a self-describing data container and I’ll return a self-describing data container to you. There is a huge semantic difference there and I suspect that the reason you can’t see that is that you come from the perspective of a statically typed language. You only see the untyped parameters. You trot out all the same arguments that the advocates of static typing use: dynamic typing is for people who can’t think clearly, dynamic typing leads to bad performance, static typing guarantees a basic level of correctness.

The world is a dynamic place and our systems need to be flexible and forgiving. Meditate for a moment on the meaning of the word: service. It means work done for others, as a servant does. The others, in this case, are not the units of code that call the service, but the people who use the system. They are the ones who determine the requirements. If they are best served by a dynamic and flexible system, then that’s what we should build. Advocates of strictly enforced static typing always mistake brittleness for reliability. Static typing just guarantees that the slightest change in the data formats will break the system. It does relieve the programmer of a small burden, but it comes at a price. Whether that price is significant depends entirely on the system requirements.

You make some specific accusations against systems that pass datasets. First, you say that information flow is often inferred, and therefore highly unpredictable and error prone. Oddly enough, that sounds like a pretty accurate description of most business processes. Perhaps these systems are just accurately modeling the processes they support. If that’s true, how is your approach more “correct”? I’m not saying that the dataset approach is always right, just that it can sometimes be the best approach (you’re making the broader claim, that it’s the wrong approach). In the cases where it’s the right approach, re-architecting the system in a more statically typed manner simply results in a system that fails to meet the needs of its users.

Next, you say that developers have to implement a lot of code to deal with the system’s dynamic nature (which you describe as “fluffy”). This is quite true, but my response is, so what? If that code meets a real requirement of the system, why is it automatically suspect in your mind? There’s certainly less code in DOS 6.0 than in Windows Server 2003, but I wouldn’t assert that DOS is a better OS. As for you comments on debugging, they don’t match with my experience. I’ve seen just as many problems caused by developers who believed that static typing ensured the correctness of their design, when all it really did was get the compiler to shut up.

The performance of datasets (especially large ones) is problematic, but the performance of any large structure serialized to an XML DOM is going to be an issue in a performance-bound application. Fortunately, most applications aren’t performance-bound in this manner. And your comments about the interoperability of datasets are quite laughable, given that your suggested alternative is to use serialized structures. .NET structures serialized via the XML Serializer are just as limited in their interoperability as datasets.

I’m not arguing that dynamically typed systems are always better, just that by dismissing them out of hand, you’re limiting the quality of your solutions. Ultimately, the dynamic vs. static typing argument is another artichokes vs. roses argument. That’s my term for all the “mine is better than yours” junk that developers like to argue about (Windows vs. Linux, J2EE vs. .NET, Remoting vs. Web Services, VB vs. C#, Python vs. Perl, ad infinitum).
# July 31, 2004 6:03 PM

Scott Stewart said:

The shape of a message
# August 6, 2004 12:51 AM

Brendan Tompkins said:

Good SOA and Indigo Blog
# August 20, 2004 9:52 AM

Raph said:

Gosh!

Why are we talking data here? We should be passing interfaces which define the encapsulated actions to take. Then the backing structure is, frankly, less relevant. Passing lots of XML around as a "magic container" is bad, whether that's a data set or a command as XML. This can ease the deployment burden, too.

A good service design will not require re-architecting. It may require a little incremental change as new requirements are realised. So be it. IT systems evolve to match business need, and no one individual or team can determine their shape or direction. And a SOA does NOT need to be SOAP or web services, altohugh it may help.

To conclude, though, I'd be very concerned about any remote interface which is exposed in terms of a "command action" or works with a "magic container" which is data-centric. (One exception, though - the UnitOfWork pattern). However, web services are very data-centric - they excel at passing XML documents. But as a RPC? No - use XML-RPC for that...

Just to whet your appetites, we're using ContextBoundObjects to implement AOP with .NET. This allows us to have cross-cutting logging and security code on our services in one place...
# August 24, 2004 1:04 AM

Rich Turner said:

I don't follow what you mean by "we should be passing interfaces". Are you proposing that I effetively recieve back a pointer or object reference to an object at the server which I have to call to retrieve the name, surname, account number, etc? If so, this is distributed system suicide. The performance of such systems is dire.

We're talking data because we're discussing how to send/return data to/from a service most effectively. For example, if I ask a service for a list of customer account details, I should expect back a collection of data that represents the requested information. Should this data come back as a wire-represented struct that can be validated by an agreed schema (note - I haven't mandated XML here although it's an obvious candidate), or should the data be returned as a serialized representation of a platform specific runtime object? The point of this blog entry was to propose that the former should be the default approach for the majority of scenarios.

When it comes to versioning, sure, some internal or small incremental interface changes can often be made to an existing service, but sometimes radical change is needed necessitating new interfaces and therefore new services. Some propose that the way to eliminate such interface change is to publish services with few methods such as "string DoWork(string request)". The problem with such approaches is that the developer then has a heck of a time coding against this service (just how do I format the request, how do I decode the responses, how do I implement the state-machine that executes incoming requests) etc. and over time, such services mutate into horrible, horrible monsters.

And the same can be true of passing other forms of typeless container - such as DataSets, Variants, etc.
# August 24, 2004 12:45 PM
Anonymous comments are disabled
Page view tracker