Tim Mallalieu's Blog.

Just a PM's random musings on data, models, services...

Thoughts on data in a world of services and process...

Thoughts on data in a world of services and process...

  • Comments 3


What does the data story look like in a service oriented world? This is something I have been worrying about for some time. In fact, I would go as far as to say it keeps me up at night. The reason I moved from the MBF team to the Data Programmability Team in Microsoft was so that I could explicitly spend my time worrying about this problem. The move to service orientation raises a number of questions, some of which have been answered some of which are left somewhat open ended. We of course have the four tenets of SO:

• Boundaries are explicit
• Services are autonomous
• Share schema and contract, not class
• Compatibility based on policy

From a Microsoft perspective, the Indigo team has been a thought leader and an implementation leader in driving infrastructure capabilities consistent with the four tenets. Beyond the fundamental notion of service orientation we now have hype and substance around application and enterprise architectures that compose on top of the fundamental services infrastructure. We have the notion of an Enterprise Service Bus, the notion of process oriented applications, composite application, event driven architecture and more. Getting back to what I spend time thinking about, however, is the way that we look at data in a service oriented world. Here are some random musings about what I think data looks like in a service oriented world…

Qualities of data in a world of services and process

Let’s think about data a bit, Pat Helland wrote a paper some time ago entitled Data on the Outside vs. Data on the Inside, Maarten Mullender wrote an article entitled CRUD, only when you can afford it, the analysts write about creating autonomous services in one article while talking about data virtualization and entity aggregation in others. So there really seems to be a heck of a lot of thought around data but have we done anything as a company to have a unified solution to really helping customers with data in a service and process oriented world. In a naïve way one could say that all we need to do is to listen to customers and evolve ADO.NET accordingly. But is this sufficient? It could be, if we make the right investments and innovations and listen, not only to customers but to our existing product groups and the market to build a roadmap for what data access will look like 3, 5, 10 years from now. Looking into my crystal ball based on my interpretation of customer, product team and market trends information I think that the qualities of data to come are along the following lines:

1: Conceptual Models that are store and location agnostic

So you have a number of autonomous systems and you want to provision processes and applications across autonomous systems. You want to define a common set of business entities that can participate in the processes and be consumed by the applications. How do you do this? Today we talk about canonical schemas and entity aggregation. It would be nice, however, to be able to have a notion of a conceptual model that is a first class construct in our data programming infrastructure and which is backed by metadata about any items defined in the model so that processes and applications can reason about the items. It would also be nice if this conceptual model was sufficiently expressive to be able to describe data from the “common” persistence models (relational, nested relational, xml) and could indeed describe a model which was the aggregate of concerns in multiple persistent forms.

Being able to define such a model and being able to reason about items in terms of the model is a good first step. Having this model surfaced in ADO.NET is a great second step. The next concerns with the model would be to consider location independence. I should be able to define a model with in an autonomous system and have an external model which is meaningful for one or more processes that span systems. The external model can be a trivial projection of the internal model or may have a rich transformation (mapping).

Finally the simple existence of a model should facilitate a set of services related to the model such as the ability to reason about the model and its items and query and interact with instances of items defined in the model. The location and specifics about the underlying data source should be a separate concern from the model. Of course, there should be ways to impose constraints on data so that service boundary and location specific semantics related to the data may be surfaced but these decorations and services are orthogonal to the existence of the model.

2: Late bound aggregations of state with external behavior

Consider composite applications and business processes again, the data that exists in process models and applications of the future shall be described declaratively and will be composed contextually. The shape of a customer entity may have all of its order history in one context and may have just the contact information in another context. These shapes of the customer entity are contextual projections of the conceptual model. The mechanisms that surface this data in a process step or in a client UI will have some context and some model (including declarative rules and intent) but will not likely be tightly coupled or early bound objects. Indeed, even though there will be a notion of type (such as customer) as the shape of the type is contextual and late bound CLR types, as they exist today are not a good fit for encapsulating this kind of state.

A related point, and one that I struggled with originally as a former OO bigot, is that apps of tomorrow will express functional intent on data and will not be modeled as objects with state and behavior. The coupling of state and behavior as in OOAD does not lend itself well today to this model of late bound, location independent aggregations of state. Instead, intent is expressed at stages of processes (be this a UI process or a workflow that is triggered by a state change in the model). The intent that is expressed may act upon local data or may send a message for fulfillment elsewhere. Note: I am not saying that OO is dead, within service boundaries or simple client applications OOAD will still have a role, it is really in this world of processes and services that some of the OO factoring does not provide the best fit.

3: In memory aggregation and query processing of data

So now we have conceptual models, we have metadata to reason about items in a model and let us assume we can realize the retrieval of data from stores and services. The next step is to have a first class disconnected model in which this data is aggregated and where one can express queries about this data. It would be nice also if the queries could be expressed in declarative and imperative manners. Of course the ability to aggregate and query in memory is only the first step, one would likely want to be able to persist this data locally; it would also be nice to be able to have collaboration schemes with peer-wise replication and synchronization of data state based on policy and process.

4: Fundamental data semantics

Conceptual models and aggregated data services are all fine but we still have to deal with explicit data semantics based upon the nature of the data that one is dealing with. If we can decorate data in a manner that expresses the type of the data that it is we can determine what kind of services are available. Here are some thoughts as to the types of decorations that would be meaningful:

- Authoritative data

Data which one is the authoritative owner of.

- Reference data

Data which one retrieves form elsewhere and which can only be used for reference purposes.

- Transient data

Data which will not be persisted in any form but can be used in the context of an activity.

- Transactional data

Data which participates in some form of OLTP activity

- Message

Data which has been received in the form of a message, reference and transactional data will often first be messages which may be transformed into some useable form (either some logical structure or actual objects).

5: Evolutionary

If someone were to build a data platform where one could realize these qualities of data it would be ideal if the abstractions that a developer used to interact with this data were an evolutionary step on the current data access stack (ADO.NET). We would want developers to be able to opt into the new services that are provided while being able to preserve existing investments and application portfolios.

These thoughts are just half-baked opinions on what the qualities of a data platform are. Clearly there needs to be more thinking and interaction with customers and product teams but at first glance these seem like compelling qualities to consider. But then again these are just random musings from a PM.


Leave a Comment
  • Please add 6 and 8 and type the answer here:
  • Post
Page 1 of 1 (3 items)