What is the correct level of abstraction for the Enterprise Canonical Data Model (ECDM)?
As I blogged before, the ECDM is used to decide what data should be passed through the integration infrastructure in the notifications that occur on business events. The canonical schema that define "things" are all subsets of the ECDM (or extensions as well will see).
In some organizations, there are fairly few variations in basic 'things' like order, product, and agreement. In other organizations, including Microsoft, the need for independent variation is more apparent. As we move more toward "Software as a service," the number and types of products will only grow. And what exactly is an order if we are using click-stream billing for a service call? This will be fun. So we need lots of flexibility as the business grows and changes. An ECDM that is too prescriptive or too large can end up constraining the business' ability to grow and change.
There are basically two types of messages that need to rely on the ECDM: event notifications and full data entities. Both are transitory, in that they state a fact at a particular point in time, but the event notifications are more transitory because they are only sent once across the infrastructure. We need to be able to replay them, but (with the exception of BAM), we don't often query them.
In general, I'd say the rule for event notifications should be:
Communicate sparingly, communicate clearly, allow for questions.
Communicate sparingly, communicate clearly, allow for questions.
Communicate sparingly: Define your entities to the minimum level needed to share "concepts" and "relationships" across the enterprise. If an order happens from "company ABC" for 10,000 licenses of "product XSP" under marketing program "VLR", then the canonical schema for that order needs to be pretty short, and the event notification even shorter, so that receiving systems can decide if they even care. Remember that your event system will send a LOT of events. Keep them small but provide enough information for the recipient to decide if they need to know more. So, perhaps the "order placed" notification has things like order id, customer id, partner id, reseller id, program id (sales are made under marketing programs) and a list of product categories that the items in the order represent. That's it. The receiving system can decide if they need to know more.
Communicate Clearly: The id's must be generic and enterprise wide. If a receiving system gets a notification or a canonical element (like the full order), they have to be able to interpret it consistently. That means that the systems listening for the events have to know what the ids mean and how to get more information on an id if they don't already have it.
Allow for Questions: the infrastructure needs to provide a generic way to ask the question: I need to know more about order 1234 to customer ABC on program VLC.
So if the needs of the event notification are for brevity and consistency, what are the needs for full data entities?
When a system gets an event notification, it will look at the event and decide if it cares. Most of the time, it won't, and our use case ends. Sometimes it will. When it does, it needs to ask for full details of that data entity. Perhaps it wants to store data. Perhaps it wants to calculate something to append to the records for the customer, the partner, the reseller, the sales team that made the sale, or the product group that made the product. Lots of reasons why the system getting the message will need more data. We have the ability to 'ask questions' listed above, but that one comes to full data entities as well.
I'd say the rule for full data entities is:
Provide a complete document, at a point in time, allow for questions
Provide a complete document, at a point in time, allow for questions
Provide a complete document - the full data entity contains all of the data that the source system can share about it, including denormalized details about related entities. For example, if I get an order as stated above, for 10,000 licenses for product XSP, we would provide the full "legal name" for the product and some attributes for the product (like the fact that it is a license, what country it is sold in, languages, product family id, etc). On the other hand, we don't want to constrain the business, so allow for optional fields in the semantics of the canonical object. Allow a system that doesn't have a data element (like a price or even a quantity) to send the order anyway. Also allow the system that is sending data to append 'system specific' data elements. That way, a team can use the canonical model to send data to another closely related system in the same business stream, where those 'system specific details' can be understood and used.
At a point in time - Recognize that your documents are not static. Provide dates and version numbers for each and every document and allow a document to be called back up on the basis of those dates and version numbers. This is key to being able to recreate a data stream later in time, an operational necessity that is often overlooked. So, yes, your order has a version number.
Allow for questions: as complete as your order document is, it will still need to have codes in it referring to other things. For example, each product may have a product family. By including the product family code, you are stating this: "At the time this order was placed, product "Sharepoint" was part of the "Office Family" of products". For some products, this may not change much, but for others, this could. So you include the product family, but there is no need to include attributes of the product family. The receiving system can ask for product family details of the same infrastructure if it needs to follow up.
Hopefully, with these simple guidelines, we can build the ECDM at the right level of abstraction.
From your perspective, would you define the canonical data model at the SOAP/XSD level (as in service contract) or are you talking full-blown enterprise data model as in having a single data store (ala SQL/MDM)?
The enterprise canonical data model is a structure that looks (from a distance, if you squint) like a database model. It has a lot of entities in it. It has all the basic entities that the business uses to do what they need to do. So, in that respect, it is like a single data store...
Only you don't store data in it.
You use it as a model to make sure that everyone agrees about what the entities mean and how they relate.
The enterprise model doesn't contain enough data or enough detail to form the Enterprise Data Warehouse. It is not a model for BI, although it is closely related because the data warehouse will need to use the same 'agreement' formed by this model to understand the data delivered by the source systems.
In fact, if done right, the ECDM will make the development of an Enterprise Data Warehouse MUCH easier because it creates a consistent context for the data in every major system. If data needs to be translated when it leaves a system in order to be part of the canonical model, it will need a very similar translation to be part of the data warehouse.
hope that helps
Your post provides a good heuristic to determine notification granularity first of all. And tying it to a CDM would ensure every provider has a consistent vocabulary to deal with data elements. Neat post, thanks!
Could you elaborate a little on the CDM itself? Is this an entity archetype that is communicated across the organization and fidelity to it enforced via best practice and governance? Where does one start on the CDM, if there are a hundred assorted data models, all loosely representing similar things? How does one version and manage this? Are there organizations that do this?! Any best practices you could share or point out?
You have asked a good question. I'm not sure I have a 'right' answer. I will say this: the ECDM cannot be formed centrally and communicated outward. It has to be formed at the edges and communicated in.
Where one starts is with the business. What does the business understand about their data. If you don't start with the conceptual data model, the canonical data model is unattainable or fictitious.
Normally the business will use different terms for the same entities. Most of the problem of developing the ECDM is in dealing with people, not technology.
That makes sense. It seems like an attempt to bring together a common "contract" to say, 10k services in a large enterprise.
Do you consider the EDM to be some form of executable contract (ala SOAP/XML, etc) or more an attempt at establishing a set of guidance and documentation across a large set of services.
Do I consider the ECDM to be an executable contract? Part of one, yes. There is the behavior of the service as part of a message exchange pattern that is entirely outside the ECDM, and then there are the data elements that I pass that are formed from entries within the ECDM.
I view it as more than guidance. It is a consensus, hopefully. An agreement for excellence.
Business objects are the essential data elements that constitute the majority, if not the entirety, of the information which circulates between IT systems.
It seems only logical that the basis for the analysis of an information flow’s structure in an execution contract be based upon these business objects.
The canonical data model should "ideally" be derived from an enterprise-wide business object model (also known as Business Data Model or Semantic Model amoung others)
Briefly on this, an enterprise-wide model contains a detailed, yet high-level view (ie: without system datatypes!), of the objects manipulated by the organization, the relationships between these objects and possibly the objects' lifecycle.
Being derived from the generic enterprise business object model, the canonical data model becomes itself a generic data model that is conventionally used by mediation (or integration) platforms such as EAI or ESB platforms.
In opposition to application or repository/data warehouse data models that are very often specific to the software package and may only be required to hold a limited amount of a business' data (objects & attributes), another goal of the canonical data model is to be an application-independent model or PIM (Platform Independent Model: see MDA principles by OMG).
It is for this reason that canonical data models are preferred by mediation platforms as it provides the following:
- the basis for construction of the canonical data types for business objects (made up of one or many business objects)
- the basis for the standardisation of information flows that use the above mentioned business objects
- an independent format to which the mapping of specific information flow data models can be performed
- an adherence to the recommended hub-and-spoke integration model
In SOA architectures the canonical data model, and in particular the canonical data types, can be used for defining the parameters that make up a service’s signature (transmission of objects as parameters rather than simply a list of values). Indeed, the definition of the different parameters (data & data types), for all request & response messages, can refer to external XSD schemas which represent the complex canonical data types. It will be up to the services, then, to pick from the full set of data as is required to perform their processing.
I think that I may have covered a lot of subjects here, each one can be the subject of one or several articles, but I hope that this may have helped a little to better understand the benefits of ECDM, as a reference structure but also how it can be used as an executable contract in today's information systems.
You use the term Business Object Model. I use the term Conceptual Data Model. We mean the same thing.
Thanks for your contribution.
I would consider the Business Object Model (or Semantic Model) as a higher level of concern than that of the Conceptual Data Model, if only for the sake of taking a step back from IT.
Indeed, what the IT industry calls a Conceptual Data Model is a data model that has a pre-defined scope. The scope, here, is generally that of a target IT solution that a Software Architect is drafting up. In past CASE methodologies, and even in those of today, it remains the role of the Software Architect/Engineer to build the high-level conceptual data model based upon the business' requirements for the target IT solution)
A Business Object Model is one that does not relate to IT <em>at all<em>, and one that is (or should be) drawn up by those that have the business knowlege: that is to say, the business users (with a little help from those who know how draw a meaningful diagram!).
A Business Object Model represents the business knowledge, the business constraints on those business objects and business characteristics (attributes, taxonomies, etc.)
The Conceptual Data Model can be <i>derived</i> in a certain manner from the Business Object Model, regrouping only those objects that fall into the scope of the target IT solution. It is in this model that the objects & attributes are further worked upon and completed to include high-level data and system architecture considerations. From here on we follow the traditional route of software development process: conceptual -> logical -> physical models ... and, I hope, respecting the MDA fashion.
But, I guess we're getting away from the subject of your original article.
Point well taken. I was not using the term correctly. You are right. I should not use the term Conceptual Data Model to refer to the Enterprise Business Object Model.