This prompt me to follow-up on my SOA, 6 blind men and an elephant post; and talk a little bit about an aspect that is often overlooked in designing and building Connected Systems: DATA.
In my experience in talking to many enterprises, the discussions often gravitates around services: how to design a service, what is the right granularity, what is the appropriate contract etc. one aspect that is often not discussed enough is how to think about data in this new class of applications where systems are getting all connected together.
The classic example needing to be solved is the 360 degree view of the customer. In a typical enterprise, information about customers are stored (and often duplicated) in several systems. For example, part of my customer info is in the CRM systems, other parts of it sit in a couple of proprietary line of business applications and of course if the enterprise has several "relationships" with the customer, the information is duplicate for each "relationship". In addition to that, data is more an more in non relational format (XML, Email, .xls spreadsheets…) and in non transactional store (batch oriented mainframes, file systems, accessible via a web service call). See image below:
With the movement from "classic" 3 tier architecture to connected systems, data is evolving from session oriented, relational and transactional to message driven and loosely coupled. Among other things, this promotes the creation of several "types" of data, each having its own specific treatment. These different types are:
The picture below gives you an overview of the characteristics of these different types of data:
Also, you want to think hard about single "ownership" of data (only the data owner is allowed to update the data, even though the data is used by many other parties), in this context you also need to introduce the concept of tentative update requests: if you are not the owner of the data but need to update it, the best you can do is to request the update. In a non transactional environment (frequesnt in these systems) the update might or might not happen; you therefore need to be able to cope either way (potentially introducing compensating logic, manual intervation...)
As you can see (and these are just few examples), there is a significant level of architectural thinking that needs to take place around Data in Connected Systems.
How can you learn more about all this? Well, ask Roger, as it is what his job is all about now :) To be fair to him, he just started on this initiative a few days ago, so let's give him some time; but expect some very cool stuff coming out soon.
Some additional topics I discussed with Roger are:
Do not hesitate to email me topics you would like covered as part of the data pillar (leaving a comment here or trackback)
Finally, to give something to chew on while we craft some of our magic, here is a very good paper on entity aggregation.
In summary, thinking about services is important but it cannot be at the expense of data. Hopefully this post gave you some motivation to revisit how much time to dedicate on data architecture (beyond "classic" database design) and of course made you realize that there is much more than services and ESBs to worry about when you design, build and deploy your Connected Systems (err. SOA)