Welcome to MSDN Blogs Sign in | Join | Help

New Master Data Management White Paper Series

 

My friend Tyler Graham is writing a series of white papers on the practical aspects of implementing an MDM system.  The first three are available here: http://technet.microsoft.com/en-us/library/cc505992(TechNet.10).aspx  Tyler came to Microsoft as part of the Stratature acquisition so he has a lot of experience in implementing MDM systems at Stratature customers.  That means these papers are full of practical advice instead of the high-level theory often found in MDM articles.  Thanks for doing this Tyler and we’re looking forward to the rest of the series.

Choosing MDM Hub styles

 

A couple weeks ago, someone asked me how to choose which MDM hub style would work best for an application.  I thought I had covered this in one of my white papers but I couldn’t find a good reference to give him so I thought I would write up something here.  To review what I’ve cover elsewhere, there are basic three types of MDM hubs:

 

·         Registry – the hub doesn’t contain the actual master data.  It contains links to where the master data exists in the source systems.  In most cases, the link takes the form of the primary key and system name of the source system.

·         Repository - the master data is actually moved from the source systems to the MDM hub and the source systems are rewritten to get their master data from the MDM hub instead of from their local database.  Mapping to the source systems isn’t required because the master data isn’t stored in the source system.  This style is often called Transactional.

·         Hybrid - as the name implies, hybrid is a combination of the other two styles.  The hub contains references to the master data entities in the source systems but also contains the shared portion of the master data.  This means it can supply links to source records when required and also serve as the master data source for new applications.

 

So which style should you use? 

 

Repository

 

The repository style seems like the best option.  There are no synchronization or latency issues with updates getting propagated to multiple copies of the master data.  There are no update conflicts caused by updates to more than one copy of the master data.  In general, a single copy of the master data is significantly easier to manage and will generally be of a higher quality than multiple copies with all the potential synchronization and mapping issues.  On the other hand, if we look at what is required to get a repository style hub up and running, you may see why this style isn’t very common:

 

1.    Decide on a common data model for all applications – this will be a difficult task both politically and technically. 

2.    Transform and load all the current databases into the hub, removing duplicates in the process.

3.    Change all your applications to use the new master data tables and database.  This can be a huge effort.  If your current applications use a variety of databases you will need to deal with multi-database distributed transactions.  If you use purchased applications, you may not have the source to change the application to use the new data source and even if you do, you are likely to run into support issues.

4.    Figure out how to handle history – you are changing your databases to use a new key for all you master data so you have to deal with many years of history that was created using different keys for the master data.  In many cases you will need to create the same kind of key mapping that the other two MDM styles require to be able to access history records.

 

In many cases, this process is too difficult or too expensive to provide a significant return on investment and even if it is justified, it can take many years to make the transition so the Repository style of MDM hub may not be suitable for many projects.

 

Registry

 

The Registry style hub is attractive because it’s generally fairly quick to implement and avoids some of the political issues around a common data model.  Because only pointers to records are stored, there is no need to agree on a common data model.  There is also less need for a data quality program because the data is left in the source systems.  To be clear, it’s probably not possible to create a pure registry style MDM hub.  One of the main things this hub is used for is mapping duplicate records in the source systems to a single record in the hub.  In order to do this, each record must be matched on a set of attributes to determine if it is a duplicate of a record already in the hub.  For example, customers would probably be matched on name and address and products might be matched on descriptions and dimensions.  If you want to avoid searching every database in every source system when a new record is added to the hub, you will need to keep the matching attributes for each master record in the hub so you can tell whether in incoming record is a duplicate of one of the hub records.  This matching won’t work reliably unless the attributes stored in the hub are accurate and high quality so you will probably have to do a significant amount of data quality work to ensure the address is right and in a common format and maybe even enriching the attributes with data from an external source like Dunn and Bradstreet.  Once you have all this established, you’re a significant way down the road toward creating an hybrid hub so you can consider a registry hub to be a hybrid hub that’s not done yet.

 

The biggest disadvantage of the registry style MDM hub is that while it helps you find all the duplicate and inconsistent copies of your master data, it doesn’t give you much help in cleaning them up.  If Roger Wolter has 3 records in the ERP database, 6 records in the CRM database and 2 records in the customer support database, and among the 11 copies there are 4 phone numbers, a registry hub will tell you where all the records are but won’t help you get them to agree on a phone number.

 

Hybrid

 

The Hybrid style of MDM hub has some of the attributes of both the Registry and Repository styles.  Like the Registry style, the Hybrid style maintains links to the copies of a master data record in the source system so you won’t have to replace the master data access parts of all your applications.  Like the Repository style, the Hybrid style maintains the shared part of the master data in the MDM hub so that you can improve its quality and enrich its content in a single place.  Thus the advantage of the Hybrid approach is that it provides a single, authoritative source for shared master data without the necessity of changing all your applications to use it.

 

The most significant disadvantage of the Hybrid style is that keeping the MDM hub copy of the data synchronized with all the source systems can be a complex process.  If you allow all the source systems to change master data, you will have a continuous data integration problem caused by incompatible changes coming from different systems.  You can reduce this problem by requiring changes to the master data to be made only to the copy in the MDM hub but this may be difficult to implement and enforce.  Also, keep in mind that MDM synchronization is more complex than data replication because the data may have to be transformed both when it is loading into the hub and when it is sent from the MDM hub back to the source systems because the data models of the source systems may all be different.

 

Conclusion

 

So what’s the best choice for you?  As in everything – it depends.  Moving from Registry to Hybrid to Repository style increases cost and complexity but also increases usefulness and data quality so you have to pick the solution that provides the data quality you need in a timeframe and a budget that you can afford.  My recommendation is usually the Hybrid approach.  The Registry approach is relatively simple and quick to implement but few users will be satisfied with the data quality it provides over the long run.  The Repository style is generally too hard to do and too expensive for most companies even though it provides the best data quality.  Hybrid implementations can evolve over time.  You might start with a minimum number of attributes for each entity stored in the MDM hub so it is pretty close to being a Registry Style hub and then over time, as your needs change and your MDM data management and stewardship capabilities improve you gradually add attributes until the MDM hub is a complete source of master data.  At this point, new applications can start using the MDM hub directly for their master data so the hub evolves gradually toward the Repository style.  While not too many people will be able to move completely to the Repository style, eventually it may become the predominant approach for applications as the old apps are replaced.

One of my pet peeves is that after we ship something we normally have a Post Mortem meeting to discuss what we should learn from the experience.  I'm not against the meeting.  I think they're great and we learn a lot.  Sometimes we even have pizza!  What bugs me is the name Post Mortem.  This suggests something just died and we're getting together to figure out why.  Come on!  We just shipped a great product that we spent years of our life developing.  Nothing died - something was born!  After our next release, I'm scheduling a Postpartum review!

Master Data is not Metadata

 

I regularly get emails from people asking about the Microsoft Metadata story.  While I assume that’s mainly confusion over the MDM acronym which could reasonably be interpreted as Metadata Management as well as Master Data Management, there’s also enough overlap between Master Data and Metadata to lead to confusion.  It turns out I’m uniquely qualified to comment on Master Data vs. Metadata. I spent a few years in the early 90’s building one of the early metadata repositories.  I worked with four other repositories after that and now I’m working on a Master Data product.

 

Metadata as I’m sure you’re aware is data about data.  It describes data but it isn’t generally considered business data.  For example, customer metadata would describe the attributes of a customer entity, the datatype and size of the attributes, which programs produce the data, which programs consume the data, what business rules are enforced on the attributes, etc.  In a BI environment, derivation, transformations, source system, and last load time are also important.  The key thing to understand is that you can know the complete metadata for customers without knowing who a single customer is.  Master Data, on the other hand, is the real business data.  Customer master data is an authoritative list of customers.

 

While Master Data and Metadata are two distinct things, managing master data generally requires working with metadata.  The Microsoft Master Data hub is metadata driven.  The data model used to store the Master Data instances is defined with metadata stored in the hub.  Data stewardship and data governance depend on metadata to understand where the data comes from, what each attribute means, what transformations are done when the data is loaded, what business rules are satisfied or violated, and who modified the data.  The Microsoft MDM hub stores most of this metadata and the types of metadata stored will be expanded before we release the product.  The metadata required to manage master data is probably one source of the confusion between the two types of data.  While the MDM hub stores significant amounts of metadata, all the data is related directly to the master data so you can’t really describe a master data hub as a general purpose metadata repository.

 

Now that we’ve talked about the difference between master data and metadata, we’ll dig a little deeper into metadata because I think there are some parallels with Master Data Management that are useful to understand.  Metadata is usually managed in a repository.  The original metadata repositories modeled the metadata as entities, attributes, and relationships.  Some of the more recent repositories use objects and properties but the models are pretty similar.  Metadata repositories evolved from data dictionaries that were used to manage the schema for databases.  A repository can model the whole IT environment to provide a unified picture that goes well beyond database schema.  An accurate enterprise model can provide valuable insight into impact analysis and drive data analysis and data integration projects.  With the current compliance and auditing environment that enterprises operated in, tracking where data is produced and what processing and transformations are done to it is almost as important as the data itself.  A metadata repository populated with accurate, current metadata is a great resource for compliance, auditing and integration projects. 

 

The problem with metadata repositories is that they require constant maintenance to ensure the metadata is current and correctly represents the data it describes.  In many cases metadata repositories were laboriously populated with data descriptions and documentation but without adequate tools and procedures to keep the metadata current, the metadata gradually became inaccurate and the users stopped using it because they couldn’t rely on it.  The basic issue is that there really aren’t good tools to capture all the metadata that a typical enterprise needs to track.  There are many different sources – database schema, ETL logs, source code, copy books, design documents, policies, documentation, etc.  Some of this data can be extracted using automated tools and some of it must be entered by people who understand the systems involved.  The resulting system can be very complex and require a fair amount of manual data entry.  The more effort required to keep the metadata current, the less likely it is to stay current.  If this sounds like the same kind of issues that a master data management system runs into, you’re right.  A master data hub that isn’t surrounded by the tools, processes and policies required to maintain the quality of the data will gradually lose value and become unusable.

 

I may be stretching the limits of causality but I believe that some of the problems that master data management is trying to solve were actually caused by flawed metadata solutions.  About twenty years ago the state of the art in metadata management was the data dictionary that was carefully maintained by a data administration organization.  The DA organization maintained control over data quality and consistency by requiring all changes to the database schema to be fully documented and approved before they were implemented.  While this ensured high-quality database schema and accurate metadata, the heavyweight process was very frustrating to developers trying to keep up with their user’s demands.  Resourceful developers solved this problem by installing a departmental server with SQL Server or some other easy-to-use database, copying the data they needed out of the corporate system and making the schema changes they needed without DA approval.  In a few years there were many variations on the corporate schema with data that was not necessarily synchronized with the corporate databases.  Eventually these quick-hit applications became the core enterprise systems replacing the corporate databases and in many cases the DA organization with loosely coupled data chaos.  Several years later, MDM was invented to get a handle on the many disparate data sources.

Master Data Management Philosophy

 

Last week I did a presentation that included a slide on my philosophy for MDM so I decided to expand that slide into a blog post.  While this is my philosophy, I think it comes pretty close to the way the rest of the MDM product team looks at MDM.  As always, I welcome any comments and feedback.

 

      Multi-domain hub – while there are definite advantages to specialized MDM applications that handle data quality, match-merge, and standardization for a particular type of data, once you have cleaned up your incoming data the processes to maintain the data are common across all domains.  There are definite advantages to a single point of management and single set of tools and processes for managing master data.  Some vendors approach cross domain master data management by implementing relationships that span domains stored in different repositories but this often means that different data must be managed in different ways by data stewards. I think there’s a real advantage to maintaining all master data in the same hub because the same processes and techniques can be applied to all types of master data.  This means a hub with enough flexibility in the data model to allow any master data domain to be modeled and managed.

      Open interfaces – there are so many kinds of master data that no single vendor can provide a toolset that spans all domains.  That’s why it’s important for an MDM hub to have open interfaces for domain specific tools to plug into.  If your hub vendor provides the data management and data stewardship facilities you’re looking for but doesn’t data import and data quality tools that specialize in the kind of data you want to store, you can find best-of breed tools (or write your own) that interface to the MDM hub through the open interfaces provided.  In these days of SOA, web services interfaces are probably the most useful.

      Incremental implementation - while a single source for all your organization’s master data is the goal of Master Data Management, not many organizations have the resources and patience to consolidate all their data in a single project.  For most companies, starting with a single domain and a subset of data sources to learn how MDM works and demonstrate early success is a much better approach than a “big science” MDM project.

      Partner for domain specific solutions – as I said earlier when discussing open interfaces, it’s not realistic for a general-purpose MDM product to handle all the possible variations of Master Data domains.  For this reason, we plan to cultivate a rich partner ecosystem to provide the expertise in the specialized domains we don’t have the resources to develop ourselves.

      Use existing integration capabilities – before we started the MDM product team, I spent about a year researching how to build MDM systems with existing Microsoft technologies.  What I found was the Microsoft has a wealth of data integration technologies – SSIS, BizTalk, FRx, etc. so when we looked for an MDM product to buy we looked first for a product that excelled at managing data, data stewardship, business rules, workflow, etc. with the assumption that our current data integration capabilities would handle the rest.  By using data integration, transformation, standardization, orchestration and profiling capabilities that other teams continue to develop we can have world-class capabilities in this area while we concentrate our resources on core Master Data Management capabilities.

      Tight integration with Microsoft Products – when we started talking about MDM around Microsoft, we found that quite a few Microsoft products had requirements that an MDM product could meet.  Developing tight integrations with Microsoft products will not only be a great benefit to our customers but will give us a chance to “dogfood” the integration interfaces we plan to ship with the product.

      Hierarchy Management a critical capability – the structure of master data is often as important as the data itself.  While this is obvious if you are using MDM to manage your chart of accounts, organization structures and product hierarchies are also critically important data.  Stratature has some unique capabilities in hierarchy management that are proving very useful to MDM users.

      Data Stewardship is a key success factor – MDM systems make data quality and accuracy more important than ever because a mistake in master data can cause issues in all the systems that consume the data.  While automated match-merge, standardization and data quality tools are becoming more capable all the time, at some point real human beings who are passionate about the data are required to make decisions that tools can make and monitor the processes to ensure that business rules and data standards are being enforced correctly.  This is just one more example of the people aspects of MDM being as important as the technology.  While processes, policies, governance, and standards are the real success factors, a good MDM hub can provide tools and capabilities that make a data steward’s job easier.  Some of the more useful capabilities are business rule enforcement, workflow, versioning, searching, auditing, and eventing.  The combination of the Microsoft MDM system and SharePoint supplies all of these capabilities and more.
 

      Analytical and Operational MDM just two uses for the same data – I did a whole blog post on this a couple months ago so I won’t spend a lot of time on it here but I wanted to point out that all the data management, stewardship, and quality capabilities I have been talking about so far apply equally whether you are using you master data to build cubes in a warehouse or to provide clean data to your operational systems.  I don’t think an MDM system that doesn’t support both operational and analytical uses for master data is a good investment.  Many companies start by using their master data for analytical purposes because it is generally easier to implement and shows positive results faster but investing the time, money and effort to create a clean source of master data without eventually using it to improve your operational systems can be a significant missed opportunity.  Doing an analytical project to learn the technology and develop the policies and processes necessary to manage master data followed by another project to use the high quality master data obtained to improve operations is often a winning approach to MDM.

Check out the new Microsoft MDM web site:  http://www.microsoft.com/sharepoint/mdm/default.mspx 

Gartner MDM Conference

 

I just got back from the Gartner MDM Conference.   I learned a lot and had the chance to talk to a lot of people about MDM and what they are doing in their organizations.  Maybe it was because of the sample I happened to talk to but it seems like a lot of people are interested in MDM but not a lot of them have projects in place.  I assume that’s an indication of the state of the MDM industry – while some people have been doing MDM for several years, the mainstream is just now feeling the need to learn about MDM.

 

One thing I learned that explains some confused conversations I’ve had with several people is that Gartner is using some of the same terms I have been using to describe MDM but using them to mean very different things.  I don’t think I disagree with what they’re saying but I think some translation between their terms and the terms I have used might help.  The main point of confusion is in what Gartner calls the styles of MDM.  The way I describe the MDM Hub contents uses different terms but I think in general we’re talking about the same things:

 

Gartner’s terms

My terms

Registry Style – store references to master data.  Data continues to reside at source system.  Data quality controlled at source.  Bidirectional data flow (into the hub and from the hub out into the source systems).

Registry Style – pretty much the same concept and obviously the same term

Coexistence Style – Data as well as references to source systems stored in the master data repository.  Data quality controlled at both the source and the MDM hub.  Bidirectional data flow (into the hub and from the hub out into the source systems).

Hybrid Style – again, pretty much the same definition but with a different label.  We agree that there’s often a natural progression from the registry style to the hybrid or coexistence style.

Transaction style – Master Data resides solely at the MDM repository.  All application use MDM as their source of master data directly.

Repository Style – again, pretty much the same thing with a different name.  I think there some disagreement on how practical this style is.

Consolidation Style – MDM repository just a destination for master data.  One directional flow into the MDM repository – data never goes back into the source systems.  This style used only for analysis.

I’ve never really talked about this style of MDM repository.  Quite a few MDM project start out using MDM primarily for analytical data so I suppose this might make sense but I look at this as a minor variation of the hybrid style so I haven’t talked about it as a separate style.

 

As you can see, I’m pretty much totally in agreement with the Gartner style classification but I have been using different terms.  I’m not sure what to do about this.  If I change to use the Gartner terms, I may confuse people who have been reading my articles for a while.  On the other hand, I think it’s pretty obvious why some people have been in violent disagreement with my analytical MDM is really the same as transactional MDM stance.  I have been talking about two different uses of MDM data while using the Gartner meaning of the terms, they are two different kinds of MDM repository.  I think I am going to change my terms in this case to analytical and operational uses of master data because transactional has two different meanings.

 

Analytical and Transactional MDM

 

I was talking to someone about Analytical and Transactional MDM recently and we realized that while there are quite a few conceptual differences between the two, there’s a significant amount of overlap in the implementation details.  For my purposes, I’ll define Analytical MDM as the processes and tools to manage the dimensions in a data warehouse or OLAP cube and Transactional MDM as the processes and tools to manage the master data used in transactional systems.

 

With few exceptions, the data for the two styles of MDM looks the same.  Transactional MDM might have a few more attributes associated with a given entity because there are things the operational system cares about that that aren’t required for analysis.  An Analytical MDM hub will probably store more hierarchies than a transactional hub because there are generally hierarchies that are interesting in analysis and reporting that the operational system may not care about.  These differences aren’t incompatible and it probably makes sense to use the same MDM hub to store both analytical and transactional master data because it will be much easier to manage in one place than if you had separate hubs for the two uses of the same entities.  This seems like a good argument for looking for an MDM hub solution that isn’t limited to only a single style of master data.

 

Another difference between the two styles of MDM might be in the way data is loaded and published.  Loading an analytical hub is usually done in batches – maybe once a day while most transactional style systems are loaded an entity at a time as the entities are created or modified in the operational systems.  Other than this, the transformations, duplicate checks, business rules checks, etc. involved in loading master data into a hub are the same in either style.  This means that other than the batch size (one in the case of transactional and “N” in the case of analytical) there’s really not much difference in the load processes.

 

Publishing data is different in the two styles of MDM but not in an incompatible way.  Transactional MDM data is generally published in a “push” method where changes to the master data are pushed out to the operational systems but there are many applications that either don’t expose the required interfaces or a run by groups that won’t allow data to be pushed into their system so a “pull” style publication is required.  Analytical MDM data is generally pulled from the hub when the OLAP cube is built or when the data warehouse is updated.  This means that an MDM hub will usually have to support both push and pull style synchronization with operational systems and warehouses so again there’s not a significant difference between the requirements of transactional and analytical MDM.

 

 

I’ve heard from many people that a transactional MDM system need higher performance and scaleability than an analytical system but I’m not sure that’s necessarily true.  The data quantities are going to be identical in either case because if you add 10,000 customers a day to your operational systems, you will need to load 10,000 into the MDM hub whether the data is used in the operational system or the warehouse.  In fact, the loading is probably spread out over the business day for transactional MDM while analytical MDM loading probably has to fit into a batch window at the end of the day so the analytical may actually need more loading performance.  Publishing to several operational systems will take more processing than publishing to a data warehouse but the transformations, encoding, and messaging required can be easily unloaded to a separate server so this doesn’t affect MDM hub performance much.  In some architectures latency might be a bigger issue for transactional MDM than analytical MDM so shorter processing lengths and asynchronous business rule enforcement might be necessary.

 

So what does this all mean?  My take on it would be to look for an MDM solution that can support both transactional and analytical styles.  I think in most cases the logical progression would be to start with analytical MDM to master the data models, rules, technology and stewardship required to manage your master data in a less mission-critical environment.  Once you have achieved some successes in analytical MDM, you can use the same data, models and processes to manage the master data for your transactional systems by just adding the publishing logic to push the master data into the operational systems.

 

Stratature Misinformation

 

Do you remember the “telephone” game we used to play in school where you line a bunch of people up in a row and whisper something to the first person in line who whispers it to the next one, etc. and the last person repeats what they heard.  This is usually hilarious – “return of the Jedi” comes out as “Jeni has pink-eye”.  As someone on the inside of the Stratature MDM acquisition, I often marvel at how our plans an commitments have gotten distorted as they made their way to print.  Some of this might be malicious but most of it is probably just the kind of miss-communication that happens as information is passed from person to person.

 

For example, one of the things that most impressed us about the Stratature product is that they do a better job than just about anybody we have seen at managing hierarchies.  When we talked to our internal IT people they said they were buying a copy of Stratature +EDM primarily for its hierarchy management capabilities because they found many people were spending a significant amount of time managing hierarchies in spreadsheets on their desktops and this not only lead to lots of duplication of effort but in some case could be error-prone if the wrong spreadsheet was used. 

 

This information lead to quite a few statements that Stratature was only a hierarchy management system.  Stratature is a very fully-featured MDM hub and hierarchy management – while it’s cool – it only a small part of what it does.  Going back to the whispering analogy, this is like starting with a statement that I bought a pair of shoes because they had really cool laces and ending up with I bought a pair of shoe laces.

 

A similar example is our thinking that we can add a lot of value to the basic Stratature hub by integrating it our BI tools being interpreted as we only plan to do analytical MDM.  This is ignoring our rich set of tools – SSIS, BizTalk, WCF, WF, Service Broker, etc. that make operational MDM a very attractive market for us.  Releasing an analytical MDM only product just doesn’t make sense for us.

 

Probably the biggest piece of misinformation comes from our statement that we’re temporarily taking the Stratature product off the market.  Once we start selling the Stratature product, it becomes a Microsoft product and as such it must adhere to a whole bunch of quality, security, and legal standards that a non-Microsoft product doesn’t have to deal with.  Until we jump through all the required hoops, we can’t release the Stratature product from Microsoft.  It doesn’t require a lot of thought to conclude that this means we can’t sell Stratature for a while until we get the required changes made. This simple fact has been interpreted to mean we are planning to hack up the product and only keep the few pieces we need to do hierarchy management and analytical MDM (see above).  This interpretation is a little too bizarre to be a case of simple miscommunication so I assume there’s a deliberate attempt to spread FUD here.  Obviously some people hope that current Stratature customers will think we’re abandoning them so their only hope is to run out and buy something else.  Microsoft is often accused of being many things not often said to be stupid and this would be just plain stupid.

 

Well, that’s the way I see things from the inside.  All I ask you to do is to watch what we do in the coming months and judge for yourself what the real story is.  I think you’ll be pleasantly surprised.

Master Data Management, Microsoft, and Stratature

 

If you’re a regular reader, you are aware that I have been blogging about MDM for about a year now.  I’m very excited by the news that Microsoft acquired Stratature last week for two reasons – I think it’s a great move for Microsoft and it means that I am now the second employee on the Microsoft MDM team.  We picked up a great team with many years of MDM experience in the Stratature team so we already have a solid MDM product team in place and we’re hiring as fast as we can get interviews done.

 

If you have read my blog and white papers on MDM you know that I think an MDM hub is the key to a successful MDM implementation and Stratature has one of the best MDM hubs out there.  It supports meta-data driven schema for entity management, sophisticated hierarchy management, versioning, business rules, and workflow.  When we combine this with the rest of the Microsoft platform – SSIS, BizTalk, WCF, WF, SharePoint, InfoPath, etc. we will have one of the most complete MDM offering available.

 

The Stratature team has many years of experience in BI and data management and a solid base of current MDM customers.  Most of their customers are large enterprises with very demanding requirements.  I’m not sure which ones are public but the WebSite lists McKesson, GlaxoSmithKline, and Tiger Brands (a very large South African company).  This injection of MDM experience will be a tremendous kick start for the Microsoft MDM efforts.

 

If you have questions about our MDM effort, we have established an email alias mdmvibe@microsoft.com that you can use.  As we get further into the process and have more information available, I’ll post it here.

Master Data Management at TechEd

 

I notice that no MDM sessions made it into the TechEd schedule this year.  If you’re a regular reader, you know that MDM is my current passion.  If you’re interested in MDM, I would love to talk to you at the Architecture Track lounge at TechEd.  I haven’t totally forgotten Service Broker so if you’re interested in Service Broker, I would love to talk also.

Windows WF on SQL Service Broker

 

From the time we first started working on the Service Broker programming models five or six years ago, it was obvious that most SSB programs end up looking a lot like a workflow.  Messages come in and the application processes them often sending out more messages to other services.  When the application is waiting for a response from a message, the state of the application is stored so that it doesn’t have to be kept around if it takes a long time for the response to come back.  The Conversation Group ID is a great way to identify state so that when a message arrives it’s easy to find the right sate for the message.  The original design for Service Broker even included a “Contract Language” to define the flow for messages within a contract.  This was dropped early on because we didn’t want to invent yet another workflow language but some kind of workflow was something I have been talking about for years.  When the workflow guys first started talking about making workflow hostable, I thought hosting it on Service Broker  was a natural move.  My friend Harry Pierson has been working on WF hosted on Service Broker for one of his projects so when I needed to do a demo for an MDM talk I was giving, I decided to write my own WF service hosted on Service Broker.

 

Harry has played with WF in a CLR stored procedure but for my MDM hub application, I wanted to run the service outside SQL Server which also simplified some of the hosting code.  Even though I thought going in that Service Broker was a good fit for workflow, I was surprised how naturally workflow integrated with the Service Broker programming model.  WF is really well designed so it wasn’t too hard to do what I needed.  I did just enough to support the demo I needed to do but I tried whenever possible to build a general purpose hosting application.  When I have a little more time, I’ll try to finish up the more general purpose code.  I think the whole WF environment can be a great way to develop Service Broker services.  It won’t be as efficient as a stored procedure for simple services but it makes writing complex Service Broker services much more approachable.  The workflow designer is hostable so it should be possible to build a whole SSB service development environment based on WF.  My MDM sample could be expanded into a toolkit for building MDM synchronization applications.  I’m also playing with some string pattern matching algorithms implemented as CLR stored procedures for possible use in duplicate detection workflows.

 

If you have ever written Service Broker code, you will recognize the main processing loop in the WF hosting code.  It’s just a loop that receives a message at the top on then drops into a switch statement based on the message type.  The EndDialog and DialogError messages are handled in the hosting code for now.  At some point it may make sense to pass these up into the workflow so you can build some custom event handling for them.  Normal message types are passed up into a workflow instance for handling.  The initial message in a workflow starts up a new workflow instance, assigns the Conversation Group ID as the Workflow Instance ID, and passes the message contents as parameters into the workflow.  Any subsequent messages are packaged as Workflow events and passed into the appropriate instance identified by the conversation group ID.  This algorithm obviously requires a way to tell the difference between a message that starts a new workflow and one that is passed to an existing workflow.  The current code has a hard-coded message type as the initial message but I think this can be generalized so that the first message in a dialog where this service is a target will always start a new instance.  I plan to try this later this week.

 

Most of the logic I wrote was event handling code because pretty much all communication between the hosting code and the workflow is done through events.  Begin Dialog, End Dialog, and Send Message are events from the workflow to the hosting code and Message Received is an event from the host code into the workflow.  I also wrote several events to handle the MDM activities I needed for the demo.  All the database activity is done in the hosting code so I can use the same database connection for everything and make the database transactions work the way I wanted to for data integrity.  I made all the event handlers into custom activities so I can use them pretty simply from the designer surface because they are available in the toolbox.

 

WF comes with a persistence class that stores workflow state in SQL Server but this class manages the SQL connection and transactions.  I needed to store the workflow state using the same connection and transaction as the Service Broker messages and my MDM hub updates so I had to write my own persistence class.  Fortunately there’s a good sample so it didn’t take long to write.  There’s quite a bit of logic for locking the state to prevent simultaneous access but since the state is always locked by the Conversation Group lock, locks are not necessary for my persistence class.

 

There are parts of WF that I haven’t looked into yet. like transaction scope and compensation but so far I don’t need them.  I have only played with sequential workflows but I think a state machine workflow would also work well.  I use the dialog handles in messages as correlation ids to tie sends to receives so that I can send out multiple messages on multiple dialogs in parallel and still get the received message routed to the correct event handler.  So far it look like my parallel activities execute serially but I’m hoping that’s just because I’m running in the debugger.

 

Like most demo software, this has been debugged to the extent that it can handle two messages in a row pretty consistently but not much more.  There really isn’t any error handling to speak of so it’s way too fragile for use in a real application.  As with all prototypes, my next step is to take what I learned, throw out the prototype and start over.

MDM Hub Architecture White Paper

 

I took my MDM Hub Architecture series of posts, cleaned them up and combined them into a white paper that was just published on MSDN:  http://msdn2.microsoft.com/en-us/architecture/bb410798.aspx

 

CDI-MDM Summit

 

I went to the CDI-MDM Spring 2007 Summit in San Francisco last week.  I learned a lot and became more optimistic about the future of MDM.  I didn’t go to the original summit last spring but from what I heard, this one was about twice as big and include many more real CDI-MDM users as attendees.  As I understand it, last year was primarily product vendors scoping each other out but this year there were a significant number of users in attendance and many of the talks included real user experiences.  Even the vendor presentations were less “this is what out tools can do” and more “this is what our customers are doing with our tools”.  While the market is still pretty new, a few vendors are starting to establish themselves as mature solution providers in the space. 

 

Even though this was billed as a CDI-MDM conference, there were a few PIM (Product Information Management) sessions which was good to see because I think eventually PIM will be as big if not bigger than CDI.  It seems to be easier to quantify the financial impact of poor quality product information than poor quality customer information.  Just in the area of incorrect invoices caused by inconsistent product specs and pricing information, some companies are saving millions of dollars by implementing PIM solutions.

 

Several Microsoft people gave talks but they were all given by Microsoft IT people who are implementing CDI solutions for the Microsoft internal systems.  No Microsoft as an MDM platform talks – maybe next year.

 

On the way back while standing in line at security I got an idea for a great new product.  About half the people in line were staring off into space while talking on their Bluetooth headsets.  What the world really needs is a fake Bluetooth headset.  Not only can you get out of any boring or awkward conversations by pretending to answer and incoming call but for those of us who tend to talk to ourselves in public, we can now do it to our hearts content without attracting scorn and ridicule.  Sure, you can use a real Bluetooth headset but you can get all the real benefits from a fake one at a fraction of the cost.

Alpha Males and Data Disasters

 

If you have been following my MDM posts, you will know that one of the key pieces of an MDM program is data governance.  As part of my research into data governance, I came across the book Alpha Males and Data Disasters The Case for Data Governance by Gwen Thomas.  While Data Governance obviously has some technical aspects, this book treats those fairly lightly and puts most of its emphasis on the human and organizational aspects of Data Governance.  There are many good suggestions on what a Data Governance organization should look like, how it should make decisions, how it should communicates its decisions, and how it should resolve conflicts.

 

The Alpha Male part of the title refers to the typical middle-management style that emphasizes handling decisions yourself and only consulting peers or higher authority when there is no other alternative.  This can lead to decisions about data that may make perfect sense to the manger’s immediate organization but are disastrous to the enterprise.  The author explains how clear rules and guidelines will take these decisions out of the Alpha Male manager’s hands and prevent many of the data disasters that have been making headlines lately.  (by the way, she does say that women can also be Alpha Male managers).

 

While I read this book in relation to my MDM research, Data Governance has much wider applicability than MDM.  Indeed, with the current emphasis on compliance, Data Governance is something all organizations will have to consider.  The author covers what kinds of organizations need Data Governance and what organizations can live without it.

 

In conclusion, I found this book to be very useful in understanding how to do Data Governance and would highly recommend it.  It’s relatively short and a quick, entertaining read.

More Posts Next page »
 
Page view tracker