Welcome to MSDN Blogs Sign in | Join | Help

The "F" in the DMF

 

[Note: This is personal opinion, it doesn't reflect the viewpoint of Microsoft or the Microsoft CRM team. This is my take on the DMF and both its shortcomings and ultimate potential. Don't assume that anything that seems like a prediction here is apt to happen. I'm only peripherally involved with the DMF team and I don't set direction for them.]

 

It's a Framework

The "F" in DMF is all about frameworks. Why? Because creating a general-purpose data migration tool or product is extremely difficult, expensive, error-prone, and unlikely to meet our customers' needs. That's right. The DMF is a framework because that's the best approach we could take and the most we could provide without setting unrealistic expectations. Simply put, there isn't a way for us to create a tool that can detect all possible data formats from all possible CRM "applications" and correctly get that data into your shiny new (or slightly used) MS-CRM without the potential for serious data disaster.

 

Let's look at a few scenarios to see why the framework approach was recommended and pursued by the R&D team. First, we can assume that existing CRM systems have been customized (I don't have the exact numbers, but my gut tells me that it's a high percentage). Next, we can assume that a CRM system has been in use long enough to collect a reasonable amount of data (otherwise why would we worry about migrating data from an existing system to a shiny new MS-CRM).

 

Given that an existing system has been customized and has been running for some time there's likely to be a few "dirty" bits of data floating around. That doesn't mean that there's a bug with the in-place system. By "dirty" I simply mean that the data in any given database column will have both syntactic and semantic problems. For example, in the U.S. states are typically abbreviated to two uppercase letters. But that hasn't always been the case. For example, Minnesota is conventionally abbreviated MN (at least that's what the post office would like to see), but it's conceivable that collected data includes other abbreviations like "Minn.", misspellings, fully-specified values, and even missing data.

 

That's just one simple case. Phone number formats and addresses are notoriously hard to agree upon. More about that particular problem in a few days when I get around to talking about why duplicate detection is actually damned difficult to do well.

 

What we wanted to provide and what we did provide

Ideally we would like to have shipped something with a lot less user-facing emphasis on the "F" part of DMF. One of our goals, which we simply didn't meet, was to provide a Big Green Button that when pressed would discover your other CRM data, clean it up, normalize it, automatically match it to your new MS-CRM system (including all the customizations you put in place and any others that we might discover while migrating your data), and last, but not least, migrate that data. Really, that's what we wanted to do. [Bobert, if you're reading this you'll remember working on another system just like this about 10 years ago and about 9000 miles away.] Well, we didn't ship one of those, so what did we ship?

 

The general idea behind the DMF is that you're not migrating a single system just once. You might be in which case the DMF still provides a ton of value. One of the assumptions that we had was that MS-CRM customers would be migrating from any number of essentially unknown systems. So, without some really great AI we would need a bit of manual intervention. That is, we'd need to ask a number of questions about your data: what format is it, what source systems hold it, what are the syntactic modifications, and what semantic rules are applied. In many cases we assumed that at least the latter two questions couldn't be answered directly: you would need to discover those rules as you went.

 

Why is this a multi-step process

It was precisely this problem that drove the idea of the intermediary staging database - the CDF. The idea here was to incentivize  partners to either create adapters from source systems for resale (i.e. connect the CDF to Act or Goldmine) or to build a consulting business model around migrating custom data (Access databases, Excel files). We would provide the back-end services such as constructing the CDF from your customizations and moving the data into your production system.

 

There were three huge problems with this model: we didn't get the partners we wanted; we didn't provide a key piece of technology; and we didn't get the CDF construction logic completed. In retrospect I think the partner model would have been easier to sell if we (and the partners) were up-front about including data acquisition, cleansing, and migration costs directly in the CRM purchase price. Not doing so left the customers with an unexpected bill for these services. We missed the key data cleansing middleware that would have taken all the source data, applied a set of cleansing rules, and produced useful production data. The problem is simply that the technology is extremely hard to get right and even when it is right still requires a set of domain-aware eyeballs to verify the production rules. Finally, we could have and should have done a better job reading your customizations (pick lists and pick list value mappings in particular) and applying them to the CDF and the cleansing / mapping rules.

 

What's next for the DMF?

That's a good question. I know where I'd like to see the DMF go in future releases, but I can't promise that the team has the same point of view. In particular I think we can do a lot better job in the back-end CDF construction; we can do a much better job with value maps; we should be able to better manage keys; and we should do a better job and basic data cleansing. This latter bit is the most important in my mind: without clean data the value of your CRM system rapidly deteriorates. This isn't just a DMF problem, but if we could verify that source data, once scrubbed, met certain criteria, we would be a lot closer to helping with the problem.

 

Another area that the DMF could stand some improvement in is around managing multiple phase migrations. The idea of the DMF works great for one-time migrations where all the source data from all the source systems is moved into the CDF at the same time. It doesn't necessarily help if the data is moved in piecemeal unless the DMF includes basic rules around duplicate detection, prevention, and clean-up. If the CDF holds source data over time we can get closer to solving the problem because we can identify these issues during clean-up and "do the right thing." However, if the CDF takes on more of a bulk-load / bulk-import role as a staging area then the actual import step from CDF to CRM needs to include reasonable rules covering data clean-up rule application at the platform level. That's another topic for another day though.

 

Published Tuesday, February 07, 2006 10:58 AM by mikemill
Filed under:

Comments

# re: The "F" in the DMF

Mr Miller,

I have been running CRM for about a year now.   It seems to me that our post dot bomb era of development has completely forgotten what a database is.  "that's that thing behind my web page, right?"

My consulting career dates back to the earily eighties, and I am still amazed at the "production" level applications  being shipped today with the worst freaking database designs I have EVER seen.  I am not saying CRM has a bad design.

What I am saying is very simple.  STOP trying to make stupid people smarter.  You will never win that battle, stupid people will keep getting more stupid.  If you can't clean your data and map it to a field that should be the same (i.e.  First Name = FRST NM) you deserve what you get.

I still have a copy of POWERBUILD 6.5 (circa 1999) on my machine. WHY? DATAPIPES.  If you don't know what that is, ask any real DBA.

If CRM shipped with a RELATIONAL table set for importing data and a tool or "green button" to suck that data into CRM you would solve 99% of the data import "problems".  Data is managed by DBAs not Programmers.

I did a project at Kiaser several years ago.  The programming staff couldn't believe I wrote database scripts to verify "their" interface filtering BEFORE I allow an insert into my database.  More than 60% of the existing application fields allowed invaild input, lesson:  A program is for navigation and data presentation - A database is for storing and managing data.  Don't confuse the two.

Why  did I take the time to write you?  I was forced to do clean install of CRM 3.0 because the 3.0 installation program said I had a non standard 1.2 database. (it is standard)  The installation program didn't tell me what it found just told me I'm screwed - go away.

So now I have to migrate my data from 1.2 to 3.0.  Which by-the-way is a direct mapping except for the stupid security descriptors.  Hey, you can't give me the name of that guy can you? I really want to meet him... dark alley... blunt instrument...

Since I can't do that the way I would like. i.e. map the data and push it in, which I could do in a matter of hours, TOPS. I have to write a freakin program to grab a bunch of xml docs and debug the whole damn mess.
Thursday, February 23, 2006 1:14 AM by Andrew Chumney
New Comments to this post are disabled
 
Page view tracker