Microsoft CRM

Posts
  • Michaeljon Miller

    Why MS-CRM isn't a CRM product

    • 1 Comments

    For the longest time I've held the belief that the underlying infrastructure behind MS-CRM was there to support arbitrary business applications. It's funny, for the last six months I've been betting my career on that belief, and even more today, I truly believe that the core value proposition is that CRM is so extensible that its CRM-ness becomes secondary to how and why people buy it.

     

    I've been talking to a number of customers and partners who have purchased CRM not for the CRM aspects at all (although I suspect they end up using quite of bit of the functionality) but for the platform components. It seems to have come full circle - the CRM product was, a very long time ago (in industry terms) designed to be the platform on which bCentral applications were based. After a short segue into on-premise core CRM functionality where the extension capabilities were disabled or hidden, the product is really progressing into a true platform.

     

    It'll be interesting to see what happens over the next two releases. One that's been talked about is called Titan and that's the one that I think of as the core platform. I'm looking forward to seeing what that team can provide to the rest of Microsoft, and possibly to the industry, in terms of platform. Watch out SharePoint, there's a new kid in town and that kid knows all about business application requirements.

     

  • Michaeljon Miller

    Multiple forms per "entity"?

    • 1 Comments

    Why not? I was just thinking about yesterday's post and had one of those disturbing thoughts that I get when I want to make software do something that its authors didn't intend (well, maybe, sorta, kinda).

     

    Let's say I want to create multiple forms per "entity" (note the quotes). One way I might do this is to define a set of "like" entities in the metadata all pointing at the same physical location. I'd have to jury-rig the relationships a bit so the platform knew how to traverse them during create and update requests, but that's not really necessary.

     

    If I were going to do this, what would I do? I'd define several entities that all mapped to a single, all-encompassing (data-wise) entity, I'd hide the super entity using roles and privileges because we don't want anybody messing with all that data (for some reason Steve Martin saying "you could melt all this stuff" pops into my head right about now… I don't know why), and then I'd configure-up all the forms for the shadow entities. Finally, I'd rename all the shadow entities so they show up, more or less, in the application. Then, and here's where the per-role stuff comes in, define roles for each entity.

     

    In theory you should have different role-based views over the same set of data. Just a thought.

     

    Update 13:45PDT - If I were really going to do this I'd just copy all of metadata for the source entity to the destination entities. There's no reason to go through the pain of actually creating the physical tables. That's why CRM is metadata-driven - so you can define things declaratively and the system will do the rest.

  • Michaeljon Miller

    Dealing with many 1:M CRM relationships

    • 3 Comments

    I while back I mentioned in the CRM newsgroups that it was possible to create multiple relationships between two entities. For the most part this is true, but there are some hoops to jump through and each of those hoops is ringed in fire. More specifically, using this approach to solve the "many relationships" problem will plant you squarely in unsupported, probably-can't-upgrade, land.

     

    The problem I'm talking about is where you have a single parent entity - the 1-side, and you want more than 1 1:M relationship from that parent entity (let's call it P) to a child entity (smartly named C). Let's try an example. Say we have a Course that has an Author and an Instructor. Now, ignoring for a moment that CRM can't deal with additional party types and won't allow multiple relationships to SystemUser, we'll call Author and Instructor simply views over the same set of data. That is, they are the same entity with different roles.

     

    In CRM v3.0 you'd implement this by creating the three entities, setting up the Course-Author and Course-Instructor relationship, hiding the Instructor entity, and writing a callout to keep the Author and Instructor data in sync. That'll work and keep you supported. There are a few caveats that will cause some serious grief: you can't use the quick create functionality from an Instructor lookup, you need to keep the security consistent, and you can't completely hide the Instructor type as a separate type.

     

    Well, there is a way around this. It's a hack but it's been successfully hacked for some internal development deployments. Create the three entities as you normally would but keep the child entities Instructor and Author "empty". Set up the relationships using the configuration tools. You now have three entities with two relationships much like you had in the supported approach. Now, customize both Author and Instructor in identical ways by adding all the attributes that the type needs. When you're done you'll end up with two identical entities pointing at two different database structures. You're still in supported land.

     

    Do not add any instance data to either Author or Instructor yet.

     

    Next, change all the display attributes for the Instructor entity so that it looks like an Author (this really isn't necessary and might make sense usage-wise if you treat them as different). Ready to jump into unsupported territory? Open the metadata database and find the Instructor entity (let's assume that Author is the primary view) in the Entity table. You'll see a few columns that describe physical table mappings for the core and extension tables. Change the table names to reflect that the Instructor entity should instead point at the Author tables. You'll want to make a few modifications to the Instructor attributes as well so that they reflect the underlying physical Author attributes (I think the primary key in the base and extension table is the only attribute you need to really worry about).

     

    Now, every time you add an Instructor or Author the data ends up in the same physical table. That means that any query over either entity will return the same set of data. So, your lookup control from Course to Author will present the same physical data as Course to Instructor. As far as CRM is concerned you have two different entities with two different relationships. You end up with copies of all the web service methods and the schemas will look nearly the same (their logical views will reflect the as-configured names, not the physical names). But, at a physical level you only have one copy of the data that you look at and edit, there are no synchronization issues, and there's no tricky callout code to handle multi-master updates.

     

    I might have missed a few steps here because I'm doing this from memory and not on a live CRM installation. If you find that some of the steps seem incorrect (missing, out of order, unnecessary) then post a comment here so everyone gets a chance to learn along with you.

  • Michaeljon Miller

    Any uptake on the RSS connector for CRM?

    • 2 Comments

    Is anyone actively using the RSS connector CRM? If so, how are you using it and did you customize it any way? If not, what are the reasons for not using it?

     

    This is really for my information. The CRM team might watch this space and have some desire to use the information, but that's not why I'm asking. My new team is looking at some related technologies and we're / I'm wondering how this one fit.

  • Michaeljon Miller

    What is a "necessary evil"?

    • 0 Comments

    Charles Eliot  stopped me in the café at lunch today and asked me to clarify my comments about why direct database access in MS-CRM is a necessary evil. Actually, I think I used the phrase "evil-but-absolutely-necessary" when I described the design and implementation of filtered views.

     

    Way back when the CRM product was in design we had a few ideas about how the database should look. One camp took a very esoteric, but correct, view. The other had a highly normalized and simplified view. The latter design was chosen. One of the core tenets of that design was its inherent readability. Not necessarily readable by average users, but readable by people who'd have to write those "special" reports that the system didn't provide.

     

    Why does this matter? Well, because the idea was that the database should be accessible and not locked away. I always believed, and still do, that the data and the database belong to the customer and they can do damned near anything they want.  We wrapped the database in a user-protective layer called the CRM platform and asked that anybody wanting to write to the database go through that layer. We never said you couldn't read directly from the database though.

     

    What we did say, although I suspect the written word is a little light, is that we reserve the right to modify the data structures on a release-by-release basis to provide greater functionality, faster access, bug fixes, and so forth. I even went so far as to say that we would break any application written which bypassed the platform layer, and I said it in public several times.

     

    With the release of 3.0 and the exposure of filtered views we still reserved the right to break any application written on those views. Only, in this case, because we released the bits as a "feature" and then started talking that same feature up, we got ourselves into a situation where breaking applications isn't a great idea. In short we said "go ahead and use the filtered views, they're good for you". But, we didn't look far enough into the future to see what effect that would have on innovation.

     

    So, I stand by my comment. The filtered views, and by extension direct database access, is an evil-but-absolutely-necessary thing. It can't go away. It probably can't even change significantly. But, I expect those views to morph over time as new schema concepts are introduced.

     

     

    On continuous loop for the last few days is Zero: A Martin Hannett Story. It's a must listen for any fan of Mancuncian music from the late 70s and early 80s.

  • Michaeljon Miller

    New CRM blog and blogger

    • 1 Comments

    Note: free exposure to CRM team members' blogs.

    Charles Elliot, a good friend and recently a co-worker of mine, has finally jumped into the blogging world. Knowling Charles I'm expecting a lot of goodness to come our way. In particular, I'm waiting for him to start talking about music and his take on it.

    In other news, the CRM PM team has a new blog. I suspect they'll stay pretty much in the "supported" world and will talk to you about all things CRM V3 and what they're planning for future releases. As I've been steering clear of the CRM world for a few months now I'm not sure what goodies they'll talk about first (although I see Charles has posted something about the, IMO evil-but-absolutely-necessary, direct database access exposed in V3). It'll be very interesting to see what the CRM PM team blogs about that's not blogged about on other CRM PM team member blogs.

  • Michaeljon Miller

    Announcing the RSS connector for MS-CRM 3.0

    • 6 Comments

    We've finally released the RSS connector for MS-CRM. I've mentioned this tool a few times. This has been a long release mired in a few documentation, legal, and technical issues. But, that's not your concern, you probably just want to download this thing, install it, and make things happen. Well, here's the backgrounder on what's making the connector tick. The MSDN article which comes with the connector download covers some basic information. Look for a longer whitepaper from our UE team in the next few weeks.

     

    We want to hear how you've used, modified, or extended the connector. If you use it, let me know. This is the last bit of code that I built for the CRM team (well, it's the last bit that I released, I was still working on the add-entity framework and address book right up to RTM) and I'd like to see what happens with it (and no, this isn't typical of the code quality that I usually write, this was a prototype first and a public release second; if we were going to release this is would be very different). We're releasing this under a different model (see the EULA) and we're very interested in its life once it leaves here.

     

    Basics of the connector

    The RSS connector for CRM is built on top of the advanced find and web service Fetch functionality. For the most part it directly executes the requested query and returns the results as RSS-formatted XML. However, there are a few changes that are made to the base query (if you're wondering, all queries are stored as serialized <fetch> requests, which means the connector gets to mess with XML).

     

    First thing that happens is that the connector loads the actual <fetch> definition for the requested user or system query. Next, it creates an array of the columns specified in the query's grid. These are used for specifying the simple list extension attributes for IE7 sorting and grouping. The one change from the grid columns is that the connector adds the modifiedon attribute if it's part of the underlying entity's definition.

     

    Once the connector has cached away the display attributes it modifies the in-memory copy of the query so that all attributes are available. The query definition has all of its selection criteria removed as well so that the feed data is as broad as possible given the caller's security attributes.

     

    User queries vs. system queries

    Under the covers system queries and user queries are structurally the same. They are stored in different tables in the database (don't ask, it was a decision that couldn't be undone by the time it was noticed), they have the same columns, and they have the same semantics. The primary difference is that the security model changes: system queries are effectively public and user queries are effectively private. A secondary difference is that they had different APIs during the TAP and alpha releases because they just happen to be written in two different languages (again, don't ask).

     

    This does have the nice side-effect that the RSS connector simply reads the query definition through the proper entity. It's really just a <fetch> that changes entity names and some minor decorations.

     

    When the RSS connector displays the nice HTML-based list of available feeds it does so in two sections. The top contains user-specific feeds and the bottom contains all accessible system feeds. The connector also generates HTML <link> elements for each user query. This tells RSS-aware browsers and readers that there are feeds published on the page. The connector only does this for user queries otherwise the list would be unreasonable long.

     

    How the connector selects attributes for display

    Because the connector modifies the query definition to force all attribute selection (this isn't just done for display, it's also done to support instance delivery, but more on that in a minute) it's able to present a rich view of the instance data to the RSS aggregator. In WriteItemData the connector loops over the entity's attribute list in a semi-intelligent order. It writes the primary field, any audit attributes, any state or status attributes, ownership data, any "description" attributes, and then all the rest. Nearly all attributes retrieved from the platform are displayed: there are a few attributes that have no public view and no reasonable display label, so those are skipped.

     

    Riding on coattails - using the list extensions

    The connector uses the query's grid column to select the set of attributes used in the list extensions elements. Technically, this is a de-selection, because the connector rewrites the query to remove the attribute list and adds an <all-attributes> clause to the <fetch>. In the foreach loop in WriteCrmAttributes there's a check to see if the "current" attribute is in the cell list and if it's not it's skipped. The data is written in a manner that makes list extension display useful to the user and executable to the extension processor. That is, all "codes" and other internal details are tossed away and nice display values are used instead.

     

    The lightweight metadata cache and the service proxy

    Two things that held up earlier release of the connector were technologies that I used to make the connector happen but which aren't supported outside of the CRM team. These are use of the 1.2 COM proxy (which is finally gone in the upcoming CRM release - I hated that thing because I had to code it over my wife's birthday a few years back and that got me in a lot of trouble) and the internal metadata cache assemblies.

     

    I didn't want to freak out our development team so I had to use the same metadata interfaces that everyone else uses. The problem is that the MD web service delivers too much data too slowly for me (speaking of slow, one optimization I'd like to see in the connector is to read the queries in one batch instead of on a per-entity type basis). To get around the metadata problem I rolled a very lightweight and purpose-built cache that uses the web service to read the metadata and keep it around in a static. There are a ton of problems with this approach: there's another copy of the cache floating around and CRM is already memory-hungry, and this cache isn't aware of customization changes (i.e. Publish) so it can get out of sync. I didn't consider either of these show-stoppers for this add-on, but the PM in charge of programmability does and he's doing something about it for V.next.

     

    One thing that I did that might be a little surprising was that I asked for the WSDL and then hand-edited down to its absolute basic bits for this solution. I didn't want a 650Kb proxy loaded into the connector and I didn't want the connector to pay the late compilation and reflection hit when W3WP loaded the proxy. The connector only uses the Fetch method and the SOAP header. That means I was able to strip all the types and method definitions out (sorry Kevin and Arash). And no, I'm not using this as an apology for the web service shape in V3, it's a good thing. I didn't have to do the same thing with the metadata proxy because it's fairly small and the connector needs a lot of the definitions from it.

     

    If you've gotten this far make sure you read my entry on using the offline client hosting process otherwise known as Cassini. I used the connector to verify that I could make the offline client web services work.

     

    Optional non-IE7 "list extension" behavior

    When the RSS feeds are displayed to the user there are two RSS icons and a text-based hyperlink. The two icons represent the "simple" RSS feed and the RSS feed with the complete instance data. The text link will show a down-level IE representation of the IE7 RSS viewer. This bit of code is a very early prototype put together by the IE and RSS team to show what the IE7 experience might look like. I lifted the code from those teams for the PDC demo and just never got around to removing it. Someone better versed in cross-browser AJAX stuff might be able to make this work better in other browsers. For now, this link can be ignored (and I would recommend replacing the link with the "real" RSS link and let the browser figure it out).

     

    Delivering a complete CRM instance in the <item> data

    The RSS connector has the capability to deliver, as part of the item data, the XML serialized representation of a complete entity instance. It does this to enable a set of scenarios supported by really simple sharing and by some hub and spoke delivery models that we're looking at. When this option is enabled the <channel> element contains the underlying entity's XSD (this is a different XSD generation process than the WSDL uses). When a smart RSS aggregator loads a feed with the CRM namespace it knows that the entity definition and entire entity instances are available to it. This means you can tunnel select CRM instance data over RSS without exposing the CRM web services. RSS provides the pipe through which this data moves. We've come up with dozens of applications for this delivery mechanism and will start building some software based on this model over the summer. (This is the project that I've left the CRM team to work on and I couldn't be more excited about it.)

     

    Wrapping things up

    The rest of the code is just infrastructure used to make CRM data into RSS. It's missing support for HTTP 304 and ETags. I'm hoping that someone will add that and drop me an update so I can reverse-integrate it into the code. I'm assuming that the connector will fall under the "unsupported sample code" umbrella which means that there isn't a formal support infrastructure in place for it. However, if you post a comment to this entry I'll see that they get to someone in the CRM team.

     

    Building and installing the connector is easy. I'm assuming that the MSDN document talks about this, but if it doesn't here's the short and sweet. With the connector code is a small CMD script that if executed from a VS2003 command window will compile and copy the assembly to the bin directory. My demo installation uses the ISV extensions to add a "Web feeds" item to the menu which points at the RSS feed display and a convenient OPML page. There's a 16x16 PNG file that fits nicely in the menu and just happens to match the IE7 and Firefox RSS icons.

     

    More things to read

    RSS and CRM - a little history

    Where is the RSS connector for CRM 3.0

    “Democratizing” Business Logic and Data

    Simple List Extensions

    Really Simple Sharing

    Using the CRM SDK offline

    Microsoft Dynamics CRM RSS Connector

  • Michaeljon Miller

    Inside MS-CRM goes Outside

    • 4 Comments

    It all started with an email to a few guys working on a replacement lead management solution for MSN. The point of that email was that we could change the way software was built and create a new model for our partners around linking software to services in the clouds. Wow, now that I look back on it, that seems like a long time ago. Funny thing is that reading that email today brings back lots of memories of being very excited about being on the brink of something huge. When I read that email again last week while dusting off my office I realized that the excitement is still there. It's just shifted a bit for me.

     

    The MS-CRM team has grown and changed over the last (nearly) seven years and I'm glad I was a part of it. I think the team set out to build something and after a few fits and starts actually outdid itself. We learned a lot on the way - both good and bad. I've grown and changed a lot over those same years. I've filled three roles on the CRM team: architect, developer, and overall pain in the butt. To any of those folks who dealt with me while I was on a rampage I apologize.

     

    My decision to leave the team really didn't come as easily as a lot of people might think. There's a lot of cool work to be done on the product and I wanted to be part of that. However, I leave the product in very capable and caring hands. I trust them to do the right thing and I trust that they'll probably bounce their ideas off me on occasion just to see what the old guy says.

     

    You know, it's kind of funny. I actually thought that MS-CRM would be the last team I'd work on at Microsoft. I really believe that the product has a future and I think the environment in which it sits today will start to adopt some of the principles that we put into the product. There were a few times where I figured this would be my last Microsoft job because I wasn't going to find anything else cool to work on. Yeah, I know, it sounds weird what with all the things that Microsoft does, but I couldn't find anything else I wanted to work on.

     

    For the foreseeable future… or the next year, whichever comes first… I'm going to be working on hybrid line of business applications. One of the things I'd like to do is go back to the vision in that original email and see if we can tie all the goodness that is MS-CRM with a bunch more goodness in the clouds. So, I guess I'm going to start looking at MS-CRM as an ISV… from the Outside.

     

    This is going to be pretty damned cool.

  • Michaeljon Miller

    Using the CRM SDK offline

    • 12 Comments

    I've been meaning to write something about using the CRM SDK in an offline state, and I've been meaning to write it for a few years now. I guess I never had the right prodding, but recent newsgroup posts show that there are people interested in this, and that they're stuck.

     

    So, I started doing a little playing around to see what might happen. First thing I noticed is that, as expected, if the client isn't in an offline state you can't work with the local web server. There's code deep in the platform security layer that flat out stops the calls. Ok, that's easy enough to do - let's put the client in an offline state for a while and see what breaks next.

     

    I needed an "application" to test with and I just happened to have my RSS feed generator bits handy and hot off the press. They're really simple and use a very narrow set of CRM SWS (what we call the web service) methods. In fact, it only uses Fetch() to do all of its magic (oh yeah, and it uses a ton of metadata, but that's another posting). Well, as many of you have noticed, you can't get reasonable WSDL from the offline SWS because the module that generates our WSDL (which happens dynamically if you're wondering) isn't on the client. There's just no need for it there.

     

    I pulled WSDL from the server endpoint and hand-tweaked it so it had just the API set that I needed. This isn't strictly necessary, but given the size of the generated code and number of classes there's a significant hit to start-up performance as the CLR reflects over all those types. Anyway, all I needed was Fetch() so I removed everything else and compiled up the resulting CS file into a client proxy assembly.

     

    After installing everything I thought I'd need to run my application offline I noticed that there was a problem hitting the SWS in Cassini, particularly around executing queries. In this case the thing to remember is that queries are old V1.x functionality and that they're implemented in native C++. That means the SWS needs its own proxy to get at those C++ bits. That's where the COM proxy comes in (warning: the COM proxy has already been removed from the next release's build environment, so don't assume you can use this in any supported way for anything).

     

    You might have noticed that the COM proxy isn't on the client machine anywhere (although there is another client-specific COM proxy, but that's not the one we want for this exercise). Go to your install CD or grovel the COM proxy from somewhere off your server and copy it to the res/web/bin directory on the client. Then, and this is important, GAC it so it's accessible from the Cassini process.

     

    That's all I needed to do to get arbitrary query support on the client in a custom application offline. I haven't expanded to arbitrary reads through other messages, but I'm assuming that they should all work. I also haven't done anything with create / update / delete yet because those requests must end up in the playback queue. The COM proxy doesn't do this work. If I remember correctly, this happens somewhere in the RC proxy or in Cassini itself (it would make the most sense for this to work as an HTTP handler inside of Cassini since we want to capture SOAP requests for later playback).

     

    Anyway, I hope that unblocks a few creative people and gets them moving in a direction that helps. I'd love to start seeing some add-on code running in an offline state. Granted, things like callouts and workflow won't work offline, so don't even both trying to make them work.

     

    If anyone comes up with a cool offline add-on I'd like to hear about it.

  • Michaeljon Miller

    RSS connector revisited

    • 2 Comments

    I'm getting close to finishing the RSS connector now. I've finished the cache code and I've switched the query engine over to use the CRM 3.0 web services. I'm not sure how we'll release this, but I suspect it'll be part of the CRM GotDotnet sandbox somewhere. Stay tuned.

  • Michaeljon Miller

    Where is the RSS connector for CRM 3.0

    • 5 Comments

    It's stuck in LegalLand right now. It's not really Legal's fault either. When I put the original prototype together for the BizSummit and PDC I was able to take liberties with the set of APIs that I used. Yup, I ended up using unsupported functionality. Shouldn't be much of a surprise, I have access to all the internals, I'm on the team, I needed to get a job done, and this blog has always been about pushing past the envelope.

     

    Turns out that once the connector was public the demand went way up and the internal pressure to release it went up. Problem is there's no dedicated resource for "fixing" the bits that I cheated on. That means that we (read that Microsoft) can't release the connector without doing one of two expensive things: document the undocumented or fix the code.

     

    Ideally I'd like to fix the code. I really didn't push the envelope all that much. In fact, all I did was cheat a bit and use the COMProxy instead of the shiny new SWS (of which I am a long-time champion) for tweaking the requested queries, and I directly access the application-level metadata cache. Fixing the COMProxy issue is an afternoon's worth of work. It really just means pulling the WSDL, ripping out all the bits that have no bearing on RSS (so it loads faster), and tweaking a few query functions. The metadata cache is another issue altogether.

     

    Here's why. The application-level metadata cache has two nice properties: it's already loaded and can be shared with the application thus cutting down on memory requirements, and it has a reasonable object model (note that I didn't say it has a good object model… if it did we would publish it). That means I need to define an object model that makes sense, and I'd want to make it "big enough" to be useful. Plus, I would need to write a bunch of code to read the WS-based metadata data and transform that into the nice object model. I've been assuming that anything I do in that space, once released, will probably end up in general usage (I would actually hope so because I wouldn't want everyone to have to go through this same pain). If I'm right about that then I would want to make sure that the cache is really usable. But then that means I'd need to spend more time "getting it right". There's also the added problem that if I, as an aside, release a metadata cache programming model that people will come to expect something much like it in a future product release (which is why I never made the 1.2 web service code generally available - when I was finally ready to release it the product team took up the banner and built one themselves).

     

    If there really is demand for the RSS stuff and / or the an object-based metadata cache, let me know and I'll try to get it on the radar. If not, let me know and I'll keep working on other things.

  • Michaeljon Miller

    CRM metadata browser

    • 6 Comments

    It was interesting to see Mitch Milam's post about the metadata browser. This was a little tool I put together pre-V1.0 ship, but which didn't make the schedule until V3. Mitch points out the "published" component which displays the entity metadata in a nice format. If you edit list.aspx you'll see two sections commented out that provide links to individual entity schemas. This is unsupported and undocumented functionality that we considered calling "sample code". Turns out that it made it into the box but not enabled by default. These schemas work nicely with the code generators that I blogged about a while back.

  • Michaeljon Miller

    Why duplicate detection is hard

    • 4 Comments

    In a past life (I'm fairly certain it was a past life because there are days when I'm sure I'm paying for it), I worked on a pair of very large-scale data-cleansing systems. They will go unnamed, except that I'll refer to them as System DBS and System ES. Both systems had a specific set of goals and in the case of DBS a very specific target problem domain. The goals were quite simple:

     

    • Acquire data from a number of related applications (where "related" is a very loose term meaning that the owning company for each application was effectively the same, but that otherwise they supported different business functions).
    • Define a common schema that captures the union of all the source data.
    • Define the set of candidate keys that lets you identify an item's source, its original key, its common key, and its goal key.
    • Construct from that data a common union of all interesting elements. That is, load it all up.
    • Apply a forward-chaining rule system over that data to coalesce duplicate records, identify "bad" data, any find missing data.
    • Push that nice clean data back out into the bad dirty world from which it came (but put it back nicely)

     

    Given that DBS was specifically designed to run over data from a given problem domain (let's call it telecom data) one might assume that the problem was well-constrained. If one did assume that one would be very wrong. So, a small team of developers set out to generalize the problem space to cover different domains and designed ES as a result. ES followed the same path as above only in a very general way and without the overly complex rules engine (12 years later I think we might have been able to use that rules engine, but at the time our distaste for it was clear in the ES design). As an aside outside of parentheses this was called the U model mainly for the shape that the model took on while we drew it on the whiteboard.

     

    So, what's the point of that history lesson and what does it have to do with duplicate detection. Well, remember that I mentioned that the primary domain was telecom. That means we covered concepts such as customers, addresses, telephone numbers, physical plant data (there are a lot of little pieces that go into getting telephone service over a land line), bills & invoices, and payments. In all there were some 30 different systems involved in sourcing this data to our engine. One the first things that needs to happen in the DBS problem space is that a set of candidate keys need to be identified that work across all systems, or at least across enough systems such that in the end all systems can be logically linked. In the telecom world that was the phone number.

     

    In the U.S. (actually in the North American dialing area) telephone numbers are 10 digits long and always follow a very specific format. I won't go into the formal names for those various groups of digits or even why there are groups, but let's just say that each of the groups can become part of a key. Nearly every installation of DBS was in the U.S. so building a telephone number parser wasn't too terribly difficult. You're either looking for 7 or 10 digits (unless you run across a PBX in the data and then you have to start messing about with extensions). Well, the big DBS installation I worked on, and what drove much of the ES design, was not based in the U.S., but was instead in another country with very different telephone formats. Some places used 4 digits, some 5, some 7… you get the point. The plan was to configure and run DBS to first dedup that data so we could convert the whole country to 10 digit dialing. A secondary goal, once the customer realized what we could do, was to dedup the whole lot of data to see what we could find (did you know that many times the phone company doesn't know that your home already has a connected telephone line so they sent a technician out with new gear to hook it up?).

     

    I can hear you now: "Get on with the discussion and tell us why duplicate detection is hard."

     

    Remember that I mentioned the step about identifying a candidate key? Well, in the case of a phone number it's fairly simple (let's make some additional simplifying assumptions that phone numbers are never reused, each person has only one phone number, and numbers never move from person to person). Once you see a phone number in a normalized format you can then query over the set of existing data looking for other instances of that phone number. In a live RDBMS that can be done using a unique key constraint over the normalized phone number column that will throw back an error when an insert or update attempts to violate that key. In our simple world this works every time because when you get an error on an insert you know that you have either a duplicate record or an error in the key. For updates you know you have an error in one of the keys of the updated record or in at most one existing record in the database.

     

    Now, let's extend this from our idealized phone number world to something that's more CRM-ish. A phone number is not a reasonable candidate key because it does change over time, two people can share it, and many people have many numbers. A solution to this problem is to identify a new candidate key. One approach is to construct a key from various bits of useful information. For example one might use some normalized elements of a contact's name, a phone number, possibly an email address, and their home address. Once we've extended the key to cover enough elements to guarantee uniqueness (which is not possible and is left as a proof to the reader) in our problem space we will invariably run into the case where that key isn't wide enough.

     

    Let's see what happens when we insert a new record. First, we construct / synthesize a candidate key and put it into one of the columns in the INSERT statement. Then we fire the statement at the database and wait for an error. Let's assume we get a key violation back so we have a few options: we can change the key, we can ask the data supplier what to do, or we can punt. If we change the key automatically then we've simply ignored the duplicate detection problem and we might as well not have done any of this work. Same thing with punting except that it's overly harsh on the other side: the data doesn't go into the database.

     

    We might ask the user what to do. Well, simply telling them that their just-entered data would create a duplicate record in the system and therefore must be in error wouldn't be particularly useful. How would they know what part of it is in error? How would they even know what to do with a duplicate. One thing that DBS and ES did was return the new record and the existing duplicate(s) in a nice bit of dynamic UI so that the user could see both records essentially side-by-side and make a judgment call. This worked for our solution because we specifically engineered it so that there was a headless service running but a staff of "Error Resolution and Correction Clerks". That is, people were sitting in the dark waiting for bad data to pop up on their screen, they'd make a call based on the original data, the duplicate data, and occasionally the data from the source system.

     

    Let's say we do something like the first approach where we simply return the "offending" records and the new record and let the user decide. Then, the user decides that these two records are actually different from one another but that the data is 100% correct. In this case the user or the system could mar the record in such a way that it's no longer considered a duplicate and complete the write operation. What just happened here? Well, the candidate key for the new item will no longer raise an error when it's the cause of a duplication because that key is different. We could mar the data in a predictable way so that the key stays intact but so that there's a "larger" unique key over the data, but again that wouldn’t cause the insert to fail.

     

    The next option is to use a unique key over the candidate key plus some invented data (invented in a predictable way that is) and query the data on the candidate key before attempting an insert. Now we're getting somewhere. We allow duplicate candidate keys but invent a wider key that guarantees uniqueness (see above for details on the widening proof). But we still haven't solved the problem because we don't have a reasonable way to verify that a duplicate is really a duplicate.

     

    This all gets horribly complex when you're dealing with multiple record types or subclasses of types (think of the customer case in MS-CRM where "customer" might mean Account or Contact. This means you need a candidate key that crosses type boundaries and that you have a way to reconcile duplicates across types.

     

    Anyway, that's why duplicate detection is hard. Note that I didn't say it was impossible, just hard.

     

  • Michaeljon Miller

    The "F" in the DMF

    • 1 Comments
     

    [Note: This is personal opinion, it doesn't reflect the viewpoint of Microsoft or the Microsoft CRM team. This is my take on the DMF and both its shortcomings and ultimate potential. Don't assume that anything that seems like a prediction here is apt to happen. I'm only peripherally involved with the DMF team and I don't set direction for them.]

     

    It's a Framework

    The "F" in DMF is all about frameworks. Why? Because creating a general-purpose data migration tool or product is extremely difficult, expensive, error-prone, and unlikely to meet our customers' needs. That's right. The DMF is a framework because that's the best approach we could take and the most we could provide without setting unrealistic expectations. Simply put, there isn't a way for us to create a tool that can detect all possible data formats from all possible CRM "applications" and correctly get that data into your shiny new (or slightly used) MS-CRM without the potential for serious data disaster.

     

    Let's look at a few scenarios to see why the framework approach was recommended and pursued by the R&D team. First, we can assume that existing CRM systems have been customized (I don't have the exact numbers, but my gut tells me that it's a high percentage). Next, we can assume that a CRM system has been in use long enough to collect a reasonable amount of data (otherwise why would we worry about migrating data from an existing system to a shiny new MS-CRM).

     

    Given that an existing system has been customized and has been running for some time there's likely to be a few "dirty" bits of data floating around. That doesn't mean that there's a bug with the in-place system. By "dirty" I simply mean that the data in any given database column will have both syntactic and semantic problems. For example, in the U.S. states are typically abbreviated to two uppercase letters. But that hasn't always been the case. For example, Minnesota is conventionally abbreviated MN (at least that's what the post office would like to see), but it's conceivable that collected data includes other abbreviations like "Minn.", misspellings, fully-specified values, and even missing data.

     

    That's just one simple case. Phone number formats and addresses are notoriously hard to agree upon. More about that particular problem in a few days when I get around to talking about why duplicate detection is actually damned difficult to do well.

     

    What we wanted to provide and what we did provide

    Ideally we would like to have shipped something with a lot less user-facing emphasis on the "F" part of DMF. One of our goals, which we simply didn't meet, was to provide a Big Green Button that when pressed would discover your other CRM data, clean it up, normalize it, automatically match it to your new MS-CRM system (including all the customizations you put in place and any others that we might discover while migrating your data), and last, but not least, migrate that data. Really, that's what we wanted to do. [Bobert, if you're reading this you'll remember working on another system just like this about 10 years ago and about 9000 miles away.] Well, we didn't ship one of those, so what did we ship?

     

    The general idea behind the DMF is that you're not migrating a single system just once. You might be in which case the DMF still provides a ton of value. One of the assumptions that we had was that MS-CRM customers would be migrating from any number of essentially unknown systems. So, without some really great AI we would need a bit of manual intervention. That is, we'd need to ask a number of questions about your data: what format is it, what source systems hold it, what are the syntactic modifications, and what semantic rules are applied. In many cases we assumed that at least the latter two questions couldn't be answered directly: you would need to discover those rules as you went.

     

    Why is this a multi-step process

    It was precisely this problem that drove the idea of the intermediary staging database - the CDF. The idea here was to incentivize  partners to either create adapters from source systems for resale (i.e. connect the CDF to Act or Goldmine) or to build a consulting business model around migrating custom data (Access databases, Excel files). We would provide the back-end services such as constructing the CDF from your customizations and moving the data into your production system.

     

    There were three huge problems with this model: we didn't get the partners we wanted; we didn't provide a key piece of technology; and we didn't get the CDF construction logic completed. In retrospect I think the partner model would have been easier to sell if we (and the partners) were up-front about including data acquisition, cleansing, and migration costs directly in the CRM purchase price. Not doing so left the customers with an unexpected bill for these services. We missed the key data cleansing middleware that would have taken all the source data, applied a set of cleansing rules, and produced useful production data. The problem is simply that the technology is extremely hard to get right and even when it is right still requires a set of domain-aware eyeballs to verify the production rules. Finally, we could have and should have done a better job reading your customizations (pick lists and pick list value mappings in particular) and applying them to the CDF and the cleansing / mapping rules.

     

    What's next for the DMF?

    That's a good question. I know where I'd like to see the DMF go in future releases, but I can't promise that the team has the same point of view. In particular I think we can do a lot better job in the back-end CDF construction; we can do a much better job with value maps; we should be able to better manage keys; and we should do a better job and basic data cleansing. This latter bit is the most important in my mind: without clean data the value of your CRM system rapidly deteriorates. This isn't just a DMF problem, but if we could verify that source data, once scrubbed, met certain criteria, we would be a lot closer to helping with the problem.

     

    Another area that the DMF could stand some improvement in is around managing multiple phase migrations. The idea of the DMF works great for one-time migrations where all the source data from all the source systems is moved into the CDF at the same time. It doesn't necessarily help if the data is moved in piecemeal unless the DMF includes basic rules around duplicate detection, prevention, and clean-up. If the CDF holds source data over time we can get closer to solving the problem because we can identify these issues during clean-up and "do the right thing." However, if the CDF takes on more of a bulk-load / bulk-import role as a staging area then the actual import step from CDF to CRM needs to include reasonable rules covering data clean-up rule application at the platform level. That's another topic for another day though.

     

  • Michaeljon Miller

    3.0 SDK is now live on MSDN!

    • 0 Comments

    This just showed up in my inbox. Enjoy!

    From: Amy Langlois
    Sent: Friday, December 16, 2005 9:30 AM
    To: CRM Team
    Subject: 3.0 SDK is now live on MSDN!

    I am happy to announce that the online version of the Microsoft CRM 3.0 SDK is now live on MSDN: http://msdn.microsoft.com/library/en-us/CrmSdk3_0/htm/v3d0microsoftcrmv3d0sdk.asp

    The online version will be updated quarterly, beginning in January.

    You can find the download version, along with other downloads, here: http://msdn.microsoft.com/MBS/Downloads/CRMdownloads/default.aspx

    3.0 database diagrams will be available on the download page within the next few weeks.

  • Michaeljon Miller

    Old Inside Microsoft CRM blog is back

    • 0 Comments
    It took me a while, but the folks over at Blogger were able to dig up all the old posts and get them back online. Several articles were already reposted here, but there are a few that never made it (including the one that started this whole conversation). Hopefully the MS-CRM community will find the older articles helpful, or at least enlightening.
  • Michaeljon Miller

    How to add an "auto number" to a CRM entity

    • 1 Comments

    Warning: unsupported territory ahead

    Adding an “auto-number” field to MS-CRM is one of those features that has been requested several times. The problem is that there isn’t a solution that really meets everyone’s needs. I was asked during V3 to come up with a way to do this for an internal customer (actually, our development team) to track an ever-increasing number on customer issues. This was something that we needed so badly that we opened a DCR against ourselves to see if we could get it into the product. No dice.

    However, there is an unsupported way to do this, much like there are unsupported ways to do many things. Here’s the way I put this together for our internal site. Note that this will only work once on a table and only if the table doesn’t already have an IDENTITY column (I don’t remember adding any IDENTITY columns in the V1.x databases, but one might have slipped by).

    First, use Deployment Manager to add a new number attribute to your target entity (let’s call it myCounter). Once you’ve done this you should have a column named CFN_myCounter on the base table and in the entity view.

    Next, find this attribute definition in the metadata. You’ll want to tweak the ValidForCreateAPI and ValidForUpdateAPI bits to 0 so you don’t accidentally supply a value from outside of the platform. This will also give you a read-only attribute on the form. Also, keep in mind that this attribute won’t have a value in the database until after the row is committed the first time. This means the edit control on the form will be empty until a Save operation (which calls CreateAndRetrieve).

    Almost there. Next you need to drop the physical column from the underlying table (this might require some tweaks with replication which means you might need to use sp_repldropcolumn and sp_repladdcolumn instead). So,

    ALTER TABLE FooBase DROP COLUMN CFN_myCounter

    Then, turn around and recreate that column, but this time specify that the column is a non-NULL IDENTITY,

    ALTER TABLE FooBase ADD CFN_myCounter INT IDENTITY NOT NULL

    Finally, go add the attribute to the form, republish, and do all the other stuff we make you do in V1.x when you do a customization.

    As usual, you’re mucking with the physical database here. This will likely break in a V3 upgrade (because we won’t migrate the IDENTITY information to the extension table). If you do this, you’re on your own, but you won’t have problems with callouts locking transactions, hotspots in a counter table, and all the other problems that might come up.

    I haven’t tried this on a 1.x deployment (recently) so I can’t vouch for the correctness of the solution. I recommend backing up both your primary and metadata databases before starting with something like this (but then I recommend that even when you’re doing supported things).

  • Michaeljon Miller

    MS CRM V1.2 Logical Model on MSDN

    • 6 Comments

    Quite a few people have asked for the database schema for CRM 1.x. Well, we were a little hesitant about handing it out for a few reasons. One, we really don't think you should be partying on the database directly. Two, there was no reasonable delivery mechanism for doing so. Well, this morning our UE team posted the logical model on MSDN. This is a big Visio diagram that shows all the entities, the definitions (no attributes), and all the relationships, both logical and physical. Thanks to the CRM UE team for putting this together.

    A little trivia - this model was generated from the metadata using VBA and then hand-tweaked to fix some of the layout issues. Just thought that was an interesting take-away.

  • Michaeljon Miller

    More fun with CRM web services

    • 1 Comments

    I’ve received a dozen or so requests for the sample WSDL wrappers for CRM. I haven’t forgotten about sending these, I’ve been working on cleaning up and extending the sample code so it’s slightly more useful and includes some missing functionality that I think ISVs are using. Shouldn’t be long now.

  • Michaeljon Miller

    Playing with Microsoft CRM programming models

    • 8 Comments

    I���ve spent the last few days playing with a new programming model on top of the CRM platform. This work was part prototype, part investigation, and part complaint. I wanted to see what might be possible if I took a completely radical approach to the API. Some of you might remember an earlier article I wrote (which isn’t linked here because it’s been wiped out) about the CRM programming model in which I mentioned that I didn’t like the v1.x API set and that I wanted to see something different for V2. Well, we’re still working on a V2 model and we’re still trying to get our heads around what might make sense. In the meantime though I decided to see what was possible using V1.x.

    The motivation behind this is simple: I don’t like the mess we got ourselves in to on the V1 product. There’s a lot of history about why we have what we have and I won’t try to defend it. Instead, I’m going to present a sample web service along with the build steps. This work is based on a few of the previous articles that I’ve written. The service is simple, completely type-safe, supports ‘pre’ and ‘post’ method events (because you own the code), and can be extended as necessary.

    The interface looks like so... the first thing you should notice is that this is the complete CRM web service; I’ve removed the per-entity endpoints because I think they just clutter up the story.

    [WebService(Namespace="http://www.microsoft.com/mbs/crm/services/2005")]

    [SoapDocumentService(SoapBindingUse.Literal, SoapParameterStyle.Bare)]

    public class Services : WebService

    {

        [WebMethod()]

        [return: XmlElement("entityId", typeof(Guid))]

        public Guid Save(businessEntity theEntity)

        {

        }

     

        [WebMethod()]

        public void Delete(businessEntity theEntity)

        {

        }

     

        [WebMethod()]

        [return: XmlElement("businessEntity", Namespace=EntityNamespace)]

        public businessEntity Get(string entityType, Guid id)

        {

        }

     

        [WebMethod()]

        public void SetAccess(businessEntity theEntity, securityAccessType accessType)

        {

        }

     

        [WebMethod()]

        [return: XmlArray("securityAccessTypes", Namespace=EntityNamespace)]

        public securityAccessType[] GetAccess(businessEntity theEntity)

        {

        }

     

        [WebMethod()]

        [return: XmlElement("results", Namespace=EntityNamespace)]

        public XmlElement Find(Microsoft.Crm.Query.fetch theQuery)

        {

        }

    }

    The client side is just as simple now. I won’t show the whole interface (in particular I’m going to skip showing what Find() looks like because I haven’t come up with a reasonable way to construct the <fetch> queries in code. For now using XML actually is better.

    CRMServices.Services webService = new CRMServices.Services();

    webService.Credentials = new System.Net.NetworkCredential("", "", "");

     

    CRMServices.account theAccount = new CRMServices.account();

    theAccount.accountcategorycode = new CRMServices.picklistType();

    theAccount.accountcategorycode.Value = 1;

    theAccount.accountclassificationcode = new CRMServices.picklistType();

    theAccount.accountclassificationcode.Value = 5;

    theAccount.accountnumber = "A123456";

    theAccount.name = "A sample account - " + DateTime.Now.ToLongTimeString();

     

    theAccount.address1_line1 = "One Microsoft Way";

    theAccount.address1_line2 = "110/2284";

    theAccount.address1_city = "Redmond";

    theAccount.address1_stateorprovince = "Washington";

     

    theAccount.emailaddress1 = "mikemill@microsoft.com";

    theAccount.donotphone = new CRMServices.booleanType();

    theAccount.donotphone.Value = true;

     

    Guid id = webService.Save(theAccount);

     

    The steps I went through to build this site were:

    1.       Create the correct XML schemas for the interesting entities. This was done using a modified version of the sample code I previously provided.

    2.       I ran the resulting XML schema document (all of the entities are defined in a single schema to make life easier) through the XSD Object Generator. To work around a bug in the 1.1 CLR I needed to modify the resulting class and rename the __fooSpecified fields.

    3.       Next, I created a schema for <fetch> so I could turn that into a class as well. That’s a topic for another rant someday.

    At this point I have schemas and classes for all the entity definitions. This works for added attributes as well because I just regenerate the schemas and classes as necessary. One way around this would be to add an extension point to the schema and let the client side figure out where to put the “found” attributes. I guess I prefer the code generator approach.

    4.       The next step was to create the web service itself using the above class. By the way, I also wrote the test driver code at the same time to make sure that what I was building was going to work the way I wanted it to. TDD can be your friend when you’re experimenting.

    I decided to use the unsupported COM proxy for this project for a few reasons. First, I wanted to sit as close to the platform code as I could because I just didn’t see the point in going through yet another serialization step when I already had done that. Using the COM proxy is fairly straightforward, but there are a few gotchas (particularly around memory management issues). This resulted in code that looks a bit like this (this is an account Save operation).

    case "account":

    {

        account theObject = (account)theEntity;

        theObject.ownerid.Value = ua.UserId;

        theObject.ownerid.type = (int) ObjectType.otSystemUser;

        string entityXml = ToString(theObject);

     

        CRMAccount theService = new CRMAccountClass();

     

        if (IsMissingElement(theObject.accountid)

        {

            id = theService.Create(ref ua, entityXml);

        }

        else

        {

            theService.Update(ref ua, theObject.accountid.Value, entityXml);

        }

     

        break;

    }

     

    The ugly thing about this code is the big switch on entity type. There are ways around this, but none of them are really pretty. So, for now this code will be messy. As you might have noticed already the methods don’t need the CUserAuth structure because the web service takes care of this for you. I figured that it was rare enough for a platform caller to actually want to supply different credentials (and if that’s the case then adding a SOAP header to this with the desired credentials would be the better way to go).

    One of the things I debated was changing the signature for Get to take a businessEntity as well. That would remove the need to pass the entity name. I might still do that, but I want to play around with this some more first.

    I have tested this code as part of a default CRM v1.2 installation. All I did was drop the resulting DLL into the wwwroot/bin directory and drop the ASMX file at the root of the application. Gotta love XCOPY installations.

    Please keep in mind that this is all sample code, that the V2 product will likely differ in style and substance, and that I don’t speak for the CRM team. That said though, this might make someone’s job a little easier for a while. If you’d like a copy of all the code let me know and I’ll package it up and send it. Right now I don’t have anywhere to post the bits for public consumption (I don’t want to use my SU personal pages) so I’ll have to send email.

    I have a few things left that I’d like to add to this sample – I really think we need a simple way to deliver commands as business documents which allow arbitrary business logic. I’ll probably do this using some flavor of Execute(). I also want to add support for the V1.x <columnset> parameter to Get(). That should be fairly simple, but I’ve left it out for now. I thought about using a delegate or reflection based model for loading ISV-defined code in Execute and for ‘callouts’, but that one I decided to leave as an exercise for the reader.

     

  • Michaeljon Miller

    Should CRM services return an error for invalid attributes?

    • 6 Comments

    It turns out that I was always working with an altered version of the CRM platform which silently was ignoring invalid-for-create and more importantly invalid-for-update attributes. Why is this important? Well, my recommendations for creating a friendlier programming environment were missing a critical step which isn’t available to “outside” developers. This means that creating classes from the XRM entity XSD will only work for a subset of the core scenarios.

    If you’re using the classes for data binding operations then you’ll quickly realize that a READ-EDIT-SAVE operation won’t always work. In fact, any time you run the READ-EDIT-SAVE loop on the same instance things will break. There are two options available for working around this problem. One of them puts the burden on the developer and still doesn’t solve the R-E-S scenario: create different schemas for each of the Read, Create, and Update operations, then create corresponding classes to use them. This doesn’t work well because you now really need to pay attention to which attributes are applicable. The second option is for us to take a design change (which probably won’t make into CRM 1.2, but it might be early enough to get it into CRM 2005) that makes the platform silently ignore “extra” attributes instead of returning an error.

    Obviously I’m leaning toward the latter solution where we fix the problem… er, where we change the design. This isn’t a done deal though and there are strong feelings on both sides of the issue. So I’m asking the CRM development community (at least folks who read the stuff I write) for your opinion. Is it better to silently ignore attributes that don’t make sense for the operation or would you prefer the current behavior where we return an error?

     

  • Michaeljon Miller

    Roles, Privileges, and CRM Security

    • 8 Comments

     

    One of the feature requests that we get on a regular basis, and a feature that a few enterprising ISVs have started to build, is the ability to have per-role UI. This also comes under the name “field-level security” – but only because hiding access to attributes in the UI is a possible, but insecure, solution to the problem. That is, one way to implement per-attribute security is to implement per-role UI. At the surface that makes a lot of sense, but there’s always something deeper than the surface that we need to worry about first.

    Before we talk about why implementing per-role UI is only a surface-level solution, and an insecure one at that, it’ll help if we figure out what a role is in the context of MS-CRM. Believe it or not neither the platform nor the application really has any notion of a role. It’s a concept that was invented as a way to package privileges in a way that’s easy for administrators to manage. Current CRM builds have just over 320 distinct privileges. Managing these individually would be extremely difficult. So, the idea of a security role as a privilege container was created. As a side note, early V1 builds had 20 or more security roles, it wasn’t until the last minute that we trimmed that list down to a manageable 7.

    So, what does this all mean? It means that the platform and application worry about what users can do by inspecting the set of privileges that the user has. If a service needs privilege X to continue, then that user better have a privilege X somewhere in one of the roles that are assigned to that user. Which raises another side issue: which direction is the assignment? Do we assign roles to users, or users to roles? If you’re a CRM administrator then it really doesn’t matter. You assign a role to a user. If you’re a CRM developer on the platform team working on V1 code, then you assign the user to a role.

    “OK”, you ask, “what does this have to do with per-role UI?” Well, it has everything to do with it. The application loads the one form for the entity (ignoring the special forms like Quick Create and Print) but looking in the form cache. This is a very fast and very simple lookup (ignoring the organization- and language-specific stuff). We could add a second level to the map key, perhaps something like a role identifier. That wouldn’t be difficult to do. We look at the user and then grab the information about that user from the cache. Simple enough. Next we look at the cached user information. Hmm, there’s no role information in there (remember, I said that the application doesn’t really know about roles), so we’ll need to extend the cache to hold on to the users’ roles. It’s all just code and memory, so that one’s simple enough. But wait, isn’t it possible to assign a user to multiple roles? Sure it is.

    That’s where the per-role UI falls apart. It’s entirely possible that a user has been assigned to multiple, overlapping roles. If each role has an associated UI, and those UI are different, then we don’t have a way to know which UI to load for the entity, because we don’t know which role governs the interaction.

    One option is to introduce a new concept orthogonal to the security role. Well, it’s really not orthogonal because the management of security roles and these new “UI roles” would best be done in the same spot, by the same person, at the same time. If there were two places to manage this information, then we’d double the administrative work, and probably bury the support team. There are other things that we could do, like force a single UI role on a user. But what happens if you want a role that has the ability to set attributes on Create, but only read them for Update? Then you need two pieces of UI that are different. This just gets us mired into the per-process, per-user, per-document UI mess, which doesn’t seem to have an elegant solution… yet.

  • Michaeljon Miller

    What's with the CRM security descriptor?

    • 0 Comments

    In response to a number of questions I'm seeing on the Microsoft CRM newsgroup I've gone back through the archives and dug this up. I hope it helps.

    There's been lots of speculation around the security descriptors in Microsoft CRM, and none of it is quite right. The “real” story behind the SD columns is talked about in the CRM architecture paper a bit. They really are NT security descriptors and in theory you can manipulate them using standard NT functionality. But... we use some of the access bits differently than they’re doc’d in MSDN and we’ve added some of our own.

    As for the “hash” rumor, and the “encrypted” one too – there’s no such thing. We store binary data in the database in base64 encoded text fields. That means that attachments, sales lit, and security descriptors are really binary data wrapped up in standard encoding. Somewhere around here I had a routine that would scan through a table, pull the descriptors, convert them back to the binary format that NT likes, and dump them to the console. It’s an interesting application for learning, but it’s not enough information to figure out which ACEs go into the descriptor itself.

    When we tweak an object (create, update, share, assign, etc.) we recalculate the SD values based on the instance’s location in the organization and on roles of security principals somehow related to it. For instance, a sales manager with DEEP READ access to accounts will probably have an ACE in the SD by way of the role membership that granted her the DEEP READ access.  Explicit share means that the principal gets an explicit ACE. Implicit means the ACE came from a group (role, team, business, …).

    But, don’t go thinking you’ve got it all figured out yet. You would still need to know how object hierarchies affect the SD value. That’s where this thread started. We have the concept of a ‘child entity’ which is essentially unsecure on its own. This means that we need to get the access rights from somewhere, so we pull it from the ‘parent’ instance. You can find these types by looking through the metadata for entities that don’t have security descriptors but still have security behavior and APIs.

    Oh, and the idea of creating an instance through the application or API and copying the SD for later SQL inserts will only sometimes work. Actually, it’ll probably not work more often than it will. The reason is that there’s some additional information in the database to track access to entities.  It’s not always kept up to date, and it’s not always applied to all entities, but it’s there nonetheless. So, if you try the trick of copying a known SD value when doing a “data migration” at the SQL level, remember caveat emptor.

    Hope that clears up some of the fog.

     

  • Michaeljon Miller

    CRM callouts are just plain hard to write

    • 9 Comments

    Building callout handlers for MS-CRM is hard. It’s just plain hard. Not because writing a COM object that implements an interface is hard – we’ve got a boatload of tools to help with that. They’re hard to write because the model is inconsistent and incomplete. The v1.x callout interface is very simple and has very simple semantics. It’s this simplicity that makes it useful, powerful, and hard to use.

    The callout interface looks like so. You’ve seen it in the documentation, and probably tried to implement it somewhere along the way. You might have even succeeded.

    [

        uuid("F4233E5B-17DC-4661-9ABC-6707A9F99215"),

        dual

    ]

    interface ICRMCallout : public IDispatch

    {

        HRESULT PostCreate([in] int ObjectType, [in] BSTR ObjectId, [in] BSTR OrigObjectXml);

        HRESULT PostUpdate([in] int ObjectType, [in] BSTR ObjectId, [in] BSTR OrigObjectXml);

        HRESULT PostDelete([in] int ObjectType, [in] BSTR ObjectId);

    };

    In case you’re a C# person, here’s the definition from the SDK. There’s not much difference.

    [GuidAttribute("F4233E5B-17DC-4661-9ABC-6707A9F99215")]

    public interface ICRMCallout

    {

        Int32 PostCreate(Int32 ObjectType, String ObjectId, String OrigObjectXml);

        Int32 PostUpdate(Int32 ObjectType, String ObjectId, String OrigObjectXml);

        Int32 PostDelete(Int32 ObjectType, String ObjectId);

    }

    The basic semantics are as follows:

    ·         PostCreate provides the handler with the instance XML as supplied by the platform consumer (this could be the application, integration, or another callout). The first parameter is self-explanatory, as is the second. OrigObjectXml on the other hand deserves some explanation and some discussion. I’ll get to that in a bit because it applies to PostUpdate as well.

    ·         PostUpdate follows the same pattern but happens after the changes have been submitted to the database. Note that I said “submit” and not “commit”. There’s an important distinction and this is one of those incomplete things about the callout interface. This problem applies to PostCreate as well so I’ll talk about it in a bit.

    ·         PostDelete is the simplest to understand and one of the hardest to use. It’s fired after a soft delete request is made to an instance. The only information supplied is an instance identifier (the type code and id).

     

    Side note – if you’re implementing a callout handler in C# you need to declare your class as follows. The thing that seems to be one of the biggest PSS issues is the GUID. It is a requirement that you add your own GUID. Don’t use the one from the documentation because someone else might have made the same mistake and now there are two registered COM classes implementing the same interface. Well, that’s not entirely true, there’s one – the last one to get registered.

     

    [ClassInterface(ClassInterfaceType.AutoDispatch)]

    [GuidAttribute("put a CLSID here")]

    public class MyCallout : ServicedComponent, ICRMCallout

     

    Submit vs. Commit

    Transaction semantics are not well-defined for callouts. Sometimes the callout is made inside of a transaction and sometimes it’s made outside. The sometimes isn’t well understood on the CRM team right now. That’s one of the problems with building large software systems from the ground up with a growing team. But that’s a discussion for another time. The important bit is that the transaction rules around callouts are ill-defined and the only assumption a callout author can make is that the instance in question is likely inaccessible for the duration of the callout function execution.

    What does this mean? Well, first it’s important to know how the platform calls the handler class. Let’s start by looking at the basic flow.

     

        COSERVERINFO si;

        si.dwReserved1 = 0;

        si.dwReserved2 = 0;

        si.pAuthInfo = NULL;

        si.pwszName = wszComputerName;

     

        MULTI_QI qi = { pIID, NULL, 0 };

        hr = CoCreateInstanceEx( callout, NULL, CLSCTX_REMOTE_SERVER, & si, 1, &qi );

    The thing to notice is that the platform is using an explicit out-of-proc call. Why? Because we really don’t want the platform to crash if the callout crashes. This means that the callout handler must be registered as an out-of-proc server, which is easily done using a COM+ application. But that also means that callouts can’t just be dropped on the server machine and registered as COM objects.

    Let’s look at how the platform actually calls a handler.

    for each pCallout

    {

        // call the event handler

        pCallout->PostCreate(otc, bstrId, bstrXml);

     

        // then release the interface pointer

        pCallout->Release();

    }

    The platform will walk the list of registered callouts and for each one it finds, and can create, it’ll call. But isn’t the definition of PostCreate supposed to return an HRESULT? Yup, and it’s being ignored by the platform because there’s no clear answer about what should occur if a given callout fails. This is good and bad. It’s good because the platform doesn’t get mired down in the transactional details necessary to “fix” things that might break in an unknown chunk of code. It’s also terribly bad for the same reason.

    Oh yeah, and all this happens deep, deep, deep in the platform infrastructure. So deep that we might as well consider them the equivalent of a database trigger. Not that triggers are bad or anything, many GP ISVs have built all kinds of solutions by adding triggers to the GP database. They’re bad because they happen to look like a trigger, and happen to behave like a trigger, but just aren’t triggers. There’s just no trigger context available. One thing that makes them very much like triggers is that they happen for every WRITE (for example) and WRITEs happen all the time in the platform for reasons that callout authors usually don’t care about (like on a security descriptor update because someone changed a role – does the callout care, probably not).

    The next thing to notice about the callout is that the platform calls it inline with the rest of the business logic happening on the current thread. This means that your customer (the application user typically) is sitting there patiently waiting for your callout code to complete. If you’re making a long-lived call to another application the user is effectively blocked from getting any other work done. It also means that the platform is blocked. At least one critical resource is waiting: the thread servicing the user request. But there may be other resources blocked depending on which platform call was made: database resources.

    Why does this matter?

    Well, if the database is blocked because the platform is in the middle of a transaction it means that other callers can’t get at the blocked resource. That caller might very well be your callout handler if it needs to call back into the platform to retrieve data. And, if the callout author is well-behaved then the callback to the platform is happening over SOAP, and that’s clearly an out-of-proc call.

    This is what I meant by incomplete earlier. The callout only gets the information that was supplied by the original caller and this data is clearly incomplete. The auditing, owner, and default data is missing from the XML for PostCreate, and it’s probably old data for a PostUpdate. The way around this is simple: call back into the platform to query the rest of the data. Oops, now we’re in trouble because we’ve probably gotten ourselves into a deadlock situation, and that’s not a good thing.

    There’s also the issue of PostDelete. Normally the delete handler works fairly well. It gets called when something gets deleted and it gets enough information to do something about it. Let’s say that a Sales Order Detail was deleted though. The callout will get the line item instance ID and nothing else. How should it deal with this? There isn’t a story. You could cheat and make an ADO call into the database to try to read the Sales Order ID from the just-deleted line item, but this is bad because we don’t want you reading the database directly (because we can and will change the structures from release to release) and that pesky deletion service might beat you to it and really delete the instance. Like I said, callouts are incomplete and themselves don’t have enough context to help the callout author.

    Uh, then what?

    Given all that, what’s a callout coder to do? First, make the callout hander as short, fast, and simple as possible. I recommend converting the callout parameters, along with some general contextual data, into a message and dropping it in a queue somewhere. This doesn’t have to be anything fancy. It can be MSMQ, a database table (in another database please), or even in the file system. This way control is returned to the platform as soon as possible which means control is returned to the user sooner. The next thing is to use a service (or other application) to watch that queue and do the expensive work out-of-band. This service is just another platform user and can call into the API to read the rest of the necessary data. Well sort of.

    Remember when I mentioned incomplete (yeah, I think I’m starting to repeat myself here). Sure, not getting the complete instance data is a problem, and if that problem were solved we could mitigate a lot of the call-back-into-the-platform problems. But what’s really missing here is the contextual information necessary to know what to do about the callout. For example, who made the original platform request? Was it an application user or was it your own callout code? There is no way to know. Part of protecting the platform from callout crashes means that getting context (current transaction, current method, and current user) to the callout is really difficult.

    For now we’re stuck. We can’t change the interface without breaking all those people who’ve already started writing handlers. We can add a new interface, but it’ll still suffer from a lot of the same problems. We need to go back to the drawing board and rethink how all this stuff can and should work, where it might best fit, how the ISV community might use it, and what tools should be supported (i.e. COM? VB6? .Net?)

    If you’ve read this far looking for enlightenment, I’m sorry, I don’t have the answer. I’ve told you what’s wrong with callouts, but you probably already knew that. Though I have given a little insight into how they work and I’m hoping that helps you understand how to write handlers that behave well. But, for now, building callout handlers for MS-CRM is hard.

     

  • Michaeljon Miller

    Add / Entity scenario in CRM 2005

    • 4 Comments

    I’m trying to get a feel for how ISVs and VARs might use custom tables and custom entities in MS-CRM. The canonical example around building 110 is that someone would add a BankAccount child entity to a customer (likely an Account). But that sure doesn’t seem like a valid solution.

    I mocked up a student management system using some of the custom entities features that are in the V2 alpha when we where doing some of the original design work around relationships. For the most part things fit into place pretty well, but the solution seems a little forced (I have a background in survey and assessment systems so the concepts all work correctly, it’s just that I went out of my way to get all the relationship “rules” in the model).

    Those "rules control how various relationships behave. The way we went about this was to devise a relationship type taxonomy and the rules that surround those types. The rules are pretty simple:

    ·         Cascade – broadcast the action ‘down’ the link to any related entity. The receiving entity will perform the operation for any related instances. This is a "code" thing that the platform manages on the caller's behalf.

    ·         Do not cascade – do not broadcast the operation at all. This action is useful for MERGE and security operations in many relationship models. It really means STOP the current traversal action for this relationship graph.

    ·         Remove link – we’ve traditionally called this ‘cut link’. The action disconnects the relationship based on the relationship type (a 1:M relationship will set the relating attribute to NULL by sending a message to the ‘far’ entity, an M:M relationship will delete the specified rows from the association table but no broadcast will occur).

    ·         Restrict operation – if an instance exists on the far end of a ‘restrict’ relationship then the whole operation is canceled. This is most useful for DELETE, but can apply to other operations.

    Keep in mind that in many cases there’s an overlap between ‘remove link’ and ‘do not cascade’ and we’ve been overloading their meaning based on the model.

    Note that operations only traverse in one direction. That is an operation will always start at the 1-side of a relationship and terminate at the m-side. Reverse traversals are undefined.

    The following scenario shows one possible extension to MS-CRM. This extension models a student management subsystem. The system tracks Accounts and Contacts as customers and students. Those students can attend seminars to receive necessary training. Those training requirements are determined based on an assessment exam. Courses can have pre- and post-requisites. Exams can be created for pre- and post-assessment (to determine training needs and to assess learning after the course has been completed).

    Many of the preceding models are represented in the scenario. There are several entity graph loops present. The loops are terminated by defining traversal rules on the following relationships: primary_contact, preferred_location, held_at, and held_for. The numbers throughout the model refer to the model numbers from an internal design document (they’re not too important).

    Note that activities are represented by a fictitious entity called ‘Generic Activity’. Activity and Annotation relationships are not shown in the model but can be inferred from the associated model metadata. Assume for purposes of this example that Course, Exam, Seminar, Location, and Instructor have relationships to Annotation. Also assume that Attendance Record and Examination have associated scheduling activities.

    Ideally “Add Entity” in V2 will have sufficient infrastructure to build a subsystem like the one below. A few modifications would make the model even more powerful such as defining Instructor and Location as Resources, and defining Examination, Seminar, and Attendance Record as Activity types.

    Now, does this seem like the kind of solution that you’d want to build on top of MS-CRM, or is this way too deep? Is the core scenario like the one above, or more like “I need to add a simple 1:M relationship to hold some ‘extra’ stuff”. Just wondering.

     

Page 1 of 2 (28 items) 12