The Microsoft Dynamics CRM Blog
News and views from the Microsoft Dynamics CRM Team

Microsoft Dynamics CRM, Email correlation and smart matching

Microsoft Dynamics CRM, Email correlation and smart matching

  • Comments 11

What is correlation and why is it required.

One of the important scenarios in email management within CRM is to have the incoming email get associated with the correct object it’s regarding to. Consider the scenario where you have created an email related to a case and sent to a customer. The customer responds to the email. The incoming email is tracked in CRM and should now get automatically associated with the same case it is being responded to.

We take a two step approach in finding out the correct regarding object for an incoming email. The first steps is to find the correlated outgoing email to which the customer has responded and the next step is to get the regarding object out of the co-related email and set it on the incoming email.

How was correlation done in CRM 3.0?

In CRM 3.0 every outgoing email from CRM was suffixed with a CRM token in its subject. The CRM token was in the format CRM:0001001 and was configurable via the system settings. When an incoming email was tacked in CRM the email would be checked for the presence of CRM token. If one was found, the system will then looks for the most recent email with the same email token to correlate the two. Once correlation is done, the regarding object of the correlated email if found was set on the incoming email.

How is correlation done in CRM 4.0?

Most of our customers did not want to have a fancy looking token suffixed to the subject line of every email sent out of CRM. So in CRM 4.0 we introduced a new concept of smart matching that is used to correlated emails. The usage of email token is optional and can be configured though system settings. The following blog article talks about it.

http://blogs.msdn.com/crm/archive/2008/01/29/what-s-new-in-microsoft-dynamics-crm-4-0-e-mail-integration.aspx

But there is subtle difference in how the email token is used in CRM 3.0 and CRM 4.0 version. In CRM 3.0 the presence of the token was the only way to identify and correlated emails. In CRM 4.0 the presence of the token only increases the accuracy of the correlation but does not determine it. Thus it’s possible that an incoming email having an email token does not get correlated to the outgoing email with the same email token. This is especially true if the customer has updated the subject of the email, but retained the token thinking it would be ok.

How does smart matching work:

Smart matching relies completely on the existence of similarity between emails. The subject and recipients (from, to, cc and bcc) list are the two important components that are considered with checking for similarity.

When an email is sent from CRM, there are two sets of hashes generated for it and stored in the database.

a. Subject hashes:

To generate subject hashes, the subject of the email, which may include the CRM token if its usage is enabled in system settings, is first checked for noise words like RE: FW: etc. The noise words are stripped off the subject and then tokenized. All the non empty tokens (words) are then hashed to generate subject hashes.

b. Recipient hashes:

To generate the recipient hashes the recipient (from, to, cc, bcc) list is analyzed for unique email addresses. For each unique email address an address hash is generated.

Next when an incoming email is tracked (arrived) in CRM, the same method is followed to create the subject and recipient hashes.

To find the correlation between the incoming email and the outgoing email the stored subject and recipient hashes are searched for matching values. Two emails are correlated if they have the same count of subject hashes and at least two matching recipient hashes.

How can smart matching be configured?

One size never fits all and so the above described constrain for correlation, which is the default behavior of out of box CRM, can be configured to suite individual needs.

There are four registry keys that allow you to manipulate the smart matching behavior. These registry keys need to be added under the CRM server registry hive only. I.e. HKLM\Software\Microsoft\MSCRM

1. HashFilterKeywords

    a. Description: This is a regular expression that is used to cancel out the noise in the subject line. All matching instances of the regular expression present in the subject line are replaced with empty strings before generating the subject hashes.

    b. Default value: ^[\s]*([\w]+\s?:[\s]*)+

Basically it indicate that we internally (by default) will ignore any word at (multiples of it) at the start of the subject line that has a “:” at the end of it example:

 

Subject

Ignored words

1

Test

None

2

RE: Test

RE:

3

FW: RE: Test

FW: RE:

Note: By default we do not ignore starting phrases in the subject line like “Out of office:” as this does not have the first word with the “:” next to it. For ignoring this phrase you can update the regular expression in the registry as “^[\s]*([\w]+\s?:[\s]*)+|Out of office:”. Do not place the double quote that I have around the string in the example into the registry. The text in the registry should only be the regular expression you want to use for ignoring words from the subject line.

2) HashMaxCount

    a. Description: This is the max number of hashes that will be generated for any subject or recipient list. I.e. if the subject after noise cancellation contains more than 20 words only the first 20 words are considered.

    b. Default value: 20

3) HashDeltaSubjectCount

    a. Description: This is the maximum delta allowed between subject hash counts of the emails to be correlated.

    b. Default value: 0

4) HashMinAddressCount

    a. Description: This is the minimum hash count matches required on the recipients list for the emails to be correlated.

    b. Default value: 2

Limitations:

The email hashes are generated when the email are sent out. If you change the HashFilterKeywords or the HashMaxCount via registry key only the new outgoing and incoming emails will be affected. The existing email hashes are not recalculated. Also CRM does not provide any out of box functionality to re-calculate the hashes.

Also the smart matching currently does not have a time limit on how old the correlated email could be. In CRM 5.0 we would address this along other improvements to smart matching.

Shashi Ranjan

  • PingBack from http://mstechnews.info/2008/11/microsoft-dynamics-crm-email-correlation-and-smart-matching/

  • A fifth configurable option to list email addresses to be ignored when hashing would be great.

    We have various internal systems which send emails to users for certain events (eg a thrid party bug tracking system will report when a bug is fixed).

    Users track these emails into CRM against a particular Case or customer, but then every future notification they get from this same system is tracked to the same record, when it may be about something else entirely.

    By excluding the email address of the automated sender from the hashes, this would mean the future email addresses would not match enough (since only one address would be hashed so MinCount of 2 would not be met).

    This functionality would leverage the existing MinCount and hashing methods while enabling us to keep rubbish out of the system, which would be fantastic.

  • Hi,

    We have a problem with emails our e-fax server is sending to our incoming queue. These are new and incoming with no relation to any outgoing email. They all contain a scanned form and have identical subject lines (Fax received from +372 6542942). We normally open the email, open the form and associate the email to the correct order manually.

    However, smart matching now will associate any new incoming emails from the e-fax server to this order based on the identical subject line, sender and recipients.

    Can this somehow be reconfigured ?? Thanks

  • Great post!  Question on reg keys - I assume the last three are DWORDS?  First one - is it a String Value?

  • We have the same problem. How can we turn this off and track manually?

  • Thanks for the helpful post.

    I think there's an error in the following statement: "Two emails are correlated if they have the same count of subject hashes and at least two matching recipient hashes."  I think the word "matching" needs to be added before "subject hashes".  If not, wouldn't a subject of "Status Meeting" correlate to a subject of "Performance Problems"?  And there would not be a need to hash the subject tokens.

    I plan on setting HashMinAddressCount to a large value so that no emails are "smart matched".  We'll go back to manually tracking,

    One thing to add to post: What needs to be done to for the registry changes to take effect? Restart the email router service? restart outlook clients? re-boot CRM server?

    Also, I'm not sure how this "smart matching" can be improved for the problem I'm seeing.  We often get emails with generic subject lines that need to be regarding different CRM objects. E.g., "Contract renewal" or "Perforamnce problems" or "Revised quote" or "Please help".  I'm seeing that as more and more emails incorrectly correlate with the wrong CRM object, it's easier for future emails to be correlated to the same wrong object because there are more and more emails with recipients to match on.  These emails are not system generated, they often originate from an external email address.  It's just that we often encounter the same issues and come up with the same short, basic subject line and the emails are often cc'd to the same people, therefore they often "smart match" wrong.

  • I have a suggestion for the email router.

    I am using the POP/SMTP option (with GMail), and the web browser client for CRM.  I have email tracking set up to track emails for Accounts, Contacts and Leads.  For incoming emails it seems to recognize emails from known contacts and bring them into CRM fine.  It ignores emails not related to Contacts/etc. (as it should).  My suggestion relates to outbound emails.  If I use the GMail client to send an email to a Contact that exists in CRM (with the same email address), it does not track that email in CRM.  However, if I look at the Windows Event Viewer on the Email Router machine, it seems to see that email in the sent folder, and skips it with a "... did not match any known records" message.  Why doesn't it bring this email into CRM and attach it to the Contact too?

    If there is already a way to make it do this, please let me know!

  • I have a suggestion for the email router.

    I am using the POP/SMTP option (with GMail), and the web browser client for CRM.  I have email tracking set up to track emails for Accounts, Contacts and Leads.  For incoming emails it seems to recognize emails from known contacts and bring them into CRM fine.  It ignores emails not related to Contacts/etc. (as it should).  My suggestion relates to outbound emails.  If I use the GMail client to send an email to a Contact that exists in CRM (with the same email address), it does not track that email in CRM.  However, if I look at the Windows Event Viewer on the Email Router machine, it seems to see that email in the sent folder, and skips it with a "... did not match any known records" message.  Why doesn't it bring this email into CRM and attach it to the Contact too?

    If there is already a way to make it do this, please let me know!

  • Smart matching is causing us the same issue as other posts related to "generic" subject lines that are routed to a public email box, thus the TO address and the Subject are constantly causing false matches.  Other than the post related to upping the HashMinAddressCount to a large number I see no corrective or system setting actions.  Does anyone from Microsoft respond to these posts?  I am new to the blogosphere on CRM.

  • Other than keying in the registry entries and increasing the HashMinAddressCount as indicated by a prior post is there any other method to turn "smart" matching OFF.

  • There appears to be NO middle ground.  We wanted to up the HASH values to TURN OFF the smart matching but LEAVE ON the token tracking.  Unfortunately these are wed at the hip and we do not see an either/or method of tracking.  

Page 1 of 1 (11 items)
Leave a Comment
  • Please add 1 and 3 and type the answer here:
  • Post