The Battle Of Evermore...
OK, I admit it. I've caught the CRM development bug. What started as a harmless bit of fun working on document library integration between CRM & SharePoint has now developed into an obsession. In this post I will describe how to build a plug-in that examines the body of any e-mail promoted promoted from Outlook or the e-mail router and converts the HTML into plain text.
After a bit of searching, I found a good article which showed how you could use regular expressions to remove unwanted HTML tags leaving just the plain text - Convert HTML to Plain Text. Converting this from C# to VB (my preferred choice of language) and stripping out some of the bits I didn't need, I came up with the following code which forms the basis of this plug-in.
Private Function ConvertHTMLToText(ByVal Source As String) As String
Dim result As String = Source
' Remove formatting that will prevent regex from running reliably
' \r - Matches a carriage return \u000D.
' \n - Matches a line feed \u000A.
' \f - Matches a form feed \u000C.
' For more details see http://msdn.microsoft.com/en-us/library/4edbef7e.aspx
result = Replace(result, "[\r\n\f]", String.Empty, Text.RegularExpressions.RegexOptions.IgnoreCase)
' replace the most commonly used special characters:
result = Replace(result, "<", "<", RegexOptions.IgnoreCase)
result = Replace(result, ">", ">", RegexOptions.IgnoreCase)
result = Replace(result, " ", " ", RegexOptions.IgnoreCase)
result = Replace(result, """, """", RegexOptions.IgnoreCase)
result = Replace(result, "&", "&", RegexOptions.IgnoreCase)
' Remove ASCII character code sequences such as &#nn; and &#nnn;
result = Replace(result, "&#[0-9]{2,3};", String.Empty, RegexOptions.IgnoreCase)
' Remove all other special characters. More can be added - see the following for more details:
' http://www.degraeve.com/reference/specialcharacters.php
' http://www.web-source.net/symbols.htm
result = Replace(result, "&.{2,6};", String.Empty, RegexOptions.IgnoreCase)
' Remove all attributes and whitespace from the <head> tag
result = Replace(result, "< *head[^>]*>", "<head>", RegexOptions.IgnoreCase)
' Remove all whitespace from the </head> tag
result = Replace(result, "< */ *head *>", "</head>", RegexOptions.IgnoreCase)
' Delete everything between the <head> and </head> tags
result = Replace(result, "<head>.*</head>", String.Empty, RegexOptions.IgnoreCase)
' Remove all attributes and whitespace from all <script> tags
result = Replace(result, "< *script[^>]*>", "<script>", RegexOptions.IgnoreCase)
' Remove all whitespace from all </script> tags
result = Replace(result, "< */ *script *>", "</script>", RegexOptions.IgnoreCase)
' Delete everything between all <script> and </script> tags
result = Replace(result, "<script>.*</script>", String.Empty, RegexOptions.IgnoreCase)
' Remove all attributes and whitespace from all <style> tags
result = Replace(result, "< *style[^>]*>", "<style>", RegexOptions.IgnoreCase)
' Remove all whitespace from all </style> tags
result = Replace(result, "< */ *style *>", "</style>", RegexOptions.IgnoreCase)
' Delete everything between all <style> and </style> tags
result = Replace(result, "<style>.*</style>", String.Empty, RegexOptions.IgnoreCase)
' Insert tabs in place of <td> tags
result = Replace(result, "< *td[^>]*>", vbTab, RegexOptions.IgnoreCase)
' Insert single line breaks in place of <br> and <li> tags
result = Replace(result, "< *br[^>]*>", vbCrLf, RegexOptions.IgnoreCase)
result = Replace(result, "< *li[^>]*>", vbCrLf, RegexOptions.IgnoreCase)
' Insert double line breaks in place of <p>, <div> and <tr> tags
result = Replace(result, "< *div[^>]*>", vbCrLf + vbCrLf, RegexOptions.IgnoreCase)
result = Replace(result, "< *tr[^>]*>", vbCrLf + vbCrLf, RegexOptions.IgnoreCase)
result = Replace(result, "< *p[^>]*>", vbCrLf + vbCrLf, RegexOptions.IgnoreCase)
' Remove all reminaing html tags
result = Replace(result, "<[^>]*>", String.Empty, RegexOptions.IgnoreCase)
' Replace repeating spaces with a single space
result = Replace(result, " +", " ")
' Remove any trailing spaces and tabs from the end of each line
result = Replace(result, "[ \t]+\r\n", vbCrLf)
' Remove any leading whitespace characters
result = Replace(result, "^[\s]+", String.Empty)
' Remove any trailing whitespace characters
result = Replace(result, "[\s]+$", String.Empty)
' Remove extra line breaks if there are more than two in a row
result = Replace(result, "\r\n\r\n(\r\n)+", vbCrLf + vbCrLf)
' Thats it.
Return result
End Function
All that remains is to implement the IPlugin.Execute method. In order to be able to modify the e-mail message before the e-mail activity gets created in the database, I had to figure out which event(s) to intercept. Through a bit of trial and error, I observed that any e-mail promoted from Outlook triggers the "DeliverPromote" event, whereas any incoming e-mail handled by the e-mail router triggers the "DeliverIncoming" event. Interestingly enough, the "Create" event was also called as a child pipeline for these events, but modifying the message here didn't have any effect, even in the pre-processing stage.
Because plug-ins have the potential to introduce significant performance and scalability issues into your environment, it is important to ensure that the code is as efficient as possible. To that end I added additional checks to ensure that the even if registered on multiple events, the main code will only run if the plug-in:
Public Class ConvertHtmlToText
Implements IPlugin
Public Sub Execute(ByVal context As IPluginExecutionContext) Implements IPlugin.Execute
' Exit if any of the following conditions are true:
' 1. plug-in is not running synchronously
' 2. plug-in is not running against the 'Email' entity
' 3. plug-in is not running in the 'pre-processing' stage of the pipeline
' 4. plug-in is not running in a 'Parent' pipeline
If Not (context.Mode = 0) Or Not (context.PrimaryEntityName = "email") Or Not (context.Stage = 10) Or Not (context.InvocationSource = 0) Then
Exit Sub
End If
If (context.MessageName = "DeliverPromote") Or (context.MessageName = "DeliverIncoming") Then
For Each item In context.InputParameters.Properties
If (item.Name = "Body") Then
context.InputParameters.Properties.Item("Body") = ConvertHTMLToText(CStr(item.Value))
Next
End Sub
End Class
As always, I have include the source code to my project here. Please do bear in mind that I haven't included any error handling or logging, so it's not production-ready. However, it should provide you with a good head-start.
This posting is provided "AS IS" with no warranties, and confers no rights.
Laughing Boy
PingBack from http://emanuel.freevideonewsnetwork.info/htmlmailto.html
Hi, I've adapted your plugin in order to work with crm 2011. Here is the code snipet I'd to change:
Added references tho M.crm.sdk.proxy and M.xrm.sdk
Public Sub Execute(ByVal serviceProvider As System.IServiceProvider) Implements Microsoft.Xrm.Sdk.IPlugin.Execute
Dim context As Microsoft.Xrm.Sdk.IPluginExecutionContext = DirectCast(serviceProvider.GetService(GetType(Microsoft.Xrm.Sdk.IPluginExecutionContext)), IPluginExecutionContext)
' 4. plug-in is not running in a 'Parent' pipeline (now, this is configurable in the registration TOOL, I guess, because I couldn't find an equivalent)
If Not (context.Mode = 0) Or Not (context.PrimaryEntityName = "email") Or Not (context.Stage = 10) Then ' Or Not (context.InvocationSource = 0)
Try
For Each elemento In context.InputParameters
If (elemento.Key = "Body") Then
Dim contenido As String = CStr(elemento.Value)
context.InputParameters.Item("Body") = ConvertHTMLToText(contenido)
'Throw New System.Exception("Se ha modificado el valor de key: Valor=" + context.InputParameters.Item("Body")) 'CStr(elemento.Value)) ' + elemento.ToString())
Exit For
Catch ex As Exception
Throw New System.Exception("Se ha modificado el valor de key: " + ex.Message)
End Try
_________________________________________________
Also, I've added these replace sentences, because I receive mails in spanish:
result = Replace(result, "á", "á", RegexOptions.IgnoreCase)
result = Replace(result, "é", "é", RegexOptions.IgnoreCase)
result = Replace(result, "í", "í", RegexOptions.IgnoreCase)
result = Replace(result, "ó", "ó", RegexOptions.IgnoreCase)
result = Replace(result, "ú", "ú", RegexOptions.IgnoreCase)
result = Replace(result, "Á", "Á", RegexOptions.IgnoreCase)
result = Replace(result, "É", "É", RegexOptions.IgnoreCase)
result = Replace(result, "Í", "Í", RegexOptions.IgnoreCase)
result = Replace(result, "Ó", "Ó", RegexOptions.IgnoreCase)
result = Replace(result, "Ú", "Ú", RegexOptions.IgnoreCase)
result = Replace(result, "Ñ", "Ñ", RegexOptions.IgnoreCase)
result = Replace(result, "ñ", "ñ", RegexOptions.IgnoreCase)
result = Replace(result, " ", vbCrLf, RegexOptions.IgnoreCase)
_________________________________________________________
If you see something wrong let me know, but it's working like a charm.
Regards
Nice one Jorge. If I get a chance, I will republish in a new post. I wonder if there is a better way of identifying all language-specific character sets, rather than adding an exception for each character?
Hi
It would be great if you just could give me a keyword for what i have to google to find the solution of how to implement such code into dynamics crm...
thank you a lot!
Hi again
Finaly i could implement the code into dynamics using the plugin registration tool. Know I thought there will be a custom step in the workflow area... wrong again :) What do I have to do to remove the html tags out of my mails?
thank you
regards
Hi Nicolas, after you have registered the plug-in, you need to register it against two specific events (steps). You can register these steps in the plug-in registration tool as well
1. Event: DeliverPromote; Entity email
2. Event: DeliverIncoming; Entity: email
Make sure these are registered to run synchronously in the pre-processing stage of the event pipeline.
Best regards, Simon
Hi Simon
Thank you very much! Everything works fine!
I expected a custom "action" for workflows... Now I know that your plugin converts all mail messages. I'll keep searching :)
Have a nice day
Best regards, Nicolas
Once this Plugin is compiled into a DLL is it them somehow installed in Outlook? Once installed how is it triggered a button? I would like to modify this to trigger when I click "Convert Email to CRM Case" and parse out the HTML body to auto populate the Case for fields.
Please forgive my foolish questions.
Thanks
A plug-in only runs when triggered by a CRM event (such as DeleverPromote), and does not shown up in the Outlook or Web UI. To be able to use this as part of the "Convert E-mail To Case" function in the Outlook client, you will need to work out what events are triggered, and modify the plug-in to work with those events. Unfortunately I don't have the ability to check this out for the next couple of weeks, as I am on vacation right now.