The RegexOptions.Compiled Flag and Slow Performance on 64-Bit .NET Framework 2.0 [Josh Free]

The RegexOptions.Compiled Flag and Slow Performance on 64-Bit .NET Framework 2.0 [Josh Free]

  • Comments 6

Developers using System.Text.RegularExpressions.Regex with the RegexOptions.Compiled flag may notice performance degradation in their 2.0 apps when running on 64-Bit .NET Framework 2.0.

The performance problem occurs in the Regex(String pattern, RegexOptions options) constructor when instantiating very large, un-optimized regular expressions and while specifying the RegexOptions.Compiled flag:

private static Regex nonwords = new Regex(@"\b("

   +@"a|aboard|about|above|absent|according\sto|across|after|against|ago|ahead\sof|ain't|all|along|alongside|"

   +@"also|although|am|amid|amidst|among|amongst|an|and|anti|anybody|anyone|anything|apart|apart\sfrom|are|"

   +@"aren't|around|as|as\sfar\sas|as\ssoon\sas|as\swell\sas|aside|at|atop|away|be|because|because\sof|before|"

   +@"behind|below|beneath|beside|besides|between|betwixt|beyond|but|by|by\smeans\sof|by\sthe\stime|can|cannot|"

   +@"circa|close\sto|com|concerning|considering|could|couldn't|cum|'d|despite|did|didn't|do|does|doesn't|don't|"

   +@"down|due\sto|during|each_other|'em|even\sif|even\sthough|ever|every|every\stime|everybody|everyone|"

   +@"everything|except|far\sfrom|few|first\stime|following|for|from|get|got|had|hadn't|has|hasn't|have|"

   +@"haven't|he|hence|her|here|hers|herself|him|himself|his|how|i|if|in|in\saccordance\swith|in\saddition\sto|in\scase|"

   +@"in\sfront\sof|in\slieu\sof|in\splace\sof|in\sspite\sof|in\sthe\sevent\sthat|in\sto|inside|inside\sof|"

   +@"instead\sof|into|is|isn't|it|itself|just\sin\scase|like|'ll|lots|may|me|mid|might|mightn't|mine|more|most|"

   +@"must|mustn't|myself|near|near\sto|nearest|no|no\sone|nobody|none|not|nothing|notwithstanding|now\sthat|of|"

   +@"ya|ye|yes|you|your|yours|yourself"

   +@")\b", (RegexOptions.IgnoreCase | RegexOptions.Compiled));

The compilation performance problem in the 64-Bit .NET Framework 2.0 is fixed with this hotfix http://support.microsoft.com/kb/917507, and will be released broadly in Service Pack 1 of .NET Framework 2.0. 

There are also several workarounds to this issue.

Reduce the Regular Expression Pattern

Developers can reduce the size of their regular expressions by simplifying the expression.  For instance the un-optimized pattern

"aa|ab|ac|ad|ae|af|ag|ah|ai|aj|ak"

can be replaced with this pattern:

"a[a-k]"

Use Regex Pre-Compilation Instead of Compiling-on-the-Fly

Developers can use Regex.CompileToAssembly to build an assembly containing their regular expression, instead of always compiling the regular expression during application startup.  For more details on Regular Expression Compilation options please see the CLR Inside Out article in the January 2006 edition of MSDN Magazine.

Remove the RegexOptions.Compiled Flag From Your Code

If you have never profiled the performance of your application or if you have profiled your app, and the run-time bottleneck is not Regex, you can consider dropping the RegexOptions.Compiled flag as a workaround, until .NET Framework 2.0 Service Pack 1 is released.

  • "until .NET Framework 2.0 Service Pack 1 is released"

    Oooh, is this a hint?  Any word on timeframe?

  • I noticed mention of .NET Framework 2.0 Service Pack 1 on a Microsoft site today. The BCL Team's

  • Questions: What is the tipping point for a large regex pattern? What is the meaning of an un-optimized regex pattern? Does this occur if one does the same but with the static version? Please feel free to post a response in the MSDN .Net Regex Forum http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1212&SiteID=1  

    Thanks.

  • >> What is the tipping point for a large regex pattern?

    Hi OmegaMan,

    The pattern in the blog post is well beyond the tipping point as it contains 150+ OR-Operations.  When this expression is compiled, about 40K of MSIL is generated (with RegexOptions.Compiled specified). 40K of MSIL is more than enough to demonstrate this issue.  Because of the 64-Bit JIT compilation issue in .NET 2.0(http://support.microsoft.com/kb/917507) it can take several minutes to compile this expression.  Removing the RegexOptions.Compiled flag will cause the same expression to be constructed in a second.  A regular expression half the size can still experience the performance impact.

    >> What is the meaning of an un-optimized regex pattern?

    An un-optimized expression is a very long expression (like the one in the post, 150+ OR's) that can be simplified into a smaller expression.  The 2nd example in the blog post demonstrates an un-optimized expression that can be simplified into a smaller expression: "aa|ab|ac|ad|ae|af|ag|ah|ai|aj|ak"

    which can be replaced with the smaller pattern: "a[a-k]"

    >> Does this occur if one does the same but with the static version?

    No.  Only when using the RegexOptions.Compiled flag with the Regex(String pattern, RegexOptions options) constructor.  The underlying issue has to do with JIT compilation as documented on http://support.microsoft.com/kb/917507.

    I hope this helps,

    Josh

  • Slow Regex Performance on 64-Bit .NET Framework 2.0

  • One new subscriber from Anothr Alerts

Page 1 of 1 (6 items)