Regex hangs with my expression [David Gutierrez]

Regex hangs with my expression [David Gutierrez]

  • Comments 9

Well, actually, it doesn't hang.  It just takes a really really long time, and you haven't waited long enough for it to finish.  One of the pitfalls with regular expression is that you can write expressions which don't perform very well.  In particular, you can end up with expressions whose search time grows exponentially with the length of the search string.  I get bugs reporting that Regex hangs about once a month, and it always turns out to be an exponentially slow expression.  Here's a simplified example of one of them:

([a-z]+)*=

There are two things interesting about this expression.  First, notice that it has two quantifiers nested within each other.  The inner one is the + quantifier for the character class, and the outer one is the *.  Second, it has a character (the equals char) that must be matched at the end of the result.  In English terms, this expression can be explained as

  1. match any character a-z, one or more times
  2. match step #1 zero or more times
  3. match an equals

What will happen is that Regex will breeze through step 1 and 2 only to find that it can't match in step 3.  That forces it to backtrack and try to match the first two steps differently.  The trouble is that there are a lot of different ways that steps 1 and 2 can match, and Regex needs to try every single one before it can determine that the expression does not match the string.  Here's some sample code to demonstrate it (if you're not using Whidbey, you'll need to change from using Stopwatch to DateTime.Now):

using System;
using System.Text;
using System.Text.RegularExpressions;

namespace Repro_RegExMatchHang {
    class Program {
        static void Main(string[] args) {
            Regex slowRegex = new Regex("([a-z]+)*=");
           
            string searchString = @"klhsaflkasdasdasdasdasdadasdasdasdasdasdadad";
            for (int i=1; i<searchString.Length; i++) {
                System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();

                Match m = slowRegex.Match(searchString.Substring(0, i));

                Console.WriteLine(i + "\t" + sw.Elapsed);
            }
        }
    }
}

This code loops over the search search string and passes successively longer strings to Regex.Match.  With each iteration, the match time takes twice as long.   Here's some of the output for a string of length 10 through 14:

10      00:00:00.0010704
11      00:00:00.0021082
12      00:00:00.0040549
13      00:00:00.0082593
14      00:00:00.0161723

 

Page 1 of 1 (9 items)