Why Regex IsMatch() hangs

I have an exepression to validate an email address:

string REGEX_EMAIL = @"^\w+([\.\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\¦\}\~]*\w+)*@\w+([\.\-]\w+)*\.\w+([\.\-]\w+)*$";

If address is correct IsMatch() method quickly shows true result. But if address string is long and wrong this method hangs.

What can I do to raise speed of this method?

Thanks.

Answers


You have a couple things going on which are hurting the performance in this regular expression.

  1. Catastrophic backtracking
  2. Too many optional statements

You can definitely improve performance by using the + instead of the * in a few key places, but this of course changes what the regular expression will and won't match. So the easiest fix I found is actually covered in the catastrophic backtracking article above. You can use the nonbacktracking subexpression to drastically improve performance in this case, without changing the regular expression's behavior in any way that matters.

The nonbacktracking subexpression looks like this... (?>pattern)

So try this regular expression instead:

^\w+(?>[\.\!\#\$\%\&\'\*\+\-\/\=\?\^\`\{\¦\}\~]*\w+)*@\w+([\.\-]\w+)*\.\w+([\.\-]\w+)*$

On a slightly related topic, my philosophy for checking for a valid email address is a bit different. For one, long regular expressions like this one can potentially have performance problems as you've found.

Secondly, there's the upcoming promise of email address internationalization which complicates all of this even more.

Lastly, the main purpose of any regular expression based email validation is to catch typos and blatant attempts to get through your form without entering a real email address. But to check if an email address is genuine requires that you send an email to that address.

So my philosophy is to err on the side of accepting too much. And that, in fact, is a very simple thing to do...

^.+@.+\..+$

This should match any conceivably valid email address, and some invalid ones as well.


You are experiencing catastrophic backtracking.

The simplified regex:

Regex regexObj = new Regex(@"^\w+([-.!#$%&'*+/=?^`{¦}~]*\w+)*@\w+([.-]\w+)*\.\w+([.-]\w+)*$");

Has potential problems e.g. ([.-]\w+)*\.

If the . is missing for example and you have a long string of characters before it then all possible combinations must be taken into account for your regex to find out that it actually fails.


Need Your Help

How to remove a Docker image from the Bluemix Hub?

docker bluemix

In the official docs I did not find any mentioning of how to remove a Docker image from Bluemix.

What parts of a Xamarin solution should be kept out of the repository?

version-control mercurial monodroid xamarin dvcs

I'm working with a designer on a Xamarin.Android application and we are having trouble keeping in sync. We use Mercurial, but my question should be as relevant for Git or any other DVCS. My .hgignore

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.