.NET Regular Expressions in Infinite Cycle

I'm using .NET Regular Expressions to strip HTML code.

Using something like:

<title>(?<Title>[\w\W]+?)</title>[\w\W]+?<div class="article">(?<Text>[\w\W]+?)</div>

This works for 99% of the time, but sometimes, when parsing...

Regex.IsMatch(HTML, Pattern)

The parser just blocks and it will continue on this line of code for several minutes or indefinitely.

What's going on?

Answers


Your regex will work just fine when your HTML string actually contains HTML that fits the pattern. But when your HTML does not fit the pattern, e.g. if the last tag is missing, your regex will exhibit what I call "catastrophic backtracking". Click that link and scroll down to the "Quickly Matching a Complete HTML File" section. It describes your problem exactly. [\w\W]+? is a complicated way of saying .+? with RegexOptions.SingleLine.


Need Your Help

How can i add single app in two organisations? client-team and dev-team

ios crash xcode6 crashlytics

In my iOS App, i want two separate teams for Crashlytics reporting. One for internal testing and other for client team. During Development, if crashes will come then it should report in dev-team an...

How do I restore a MySQL .dump file?

mysql database mysqldump mysql-management

I was given a .dump MySQL database file that I need to restore as a database on my Windows Server 2008 machine.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.