PCRE: Find matching brace for code block

Is there a way for PCRE regular expressions to count how many occurrences of a character it encounters (n), and to stop searching after it has found n occurrences of another character (specifically { and }).

This is to grab code blocks (which may or may not have code blocks nested inside them).

If it makes it simpler, the input will be a single-line string, with the only characters other than braces are digits, colons and commas. The input must pass the following criteria before code blocks are even attempted to be extracted:

$regex = '%^(\\d|\\:|\\{|\\}|,)*$%';

All braces will have a matching pair, and nested correctly.

I would like to know if this can be achieved before I start writing a script to check every character in the string and count each occurrence of a brace. Regular expressions would be much more memory friendly as these strings can be several kilobytes in size!

Thanks, mniz.

Solution

http://stackoverflow.com/questions/2344747/pcre-lazy-and-greedy-at-the-same-time-possessive-quantifiers/2353753#2353753

Answers


pcre has recursive patterns, so you can do something like this

$code_is_valid = preg_match('~^({ ( (?>[^{}]+) | (?1) )* })$~x', '{' . $code .'}');

the other thing, i don't think this will be faster or less memory consuming than simple counter, especially on large strings.

and this is how to find all (valid) codeblocks in a string

preg_match_all('~ { ( (?>[^{}]+) | (?R) )* } ~x', $input, $blocks);
print_r($blocks);

This is exactly what regular expressions are not good for. It's the classic example.

You should just iterate over the string character by character, and keep a count of the nesting level.


$regex='%^(\\d|\\:|\\{|\\}|,){0,25)$%';
preg_match($regex,$target,$matches);

where: 25 on first line indicates maximum number of occurrences. then check:

$n=count($matches);

It is impossible since the language you are describing is not a regular language.

Use a parser instead.


I created a solution, and have posted it as an answer on my previous question.

Thanks for all your help, mniz.


Need Your Help

Turning a list of strings into float

python string list

I printed some data from an external file and split the data into a string:

Rename two columns in netezza using single query

table rename multiple-columns netezza alter

I am new to netezza. I need to know how can we rename two columns with a single query.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.