Perl Pattern Matching Question

I am trying to match patterns in perl and need some help.

I need to delete from a string anything that matches [xxxx] i.e. opening bracket-things inside it-first closing bracket that occurs.

So I am trying to substitute with space the opening bracket, things inside, first closing bracket with the following code :

   if($_ =~ /[/)
  {
    print "In here!\n";
    $_ =~ s/[(.*?)]/ /ig;
  }

Similarly I need to match i.e. angular bracket-things inside it-first closing angular bracket.

I am doing that using the following code :

   if($_ =~ /</)
  {
    print "In here!\n";
    $_ =~ s/<(.*?)>/ /ig;
  }

This some how does not seem to work. My sample data is as below :

 'Joanne' <!--Her name does NOT contain "Kathleen"; see the section "Name"--> "'Jo'" 'Rowling', OBE [http://news bbc co uk/1/hi/uk/793844 stm Caine heads birthday honours list]  BBC News  17 June 2000  Retrieved 25 October 2000  , [http://content scholastic com/browse/contributor jsp?id=3578 JK Rowling Biography]  Scholastic com  Retrieved 20 October 2007  better known as 'J  K  Rowling' ,<ref name=telegraph>[http://www telegraph co uk/news/uknews/1531779/BBCs-secret-guide-to-avoid-tripping-over-your-tongue html Daily Telegraph, BBC's secret guide to avoid tripping over your tongue, 19 October 2006] is a British <!--do not change to "English" or "Scottish" until issue is resolved --> author best known as the creator of the [[Harry Potter]] fantasy series, the idea for which was conceived whilst on a train trip from Manchester to London in 1990  The Potter books have gained worldwide attention, won multiple awards, sold more than 400 million copies and been the basis for a popular series of films, in which Rowling had creative control serving as a producer in two of the seven installments  [http://www businesswire com/news/home/20100920005538/en/Warner-Bros -Pictures-Worldwide-Satellite-Trailer-Debut%C2%A0Harry Business Wire - Warner Bros  Pictures mentions J  K  Rowling as producer ] 

Any help would be appreciated. Thanks!

Answers


$_ =~ /someregex/ will not modify $_

Just a note, $_ =~ /someregex/ and /someregex/ do the same thing.

Also, you don't need to check for the existence of [ or < or the grouping parenthesis:

s/\[.*?\]/ /g;

s/<.*?>/ /g;

will do the job you want.

Edit: changed code to match the fact you're modifying $_


You need to use this:

1 while s/\[[^\[\]]*\];

Demo:

% echo "i have [some [square] brackets] in [here] and [here] today."| perl -pe '1 while s/\[[^\[\]]*\]/NADA/g'
i have NADA in NADA and NADA today.

Versus the failing:

% echo "i have [some [square] brackets] in [here] and [here] today." | perl -pe 's/\[.*?\]/NADA/g'
i have NADA brackets] in NADA and NADA today.

The recursive regular expression I leave as an exercise for the reader. :)


EDIT: Eric Strom kindly provided a recursive solution you don’t have to use 1 while:

% echo "i have [some [square] brackets] in [here] and [here] today." | perl -pe 's/\[(?:[^\[\]]*|(?R))*\]/NADA/g'
i have NADA in NADA and NADA today.

  • Square brackets have special meaning in the regex syntax, so escape them: /\[.*?\]/. (You also don't need the parentheses here, and doing case-insensitive matching is pointless.)

  • It's been a long time since I had to wrestle with Perl, but I'm pretty sure that testing $_ with a regex will also modify $_ (even if you aren't using s///). You don't need the test anyway; just run the replacement, and if the pattern doesn't match anywhere, then it won't do anything.


Need Your Help

Java JUnit, Interfaces, Class Headings

java interface junit

A programmer has to write a program for an xyz program. He has recognised that both the Div and Add values are based on the same underlying Op data structure. As a result, he has written the follow...

MVC List Query DB

asp.net-mvc asp.net-mvc-4

Im trying to learn mvc and im stuck on something silly.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.