Matching across a line vs matching words regex

Why is it that when I match across new lines it would seem that I can't identify individual words. For example:

content = "COAL_STORIES
AUSTRALIA - blah blah blah
BOTSWANA – blah blah blah 

URANIUM_STORIES 
AUSTRALIA – blah
INDIA - blah

COPPER_STORIES
AUSTRALIA - blah blah blah
AUSTRALIA - blah blah blah
CHINA - blah blah blah

ALUMINIUM_STORIES"




sections = content.scan(/\w.*_.*\b/)

Give and array:

[
    [0] "COAL_STORIES",
    [1] "URANIUM_STORIES",
    [2] "COPPER_STORIES",
    [3] "ALUMINIUM_STORIES"
]

But if I try that using the 'm' flag everything gets matched:

sections = content.scan(/\w.*_.*\b/m) gives an array:

[
    [0] "COAL_STORIES\nAUSTRALIA - blah blah blah\nBOTSWANA – blah blah blah \n\nURANIUM_STORIES \nAUSTRALIA – blah\nINDIA - blah\n\nCOPPER_STORIES\nAUSTRALIA - blah blah blah\nAUSTRALIA - blah blah blah\nCHINA - blah blah blah\n\nALUMINIUM_STORIES"
]

As far as I can tell I'm still looking for the same word boundaries?

Answers


To elaborate on Casimir's comment:

.* is greedy... it will match the longest possible string it can, including newlines if you let it (which you can/did do by enabling multiline matching with \m).

In your first example .* will not match newlines, so \b is forced to match a word boundary on the same line as where \w matched.

In your second example .* will match across lines, so when \w matches your first character, \b is free to match any word boundary, even many lines away, as long as there's an _ somewhere between the two. Specifically, for you, it looks like:

  • \w matched the first character in your input: "C" from "COAL_STORIES"
  • .* matched everything up to "ALUMINUM" on the last line
  • _ matched "_"
  • .* matched "STORIES"
  • \b matched the end of "STORIES"

Need Your Help

WCF GET method not getting latest data for iOS Client

c# ios iphone objective-c wcf

I'm running into a weird situation with my WCF service and iOS client and I'm not really sure what is going on. The problem is if I add data with my WCF POST method in my client and then call the G...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.