Matching across a line vs matching words regex

Why is it that when I match across new lines it would seem that I can't identify individual words. For example:

content = "COAL_STORIES
AUSTRALIA - blah blah blah
BOTSWANA – blah blah blah 

INDIA - blah

AUSTRALIA - blah blah blah
AUSTRALIA - blah blah blah
CHINA - blah blah blah


sections = content.scan(/\w.*_.*\b/)

Give and array:

    [0] "COAL_STORIES",

But if I try that using the 'm' flag everything gets matched:

sections = content.scan(/\w.*_.*\b/m) gives an array:

    [0] "COAL_STORIES\nAUSTRALIA - blah blah blah\nBOTSWANA – blah blah blah \n\nURANIUM_STORIES \nAUSTRALIA – blah\nINDIA - blah\n\nCOPPER_STORIES\nAUSTRALIA - blah blah blah\nAUSTRALIA - blah blah blah\nCHINA - blah blah blah\n\nALUMINIUM_STORIES"

As far as I can tell I'm still looking for the same word boundaries?


To elaborate on Casimir's comment:

.* is greedy... it will match the longest possible string it can, including newlines if you let it (which you can/did do by enabling multiline matching with \m).

In your first example .* will not match newlines, so \b is forced to match a word boundary on the same line as where \w matched.

In your second example .* will match across lines, so when \w matches your first character, \b is free to match any word boundary, even many lines away, as long as there's an _ somewhere between the two. Specifically, for you, it looks like:

  • \w matched the first character in your input: "C" from "COAL_STORIES"
  • .* matched everything up to "ALUMINUM" on the last line
  • _ matched "_"
  • .* matched "STORIES"
  • \b matched the end of "STORIES"

Need Your Help

android username in action bar not displayed when page is loaded the first time

android android-actionbar oncreate

Hi I am very new to Android and I am having some issues with putting the username (retrieved by using the web services) into the Action bar.

Why in some dynamic website , their pages are in html format? html dynamic website format

I've seen a lot of dynamic website through the internet that their pages are in html or htm format . I don't get it why is that ? And how they do that ?

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.