RegEx to exclude number using PHP

This question is a continuation of my previous question:

RegEx to exclude academic title

I want split paragraph string into array of sentences using regular expression with character dot (.). And the next problem is about number.

Here is an example :

In this year 2013. Hello Mr. Andre, your money is Rp 40.000.

Of course the correct output :

Array ( [0] => In this year 2013 [1] => Hello Mr. Andre, your money is Rp 40.000 )

The title problem (Mr.) is already solved from my question before. I've tried with adding regex of number but still don't work.

My not worked code :

$titles_number=array('(^[0-9]*)','(?<!Mr)', '(?<!Mrs)', '(?<!Ms)');
$sentences=preg_split('/('.implode('',$titles_number).')\./',$text);
print_r($sentences);

Can I do this with one blow (one regex to get rid two problem)? Tell me if I can't do it. Thanks in advance

Answers


This will be easier to accomplish with preg_match_all():

preg_match_all(
    '/[^\s.][^.]*(?:\.(?:(?<=Prof\.|Dr\.|Mr\.|Mrs\.|Ms\.)|(?=\d))[^.]*)*\./',
    $subject, $result, PREG_PATTERN_ORDER);
print_r($result[0]);

explanation:

  • [^\s.] matches the next non-whitespace character (i.e., skip over any whitespace between sentences)
  • [^.]* gobbles up any non-dot characters
  • \. matches a dot IF...
  • (?<=Prof\.|Dr\.|Mr\.|Mrs\.|Ms\.) ...it's part of an honorific...
  • (?=\d) ...or part of a number

notes:

  1. (?<=Prof\.|Dr\.|Mr\.|Mrs\.|Ms\.) is legal because the alternation is at the top level. That is, it acts like several discrete lookbehinds, each with a fixed length. That's why I had to repeat the \. in every branch instead of using (?<=(?:Prof|Dr|Mr|Mrs|Ms)\.).

  2. \.(?=\d) seems to be sufficient for identifying a dot that's part of a number. If you really have to check for digits before and after the dot, you can use (?=(?<=\d\.)\d) instead.

  3. If this is for anything more serious than a homework problem, you should discard regexes and look for a natural-language processing library. Crude as all this is, it's very close to the limit of what you can do with regexes.


Need Your Help

TomcatInstrumentableClassLoader Servlet ClassNotFoundException

spring tomcat servlets aop

I using Tomcat 6 as the App Server for my web application which uses the AOP to inject DAO into servlet.

Using JavaScript to create a Chrome-style alert

javascript google-chrome popup

How do I create an alert in Chrome using JavaScript that opens a text box in the middle of the browser window and washes out the background, like so:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.