PHP regular expression for multiple tables

I need help with building regular expression for text separating. Now I have some text like

text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text
<table class="table2">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text

I need to create a regular expression that would separate the text and tables. Now I have regular expression

preg_match_all( "/(.*)(<table(?s).*?\/table>)(.*)/si", $value[ 'TEXT' ], $matches );

And this expression works fine for the text like

text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>

It separate to the

text text text
text text text
<div> text text text </div>

and

    <table class="table1">
    <tr>
    <td>
    </td>
    </tr>
    </table>

But for the text

text text text
text text text
<div> text text text </div>
<table class="table1">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text
<table class="table2">
<tr>
<td>
</td>
</tr>
</table>
text text text
text text text
text text text

my regular expression doesnot work. It's return array with

[0] =>"text text text
    text text text
    <div> text text text </div>
    <table class="table1">
    <tr>
    <td>
    </td>
    </tr>
    </table>
    text text text
    text text text
    text text text",
[1]=>"<table class="table2">
    <tr>
    <td>
    </td>
    </tr>
    </table>",
[2]=>"text text text
    text text text
    text text text"

How to build right regular expression?

Answers


It should be somewhere around this:

$doc = new DOMDocument;
$doc->loadHTML('html string');

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table){
    $parent = $table->parentNode;
    $parent->removeChild($table);
}

$doc->normalizeDocument();

$text = array();
$xpath = new DOMXPath($doc);
$textnodes = $xpath->evaluate('//text()');
foreach($textnodes as $textnode){
    $text[] = $textnode->wholeText;
}
print_r($text)

This code loads your html, find and removes tables, finds all the textnodes and fill an array with their content. You should read more about PHP DOM to fine tune it to your needs.


Need Your Help

Can't have the same table names in different entity framework models?

c# .net sql-server entity-framework

My application uses two different SQL 2008 databases. The databases have a few tables with the same name, ie. Users. I would like to use EF4 for both these databases. However, when I run my applica...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.