Why is this regex returning errors when I use it to fish img src's from HTML?

I'm writing a function that fishes out the src from the first image tag it finds in an html file. Following the instructions in this thread on here, I got something that seemed to be working:

preg_match_all('#<img[^>]*>#i', $content, $match); 

foreach ($match as $value) {
    $img = $value[0];
                           } 

$stuff = simplexml_load_string($img);
$stuff = $stuff[src];
return $stuff;

But after a few minutes of using the function, it started returning errors like this:

warning: simplexml_load_string() [0function.simplexml-load-string0]: Entity: line 1: parser error : Premature end of data in tag img line 1 in path/to/script on line 42.

and

warning: simplexml_load_string() [0function.simplexml-load-string0]: tp://feeds.feedburner.com/~f/ChicagobusinesscomBreakingNews?i=KiStN" border="0"> in path/to/script on line 42.

I'm kind of new to PHP but it seems like my regex is chopping up the HTML incorrectly. How can I make it more "airtight"?

Answers


These two lines of PHP code should give you a list of all the values of the src attribute in all img tags in an HTML file:

preg_match_all('/<img\s+[^<>]*src=["\']?([^"\'<>\s]+)["\']?/i', $content, $result, PREG_PATTERN_ORDER);
$result = $result[1];

To keep the regex simple, I'm not allowing file names to have spaces in them. If you want to allow this, you need to use separate alternatives for quoted attribute values (which can have spaces), and unquoted attribute values (which can't have spaces).


Need Your Help

Is it possible for Android application that is running in background to use location or GPS services

android geolocation location background-process

I just want to make sure that it is possible for the application (or all the applications) that are running at the moment in the background (as a background service or anything like that) to access

Pull comments from Facebook API (Python, JSON)

python json facebook facebook-graph-api

I want to pull all comments from all posts the last 24 hours using the Facebook API. Currently, I can only pull from a certain data range of posts as the Facebook API only allows "since" and "until...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.