Why is this regex returning errors when I use it to fish img src's from HTML?

I'm writing a function that fishes out the src from the first image tag it finds in an html file. Following the instructions in this thread on here, I got something that seemed to be working:

preg_match_all('#<img[^>]*>#i', $content, $match); 

foreach ($match as $value) {
    $img = $value[0];

$stuff = simplexml_load_string($img);
$stuff = $stuff[src];
return $stuff;

But after a few minutes of using the function, it started returning errors like this:

warning: simplexml_load_string() [0function.simplexml-load-string0]: Entity: line 1: parser error : Premature end of data in tag img line 1 in path/to/script on line 42.


warning: simplexml_load_string() [0function.simplexml-load-string0]: tp://feeds.feedburner.com/~f/ChicagobusinesscomBreakingNews?i=KiStN" border="0"> in path/to/script on line 42.

I'm kind of new to PHP but it seems like my regex is chopping up the HTML incorrectly. How can I make it more "airtight"?


These two lines of PHP code should give you a list of all the values of the src attribute in all img tags in an HTML file:

preg_match_all('/<img\s+[^<>]*src=["\']?([^"\'<>\s]+)["\']?/i', $content, $result, PREG_PATTERN_ORDER);
$result = $result[1];

To keep the regex simple, I'm not allowing file names to have spaces in them. If you want to allow this, you need to use separate alternatives for quoted attribute values (which can have spaces), and unquoted attribute values (which can't have spaces).

