SimpleXML & html entities = strange characters
I am getting a feed as such..
$posts = new SimpleXMLElement(WP_ROOT_URL . 'feed/', 0, true);
In this feed one of the items I am getting contains a html entity, which is the entity for the "hyphen character", which is –
However when this is returned from SimpleXML all I get is a â€“. I have read other similar questions on SO & some mention to make sure your page is set to UTF-8; though not sure how this will stop SimpleXML from returning the strange character?
Any which way I do have this on the page the data is output on:
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
What can I do here to get the correct entity?
In PHP strings don't have unified or managed encoding, therefore you cannot think of them as containing characters but bytes. The result always contains the bytes 0xE28093, only the interpretation changes. You can see this by calling bin2hex() on the result.
The bytes interpreted in Windows-1252 come out as â€“, interpreted in UTF-8, they come out as –.
If you are echoing this on a web page, then you can make browser interpret your output in UTF-8 by doing:
<?php header("Content-Type: text/html; charset=UTF-8"); //Put this before any output echo "stuff";