# preg_replace is not work correctly with UTF-8 chars?

i am using this function to replace bad words from phrases, but it works good with english letters except UTF-8 chars.

i found that \b boundary is not working properly with utf-8 chars. are there any alternative method to do this ?

i had to add '\b' as i need to replace the exact word only. as a example: dont want to replace popo_one with p***o i need only to replace popo with p***o. hope it is clear to understand.

public function wordfilter($phrase) {$filter = array('/popo\b/i','/blabla\b/i');
$replace = array('p***o','b***a');$newphrase = preg_replace($filter,$replace, $phrase); return$newphrase;
}


any ideas appreciated.

\b (a word boundary) is the limit between a character from the \w character class and an other character or the limit of the string (begining or end).

By default \w contains only [a-zA-Z0-9_], but if you use the u modifier the \w character class will contain all unicode letters and digits (and will be equivalent to [\p{L}\p{N}_]). So with this modifier the meaning of \b will change too.

The u modifier has a second effect. With it, the pattern and the subject string are no more treated as ascii strings, but as utf8 strings.

The u modifier is a combination of two directives: (*UCP) that changes the meaning of the shorthand character classes (\w, \d, \s...) and (*UTF8) that makes pattern and subject strings to be read as utf8 strings. These directives can be placed directly in the pattern at the very begining instead of using the u modifier.