Unicode Replacement Characters In The Php Htmlspecialchars Function
Solution 1:
There is only one, universal replacement character: U+FFFD. If you are writing out UTF-8, then this codepoint is appropriately encoded. If not, you get the corresponding character reference �
instead.
There is no reversible mapping. By definition, the original byte sequence was invalid, i.e. it does not have a value (valid = has a value).
Bytes (not really "characters") that are replaced are those that are not valid in the assumed source encoding. For example, if your source encoding was UTF-16 and you had a lone surrogate, that would be "invalid" (though technically any text processor is supposed to abort fatally in that situation). As a better example, if the source encoding is ASCII, then any value above 127 is an invalid character.
Post a Comment for "Unicode Replacement Characters In The Php Htmlspecialchars Function"