Skip to content Skip to sidebar Skip to footer

Unicode Replacement Characters In The Php Htmlspecialchars Function

In the htmlspecialchars function, if you set the ENT_SUBSTITUTE flag, it is supposed to replace some invalid characters. What characters are replaced? And what is the mapping betw

Solution 1:

There is only one, universal replacement character: U+FFFD. If you are writing out UTF-8, then this codepoint is appropriately encoded. If not, you get the corresponding character reference � instead.

There is no reversible mapping. By definition, the original byte sequence was invalid, i.e. it does not have a value (valid = has a value).

Bytes (not really "characters") that are replaced are those that are not valid in the assumed source encoding. For example, if your source encoding was UTF-16 and you had a lone surrogate, that would be "invalid" (though technically any text processor is supposed to abort fatally in that situation). As a better example, if the source encoding is ASCII, then any value above 127 is an invalid character.

Post a Comment for "Unicode Replacement Characters In The Php Htmlspecialchars Function"