PHP – How to get the substitution character using mb_substitute_character()?

In PHP, we can use the function mb_substitute_character() to get or set the substitution character. This function specifies the substitution character when the input character encoding is not valid or the character code does not exist in the output character encoding.

Note: The invalid characters may be substituted with no output, string, or int value (Unicode character code value).

Syntax

string mb_substitute_character($char)

Parameters

This function accepts only one parameter, $char.

  • $char − It specifies the Unicode value as an integer or the strings given below:

    • "none" − It will return no output.

    • "long" − It is used for the output character code value. For example, "U+3000, JIS+7E7E"

    • "entity" − it is used to return the output character entity. For example, "&#x200".

Return Value

If the mb_substitute_character is set, then it will return true for success or else it will return false. If it is not set, then it will return the current setting.

Note: PHP 8.0 does not support passing an empty string to substitute_character.

Example

Here's how to use the function to set and retrieve the substitution character ?

<?php
    // It will set the Unicode U+3013
    mb_substitute_character(0x3013);

    // Set to hexadecimal format
    mb_substitute_character("long");

    // It will display current setting
    echo mb_substitute_character();
?>
long

Different Substitution Types

Let's see how different substitution types work ?

<?php
    // Set to "none" - no output for invalid characters
    mb_substitute_character("none");
    echo "Current setting: " . mb_substitute_character() . "<br>";

    // Set to "entity" - HTML entity for invalid characters
    mb_substitute_character("entity");
    echo "Current setting: " . mb_substitute_character() . "<br>";

    // Set to a specific Unicode character (question mark)
    mb_substitute_character(0x003F);
    echo "Current setting: " . mb_substitute_character() . "<br>";
?>
Current setting: none
Current setting: entity
Current setting: 63

Conclusion

The mb_substitute_character() function is essential for handling invalid characters in multibyte string operations. Use "long", "entity", or "none" for different output formats, or specify a Unicode value for custom substitution characters.

Updated on: 2026-03-15T09:58:18+05:30

359 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements