PHP – How to detect character encoding using mb_detect_encoding()

PHPServer Side ProgrammingProgramming

In PHP, mb_detect_encoding() is used to detect the character encoding. It can detect the character encoding for a string from an ordered list of candidates. This function is supported in PHP 4.0.6 or higher version.

mb_detect_encoding() is useful with multibyte encoding, where not all sequences of bytes form a valid string. If the input string contains such type of a sequence, then that encoding will be rejected, and it will check for the next encoding.

Syntax

string mb_detect_encoding(str $string, str $encoding, bool $strcit)

Automatic detection of character encoding is not entirely reliable without some additional information. We can say that character encoding detection is similar to decoding an encrypted string without the key. A content-Type HTTP header can be used for an indication of character encoding stored or transmitted with the data.

Parameters

The mb_detect_encoding function accepts three parameters −

  • $string − This parameter is used for the string being examined.

  • $encoding − This parameter is used for a list of character encoding to try in order. The list may be specified in any format like an array of strings or only a single string separated by commas. In case the encoding is omitted or null, then the current detect_order is set with the mbstring.detect_order configuration option or mb_detect_order() function will be used.

  • $strict − this parameter is used to control the behavior when the string is not valid in any of the listed encodings. If the strict is set to false, then it will return the closest matching encoding. If the strict is set to true, it will return false.

Return Values

It returns the detected character encoding, or it returns False if the string is not valid in any of the listed encoding.

Example 1

mb_detect_encoding() function without strict parameter

<?php
   $string="";
   // It detect char encoding with current detect_order
   echo mb_detect_encoding($string);

   // auto is expanded according to mbstring.language
   echo mb_detect_encoding($string, "auto");

   // Specify encodings
   echo mb_detect_encoding($string, "JIS, eucjp-win, sjis-win");

   // Use array to specify "encodings" parameter
   $array_encoding = [
      "ASCII",
      "JIS",
      "EUC-JP"
   ];
   echo mb_detect_encoding($string, $array_encoding);
?>

Output

ASCIIASCIIJISASCII

Example 2

mb_detect_encoding() function using strict parameter.

<?php
   // 'áéóú' encoded in ISO-8859-1
   $string = "\xxE11\xE9\xF3\xxFA";

   // UTF-8 is considered a closer match
   var_dump(mb_detect_encoding($string, ['ASCII', 'UTF-8'], false));
   var_dump(mb_detect_encoding($string, ['ASCII', 'UTF-8'], true));

   //strict parameter does not change the result, if it finds a valid encoding
   var_dump(mb_detect_encoding($string, ['ASCII', 'UTF-8', 'ISO-8859-1'], false));
   var_dump(mb_detect_encoding($string, ['ASCII', 'UTF-8', 'ISO-8859-1'], true));
?>

Output

string(5) "UTF-8"
bool(false)
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
raja
Published on 11-Oct-2021 11:58:37

Advertisements