Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
strlen() php function giving the wrong length of unicode characters ?
The PHP strlen() function counts bytes, not characters, which causes incorrect results with Unicode characters that use multiple bytes. For accurate character counting with Unicode strings, use mb_strlen() instead.
Why strlen() Fails with Unicode
Unicode characters like accented letters (é, ñ, ?) often require 2-4 bytes in UTF-8 encoding. The strlen() function counts these bytes rather than the actual visible characters ?
<?php
$unicodeString = 'JohnSm?th';
echo "String: " . $unicodeString . "
";
echo "strlen() result: " . strlen($unicodeString) . "
";
echo "mb_strlen() result: " . mb_strlen($unicodeString, 'UTF-8') . "
";
?>
String: JohnSm?th strlen() result: 10 mb_strlen() result: 9
Comparison of Methods
| Function | What It Counts | Unicode Support | Result for 'JohnSm?th' |
|---|---|---|---|
strlen() |
Bytes | No | 10 |
mb_strlen() |
Characters | Yes | 9 |
Best Practice Example
Always specify the encoding parameter when using mb_strlen() to ensure consistent results ?
<?php
$text = 'Café résumé';
// Incorrect way
echo "strlen(): " . strlen($text) . "
";
// Correct way
echo "mb_strlen(): " . mb_strlen($text, 'UTF-8') . "
";
// Also works with other encodings
echo "mb_strlen() (auto-detect): " . mb_strlen($text) . "
";
?>
strlen(): 13 mb_strlen(): 11 mb_strlen() (auto-detect): 11
Conclusion
Use mb_strlen() with UTF-8 encoding for accurate character counting in Unicode strings. The strlen() function should only be used when you specifically need byte count rather than character count.
Advertisements
