Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to deal with multi-byte UTF-8 strings in JavaScript and fix the empty delimiter/separator issue
In PHP, when working with multi-byte UTF-8 strings, using preg_split() with the '//u' pattern and the PREG_SPLIT_NO_EMPTY flag helps handle empty delimiter issues and properly splits UTF-8 characters.
Syntax
preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY)
Parameters
The key components of this approach −
-
'//u'− Empty pattern with Unicode modifier for UTF-8 support -
$string− The input string to split -
-1− No limit on number of splits -
PREG_SPLIT_NO_EMPTY− Removes empty elements from result
Example
Here's how to split UTF-8 strings into individual characters −
<?php
// Empty string test
$stringValues = "";
$result = preg_split('//u', $stringValues, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
echo "<br>";
// Regular ASCII string
$stringValues1 = "John Smith";
$result1 = preg_split('//u', $stringValues1, -1, PREG_SPLIT_NO_EMPTY);
print_r($result1);
echo "<br>";
// UTF-8 multi-byte characters
$stringValues2 = "Héllo Wörld";
$result2 = preg_split('//u', $stringValues2, -1, PREG_SPLIT_NO_EMPTY);
print_r($result2);
?>
Array ( ) Array ( [0] => J [1] => o [2] => h [3] => n [4] => [5] => S [6] => m [7] => i [8] => t [9] => h ) Array ( [0] => H [1] => é [2] => l [3] => l [4] => o [5] => [6] => W [7] => ö [8] => r [9] => l [10] => d )
How It Works
The //u pattern creates an empty regular expression with the Unicode modifier, which correctly handles multi-byte UTF-8 characters. The PREG_SPLIT_NO_EMPTY flag prevents empty array elements from being created, solving the empty delimiter issue.
Conclusion
Using preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY) is the most effective way to split UTF-8 strings into individual characters while avoiding empty delimiter problems in PHP.
