Article Categories

Selected Reading

How to deal with multi-byte UTF-8 strings in JavaScript and fix the empty delimiter/separator issue

PHP Server Side Programming Programming

In PHP, when working with multi-byte UTF-8 strings, using preg_split() with the '//u' pattern and the PREG_SPLIT_NO_EMPTY flag helps handle empty delimiter issues and properly splits UTF-8 characters.

Syntax

preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY)

Parameters

The key components of this approach −

'//u' − Empty pattern with Unicode modifier for UTF-8 support
$string − The input string to split
-1 − No limit on number of splits
PREG_SPLIT_NO_EMPTY − Removes empty elements from result

Example

Here's how to split UTF-8 strings into individual characters −

<?php
// Empty string test
$stringValues = "";
$result = preg_split('//u', $stringValues, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
echo "<br>";

// Regular ASCII string
$stringValues1 = "John Smith";
$result1 = preg_split('//u', $stringValues1, -1, PREG_SPLIT_NO_EMPTY);
print_r($result1);
echo "<br>";

// UTF-8 multi-byte characters
$stringValues2 = "Héllo Wörld";
$result2 = preg_split('//u', $stringValues2, -1, PREG_SPLIT_NO_EMPTY);
print_r($result2);
?>

Array ( )
Array ( [0] => J [1] => o [2] => h [3] => n [4] =>   [5] => S [6] => m [7] => i [8] => t [9] => h )
Array ( [0] => H [1] => é [2] => l [3] => l [4] => o [5] =>   [6] => W [7] => ö [8] => r [9] => l [10] => d )

How It Works

The //u pattern creates an empty regular expression with the Unicode modifier, which correctly handles multi-byte UTF-8 characters. The PREG_SPLIT_NO_EMPTY flag prevents empty array elements from being created, solving the empty delimiter issue.

Conclusion

Using preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY) is the most effective way to split UTF-8 strings into individual characters while avoiding empty delimiter problems in PHP.

AmitDiwan

Updated on: 2026-03-15T09:35:14+05:30

354 Views

Previous Next