How to deal with multi-byte UTF-8 strings in JavaScript and fix the empty delimiter/separator issue

In PHP, when working with multi-byte UTF-8 strings, using preg_split() with the '//u' pattern and the PREG_SPLIT_NO_EMPTY flag helps handle empty delimiter issues and properly splits UTF-8 characters.

Syntax

preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY)

Parameters

The key components of this approach −

  • '//u' − Empty pattern with Unicode modifier for UTF-8 support
  • $string − The input string to split
  • -1 − No limit on number of splits
  • PREG_SPLIT_NO_EMPTY − Removes empty elements from result

Example

Here's how to split UTF-8 strings into individual characters −

<?php
// Empty string test
$stringValues = "";
$result = preg_split('//u', $stringValues, -1, PREG_SPLIT_NO_EMPTY);
print_r($result);
echo "<br>";

// Regular ASCII string
$stringValues1 = "John Smith";
$result1 = preg_split('//u', $stringValues1, -1, PREG_SPLIT_NO_EMPTY);
print_r($result1);
echo "<br>";

// UTF-8 multi-byte characters
$stringValues2 = "Héllo Wörld";
$result2 = preg_split('//u', $stringValues2, -1, PREG_SPLIT_NO_EMPTY);
print_r($result2);
?>
Array ( )
Array ( [0] => J [1] => o [2] => h [3] => n [4] =>   [5] => S [6] => m [7] => i [8] => t [9] => h )
Array ( [0] => H [1] => é [2] => l [3] => l [4] => o [5] =>   [6] => W [7] => ö [8] => r [9] => l [10] => d )

How It Works

The //u pattern creates an empty regular expression with the Unicode modifier, which correctly handles multi-byte UTF-8 characters. The PREG_SPLIT_NO_EMPTY flag prevents empty array elements from being created, solving the empty delimiter issue.

Conclusion

Using preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY) is the most effective way to split UTF-8 strings into individual characters while avoiding empty delimiter problems in PHP.

Updated on: 2026-03-15T09:35:14+05:30

319 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements