Make PHP pathinfo() return the correct filename if the filename is UTF-8

Most of the core PHP functions don’t deal with character sets apart from Latin-1. However, the pathinfo() function can be made to handle UTF-8 filenames correctly by setting the appropriate locale using setlocale() before calling pathinfo().

By default, PHP runs with ‘C’ locale, while CLI scripts run with a default UTF-8 locale. To handle UTF-8 filenames properly, the locale should be changed from ‘C’ to ‘C.UTF-8’ or ‘en_US.UTF-8’ before calling pathinfo().

Example

Here’s how to set the locale and use pathinfo() with UTF-8 filenames −

<?php
// Set locale to handle UTF-8 characters
setlocale(LC_ALL, 'en_US.UTF-8');

// UTF-8 filename example
$originalName = 'résumé_français.pdf';

// Get filename without extension
$filename = pathinfo($originalName, PATHINFO_FILENAME);
echo "Filename: " . $filename . "<br>";

// Get complete basename
$basename = pathinfo($originalName, PATHINFO_BASENAME);
echo "Basename: " . $basename . "<br>";

// Get extension
$extension = pathinfo($originalName, PATHINFO_EXTENSION);
echo "Extension: " . $extension . "<br>";

// Get directory path
$dirname = pathinfo('/path/to/' . $originalName, PATHINFO_DIRNAME);
echo "Directory: " . $dirname;
?>
Filename: résumé_français
Basename: résumé_français.pdf
Extension: pdf
Directory: /path/to

Alternative Locale Settings

You can use different UTF-8 locales depending on your system configuration −

<?php
// Try different UTF-8 locales
$locales = ['en_US.UTF-8', 'C.UTF-8', 'en_GB.UTF-8'];

foreach ($locales as $locale) {
    if (setlocale(LC_ALL, $locale)) {
        echo "Successfully set locale to: " . $locale . "<br>";
        break;
    }
}

$utf8Filename = '????_????.txt'; // Cyrillic characters
$info = pathinfo($utf8Filename);

echo "Original: " . $utf8Filename . "<br>";
echo "Filename: " . $info['filename'] . "<br>";
echo "Extension: " . $info['extension'];
?>
Successfully set locale to: en_US.UTF-8
Original: ????_????.txt
Filename: ????_????
Extension: txt

Conclusion

Setting the appropriate UTF-8 locale with setlocale() before using pathinfo() ensures proper handling of international characters in filenames. This approach works reliably across different systems and character encodings.

Updated on: 2026-03-15T08:52:38+05:30

357 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements