Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Make PHP pathinfo() return the correct filename if the filename is UTF-8
Most of the core PHP functions don’t deal with character sets apart from Latin-1. However, the pathinfo() function can be made to handle UTF-8 filenames correctly by setting the appropriate locale using setlocale() before calling pathinfo().
By default, PHP runs with ‘C’ locale, while CLI scripts run with a default UTF-8 locale. To handle UTF-8 filenames properly, the locale should be changed from ‘C’ to ‘C.UTF-8’ or ‘en_US.UTF-8’ before calling pathinfo().
Example
Here’s how to set the locale and use pathinfo() with UTF-8 filenames −
<?php
// Set locale to handle UTF-8 characters
setlocale(LC_ALL, 'en_US.UTF-8');
// UTF-8 filename example
$originalName = 'résumé_français.pdf';
// Get filename without extension
$filename = pathinfo($originalName, PATHINFO_FILENAME);
echo "Filename: " . $filename . "<br>";
// Get complete basename
$basename = pathinfo($originalName, PATHINFO_BASENAME);
echo "Basename: " . $basename . "<br>";
// Get extension
$extension = pathinfo($originalName, PATHINFO_EXTENSION);
echo "Extension: " . $extension . "<br>";
// Get directory path
$dirname = pathinfo('/path/to/' . $originalName, PATHINFO_DIRNAME);
echo "Directory: " . $dirname;
?>
Filename: résumé_français Basename: résumé_français.pdf Extension: pdf Directory: /path/to
Alternative Locale Settings
You can use different UTF-8 locales depending on your system configuration −
<?php
// Try different UTF-8 locales
$locales = ['en_US.UTF-8', 'C.UTF-8', 'en_GB.UTF-8'];
foreach ($locales as $locale) {
if (setlocale(LC_ALL, $locale)) {
echo "Successfully set locale to: " . $locale . "<br>";
break;
}
}
$utf8Filename = '????_????.txt'; // Cyrillic characters
$info = pathinfo($utf8Filename);
echo "Original: " . $utf8Filename . "<br>";
echo "Filename: " . $info['filename'] . "<br>";
echo "Extension: " . $info['extension'];
?>
Successfully set locale to: en_US.UTF-8 Original: ????_????.txt Filename: ????_???? Extension: txt
Conclusion
Setting the appropriate UTF-8 locale with setlocale() before using pathinfo() ensures proper handling of international characters in filenames. This approach works reliably across different systems and character encodings.
