Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to read and write unicode (UTF-8) files in Python?
Python provides built-in support for reading and writing Unicode (UTF-8) files through the open() function. UTF-8 is the most widely used encoding for text files as it can represent any Unicode character.
Reading UTF-8 Files
To read a UTF-8 encoded file, specify the encoding parameter when opening the file ?
# Create a sample UTF-8 file first
with open('sample.txt', 'w', encoding='utf-8') as f:
f.write('Hello World! ???? ?')
# Read the UTF-8 file
with open('sample.txt', 'r', encoding='utf-8') as f:
content = f.read()
print(content)
Hello World! ???? ?
Writing UTF-8 Files
To write Unicode text to a UTF-8 file, use the same encoding parameter ?
# Unicode text with different characters
unicode_text = "Python supports Unicode: ñáéíóú ??? ?? ?"
# Write to UTF-8 file
with open('unicode_output.txt', 'w', encoding='utf-8') as f:
f.write(unicode_text)
# Read back to verify
with open('unicode_output.txt', 'r', encoding='utf-8') as f:
retrieved_text = f.read()
print("Written and read back:")
print(retrieved_text)
Written and read back: Python supports Unicode: ñáéíóú ??? ?? ?
Using the io Module (Alternative Method)
The io module provides an alternative approach that's compatible with older Python versions ?
import io
# Write using io module
text = "Unicode with io module: café résumé naïve"
with io.open('io_example.txt', 'w', encoding='utf-8') as f:
f.write(text)
# Read using io module
with io.open('io_example.txt', 'r', encoding='utf-8') as f:
content = f.read()
print(content)
Unicode with io module: café résumé naïve
Handling Encoding Errors
You can specify error handling behavior when reading files that might contain invalid characters ?
# Example of error handling
text_with_special_chars = "Mixed content: ASCII and Unicode ????"
# Write the file
with open('error_test.txt', 'w', encoding='utf-8') as f:
f.write(text_with_special_chars)
# Read with error handling
with open('error_test.txt', 'r', encoding='utf-8', errors='replace') as f:
safe_content = f.read()
print("Content with error handling:")
print(safe_content)
Content with error handling: Mixed content: ASCII and Unicode ????
Best Practices
| Approach | Python Version | Recommendation |
|---|---|---|
open() |
Python 3.x | Preferred method |
io.open() |
Python 2.x/3.x | Legacy compatibility |
codecs.open() |
Python 2.x | Older alternative |
Conclusion
Always specify encoding='utf-8' when working with Unicode files in Python. The built-in open() function is the recommended approach for Python 3, while io.open() provides backward compatibility.
