How to read and write unicode (UTF-8) files in Python?

Python provides built-in support for reading and writing Unicode (UTF-8) files through the open() function. UTF-8 is the most widely used encoding for text files as it can represent any Unicode character.

Reading UTF-8 Files

To read a UTF-8 encoded file, specify the encoding parameter when opening the file ?

# Create a sample UTF-8 file first
with open('sample.txt', 'w', encoding='utf-8') as f:
    f.write('Hello World! ???? ?')

# Read the UTF-8 file
with open('sample.txt', 'r', encoding='utf-8') as f:
    content = f.read()
    print(content)
Hello World! ???? ?

Writing UTF-8 Files

To write Unicode text to a UTF-8 file, use the same encoding parameter ?

# Unicode text with different characters
unicode_text = "Python supports Unicode: ñáéíóú ??? ?? ?"

# Write to UTF-8 file
with open('unicode_output.txt', 'w', encoding='utf-8') as f:
    f.write(unicode_text)

# Read back to verify
with open('unicode_output.txt', 'r', encoding='utf-8') as f:
    retrieved_text = f.read()
    print("Written and read back:")
    print(retrieved_text)
Written and read back:
Python supports Unicode: ñáéíóú ??? ?? ?

Using the io Module (Alternative Method)

The io module provides an alternative approach that's compatible with older Python versions ?

import io

# Write using io module
text = "Unicode with io module: café résumé naïve"

with io.open('io_example.txt', 'w', encoding='utf-8') as f:
    f.write(text)

# Read using io module
with io.open('io_example.txt', 'r', encoding='utf-8') as f:
    content = f.read()
    print(content)
Unicode with io module: café résumé naïve

Handling Encoding Errors

You can specify error handling behavior when reading files that might contain invalid characters ?

# Example of error handling
text_with_special_chars = "Mixed content: ASCII and Unicode ????"

# Write the file
with open('error_test.txt', 'w', encoding='utf-8') as f:
    f.write(text_with_special_chars)

# Read with error handling
with open('error_test.txt', 'r', encoding='utf-8', errors='replace') as f:
    safe_content = f.read()
    print("Content with error handling:")
    print(safe_content)
Content with error handling:
Mixed content: ASCII and Unicode ????

Best Practices

Approach Python Version Recommendation
open() Python 3.x Preferred method
io.open() Python 2.x/3.x Legacy compatibility
codecs.open() Python 2.x Older alternative

Conclusion

Always specify encoding='utf-8' when working with Unicode files in Python. The built-in open() function is the recommended approach for Python 3, while io.open() provides backward compatibility.

Updated on: 2026-03-24T19:36:12+05:30

29K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements