Character Encoding in Python

Character encoding is the process of converting text into bytes that computers can store and process. Python 3 uses Unicode by default and supports various encoding formats, with UTF-8 being the most common.

Understanding Character Encoding

In character encoding, each character is mapped to a numeric value. For example:

  • C = 67

  • D = 68

  • E = 69

Character Number Binary
D 68 1000100

UTF-8 Encoding in Python

UTF-8 is Python's default encoding method with these characteristics ?

  • ASCII characters use one byte (0-127)

  • Non-ASCII characters use 2-4 bytes depending on the character

  • Backward compatible with ASCII encoding

  • Variable-length encoding for efficiency

Method 1: Using binascii Module

The binascii module provides functions to convert between binary and ASCII formats ?

import binascii

# Sample text to encode
data = "Welcome to TutorialsPoint"
print("Original data type:", type(data))
print("Original data:", data)

# Encode string to UTF-8 bytes, then to hexadecimal
encoding = binascii.hexlify(data.encode('utf-8'))
print("Encoded data:", encoding)
print("Encoded data type:", type(encoding))
Original data type: <class 'str'>
Original data: Welcome to TutorialsPoint
Encoded data: b'57656c636f6d6520746f205475746f7269616c73506f696e74'
Encoded data type: <class 'bytes'>

Method 2: Using hex() Method

The hex() method provides a more direct way to get hexadecimal representation ?

# Sample text to encode
data = "Welcome to TutorialsPoint"
print("Original data:", data)
print("Original data type:", type(data))

# Encode to UTF-8 bytes and convert to hex string
encoding = data.encode('utf-8').hex()
print("Encoded data:", encoding)
print("Encoded data type:", type(encoding))
Original data: Welcome to TutorialsPoint
Original data type: <class 'str'>
Encoded data: 57656c636f6d6520746f205475746f7269616c73506f696e74
Encoded data type: <class 'str'>

Encoding and Decoding Example

Here's a complete example showing both encoding and decoding ?

# Original string
text = "Python encoding: ?"
print("Original:", text)

# Encode to bytes
encoded_bytes = text.encode('utf-8')
print("Encoded bytes:", encoded_bytes)

# Convert to hexadecimal representation
hex_representation = encoded_bytes.hex()
print("Hex representation:", hex_representation)

# Decode back to string
decoded_text = encoded_bytes.decode('utf-8')
print("Decoded back:", decoded_text)
Original: Python encoding: ?
Encoded bytes: b'Python encoding: \xf0\x9f\x90\x8d'
Hex representation: 507974686f6e20656e636f64696e673a20f09f908d
Decoded back: Python encoding: ?

Comparison of Methods

Method Output Type Use Case
binascii.hexlify() bytes When you need bytes output
.hex() str When you need string output

Conclusion

Character encoding converts text to bytes for computer processing. Python's UTF-8 encoding handles both ASCII and Unicode characters efficiently. Use .encode() and .decode() methods for string-to-bytes conversion, and hex() or binascii for hexadecimal representation.

Updated on: 2026-03-27T13:38:32+05:30

598 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements