Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Character Encoding in Python
Character encoding is the process of converting text into bytes that computers can store and process. Python 3 uses Unicode by default and supports various encoding formats, with UTF-8 being the most common.
Understanding Character Encoding
In character encoding, each character is mapped to a numeric value. For example:
C = 67
D = 68
E = 69
| Character | Number | Binary |
|---|---|---|
| D | 68 | 1000100 |
UTF-8 Encoding in Python
UTF-8 is Python's default encoding method with these characteristics ?
ASCII characters use one byte (0-127)
Non-ASCII characters use 2-4 bytes depending on the character
Backward compatible with ASCII encoding
Variable-length encoding for efficiency
Method 1: Using binascii Module
The binascii module provides functions to convert between binary and ASCII formats ?
import binascii
# Sample text to encode
data = "Welcome to TutorialsPoint"
print("Original data type:", type(data))
print("Original data:", data)
# Encode string to UTF-8 bytes, then to hexadecimal
encoding = binascii.hexlify(data.encode('utf-8'))
print("Encoded data:", encoding)
print("Encoded data type:", type(encoding))
Original data type: <class 'str'> Original data: Welcome to TutorialsPoint Encoded data: b'57656c636f6d6520746f205475746f7269616c73506f696e74' Encoded data type: <class 'bytes'>
Method 2: Using hex() Method
The hex() method provides a more direct way to get hexadecimal representation ?
# Sample text to encode
data = "Welcome to TutorialsPoint"
print("Original data:", data)
print("Original data type:", type(data))
# Encode to UTF-8 bytes and convert to hex string
encoding = data.encode('utf-8').hex()
print("Encoded data:", encoding)
print("Encoded data type:", type(encoding))
Original data: Welcome to TutorialsPoint Original data type: <class 'str'> Encoded data: 57656c636f6d6520746f205475746f7269616c73506f696e74 Encoded data type: <class 'str'>
Encoding and Decoding Example
Here's a complete example showing both encoding and decoding ?
# Original string
text = "Python encoding: ?"
print("Original:", text)
# Encode to bytes
encoded_bytes = text.encode('utf-8')
print("Encoded bytes:", encoded_bytes)
# Convert to hexadecimal representation
hex_representation = encoded_bytes.hex()
print("Hex representation:", hex_representation)
# Decode back to string
decoded_text = encoded_bytes.decode('utf-8')
print("Decoded back:", decoded_text)
Original: Python encoding: ? Encoded bytes: b'Python encoding: \xf0\x9f\x90\x8d' Hex representation: 507974686f6e20656e636f64696e673a20f09f908d Decoded back: Python encoding: ?
Comparison of Methods
| Method | Output Type | Use Case |
|---|---|---|
binascii.hexlify() |
bytes | When you need bytes output |
.hex() |
str | When you need string output |
Conclusion
Character encoding converts text to bytes for computer processing. Python's UTF-8 encoding handles both ASCII and Unicode characters efficiently. Use .encode() and .decode() methods for string-to-bytes conversion, and hex() or binascii for hexadecimal representation.
