Character Encoding in Python


Introduction

Python has got a remarkable place in the field of handling data with the technologies like data science and machine learning. In the 21st century handling data is the most challenging task for organizations with a high volume of data and with the development of data science and machine learning it has become easier to access.

In the Python language, Unicode is used and is represented as UTF (Universal Coded Character Set plus Transfer Format)-8. To encode the characters, there are two main possibilities of characters one is an ASCII character and the other is a non-ASCII character. For example, let’s take three characters,

C = 67

D = 68

E = 69

D

68

1000100

Character

Number

Binary Number

Character Encoding Methods

There are various encoding techniques available and out of which UTF-8 is the encoding technique used in Python language.

  • One byte is used by the UTF for an ASCII character.

  • On the other side, the non-ASCII character takes two or more bytes based on its usage.

  • As the current technology uses Python 3, it uses the bytes along the string for encoding purposes.

  • But before it Python 2 was used to convert a byte array into a string ‘str’ using an encoding.

  • The characters are encoded by converting the given string of characters into bytes and then the decoding process involves converting the bytes into strings.

Approach

Approach 1 − Using ASCII module

Approach 2 − Using the hex() method

Approach 1: Python Program for character encoding using binascii module

The encode() method is called on “data” with the “UTF-8” encoding technique to convert the given string into an encoded binary format using binascii() method.

Algorithm

  • Step 1 − The binascii module is imported to use the function of binascii.hexlify().

  • Step 2 − Initialize the variable with a string element.

  • Step 3 − The type() function is used to get the data type of the defined variable and in this case, it is a string variable.

  • Step 4 − Then the resulting code is stored in the variable “encoding”.

  • Step 5 − The print statement will return the type of the string and the encoded data.

Example

#importing the binascii module
import binascii

#initializing the data with a string of elements
data = "Welcome to Tutorialpoint"
#returns the type of data initialized
print(type(data))

#convert the string into its equivalent binary format
encoding = binascii.hexlify(data.encode('utf-8'))
#returns the data after encoding the data
print("Data after encoding is:", encoding)

Output

<class 'str'>
Data after encoding is: b'57656c636f6d6520746f205475746f7269616c706f696e74'

Approach 2: Python Program for character encoding using the hex() method

The encode() method is called on “data” with the “UTF-8” encoding technique to convert the given string into an encoded hexadecimal format using the hex() method.

Algorithm

  • Step 1 − Initialize the variable with a string element.

  • Step 2 − The type() function is used to get the data type of the defined variable and in this case, it is a string variable.

  • Step 3 − Then the resulting code would be assigned to the variable “encoding”.

  • Step 4 − Finally, display the value of the encoding variable.

Example

#initializing the data with a string of elements
data = "Welcome to Tutorialpoint"
#returns the type of data initialized
print(type(data))

#convert the string into its equivalent hexadecimal format
encoding = data.encode('utf-8').hex()
#returns the data after encoding the data
print("Data after encoding is:", encoding)

Output

<class 'str'>
Data after encoding is: 57656c636f6d6520746f205475746f7269616c706f696e74

Conclusion

As a programmer, characters should be encoded for secured methods in Python. The character of the string is encoded which is used in the systems and other electronic devices. When the data is interchanged from one source to another then the data or character needs to be encoded. The strings are composed of Unicode objects to encode which can be used globally. When Unicode is designed and can be accessible in different platforms of Operating systems and applications.

Updated on: 25-Aug-2023

129 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements