Pandas Series.str.decode() Method
The Series.str.decode() method in Pandas allows you to convert byte strings into regular strings by using the specified encoding. This function is useful when working with encoded text data that needs to be decoded for analysis or processing.
This method is similar to the str.decode() method in Python 2 and the bytes.decode() method in Python 3, providing a easy way to handle encoded text data within a Pandas Series or Index.
Syntax
Following is the syntax of the Pandas Series.str.decode() method −
Series.str.decode(encoding, errors='strict')
Parameters
The Series.str.decode() method accepts the following parameters −
encoding − A string representing the name of the encoding used to decode the bytes.
errors − An optional string specifying the error handling scheme. The default is 'strict', which raises a UnicodeDecodeError on encoding errors. Other options include 'ignore', 'replace', 'backslashreplace', and 'namereplace'.
Return Value
The Series.str.decode() method returns a Series or Index of the same type as the calling object, containing the decoded strings.
Example
In this example, we demonstrate the basic usage of the Series.str.decode() method by decoding a Series of byte strings using the 'ascii' encoding.
import pandas as pd
# Create a Series of byte strings
ser = pd.Series([b'Tutorialspoint', b'123', b'$'])
# Decode byte strings using 'ascii' encoding
result = ser.str.decode('ascii')
print("Input Series:")
print(ser)
print("\nSeries after calling str.decode('ascii'):")
print(result)
When we run the above code, it produces the following output −
Input Series:
0 b'Tutorialspoint'
1 b'123'
2 b'$'
dtype: object
Series after calling str.decode('ascii'):
0 Tutorialspoint
1 123
2 $
dtype: object
Example
This example demonstrates how to use the Series.str.decode() method to decode a column of byte strings in a DataFrame using the 'utf-8' encoding.
import pandas as pd
# Create a DataFrame with a column of byte strings
df = pd.DataFrame({ 'COLUMN1': [b'\xc2\xa9', b'\xe2\x82\xac', b'\xf0\x9f\x87\x80'] })
# Decode byte strings using 'utf-8' encoding
result = df['COLUMN1'].str.decode("utf-8")
print("Input DataFrame:")
print(df)
print("\nDataFrame column after calling str.decode('utf-8'):")
print(result)
Following is the output of the above code −
Input DataFrame:
COLUMN1
0 b'\xc2\xa9'
1 b'\xe2\x82\xac'
2 b'\xf0\x9f\x87\x80'
DataFrame column after calling str.decode('utf-8'):
0
1
2
Name: COLUMN1, dtype: object
Example
Here's another example demonstrating the use of Series.str.decode() method.
import pandas as pd
# Create a Series of byte strings representing text in different encodings
ser = pd.Series([b'\xe2\x9c\x94', b'\xe2\x9c\x93', b'\xe2\x9c\x9c'])
# Decode byte strings using 'utf-8' encoding
result = ser.str.decode('utf-8')
print("Input Series:")
print(ser)
print("\nSeries after calling str.decode('utf-8'):")
print(result)
Following is the output of the above code −
Input Series:
0 b'\xe2\x9c\x94'
1 b'\xe2\x9c\x93'
2 b'\xe2\x9c\x9c'
dtype: object
Series after calling str.decode('utf-8'):
0
1
2
dtype: object