What is the basic operation of pandas Series.factorize() function?


The pandas Series.factorize() method is used to encode the series object as an enumerated type or categorical variable. This method generates the numeric representation of the series data.

The output of this Series.factorize() method is a tuple and it has two elements one is indicating codes and another element indicates uniques.

Example 1

In the following example, we will see how the series.factorize() method encodes the elements of the series object.

# importing pandas package
import pandas as pd

# create a series
s = pd.Series({'A':"aa", 'B':"bb", "C":"cc"})
print(s)

result = s.factorize()
print(result)

Explanation

Here the series object is created by using a python dictionary.

Output

The output is given below −

A    aa
B    bb
C    cc
dtype: object
(array([0, 1, 2], dtype=int32), Index(['aa', 'bb', 'cc'], dtype='object'))

In the output, we can see that the Series.factorize() function has successfully encoded the data of the series object “s”.

Example 2

In the following example, we will see how the series.factorize() method encodes the elements of series with missing values.

# importing pandas package
import pandas as pd

# create a series
s = pd.Series([70, 52, None, 79, 34,], index= list('ijklm'))
print(s)

result = s.factorize()
print(result)

Output

The output is given below −

i    70.0
j    52.0
k     NaN
l    79.0
m    34.0
dtype: float64

(array([ 0, 1, -1, 2, 3], dtype=int32), Float64Index([70.0, 52.0, 79.0, 34.0], dtype='float64'))

As we can see in the above output block, the Series.factorize() method has encoded the data of the given series object with the numerical values. We can also observe that the missing values have been assigned with numerical value -1.

Updated on: 07-Mar-2022

105 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements