 
 Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What is the basic operation of pandas Series.factorize() function?
The pandas Series.factorize() method is used to encode the series object as an enumerated type or categorical variable. This method generates the numeric representation of the series data.
The output of this Series.factorize() method is a tuple and it has two elements one is indicating codes and another element indicates uniques.
Example 1
In the following example, we will see how the series.factorize() method encodes the elements of the series object.
# importing pandas package
import pandas as pd
# create a series
s = pd.Series({'A':"aa", 'B':"bb", "C":"cc"})
print(s)
result = s.factorize()
print(result)
Explanation
Here the series object is created by using a python dictionary.
Output
The output is given below −
A aa B bb C cc dtype: object (array([0, 1, 2], dtype=int32), Index(['aa', 'bb', 'cc'], dtype='object'))
In the output, we can see that the Series.factorize() function has successfully encoded the data of the series object “s”.
Example 2
In the following example, we will see how the series.factorize() method encodes the elements of series with missing values.
# importing pandas package
import pandas as pd
# create a series
s = pd.Series([70, 52, None, 79, 34,], index= list('ijklm'))
print(s)
result = s.factorize()
print(result)
Output
The output is given below −
i 70.0 j 52.0 k NaN l 79.0 m 34.0 dtype: float64 (array([ 0, 1, -1, 2, 3], dtype=int32), Float64Index([70.0, 52.0, 79.0, 34.0], dtype='float64'))
As we can see in the above output block, the Series.factorize() method has encoded the data of the given series object with the numerical values. We can also observe that the missing values have been assigned with numerical value -1.
