Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a program in Python to print numeric index array with sorted distinct values in a given series
When working with pandas Series, you often need to convert categorical data into numeric indices. The pd.factorize() function creates numeric indices for distinct values, with an option to sort the unique values alphabetically.
Understanding pd.factorize()
The pd.factorize() function returns two arrays:
codes − numeric indices for each element
uniques − array of distinct values
Without Sorting
By default, pd.factorize() assigns indices based on the order of first appearance ?
import pandas as pd
fruits = ['mango', 'orange', 'apple', 'orange', 'mango', 'kiwi', 'pomegranate']
index, unique_values = pd.factorize(fruits)
print("Without sorting of distinct values - numeric array index")
print(index)
print(unique_values)
Without sorting of distinct values - numeric array index [0 1 2 1 0 3 4] ['mango' 'orange' 'apple' 'kiwi' 'pomegranate']
With Sorting
Setting sort=True sorts the unique values alphabetically and reassigns indices accordingly ?
import pandas as pd
fruits = ['mango', 'orange', 'apple', 'orange', 'mango', 'kiwi', 'pomegranate']
sorted_index, unique_values = pd.factorize(fruits, sort=True)
print("Sorted distinct values - numeric array index")
print(sorted_index)
print(unique_values)
Sorted distinct values - numeric array index [2 3 0 3 2 1 4] ['apple' 'kiwi' 'mango' 'orange' 'pomegranate']
Complete Example
Here's a complete program demonstrating both approaches ?
import pandas as pd
fruits = ['mango', 'orange', 'apple', 'orange', 'mango', 'kiwi', 'pomegranate']
# Without sorting
index, unique_values = pd.factorize(fruits)
print("Without sorting of distinct values - numeric array index")
print("Indices:", index)
print("Unique values:", unique_values)
print()
# With sorting
sorted_index, sorted_unique_values = pd.factorize(fruits, sort=True)
print("Sorted distinct values - numeric array index")
print("Indices:", sorted_index)
print("Unique values:", sorted_unique_values)
Without sorting of distinct values - numeric array index Indices: [0 1 2 1 0 3 4] Unique values: ['mango' 'orange' 'apple' 'kiwi' 'pomegranate'] Sorted distinct values - numeric array index Indices: [2 3 0 3 2 1 4] Unique values: ['apple' 'kiwi' 'mango' 'orange' 'pomegranate']
Key Points
Default behavior assigns indices based on order of appearance
sort=Truealphabetically sorts unique values and reassigns indicesUseful for converting categorical data to numeric format
Conclusion
The pd.factorize() function efficiently converts categorical data to numeric indices. Use sort=True when you need alphabetically ordered unique values with corresponding indices.
