Write a program in Python to print numeric index array with sorted distinct values in a given series

When working with pandas Series, you often need to convert categorical data into numeric indices. The pd.factorize() function creates numeric indices for distinct values, with an option to sort the unique values alphabetically.

Understanding pd.factorize()

The pd.factorize() function returns two arrays:

  • codes − numeric indices for each element

  • uniques − array of distinct values

Without Sorting

By default, pd.factorize() assigns indices based on the order of first appearance ?

import pandas as pd

fruits = ['mango', 'orange', 'apple', 'orange', 'mango', 'kiwi', 'pomegranate']
index, unique_values = pd.factorize(fruits)

print("Without sorting of distinct values - numeric array index")
print(index)
print(unique_values)
Without sorting of distinct values - numeric array index
[0 1 2 1 0 3 4]
['mango' 'orange' 'apple' 'kiwi' 'pomegranate']

With Sorting

Setting sort=True sorts the unique values alphabetically and reassigns indices accordingly ?

import pandas as pd

fruits = ['mango', 'orange', 'apple', 'orange', 'mango', 'kiwi', 'pomegranate']
sorted_index, unique_values = pd.factorize(fruits, sort=True)

print("Sorted distinct values - numeric array index")
print(sorted_index)
print(unique_values)
Sorted distinct values - numeric array index
[2 3 0 3 2 1 4]
['apple' 'kiwi' 'mango' 'orange' 'pomegranate']

Complete Example

Here's a complete program demonstrating both approaches ?

import pandas as pd

fruits = ['mango', 'orange', 'apple', 'orange', 'mango', 'kiwi', 'pomegranate']

# Without sorting
index, unique_values = pd.factorize(fruits)
print("Without sorting of distinct values - numeric array index")
print("Indices:", index)
print("Unique values:", unique_values)
print()

# With sorting
sorted_index, sorted_unique_values = pd.factorize(fruits, sort=True)
print("Sorted distinct values - numeric array index")
print("Indices:", sorted_index)
print("Unique values:", sorted_unique_values)
Without sorting of distinct values - numeric array index
Indices: [0 1 2 1 0 3 4]
Unique values: ['mango' 'orange' 'apple' 'kiwi' 'pomegranate']

Sorted distinct values - numeric array index
Indices: [2 3 0 3 2 1 4]
Unique values: ['apple' 'kiwi' 'mango' 'orange' 'pomegranate']

Key Points

  • Default behavior assigns indices based on order of appearance

  • sort=True alphabetically sorts unique values and reassigns indices

  • Useful for converting categorical data to numeric format

Conclusion

The pd.factorize() function efficiently converts categorical data to numeric indices. Use sort=True when you need alphabetically ordered unique values with corresponding indices.

Updated on: 2026-03-25T16:31:49+05:30

159 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements