Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Compute indexer and mask for new index even for non-uniquely valued objects
To compute indexer and mask for new index even for non-uniquely valued objects, use the index.get_indexer_non_unique() method. This method handles duplicate values in the index and returns both the positions of matches and missing values.
Syntax
The syntax for the get_indexer_non_unique() method is ?
index.get_indexer_non_unique(target)
Parameters
- target − An array-like object containing the values to search for in the index
Return Value
The method returns a tuple containing ?
- indexer − Array of indices where matches are found (-1 for missing values)
- missing − Array of indices in the target that were not found
Example
Let's create a Pandas index with duplicate values and compute indexer and mask ?
import pandas as pd
# Creating Pandas index with some non-unique values
index = pd.Index([10, 20, 30, 40, 40, 50, 60, 60, 60, 70])
# Display the Pandas index
print("Pandas Index...\n", index)
# Return the number of elements in the index
print("\nNumber of elements in the index...\n", index.size)
# Compute indexer and mask for target values
# Returns (-1) for values not found in index
# Handles non-unique values by returning all matching positions
target_values = [30, 40, 90, 100, 50, 60]
indexer, missing = index.get_indexer_non_unique(target_values)
print("\nTarget values:", target_values)
print("Indexer array:", indexer)
print("Missing indices:", missing)
Pandas Index... Int64Index([10, 20, 30, 40, 40, 50, 60, 60, 60, 70], dtype='int64') Number of elements in the index... 10 Target values: [30, 40, 90, 100, 50, 60] Indexer array: [ 2 3 4 -1 -1 5 6 7 8] Missing indices: [2 3]
How It Works
The method processes each target value as follows ?
- Value 30 − Found at index position 2
- Value 40 − Found at positions 3 and 4 (both occurrences returned)
- Value 90 − Not found, marked as -1
- Value 100 − Not found, marked as -1
- Value 50 − Found at index position 5
- Value 60 − Found at positions 6, 7, and 8 (all occurrences returned)
The missing array [2, 3] indicates that the 3rd and 4th elements in the target array (90 and 100) were not found.
Key Points
- Unlike
get_indexer(), this method handles duplicate values properly - Returns all matching positions for duplicate values
- Missing values are marked with -1 in the indexer array
- The missing array contains indices of target values that weren't found
Conclusion
The get_indexer_non_unique() method is essential for handling non-unique index values in Pandas. It returns both the positions of matches and identifies missing values, making it ideal for data alignment operations with duplicate entries.
