Article Categories

Selected Reading

Python – Remove Columns of Duplicate Elements

Python Server Side Programming Programming

When working with lists of lists, you may need to remove columns that contain duplicate elements within each row. Python provides an elegant solution using sets to track duplicates and list comprehension to filter columns.

Understanding the Problem

Given a list of lists, we want to remove columns (same index positions) where any row has duplicate elements at that position within the same row.

Solution Using Set-Based Duplicate Detection

The approach involves creating a helper function that identifies duplicate positions and then filtering out those columns ?

from itertools import chain

def find_duplicate_positions(row):
    seen = set()
    for i, elem in enumerate(row):
        if elem not in seen:
            seen.add(elem)
        else:
            yield i

# Sample data - list of lists
data = [[5, 1, 6, 7, 9], [6, 3, 1, 9, 1], [4, 2, 9, 8, 9], [5, 1, 6, 7, 3]]

print("Original list:")
print(data)

# Find all duplicate positions across all rows
duplicate_positions = set(chain.from_iterable(find_duplicate_positions(row) for row in data))

# Remove columns at duplicate positions
result = [[elem for i, elem in enumerate(row) if i not in duplicate_positions] for row in data]

print("After removing duplicate columns:")
print(result)

Original list:
[[5, 1, 6, 7, 9], [6, 3, 1, 9, 1], [4, 2, 9, 8, 9], [5, 1, 6, 7, 3]]
After removing duplicate columns:
[[5, 1, 6, 7], [6, 3, 1, 9], [4, 2, 9, 8], [5, 1, 6, 7]]

How It Works

The algorithm works in three main steps:

Duplicate Detection: For each row, the function tracks seen elements in a set and yields positions where duplicates occur
Position Aggregation: All duplicate positions from all rows are collected into a single set using chain.from_iterable()
Column Filtering: List comprehension creates new rows excluding elements at duplicate positions

Alternative Approach Using Pandas

For larger datasets, pandas provides a more efficient solution ?

import pandas as pd

# Convert to DataFrame
data = [[5, 1, 6, 7, 9], [6, 3, 1, 9, 1], [4, 2, 9, 8, 9], [5, 1, 6, 7, 3]]
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Find columns with no duplicates in any row
duplicate_cols = set()
for idx, row in df.iterrows():
    duplicate_cols.update(row[row.duplicated()].index)

# Keep only columns without duplicates
clean_df = df.drop(columns=duplicate_cols)
print("\nAfter removing duplicate columns:")
print(clean_df)

Original DataFrame:
   0  1  2  3  4
0  5  1  6  7  9
1  6  3  1  9  1
2  4  2  9  8  9
3  5  1  6  7  3

After removing duplicate columns:
   0  1  2  3
0  5  1  6  7
1  6  3  1  9
2  4  2  9  8
3  5  1  6  7

Conclusion

Use set-based tracking with list comprehension for simple duplicate column removal. For larger datasets, pandas provides more efficient DataFrame operations with better performance.

AmitDiwan

Updated on: 2026-03-26T02:41:09+05:30

247 Views

Previous Next