Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python-Remove element of list that are repeated less than k times
In data processing, we often need to filter out elements that don't meet certain frequency criteria. This article shows how to remove elements from a list that appear less than k times using Python.
Problem Definition
Given a list of elements and a threshold value k, we need to remove all elements whose frequency is less than k. For example, if k=3, only elements appearing 3 or more times should remain.
Algorithm
Step 1 Count frequency of each element using a dictionary or Counter
Step 2 Create a new list containing only elements with frequency ? k
Step 3 Return the filtered list
Using Dictionary and List Comprehension
This approach manually counts element frequencies using a dictionary ?
def remove_less_frequent(original_list, k):
# Count frequency of each element
element_count = {}
for element in original_list:
if element in element_count:
element_count[element] += 1
else:
element_count[element] = 1
# Filter elements with frequency >= k
filtered_list = [element for element in original_list
if element_count[element] >= k]
return filtered_list
# Example usage
numbers = [1, 0, 1, 1, 2, 3, 2, 2, 3, 3, 4, 5, 4, 4, 4, 5]
k_value = 3
result = remove_less_frequent(numbers, k_value)
print("Original list:", numbers)
print("Filtered list (k=3):", result)
Original list: [1, 0, 1, 1, 2, 3, 2, 2, 3, 3, 4, 5, 4, 4, 4, 5] Filtered list (k=3): [1, 1, 1, 2, 3, 2, 2, 3, 3, 4, 4, 4, 4]
Using Counter from Collections Module
The Counter class provides a cleaner approach for counting element frequencies ?
from collections import Counter
def remove_less_frequent_counter(original_list, k):
# Count frequencies using Counter
element_counter = Counter(original_list)
# Filter elements with frequency >= k
filtered_list = [element for element in original_list
if element_counter[element] >= k]
return filtered_list
# Example usage
numbers = [8, 8, 6, 6, 8, 8, 4, 6, 4, 4, 33, 33, 1, 2, 1, 2, 2, 0]
k_value = 3
result = remove_less_frequent_counter(numbers, k_value)
print("Original list:", numbers)
print("Filtered list (k=3):", result)
print("Element frequencies:", dict(Counter(numbers)))
Original list: [8, 8, 6, 6, 8, 8, 4, 6, 4, 4, 33, 33, 1, 2, 1, 2, 2, 0]
Filtered list (k=3): [8, 8, 6, 6, 8, 8, 4, 6, 4, 4, 2, 2, 2]
Element frequencies: {8: 4, 6: 3, 4: 3, 33: 2, 1: 2, 2: 3, 0: 1}
Comparison
| Method | Code Length | Readability | Best For |
|---|---|---|---|
| Dictionary | Longer | Good | Understanding the counting logic |
| Counter | Shorter | Excellent | Clean, production code |
Practical Example
Here's how you might use this in data preprocessing ?
from collections import Counter
def filter_frequent_words(words, min_frequency=2):
"""Remove words that appear less than min_frequency times"""
word_counts = Counter(words)
return [word for word in words if word_counts[word] >= min_frequency]
# Example: filtering words in text processing
words = ['python', 'is', 'great', 'python', 'rocks', 'is', 'awesome', 'python']
frequent_words = filter_frequent_words(words, 2)
print("Original words:", words)
print("Frequent words (?2):", frequent_words)
print("Word counts:", dict(Counter(words)))
Original words: ['python', 'is', 'great', 'python', 'rocks', 'is', 'awesome', 'python']
Frequent words (?2): ['python', 'is', 'python', 'is', 'python']
Word counts: {'python': 3, 'is': 2, 'great': 1, 'rocks': 1, 'awesome': 1}
Conclusion
Use Counter from collections for clean, readable code when filtering elements by frequency. The dictionary approach helps understand the underlying counting mechanism. Both methods effectively remove elements appearing less than k times, making your data cleaner for analysis.
