Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
Find top K frequent elements from a list of tuples in Python
We have a list of tuples and need to find the top K elements with highest values. For example, if K is 3, we need to find the three tuples with the largest second values.
Using defaultdict and sorted
This approach uses defaultdict to group elements and then sorts them by value to get the top K elements.
Example
import collections
from operator import itemgetter
from itertools import chain
# Input list initialization
listA = [[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
# Set K
K = 3
# Given list
print("Given list:")
print(listA)
print("K value:", K)
# Using defaultdict
dict_ = collections.defaultdict(list)
new_list = list(chain.from_iterable(listA))
for elem in new_list:
dict_[elem[0]].append(elem[1])
res = {k: sum(v) for k, v in dict_.items()}
# Using sorted to get top K elements
res = sorted(res.items(), key=itemgetter(1), reverse=True)[0:K]
# Output
print("Top 3 elements are:")
print(res)
Given list:
[[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
K value: 3
Top 3 elements are:
[('Tue', 768), ('Wed', 512), ('Fri', 341)]
Using sorted and itemgetter Directly
This simpler approach flattens the nested list and directly sorts the tuples by their second element (value) to get the top K results.
Example
from operator import itemgetter
from itertools import chain
# Input list initialization
listA = [[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
# Set K
K = 3
# Given list
print("Given list:")
print(listA)
print("K value:", K)
# Using sorted with itemgetter
res = sorted(list(chain.from_iterable(listA)),
key=itemgetter(1), reverse=True)[0:K]
# Output
print("Top 3 elements are:")
print(res)
Given list:
[[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
K value: 3
Top 3 elements are:
[('Tue', 768), ('Wed', 512), ('Fri', 341)]
Using heapq for Better Performance
For larger datasets, using heapq.nlargest() provides better performance than sorting the entire list.
Example
import heapq
from itertools import chain
# Input list initialization
listA = [[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
# Set K
K = 3
# Given list
print("Given list:")
print(listA)
print("K value:", K)
# Flatten the list and get top K using heapq
flattened = list(chain.from_iterable(listA))
res = heapq.nlargest(K, flattened, key=lambda x: x[1])
# Output
print("Top 3 elements are:")
print(res)
Given list:
[[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
K value: 3
Top 3 elements are:
[('Tue', 768), ('Wed', 512), ('Fri', 341)]
Comparison
| Method | Time Complexity | Best For |
|---|---|---|
| defaultdict + sorted | O(n log n) | When you need to group duplicate keys |
| sorted + itemgetter | O(n log n) | Simple cases with unique keys |
| heapq.nlargest | O(n log k) | Large datasets, small K |
Conclusion
Use heapq.nlargest() for better performance when K is much smaller than the dataset size. The sorted() approach is simpler and works well for smaller lists.
Advertisements
