Find top K frequent elements from a list of tuples in Python

We have a list of tuples and need to find the top K elements with highest values. For example, if K is 3, we need to find the three tuples with the largest second values.

Using defaultdict and sorted

This approach uses defaultdict to group elements and then sorts them by value to get the top K elements.

Example

import collections
from operator import itemgetter
from itertools import chain

# Input list initialization
listA = [[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]

# Set K
K = 3

# Given list
print("Given list:")
print(listA)
print("K value:", K)

# Using defaultdict
dict_ = collections.defaultdict(list)
new_list = list(chain.from_iterable(listA))

for elem in new_list:
    dict_[elem[0]].append(elem[1])

res = {k: sum(v) for k, v in dict_.items()}

# Using sorted to get top K elements
res = sorted(res.items(), key=itemgetter(1), reverse=True)[0:K]

# Output
print("Top 3 elements are:")
print(res)
Given list:
[[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
K value: 3
Top 3 elements are:
[('Tue', 768), ('Wed', 512), ('Fri', 341)]

Using sorted and itemgetter Directly

This simpler approach flattens the nested list and directly sorts the tuples by their second element (value) to get the top K results.

Example

from operator import itemgetter
from itertools import chain

# Input list initialization
listA = [[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]

# Set K
K = 3

# Given list
print("Given list:")
print(listA)
print("K value:", K)

# Using sorted with itemgetter
res = sorted(list(chain.from_iterable(listA)), 
            key=itemgetter(1), reverse=True)[0:K]

# Output
print("Top 3 elements are:")
print(res)
Given list:
[[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
K value: 3
Top 3 elements are:
[('Tue', 768), ('Wed', 512), ('Fri', 341)]

Using heapq for Better Performance

For larger datasets, using heapq.nlargest() provides better performance than sorting the entire list.

Example

import heapq
from itertools import chain

# Input list initialization
listA = [[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]

# Set K
K = 3

# Given list
print("Given list:")
print(listA)
print("K value:", K)

# Flatten the list and get top K using heapq
flattened = list(chain.from_iterable(listA))
res = heapq.nlargest(K, flattened, key=lambda x: x[1])

# Output
print("Top 3 elements are:")
print(res)
Given list:
[[('Mon', 126)], [('Tue', 768)], [('Wed', 512)], [('Thu', 13)], [('Fri', 341)]]
K value: 3
Top 3 elements are:
[('Tue', 768), ('Wed', 512), ('Fri', 341)]

Comparison

Method Time Complexity Best For
defaultdict + sorted O(n log n) When you need to group duplicate keys
sorted + itemgetter O(n log n) Simple cases with unique keys
heapq.nlargest O(n log k) Large datasets, small K

Conclusion

Use heapq.nlargest() for better performance when K is much smaller than the dataset size. The sorted() approach is simpler and works well for smaller lists.

Updated on: 2026-03-15T18:04:12+05:30

286 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements