Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Find k longest words in given list in Python
When working with lists of words, you often need to find the k longest words. Python provides several approaches to accomplish this task efficiently using built-in functions like sorted(), enumerate(), and heapq.
Using sorted() with Length as Key
The simplest approach is to sort words by length in descending order and slice the first k elements ?
def k_longest_words(words, k):
return sorted(words, key=len, reverse=True)[:k]
words = ['Earth', 'Moonshine', 'Aurora', 'Snowflakes', 'Sunshine']
k = 3
result = k_longest_words(words, k)
print(f"Top {k} longest words: {result}")
Top 3 longest words: ['Snowflakes', 'Moonshine', 'Sunshine']
Using sorted() with Stable Ordering
When words have the same length, you might want to preserve the original order. This approach uses enumerate() to maintain stability ?
def k_longest_stable(words, k):
indexed_words = list(enumerate(words))
sorted_words = sorted(indexed_words, key=lambda x: (-len(x[1]), x[0]))
return [word for _, word in sorted_words[:k]]
words = ['Earth', 'Moonshine', 'Aurora', 'Snowflakes', 'Sunshine']
k = 3
result = k_longest_stable(words, k)
print(f"Top {k} longest words (stable): {result}")
Top 3 longest words (stable): ['Snowflakes', 'Moonshine', 'Sunshine']
Using heapq for Large Lists
For very large lists, using heapq.nlargest() can be more efficient as it doesn't sort the entire list ?
import heapq
def k_longest_heap(words, k):
return heapq.nlargest(k, words, key=len)
words = ['Earth', 'Moonshine', 'Aurora', 'Snowflakes', 'Sunshine', 'Sky', 'Ocean']
k = 4
result = k_longest_heap(words, k)
print(f"Top {k} longest words using heap: {result}")
Top 4 longest words using heap: ['Snowflakes', 'Moonshine', 'Sunshine', 'Aurora']
Comparison of Methods
| Method | Time Complexity | Best For | Preserves Order |
|---|---|---|---|
sorted() |
O(n log n) | Small to medium lists | No |
sorted() with enumerate |
O(n log n) | When stable sorting needed | Yes |
heapq.nlargest() |
O(n log k) | Large lists, small k | No |
Handling Edge Cases
Here's a robust function that handles common edge cases ?
def find_k_longest_words(words, k):
if not words or k <= 0:
return []
if k >= len(words):
return sorted(words, key=len, reverse=True)
return sorted(words, key=len, reverse=True)[:k]
# Test with various inputs
test_cases = [
(['Python', 'Java', 'C++', 'JavaScript'], 2),
(['Hi'], 3), # k greater than list length
([], 2), # empty list
(['Same', 'Size'], 1)
]
for words, k in test_cases:
result = find_k_longest_words(words, k)
print(f"Words: {words}, k={k} ? {result}")
Words: ['Python', 'Java', 'C++', 'JavaScript'], k=2 ? ['JavaScript', 'Python'] Words: ['Hi'], k=3 ? ['Hi'] Words: [], k=2 ? [] Words: ['Same', 'Size'], k=1 ? ['Same']
Conclusion
Use sorted() with key=len for simple cases. For large datasets with small k values, heapq.nlargest() is more efficient. When order preservation matters for same-length words, combine sorted() with enumerate().
