Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Kth Largest Element in a Stream in Python
The Kth Largest Element in a Stream problem involves designing a class that efficiently finds the kth largest element as new elements are added to a data stream. This is particularly useful in real-time data processing scenarios.
The KthLargest class maintains a stream of numbers and returns the kth largest element each time a new number is added. Note that we want the kth largest in sorted order, not the kth distinct element.
Problem Understanding
Given k = 3 and initial elements [4, 5, 8, 2], when we add elements 3, 5, 10, 9, 4 sequentially, we should get 4, 5, 5, 8, 8 as the 3rd largest elements respectively.
Basic Implementation
The straightforward approach involves maintaining an array and sorting it after each addition ?
class KthLargest:
def __init__(self, k, nums):
self.array = nums
self.k = k
def add(self, val):
self.array.append(val)
self.array.sort()
return self.array[len(self.array) - self.k]
# Example usage
kth_largest = KthLargest(3, [4, 5, 8, 2])
print(kth_largest.add(3)) # 3rd largest among [2,3,4,5,8]
print(kth_largest.add(5)) # 3rd largest among [2,3,4,5,5,8]
print(kth_largest.add(10)) # 3rd largest among [2,3,4,5,5,8,10]
print(kth_largest.add(9)) # 3rd largest among [2,3,4,5,5,8,9,10]
print(kth_largest.add(4)) # 3rd largest among [2,3,4,4,5,5,8,9,10]
4 5 5 8 8
Optimized Implementation Using Min-Heap
A more efficient approach uses a min-heap of size k. This reduces time complexity from O(n log n) to O(log k) per addition ?
import heapq
class KthLargestOptimized:
def __init__(self, k, nums):
self.k = k
self.heap = nums
heapq.heapify(self.heap)
# Keep only k largest elements
while len(self.heap) > k:
heapq.heappop(self.heap)
def add(self, val):
heapq.heappush(self.heap, val)
if len(self.heap) > self.k:
heapq.heappop(self.heap)
return self.heap[0] # Root of min-heap is kth largest
# Example usage
kth_largest_opt = KthLargestOptimized(3, [4, 5, 8, 2])
print(kth_largest_opt.add(3))
print(kth_largest_opt.add(5))
print(kth_largest_opt.add(10))
print(kth_largest_opt.add(9))
print(kth_largest_opt.add(4))
4 5 5 8 8
How the Min-Heap Approach Works
The min-heap maintains exactly k elements - the k largest elements seen so far. The root (minimum element in the heap) represents the kth largest element overall.
Comparison
| Approach | Time Complexity (add) | Space Complexity | Best For |
|---|---|---|---|
| Array + Sort | O(n log n) | O(n) | Simple implementation |
| Min-Heap | O(log k) | O(k) | Frequent additions, large streams |
Conclusion
For finding the kth largest element in a stream, use a min-heap of size k for optimal performance. The basic sorting approach works for small datasets, but the heap-based solution scales much better with larger streams and frequent additions.
