Design an Array Statistics Tracker - Problem

Design an Array Statistics Tracker

You need to design a smart data structure that can dynamically track statistical measures of numbers as they are added and removed. Think of it as a real-time analytics system that maintains running statistics!

Your StatisticsTracker class should support:

  • Adding numbers: Insert new values into the tracker
  • Removing numbers: Remove the earliest added number (FIFO - First In, First Out)
  • Computing statistics: Calculate mean, median, and mode on-demand

Statistical Definitions:

  • Mean: Sum of all values รท count of values (floored to integer)
  • Median: Middle value when sorted. If even count, take the larger of the two middle values
  • Mode: Most frequently occurring value. If multiple modes exist, return the smallest one

The challenge is to efficiently maintain these statistics as the data changes dynamically through additions and removals.

Input & Output

example_1.py โ€” Basic Operations
$ Input: ["StatisticsTracker", "addNumber", "addNumber", "addNumber", "getMean", "getMedian", "getMode"] [[], [1], [3], [1], [], [], []]
โ€บ Output: [null, null, null, null, 1, 1, 1]
๐Ÿ’ก Note: Initialize tracker, add 1, 3, 1. Mean = (1+3+1)/3 = 1, Median of [1,1,3] = 1 (middle), Mode = 1 (appears twice)
example_2.py โ€” With Removal
$ Input: ["StatisticsTracker", "addNumber", "addNumber", "addNumber", "removeFirstAddedNumber", "getMean", "getMedian", "getMode"] [[], [4], [2], [1], [], [], [], []]
โ€บ Output: [null, null, null, null, null, 1, 2, 1]
๐Ÿ’ก Note: Add 4,2,1 then remove first (4). Remaining: [2,1]. Mean = (2+1)/2 = 1, Median of [1,2] = 2 (larger middle), Mode = 1 (lexically smaller)
example_3.py โ€” Edge Case Single Element
$ Input: ["StatisticsTracker", "addNumber", "getMean", "getMedian", "getMode", "removeFirstAddedNumber", "getMean"] [[], [5], [], [], [], [], []]
โ€บ Output: [null, null, 5, 5, 5, null, 0]
๐Ÿ’ก Note: Single element case: all statistics equal the element. After removal, empty tracker returns 0 for mean.

Visualization

Tap to expand
Real-Time Statistics DashboardData StreamNew: $105.50Process: $104.20Expire: $102.10Multi-Structure ProcessingQueue (Order)Sorted (Median)Freq Map (Mode)Running SumLive StatisticsMean: $104.27Median: $104.20Mode: $104.00โšก Updated in O(log n) timePerformance Metricsโ€ข Add/Remove: O(log n) โ€ข Query Mean: O(1) โ€ข Query Median: O(log n) โ€ข Query Mode: O(1)โ€ข Memory: O(n) total โ€ข Suitable for real-time trading systems
Understanding the Visualization
1
Data Ingestion
New stock prices arrive continuously and must be stored in processing order
2
Data Expiration
Old prices expire (FIFO) to maintain a sliding window of recent data
3
Real-time Analytics
Traders need instant mean, median, and mode calculations for decision making
4
Performance Optimization
Multiple data structures work together to provide sub-second response times
Key Takeaway
๐ŸŽฏ Key Insight: Instead of recalculating statistics from scratch each time, maintain multiple specialized data structures that can be updated incrementally. This transforms expensive O(n log n) operations into efficient O(log n) or O(1) operations.

Time & Space Complexity

Time Complexity
โฑ๏ธ
O(log n)

Add/Remove: O(log n) for sorted structure updates. Mean: O(1), Median: O(log n), Mode: O(1) amortized

n
2n
โšก Linearithmic
Space Complexity
O(n)

Additional space for frequency map and sorted structure, but still O(n) overall

n
2n
โšก Linearithmic Space

Constraints

  • 1 โ‰ค number โ‰ค 105
  • At most 104 calls to addNumber and removeFirstAddedNumber
  • At most 104 calls to getMean, getMedian, and getMode
  • removeFirstAddedNumber is called only when the array is non-empty
  • Statistics methods are called only when the array is non-empty
Asked in
Google 32 Amazon 28 Meta 24 Microsoft 18
38.4K Views
High Frequency
~25 min Avg. Time
1.3K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen