Statistics from a Large Sample - Problem

Imagine you're analyzing a massive dataset of pixel intensities from millions of images, where each pixel value ranges from 0 (black) to 255 (white). The dataset is so large that instead of storing individual values, you have a frequency count array where count[k] represents how many times the value k appears.

Your task is to calculate five key statistical measures from this compressed representation:

  • Minimum: The smallest value that appears at least once
  • Maximum: The largest value that appears at least once
  • Mean: The average of all values (sum of all elements รท total count)
  • Median: The middle value when all elements are sorted (or average of two middle values for even counts)
  • Mode: The most frequently occurring value (guaranteed to be unique)

Return these statistics as an array of floating-point numbers: [minimum, maximum, mean, median, mode]

Note: Answers within 10-5 of the actual answer will be accepted.

Input & Output

example_1.py โ€” Basic Case
$ Input: count = [0,1,3,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
โ€บ Output: [1.00000, 3.00000, 2.37500, 2.50000, 3.00000]
๐Ÿ’ก Note: Dataset: [1,2,2,2,3,3,3,3]. Min=1, Max=3, Mean=(1+6+12)/8=2.375, Median=(2+3)/2=2.5, Mode=3 (appears 4 times)
example_2.py โ€” Single Element
$ Input: count = [0,4,3,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
โ€บ Output: [1.00000, 4.00000, 2.18182, 2.00000, 1.00000]
๐Ÿ’ก Note: Dataset: [1,1,1,1,2,2,2,3,3,4,4]. Mode=1 (appears 4 times), total 11 elements so median is 6th element which is 2
example_3.py โ€” Edge Case
$ Input: count = [1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1]
โ€บ Output: [0.00000, 255.00000, 127.50000, 127.50000, 0.00000]
๐Ÿ’ก Note: Dataset: [0,255]. Min=0, Max=255, Mean=(0+255)/2=127.5, Median=(0+255)/2=127.5, Mode=0 (both appear once, but 0 comes first)

Constraints

  • count.length == 256
  • 0 โ‰ค count[i] โ‰ค 109
  • 1 โ‰ค sum(count) โ‰ค 109
  • It is guaranteed that mode is unique
  • Answers within 10-5 of the actual answer will be accepted

Visualization

Tap to expand
Statistics from Frequency DataInput: Frequency ArrayValue 03Value 15Value 20Value 32Step 1: Single Pass AnalysisOne Pass Through Frequency Arrayโ€ข Min = 0 (first non-zero)โ€ข Max = 3 (last non-zero)โ€ข Mode = 1 (highest frequency: 5)Step 2: Calculate MeanWeighted Sum = (0ร—3) + (1ร—5) + (3ร—2) = 11Mean = 11 รท 10 = 1.1Step 3: Find Median (Cumulative Frequency)0: freq=3cum: 31: freq=5cum: 83: freq=2cum: 10Middle positions: 4,5 โ†’ Value 1๐ŸŽฏ Key AdvantagesTime: O(1) - Always 256 iterationsSpace: O(1) - No extra data structuresโœ“ No dataset reconstruction neededโœ“ Memory efficient for large datasetsโœ“ Scales perfectly with data sizeโœ“ Mathematical eleganceExample Performance:โ€ข 1,000 elements: ~256 operationsโ€ข 1,000,000 elements: ~256 operationsโ€ข 1,000,000,000 elements: ~256 operationsSame performance regardless!
Understanding the Visualization
1
Frequency Table Analysis
Extract min, max, mode, and calculate weighted sum in single pass
2
Mean Calculation
Use formula: ฮฃ(value ร— frequency) รท total_count
3
Median via Cumulative Frequency
Find middle position(s) using running frequency totals
Key Takeaway
๐ŸŽฏ Key Insight: Frequency data contains all information needed for statistics - no reconstruction required! Use weighted sums for mean and cumulative frequencies for median.
Asked in
Google 42 Amazon 35 Meta 28 Microsoft 31
67.2K Views
Medium Frequency
~18 min Avg. Time
1.8K Likes
Ln 1, Col 1
Smart Actions
๐Ÿ’ก Explanation
AI Ready
๐Ÿ’ก Suggestion Tab to accept Esc to dismiss
// Output will appear here after running code
Code Editor Closed
Click the red button to reopen