Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Performing Runs test of Randomness in Python
The Runs test of randomness is a non-parametric statistical test used to determine whether a sequence of data points is random or exhibits systematic patterns. This test analyzes "runs" consecutive sequences of values that are either above or below a certain threshold to assess the randomness of data.
Understanding the Runs Test
A run is defined as a consecutive sequence of values that are either above or below a specified threshold (typically the median). The Runs test examines whether the number of runs in a dataset significantly deviates from what would be expected in a truly random sequence.
The test assumes that in a random sequence, the number of runs follows a specific distribution. If the observed number of runs significantly differs from this expected distribution, it suggests the presence of patterns or bias in the data.
Z-Test Statistics Formula
The Runs test uses a Z-test statistic to evaluate significance ?
Z = (observed_runs - expected_runs) / standard_deviation
Where:
- expected_runs = (2n - 1) / 3
- standard_deviation = ?((16n - 29) / 90)
- n = sample size
Implementation in Python
Algorithm Steps
Step 1: Input the data sequence
Step 2: Count the number of runs by comparing consecutive elements
Step 3: Calculate expected runs using the formula
Step 4: Calculate standard deviation
Step 5: Compute Z-score and interpret results
Example
def runs_test(data):
n = len(data)
num_runs = 1
# Count runs by comparing consecutive elements
for i in range(1, n):
if data[i] != data[i - 1]:
num_runs += 1
# Calculate expected runs and standard deviation
expected_runs = (2 * n - 1) / 3
std_deviation = ((16 * n - 29) / 90) ** 0.5
# Calculate Z-score
z_score = (num_runs - expected_runs) / std_deviation
return num_runs, expected_runs, std_deviation, z_score
# Test data
data = [12, 10, 8, 9, 7, 5, 4, 6, 8, 10]
num_runs, expected_runs, std_deviation, z_score = runs_test(data)
print("Data:", data)
print("Number of Runs:", num_runs)
print("Expected Runs:", round(expected_runs, 2))
print("Standard Deviation:", round(std_deviation, 2))
print("Z-Score:", round(z_score, 2))
# Interpretation (using ? = 0.05, critical value = ±1.96)
if abs(z_score) <= 1.96:
print("Result: Data appears random (not statistically significant)")
else:
print("Result: Data shows non-random pattern (statistically significant)")
Data: [12, 10, 8, 9, 7, 5, 4, 6, 8, 10] Number of Runs: 10 Expected Runs: 6.33 Standard Deviation: 1.21 Z-Score: 3.04 Result: Data shows non-random pattern (statistically significant)
Using SciPy for Runs Test
For more robust implementation, use SciPy's statistical functions ?
import numpy as np
from scipy import stats
def runs_test_scipy(data, threshold=None):
if threshold is None:
threshold = np.median(data)
# Convert to binary sequence (above/below threshold)
binary_seq = [1 if x >= threshold else 0 for x in data]
# Count runs
runs = 1
for i in range(1, len(binary_seq)):
if binary_seq[i] != binary_seq[i-1]:
runs += 1
n = len(data)
expected = (2 * n - 1) / 3
variance = (16 * n - 29) / 90
z_score = (runs - expected) / np.sqrt(variance)
# Two-tailed p-value
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
return runs, expected, z_score, p_value
# Example with different data
data = [1, 2, 1, 3, 2, 1, 4, 3, 2, 1, 5, 4]
runs, expected, z_score, p_value = runs_test_scipy(data)
print(f"Runs: {runs}")
print(f"Expected: {expected:.2f}")
print(f"Z-score: {z_score:.2f}")
print(f"P-value: {p_value:.4f}")
print(f"Random at ?=0.05: {p_value > 0.05}")
Runs: 12 Expected: 7.67 Z-score: 3.21 P-value: 0.0013 Random at ?=0.05: False
Limitations and Considerations
Sample Size: Requires at least 20 observations for reliable results. Small samples may not detect deviations from randomness accurately.
Independence Assumption: Assumes observations are independent. Autocorrelated data may produce biased results.
Threshold Selection: The choice of threshold significantly affects results. Typically use median or mean as threshold.
Interpretation: The test indicates departure from randomness but doesn't specify the nature of the pattern.
Conclusion
The Runs test is a valuable tool for assessing data randomness by analyzing consecutive sequences. Python provides efficient implementation through custom functions or SciPy libraries. Remember that statistical tests indicate probability of randomness rather than definitive proof.
