Article Categories

Selected Reading

Performing Runs test of Randomness in Python

Machine Learning Server Side Programming Python

The Runs test of randomness is a non-parametric statistical test used to determine whether a sequence of data points is random or exhibits systematic patterns. This test analyzes "runs" consecutive sequences of values that are either above or below a certain threshold to assess the randomness of data.

Understanding the Runs Test

A run is defined as a consecutive sequence of values that are either above or below a specified threshold (typically the median). The Runs test examines whether the number of runs in a dataset significantly deviates from what would be expected in a truly random sequence.

The test assumes that in a random sequence, the number of runs follows a specific distribution. If the observed number of runs significantly differs from this expected distribution, it suggests the presence of patterns or bias in the data.

Z-Test Statistics Formula

The Runs test uses a Z-test statistic to evaluate significance ?

Z = (observed_runs - expected_runs) / standard_deviation

Where:

expected_runs = (2n - 1) / 3
standard_deviation = ?((16n - 29) / 90)
n = sample size

Implementation in Python

Algorithm Steps

Step 1: Input the data sequence

Step 2: Count the number of runs by comparing consecutive elements

Step 3: Calculate expected runs using the formula

Step 4: Calculate standard deviation

Step 5: Compute Z-score and interpret results

Example

def runs_test(data):
    n = len(data)
    num_runs = 1  
    
    # Count runs by comparing consecutive elements
    for i in range(1, n):
        if data[i] != data[i - 1]:
            num_runs += 1
    
    # Calculate expected runs and standard deviation
    expected_runs = (2 * n - 1) / 3
    std_deviation = ((16 * n - 29) / 90) ** 0.5
    
    # Calculate Z-score
    z_score = (num_runs - expected_runs) / std_deviation
    
    return num_runs, expected_runs, std_deviation, z_score

# Test data
data = [12, 10, 8, 9, 7, 5, 4, 6, 8, 10]

num_runs, expected_runs, std_deviation, z_score = runs_test(data)

print("Data:", data)
print("Number of Runs:", num_runs)
print("Expected Runs:", round(expected_runs, 2))
print("Standard Deviation:", round(std_deviation, 2))
print("Z-Score:", round(z_score, 2))

# Interpretation (using ? = 0.05, critical value = ±1.96)
if abs(z_score) <= 1.96:
    print("Result: Data appears random (not statistically significant)")
else:
    print("Result: Data shows non-random pattern (statistically significant)")

Data: [12, 10, 8, 9, 7, 5, 4, 6, 8, 10]
Number of Runs: 10
Expected Runs: 6.33
Standard Deviation: 1.21
Z-Score: 3.04
Result: Data shows non-random pattern (statistically significant)

Using SciPy for Runs Test

For more robust implementation, use SciPy's statistical functions ?

import numpy as np
from scipy import stats

def runs_test_scipy(data, threshold=None):
    if threshold is None:
        threshold = np.median(data)
    
    # Convert to binary sequence (above/below threshold)
    binary_seq = [1 if x >= threshold else 0 for x in data]
    
    # Count runs
    runs = 1
    for i in range(1, len(binary_seq)):
        if binary_seq[i] != binary_seq[i-1]:
            runs += 1
    
    n = len(data)
    expected = (2 * n - 1) / 3
    variance = (16 * n - 29) / 90
    z_score = (runs - expected) / np.sqrt(variance)
    
    # Two-tailed p-value
    p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))
    
    return runs, expected, z_score, p_value

# Example with different data
data = [1, 2, 1, 3, 2, 1, 4, 3, 2, 1, 5, 4]
runs, expected, z_score, p_value = runs_test_scipy(data)

print(f"Runs: {runs}")
print(f"Expected: {expected:.2f}")
print(f"Z-score: {z_score:.2f}")
print(f"P-value: {p_value:.4f}")
print(f"Random at ?=0.05: {p_value > 0.05}")

Runs: 12
Expected: 7.67
Z-score: 3.21
P-value: 0.0013
Random at ?=0.05: False

Limitations and Considerations

Sample Size: Requires at least 20 observations for reliable results. Small samples may not detect deviations from randomness accurately.
Independence Assumption: Assumes observations are independent. Autocorrelated data may produce biased results.
Threshold Selection: The choice of threshold significantly affects results. Typically use median or mean as threshold.
Interpretation: The test indicates departure from randomness but doesn't specify the nature of the pattern.

Conclusion

The Runs test is a valuable tool for assessing data randomness by analyzing consecutive sequences. Python provides efficient implementation through custom functions or SciPy libraries. Remember that statistical tests indicate probability of randomness rather than definitive proof.

Pranavnath

Updated on: 2026-03-27T10:33:30+05:30

1K+ Views

Previous Next