Article Categories

Selected Reading

Show the 68-95-99.7 rule in Statistics using Python

Python Server Side Programming Programming

Statistics provides us with powerful tools to analyze and understand data. One of the fundamental concepts in statistics is the 68-95-99.7 rule, also known as the empirical rule or the three-sigma rule. This rule allows us to make important inferences about the distribution of data based on its standard deviation.

Overview of the 68-95-99.7 Rule

The 68-95-99.7 rule provides a way to estimate the percentage of data that falls within a certain number of standard deviations from the mean in a normal distribution. According to this rule ?

Approximately 68% of the data falls within one standard deviation of the mean.
Approximately 95% of the data falls within two standard deviations of the mean.
Approximately 99.7% of the data falls within three standard deviations of the mean.

These percentages hold true for a dataset that follows a normal distribution, also known as a bell curve. Understanding this rule allows us to quickly assess the spread of data and identify outliers or unusual observations.

Implementing the 68-95-99.7 Rule in Python

To demonstrate the 68-95-99.7 rule in action, we will use Python and its popular data analysis libraries. Let's start by importing the required libraries and generating a random dataset ?

import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Generate a random dataset following normal distribution
data = np.random.normal(0, 1, 10000)

# Calculate mean and standard deviation
mean = np.mean(data)
std = np.std(data)

print(f"Mean: {mean:.4f}")
print(f"Standard Deviation: {std:.4f}")

Mean: 0.0027
Standard Deviation: 0.9973

Visualizing the Distribution

Let's create a histogram to visualize the data and the areas covered by the 68-95-99.7 rule ?

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
data = np.random.normal(0, 1, 10000)
mean = np.mean(data)
std = np.std(data)

plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, density=True, alpha=0.7, color='lightblue', edgecolor='black')

# Plot the mean and standard deviations
plt.axvline(mean, color='red', linestyle='dashed', linewidth=2, label='Mean')
plt.axvline(mean - std, color='green', linestyle='dashed', linewidth=2, label='±1 STD')
plt.axvline(mean + std, color='green', linestyle='dashed', linewidth=2)
plt.axvline(mean - 2*std, color='blue', linestyle='dashed', linewidth=2, label='±2 STD')
plt.axvline(mean + 2*std, color='blue', linestyle='dashed', linewidth=2)
plt.axvline(mean - 3*std, color='magenta', linestyle='dashed', linewidth=2, label='±3 STD')
plt.axvline(mean + 3*std, color='magenta', linestyle='dashed', linewidth=2)

plt.legend()
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Normal Distribution with 68-95-99.7 Rule')
plt.grid(True, alpha=0.3)
plt.show()

Calculating the Percentages

Now let's calculate the actual percentages of data falling within each range and verify the 68-95-99.7 rule ?

import numpy as np

np.random.seed(42)
data = np.random.normal(0, 1, 10000)
mean = np.mean(data)
std = np.std(data)

# Calculate the percentage within one standard deviation
pct_within_1_std = np.sum(np.logical_and(data >= mean - std, data <= mean + std)) / len(data)

# Calculate the percentage within two standard deviations
pct_within_2_std = np.sum(np.logical_and(data >= mean - 2*std, data <= mean + 2*std)) / len(data)

# Calculate the percentage within three standard deviations
pct_within_3_std = np.sum(np.logical_and(data >= mean - 3*std, data <= mean + 3*std)) / len(data)

print("68-95-99.7 Rule Verification:")
print(f"Percentage within 1 standard deviation: {pct_within_1_std:.2%}")
print(f"Percentage within 2 standard deviations: {pct_within_2_std:.2%}")
print(f"Percentage within 3 standard deviations: {pct_within_3_std:.2%}")

print("\nExpected vs Actual:")
print(f"1 STD - Expected: 68.0%, Actual: {pct_within_1_std:.1%}")
print(f"2 STD - Expected: 95.0%, Actual: {pct_within_2_std:.1%}")
print(f"3 STD - Expected: 99.7%, Actual: {pct_within_3_std:.1%}")

68-95-99.7 Rule Verification:
Percentage within 1 standard deviation: 68.27%
Percentage within 2 standard deviations: 95.61%
Percentage within 3 standard deviations: 99.70%

Expected vs Actual:
1 STD - Expected: 68.0%, Actual: 68.3%
2 STD - Expected: 95.0%, Actual: 95.6%
3 STD - Expected: 99.7%, Actual: 99.7%

Practical Applications

The 68-95-99.7 rule finds application in various fields:

Quality Control: Identifying defective products in manufacturing
Financial Analysis: Assessing risk and return on investments
Healthcare Research: Understanding patient characteristics and test results
Educational Testing: Interpreting standardized test scores

Limitations and Considerations

While the 68-95-99.7 rule is valuable, it has important limitations:

Only applies to normal distributions
Outliers can significantly impact the accuracy of percentages
Skewed distributions require different statistical approaches
Small sample sizes may not follow the rule precisely

Conclusion

The 68-95-99.7 rule is a powerful concept that helps us understand data distribution based on standard deviation. Using Python and NumPy, we can easily verify this rule and apply it to real-world data analysis. This rule enables quick assessment of data spread and identification of potential outliers in normally distributed datasets.

Priya Sharma

Updated on: 2026-03-27T12:39:46+05:30

1K+ Views

Previous Next