Show the 68-95-99.7 rule in Statistics using Python

Statistics provides us with powerful tools to analyze and understand data. One of the fundamental concepts in statistics is the 68-95-99.7 rule, also known as the empirical rule or the three-sigma rule. This rule allows us to make important inferences about the distribution of data based on its standard deviation.

Overview of the 68-95-99.7 Rule

The 68-95-99.7 rule provides a way to estimate the percentage of data that falls within a certain number of standard deviations from the mean in a normal distribution. According to this rule ?

  • Approximately 68% of the data falls within one standard deviation of the mean.

  • Approximately 95% of the data falls within two standard deviations of the mean.

  • Approximately 99.7% of the data falls within three standard deviations of the mean.

These percentages hold true for a dataset that follows a normal distribution, also known as a bell curve. Understanding this rule allows us to quickly assess the spread of data and identify outliers or unusual observations.

Implementing the 68-95-99.7 Rule in Python

To demonstrate the 68-95-99.7 rule in action, we will use Python and its popular data analysis libraries. Let's start by importing the required libraries and generating a random dataset ?

import numpy as np
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

# Generate a random dataset following normal distribution
data = np.random.normal(0, 1, 10000)

# Calculate mean and standard deviation
mean = np.mean(data)
std = np.std(data)

print(f"Mean: {mean:.4f}")
print(f"Standard Deviation: {std:.4f}")
Mean: 0.0027
Standard Deviation: 0.9973

Visualizing the Distribution

Let's create a histogram to visualize the data and the areas covered by the 68-95-99.7 rule ?

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)
data = np.random.normal(0, 1, 10000)
mean = np.mean(data)
std = np.std(data)

plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, density=True, alpha=0.7, color='lightblue', edgecolor='black')

# Plot the mean and standard deviations
plt.axvline(mean, color='red', linestyle='dashed', linewidth=2, label='Mean')
plt.axvline(mean - std, color='green', linestyle='dashed', linewidth=2, label='±1 STD')
plt.axvline(mean + std, color='green', linestyle='dashed', linewidth=2)
plt.axvline(mean - 2*std, color='blue', linestyle='dashed', linewidth=2, label='±2 STD')
plt.axvline(mean + 2*std, color='blue', linestyle='dashed', linewidth=2)
plt.axvline(mean - 3*std, color='magenta', linestyle='dashed', linewidth=2, label='±3 STD')
plt.axvline(mean + 3*std, color='magenta', linestyle='dashed', linewidth=2)

plt.legend()
plt.xlabel('Value')
plt.ylabel('Density')
plt.title('Normal Distribution with 68-95-99.7 Rule')
plt.grid(True, alpha=0.3)
plt.show()

Calculating the Percentages

Now let's calculate the actual percentages of data falling within each range and verify the 68-95-99.7 rule ?

import numpy as np

np.random.seed(42)
data = np.random.normal(0, 1, 10000)
mean = np.mean(data)
std = np.std(data)

# Calculate the percentage within one standard deviation
pct_within_1_std = np.sum(np.logical_and(data >= mean - std, data <= mean + std)) / len(data)

# Calculate the percentage within two standard deviations
pct_within_2_std = np.sum(np.logical_and(data >= mean - 2*std, data <= mean + 2*std)) / len(data)

# Calculate the percentage within three standard deviations
pct_within_3_std = np.sum(np.logical_and(data >= mean - 3*std, data <= mean + 3*std)) / len(data)

print("68-95-99.7 Rule Verification:")
print(f"Percentage within 1 standard deviation: {pct_within_1_std:.2%}")
print(f"Percentage within 2 standard deviations: {pct_within_2_std:.2%}")
print(f"Percentage within 3 standard deviations: {pct_within_3_std:.2%}")

print("\nExpected vs Actual:")
print(f"1 STD - Expected: 68.0%, Actual: {pct_within_1_std:.1%}")
print(f"2 STD - Expected: 95.0%, Actual: {pct_within_2_std:.1%}")
print(f"3 STD - Expected: 99.7%, Actual: {pct_within_3_std:.1%}")
68-95-99.7 Rule Verification:
Percentage within 1 standard deviation: 68.27%
Percentage within 2 standard deviations: 95.61%
Percentage within 3 standard deviations: 99.70%

Expected vs Actual:
1 STD - Expected: 68.0%, Actual: 68.3%
2 STD - Expected: 95.0%, Actual: 95.6%
3 STD - Expected: 99.7%, Actual: 99.7%

Practical Applications

The 68-95-99.7 rule finds application in various fields:

  • Quality Control: Identifying defective products in manufacturing

  • Financial Analysis: Assessing risk and return on investments

  • Healthcare Research: Understanding patient characteristics and test results

  • Educational Testing: Interpreting standardized test scores

Limitations and Considerations

While the 68-95-99.7 rule is valuable, it has important limitations:

  • Only applies to normal distributions

  • Outliers can significantly impact the accuracy of percentages

  • Skewed distributions require different statistical approaches

  • Small sample sizes may not follow the rule precisely

Conclusion

The 68-95-99.7 rule is a powerful concept that helps us understand data distribution based on standard deviation. Using Python and NumPy, we can easily verify this rule and apply it to real-world data analysis. This rule enables quick assessment of data spread and identification of potential outliers in normally distributed datasets.

Updated on: 2026-03-27T12:39:46+05:30

961 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements