Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Absolute Deviation and Absolute Mean Deviation using NumPy
In statistics, data variability measures how dispersed values are in a sample. Two key measures are Absolute Deviation (difference of each value from the mean) and Mean Absolute Deviation (MAD) (average of all absolute deviations). NumPy provides efficient functions to calculate both.
Formulas
$$\mathrm{Absolute\:Deviation_i = |x_i - \bar{x}|}$$
$$\mathrm{MAD = \frac{1}{n}\sum_{i=1}^{n}|x_i - \bar{x}|}$$
Where xi is each data point, x? is the mean, and n is the sample size.
Absolute Deviation
Calculate the absolute deviation for each element in a data sample ?
from numpy import mean, absolute
data = [12, 42, 53, 13, 112]
# Find mean value
M = mean(data)
print("Sample Mean Value =", M)
# Calculate absolute deviation for each element
print("\nData - Mean = Deviation")
for i in range(len(data)):
dev = absolute(data[i] - M)
print(f" {data[i]:3d} - {M} = {round(dev, 2)}")
Sample Mean Value = 46.4 Data - Mean = Deviation 12 - 46.4 = 34.4 42 - 46.4 = 4.4 53 - 46.4 = 6.6 13 - 46.4 = 33.4 112 - 46.4 = 65.6
Values above the mean (53, 112) and below the mean (12, 42, 13) all produce positive deviations after taking the absolute value. Without absolute values, these deviations would sum to zero.
Mean Absolute Deviation (MAD)
MAD is the average of all absolute deviations − a single number summarizing data spread ?
from numpy import mean, absolute
data = [12, 42, 53, 13, 112]
# Find mean value
M = mean(data)
print("Sample Mean Value =", M)
# Calculate MAD manually
total = 0
for i in range(len(data)):
dev = absolute(data[i] - M)
total += round(dev, 2)
mad = total / len(data)
print("Mean Absolute Deviation:", mad)
Sample Mean Value = 46.4 Mean Absolute Deviation: 28.88
Shorter NumPy Approach
NumPy can calculate MAD in a single line using vectorized operations ?
from numpy import mean, absolute, array
data = array([12, 42, 53, 13, 112])
# One-line MAD calculation
mad = mean(absolute(data - mean(data)))
print("Mean Absolute Deviation:", round(mad, 2))
Mean Absolute Deviation: 28.88
data - mean(data) computes all deviations at once, absolute() makes them positive, and mean() averages them.
Comparison: MAD vs Standard Deviation
| Measure | Formula | Sensitivity to Outliers |
|---|---|---|
| MAD | Mean of |xi − x?| | Less sensitive (linear) |
| Standard Deviation | Square root of mean of (xi − x?)² | More sensitive (squared) |
Conclusion
Absolute Deviation measures each data point's distance from the mean, while MAD averages these distances into a single dispersion metric. Use NumPy's vectorized approach (mean(absolute(data - mean(data)))) for efficient one-line calculation. MAD is preferred over standard deviation when your data contains outliers.
