Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a Python program to find the mean absolute deviation of rows and columns in a dataframe
Mean Absolute Deviation (MAD) measures the average distance between each data point and the mean of the dataset. In pandas, you can calculate MAD for both rows and columns of a DataFrame using the mad() method.
What is Mean Absolute Deviation?
MAD is calculated as the mean of absolute deviations from the arithmetic mean:
MAD = mean(|x - mean(x)|)
Creating a Sample DataFrame
Let's start by creating a DataFrame with sample data ?
import pandas as pd
data = {"Column1": [6, 5.3, 5.9, 7.8, 7.6, 7.45, 7.75],
"Column2": [7, 7.1, 7.2, 6, 6.1, 6.3, 5.1]}
df = pd.DataFrame(data)
print("DataFrame is:")
print(df)
DataFrame is: Column1 Column2 0 6.00 7.0 1 5.30 7.1 2 5.90 7.2 3 7.80 6.0 4 7.60 6.1 5 7.45 6.3 6 7.75 5.1
Mean Absolute Deviation of Columns
To calculate MAD for each column (default behavior), use df.mad() or df.mad(axis=0) ?
import pandas as pd
data = {"Column1": [6, 5.3, 5.9, 7.8, 7.6, 7.45, 7.75],
"Column2": [7, 7.1, 7.2, 6, 6.1, 6.3, 5.1]}
df = pd.DataFrame(data)
print("MAD of columns:")
print(df.mad())
MAD of columns: Column1 0.938776 Column2 0.600000 dtype: float64
Mean Absolute Deviation of Rows
To calculate MAD for each row, use df.mad(axis=1) ?
import pandas as pd
data = {"Column1": [6, 5.3, 5.9, 7.8, 7.6, 7.45, 7.75],
"Column2": [7, 7.1, 7.2, 6, 6.1, 6.3, 5.1]}
df = pd.DataFrame(data)
print("MAD of rows:")
print(df.mad(axis=1))
MAD of rows: 0 0.500 1 0.900 2 0.650 3 0.900 4 0.750 5 0.575 6 1.325 dtype: float64
Complete Example
Here's the complete program to calculate both column and row MAD ?
import pandas as pd
data = {"Column1": [6, 5.3, 5.9, 7.8, 7.6, 7.45, 7.75],
"Column2": [7, 7.1, 7.2, 6, 6.1, 6.3, 5.1]}
df = pd.DataFrame(data)
print("DataFrame is:")
print(df)
print("\nMAD of columns:")
print(df.mad())
print("\nMAD of rows:")
print(df.mad(axis=1))
DataFrame is: Column1 Column2 0 6.00 7.0 1 5.30 7.1 2 5.90 7.2 3 7.80 6.0 4 7.60 6.1 5 7.45 6.3 6 7.75 5.1 MAD of columns: Column1 0.938776 Column2 0.600000 dtype: float64 MAD of rows: 0 0.500 1 0.900 2 0.650 3 0.900 4 0.750 5 0.575 6 1.325 dtype: float64
Key Points
-
df.mad()ordf.mad(axis=0)calculates MAD for columns -
df.mad(axis=1)calculates MAD for rows - MAD measures the average distance from the mean, useful for understanding data spread
- Higher MAD values indicate greater variability in the data
Conclusion
Use df.mad() to calculate mean absolute deviation for columns and df.mad(axis=1) for rows. MAD is a robust measure of variability that's less sensitive to outliers than standard deviation.
