Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Write a Python program to quantify the shape of a distribution in a dataframe
Distribution shape analysis is crucial in data science for understanding data characteristics. Python's Pandas provides built-in methods to calculate kurtosis (measures peakedness) and skewness (measures asymmetry) to quantify distribution shapes.
What is Kurtosis and Skewness?
Kurtosis measures how peaked or flat a distribution is compared to a normal distribution. Values above 0 indicate a more peaked distribution, while negative values indicate a flatter distribution.
Skewness measures the asymmetry of a distribution. Positive skewness indicates a tail extending toward higher values, while negative skewness indicates a tail extending toward lower values.
Creating a Sample DataFrame
Let's start by creating a sample DataFrame with numerical data ?
import pandas as pd
data = {"Column1": [12, 34, 56, 78, 90],
"Column2": [23, 30, 45, 50, 90]}
df = pd.DataFrame(data)
print("DataFrame is:")
print(df)
DataFrame is: Column1 Column2 0 12 23 1 34 30 2 56 45 3 78 50 4 90 90
Calculating Kurtosis
Use the kurt() method to calculate kurtosis for each column ?
import pandas as pd
data = {"Column1": [12, 34, 56, 78, 90],
"Column2": [23, 30, 45, 50, 90]}
df = pd.DataFrame(data)
kurtosis = df.kurt(axis=0)
print("Kurtosis is:")
print(kurtosis)
Kurtosis is: Column1 -1.526243 Column2 1.948382 dtype: float64
Calculating Skewness
Use the skew() method to calculate skewness for each column ?
import pandas as pd
data = {"Column1": [12, 34, 56, 78, 90],
"Column2": [23, 30, 45, 50, 90]}
df = pd.DataFrame(data)
skewness = df.skew(axis=0)
print("Asymmetry distribution - skewness is:")
print(skewness)
Asymmetry distribution - skewness is: Column1 -0.280389 Column2 1.309355 dtype: float64
Complete Example
Here's the complete program that calculates both kurtosis and skewness ?
import pandas as pd
data = {"Column1": [12, 34, 56, 78, 90],
"Column2": [23, 30, 45, 50, 90]}
df = pd.DataFrame(data)
print("DataFrame is:")
print(df)
kurtosis = df.kurt(axis=0)
print("\nKurtosis is:")
print(kurtosis)
skewness = df.skew(axis=0)
print("\nAsymmetry distribution - skewness is:")
print(skewness)
DataFrame is: Column1 Column2 0 12 23 1 34 30 2 56 45 3 78 50 4 90 90 Kurtosis is: Column1 -1.526243 Column2 1.948382 dtype: float64 Asymmetry distribution - skewness is: Column1 -0.280389 Column2 1.309355 dtype: float64
Interpreting the Results
| Column | Kurtosis | Skewness | Interpretation |
|---|---|---|---|
| Column1 | -1.526 (negative) | -0.280 (slightly negative) | Flatter distribution, slightly left-skewed |
| Column2 | 1.948 (positive) | 1.309 (positive) | More peaked distribution, right-skewed |
Conclusion
Use df.kurt(axis=0) to measure distribution peakedness and df.skew(axis=0) to measure asymmetry. These metrics help identify data distribution characteristics for better statistical analysis.
