Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Find the Summary of Statistics of a Pandas DataFrame
The describe() method in Pandas provides a comprehensive statistical summary of numerical columns in a DataFrame. It calculates count, mean, standard deviation, minimum, maximum, and quartiles in one convenient method.
Basic DataFrame Statistics
First, let's create a sample DataFrame and get its statistical summary ?
import pandas as pd
# Create sample data
data = {
'Car': ['Audi', 'Porsche', 'RollsRoyce', 'BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Lamborghini'],
'Place': ['Bangalore', 'Mumbai', 'Pune', 'Delhi', 'Hyderabad', 'Chandigarh', 'Mumbai', 'Pune', 'Delhi'],
'UnitsSold': [80, 110, 100, 95, 80, 80, 100, 120, 100]
}
df = pd.DataFrame(data)
print("DataFrame:")
print(df)
DataFrame:
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 RollsRoyce Pune 100
3 BMW Delhi 95
4 Mercedes Hyderabad 80
5 Lamborghini Chandigarh 80
6 Audi Mumbai 100
7 Mercedes Pune 120
8 Lamborghini Delhi 100
Getting Statistical Summary
Use describe() to get comprehensive statistics for numerical columns ?
import pandas as pd
data = {
'Car': ['Audi', 'Porsche', 'RollsRoyce', 'BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Lamborghini'],
'Place': ['Bangalore', 'Mumbai', 'Pune', 'Delhi', 'Hyderabad', 'Chandigarh', 'Mumbai', 'Pune', 'Delhi'],
'UnitsSold': [80, 110, 100, 95, 80, 80, 100, 120, 100]
}
df = pd.DataFrame(data)
print("Statistical Summary:")
print(df.describe())
Statistical Summary:
UnitsSold
count 9.000000
mean 96.111111
std 14.092945
min 80.000000
25% 80.000000
50% 100.000000
75% 100.000000
max 120.000000
Understanding the Statistics
The describe() method provides these key statistics ?
- count: Number of non-null values
- mean: Average value
- std: Standard deviation
- min: Minimum value
- 25%: First quartile (25th percentile)
- 50%: Median (50th percentile)
- 75%: Third quartile (75th percentile)
- max: Maximum value
Including All Columns
To include non-numeric columns in the summary, use include='all' ?
import pandas as pd
data = {
'Car': ['Audi', 'Porsche', 'RollsRoyce', 'BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Lamborghini'],
'Place': ['Bangalore', 'Mumbai', 'Pune', 'Delhi', 'Hyderabad', 'Chandigarh', 'Mumbai', 'Pune', 'Delhi'],
'UnitsSold': [80, 110, 100, 95, 80, 80, 100, 120, 100]
}
df = pd.DataFrame(data)
print("Complete Summary (All Columns):")
print(df.describe(include='all'))
Complete Summary (All Columns):
Car Place UnitsSold
count 9 9 9.000000
unique 6 6 NaN
top Lamborghini Bangalore NaN
freq 2 1 NaN
mean NaN NaN 96.111111
std NaN NaN 14.092945
min NaN NaN 80.000000
25% NaN NaN 80.000000
50% NaN NaN 100.000000
75% NaN NaN 100.000000
max NaN NaN 120.000000
Additional DataFrame Information
Combine describe() with other useful DataFrame methods ?
import pandas as pd
data = {
'Car': ['Audi', 'Porsche', 'RollsRoyce', 'BMW', 'Mercedes', 'Lamborghini', 'Audi', 'Mercedes', 'Lamborghini'],
'Place': ['Bangalore', 'Mumbai', 'Pune', 'Delhi', 'Hyderabad', 'Chandigarh', 'Mumbai', 'Pune', 'Delhi'],
'UnitsSold': [80, 110, 100, 95, 80, 80, 100, 120, 100]
}
df = pd.DataFrame(data)
print("DataFrame Shape:", df.shape)
print("\nFirst 5 rows:")
print(df.head())
print("\nStatistical Summary:")
print(df.describe())
DataFrame Shape: (9, 3)
First 5 rows:
Car Place UnitsSold
0 Audi Bangalore 80
1 Porsche Mumbai 110
2 RollsRoyce Pune 100
3 BMW Delhi 95
4 Mercedes Hyderabad 80
Statistical Summary:
UnitsSold
count 9.000000
mean 96.111111
std 14.092945
min 80.000000
25% 80.000000
50% 100.000000
75% 100.000000
max 120.000000
Conclusion
The describe() method provides essential statistical insights for numerical data analysis. Use include='all' to analyze both numeric and categorical columns. Combine it with shape and head() for comprehensive data exploration.
