Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to get the mean of columns that contains numeric values of a dataframe in Pandas Python?
Sometimes, you may need to calculate the mean values of specific columns or all columns containing numeric data in a pandas DataFrame. The mean() function automatically identifies and computes the mean for numeric columns only.
The term mean refers to finding the sum of all values and dividing it by the total number of values in the dataset (also called the arithmetic average).
Basic Example
Let's create a DataFrame with mixed data types and calculate the mean of numeric columns ?
import pandas as pd
# Create a DataFrame with mixed data types
data = {
'Name': ['Tom', 'Jane', 'Vin', 'Eve', 'Will'],
'Age': [45, 67, 89, 12, 23],
'Salary': [8.79, 23.24, 31.98, 78.56, 90.20]
}
df = pd.DataFrame(data)
print("The DataFrame is:")
print(df)
print("\nThe mean of numeric columns:")
print(df.mean())
The DataFrame is: Name Age Salary 0 Tom 45 8.79 1 Jane 67 23.24 2 Vin 89 31.98 3 Eve 12 78.56 4 Will 23 90.20 The mean of numeric columns: Age 47.200 Salary 46.554 dtype: float64
Getting Mean of Specific Columns
You can calculate the mean of specific numeric columns by selecting them first ?
import pandas as pd
data = {
'Name': ['Tom', 'Jane', 'Vin', 'Eve', 'Will'],
'Age': [45, 67, 89, 12, 23],
'Salary': [8.79, 23.24, 31.98, 78.56, 90.20],
'Experience': [5, 10, 15, 2, 8]
}
df = pd.DataFrame(data)
# Mean of a single column
print("Mean age:", df['Age'].mean())
# Mean of multiple specific columns
print("\nMean of Age and Salary:")
print(df[['Age', 'Salary']].mean())
Mean age: 47.2 Mean of Age and Salary: Age 47.200 Salary 46.554 dtype: float64
Handling Missing Values
The mean() function automatically excludes NaN values from calculations ?
import pandas as pd
import numpy as np
data = {
'A': [1, 2, np.nan, 4, 5],
'B': [10, 20, 30, np.nan, 50],
'C': ['x', 'y', 'z', 'w', 'v']
}
df = pd.DataFrame(data)
print("DataFrame with missing values:")
print(df)
print("\nMean (excludes NaN):")
print(df.mean())
DataFrame with missing values:
A B C
0 1.0 10.0 x
1 2.0 20.0 y
2 NaN 30.0 z
3 4.0 NaN w
4 5.0 50.0 v
Mean (excludes NaN):
A 3.0
B 27.5
dtype: float64
Key Features
| Feature | Description |
|---|---|
| Automatic Selection | Only processes numeric columns automatically |
| Missing Values | Excludes NaN values from calculation |
| Return Type | Returns pandas Series with column names as index |
Conclusion
The mean() function in pandas automatically identifies numeric columns and calculates their arithmetic mean while excluding NaN values. Use column selection to calculate means for specific columns when needed.
