Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - How to fill NAN values with mean in Pandas?
When working with datasets in Pandas, missing values (NaN) are common. You can fill these NaN values with the mean of the column using mean() and fillna() functions.
Creating a DataFrame with NaN Values
Let us first import the required libraries and create a sample DataFrame ?
import pandas as pd
import numpy as np
# Create DataFrame with NaN values
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus NaN
3 Mustang 80.0
4 Bentley NaN
5 Mustang NaN
Calculating the Mean
Calculate the mean of the column containing NaN values. The mean() function automatically ignores NaN values ?
import pandas as pd
import numpy as np
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
# Calculate mean (ignores NaN values automatically)
meanVal = dataFrame['Units'].mean()
print(f"Mean of Units column: {meanVal}")
Mean of Units column: 110.0
Filling NaN Values with Mean
Use fillna() to replace all NaN values in the column with the calculated mean ?
import pandas as pd
import numpy as np
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
print("Original DataFrame:")
print(dataFrame)
# Calculate mean and fill NaN values
meanVal = dataFrame['Units'].mean()
dataFrame['Units'].fillna(value=meanVal, inplace=True)
print(f"\nDataFrame after filling NaN with mean ({meanVal}):")
print(dataFrame)
Original DataFrame:
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus NaN
3 Mustang 80.0
4 Bentley NaN
5 Mustang NaN
DataFrame after filling NaN with mean (110.0):
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus 110.0
3 Mustang 80.0
4 Bentley 110.0
5 Mustang 110.0
Alternative Methods
You can also fill NaN values directly without storing the mean in a variable ?
import pandas as pd
import numpy as np
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
# Direct method - fill NaN with mean in one line
dataFrame['Units'] = dataFrame['Units'].fillna(dataFrame['Units'].mean())
print(dataFrame)
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus 110.0
3 Mustang 80.0
4 Bentley 110.0
5 Mustang 110.0
Conclusion
Use fillna(dataFrame['column'].mean()) to replace NaN values with the column mean. The mean() function automatically excludes NaN values from the calculation, making this approach reliable for data cleaning.
