Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Filling missing column values with median
The median is a statistical measure that separates the higher half from the lower half of a dataset. In Pandas, you can fill missing values (NaN) in a DataFrame column with the median using the fillna() method combined with median().
Importing Required Libraries
First, import Pandas and NumPy with their standard aliases ?
import pandas as pd import numpy as np
Creating DataFrame with Missing Values
Create a DataFrame containing NaN values using np.NaN ?
import pandas as pd
import numpy as np
# Create DataFrame with missing values
dataFrame = pd.DataFrame({
"Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Units
0 Lexus 100.0
1 BMW 150.0
2 Audi NaN
3 Bentley 80.0
4 Mustang NaN
5 Tesla NaN
Filling Missing Values with Median
Calculate the median of the Units column and fill all NaN values with this median ?
import pandas as pd
import numpy as np
# Create DataFrame with missing values
dataFrame = pd.DataFrame({
"Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
# Calculate median of Units column (ignoring NaN values)
median_value = dataFrame['Units'].median()
print(f"Median of Units column: {median_value}")
# Fill NaN values with median
dataFrame.fillna(dataFrame['Units'].median(), inplace=True)
print("\nDataFrame after filling NaN with median:")
print(dataFrame)
Median of Units column: 100.0
DataFrame after filling NaN with median:
Car Units
0 Lexus 100.0
1 BMW 150.0
2 Audi 100.0
3 Bentley 80.0
4 Mustang 100.0
5 Tesla 100.0
How It Works
The median() method automatically ignores NaN values when calculating the median. For the Units column [100, 150, 80], the median is 100. The fillna() method then replaces all NaN values with this calculated median value.
Alternative Approach
You can also fill missing values for specific columns only ?
import pandas as pd
import numpy as np
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
# Fill only the Units column with its median
dataFrame['Units'].fillna(dataFrame['Units'].median(), inplace=True)
print("DataFrame with Units column filled:")
print(dataFrame)
DataFrame with Units column filled:
Car Units
0 Lexus 100.0
1 BMW 150.0
2 Audi 100.0
3 Bentley 80.0
4 Mustang 100.0
5 Tesla 100.0
Conclusion
Use fillna() with median() to replace missing values with the median of the column. The inplace=True parameter modifies the original DataFrame directly.
