Python Pandas - Filling missing column values with median

The median is a statistical measure that separates the higher half from the lower half of a dataset. In Pandas, you can fill missing values (NaN) in a DataFrame column with the median using the fillna() method combined with median().

Importing Required Libraries

First, import Pandas and NumPy with their standard aliases ?

import pandas as pd
import numpy as np

Creating DataFrame with Missing Values

Create a DataFrame containing NaN values using np.NaN ?

import pandas as pd
import numpy as np

# Create DataFrame with missing values
dataFrame = pd.DataFrame({
    "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],
    "Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})

print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
       Car  Units
0    Lexus  100.0
1      BMW  150.0
2     Audi    NaN
3  Bentley   80.0
4  Mustang    NaN
5    Tesla    NaN

Filling Missing Values with Median

Calculate the median of the Units column and fill all NaN values with this median ?

import pandas as pd
import numpy as np

# Create DataFrame with missing values
dataFrame = pd.DataFrame({
    "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],
    "Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})

# Calculate median of Units column (ignoring NaN values)
median_value = dataFrame['Units'].median()
print(f"Median of Units column: {median_value}")

# Fill NaN values with median
dataFrame.fillna(dataFrame['Units'].median(), inplace=True)

print("\nDataFrame after filling NaN with median:")
print(dataFrame)
Median of Units column: 100.0

DataFrame after filling NaN with median:
       Car  Units
0    Lexus  100.0
1      BMW  150.0
2     Audi  100.0
3  Bentley   80.0
4  Mustang  100.0
5    Tesla  100.0

How It Works

The median() method automatically ignores NaN values when calculating the median. For the Units column [100, 150, 80], the median is 100. The fillna() method then replaces all NaN values with this calculated median value.

Alternative Approach

You can also fill missing values for specific columns only ?

import pandas as pd
import numpy as np

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],
    "Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})

# Fill only the Units column with its median
dataFrame['Units'].fillna(dataFrame['Units'].median(), inplace=True)

print("DataFrame with Units column filled:")
print(dataFrame)
DataFrame with Units column filled:
       Car  Units
0    Lexus  100.0
1      BMW  150.0
2     Audi  100.0
3  Bentley   80.0
4  Mustang  100.0
5    Tesla  100.0

Conclusion

Use fillna() with median() to replace missing values with the median of the column. The inplace=True parameter modifies the original DataFrame directly.

Updated on: 2026-03-26T03:01:43+05:30

3K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements