Python Pandas - Filling missing column values with median


Median separates the higher half from the lower half of the data. Use the fillna() method and set the median to fill missing columns with median. At first, let us import the required libraries with their respective aliases −

import pandas as pd
import numpy as np

Create a DataFrame with 2 columns. We have set the NaN values using the Numpy np.NaN

dataFrame = pd.DataFrame(
   {
      "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
   }
)

Find median of the column values with NaN i.e, for Units columns here. Replace NaNs with the median of the column where it is located using median() on Units column −

dataFrame.fillna(dataFrame['Units'].median(), inplace = True)

Example

Following is the code −

import pandas as pd
import numpy as np

# Create DataFrame
dataFrame = pd.DataFrame(
   {
      "Car": ['Lexus', 'BMW', 'Audi', 'Bentley', 'Mustang', 'Tesla'],"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
   }
)

print"DataFrame ...\n",dataFrame

# finding median of the column values with NaN i.e, for Units columns here
# Replace NaNs with the median of the column where it is located
dataFrame.fillna(dataFrame['Units'].median(), inplace = True)

print"\nUpdated Dataframe after filling NaN values with median...\n",dataFrame

Output

This will produce the following output −

DataFrame ...
       Car   Units
0    Lexus   100.0
1      BMW   150.0
2     Audi     NaN
3  Bentley    80.0
4  Mustang     NaN
5    Tesla     NaN

Updated Dataframe after filling NaN values with median...
       Car   Units
0    Lexus   100.0
1      BMW   150.0
2     Audi   100.0
3  Bentley    80.0
4  Mustang   100.0
5    Tesla   100.0

Updated on: 21-Sep-2021

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements