Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Pandas - Filling missing column values with mode
Mode is the value that appears most frequently in a dataset. In Pandas, you can fill missing values with the mode using the fillna() method combined with mode(). This is useful when you want to replace NaN values with the most common value in a column.
Syntax
dataframe.fillna(dataframe['column'].mode()[0], inplace=True)
Creating DataFrame with Missing Values
Let's start by importing the required libraries and creating a DataFrame with some missing values ?
import pandas as pd
import numpy as np
# Create DataFrame with NaN values
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus NaN
3 Mustang 80.0
4 Bentley NaN
5 Mustang NaN
Finding the Mode
First, let's find the mode of the Units column to understand what value will be used for filling ?
import pandas as pd
import numpy as np
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
# Find mode of Units column
mode_value = dataFrame['Units'].mode()[0]
print("Mode of Units column:", mode_value)
print("Value counts:")
print(dataFrame['Units'].value_counts())
Mode of Units column: 80.0 Value counts: Units 150.0 1 100.0 1 80.0 1 Name: count, dtype: int64
Filling Missing Values with Mode
Now let's fill the NaN values with the mode using fillna() ?
import pandas as pd
import numpy as np
# Create DataFrame
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', 'Lexus', 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN]
})
print("DataFrame before filling:")
print(dataFrame)
# Fill NaN values with mode of Units column
dataFrame.fillna(dataFrame['Units'].mode()[0], inplace=True)
print("\nDataFrame after filling NaN values with mode:")
print(dataFrame)
DataFrame before filling:
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus NaN
3 Mustang 80.0
4 Bentley NaN
5 Mustang NaN
DataFrame after filling NaN values with mode:
Car Units
0 BMW 100.0
1 Lexus 150.0
2 Lexus 80.0
3 Mustang 80.0
4 Bentley 80.0
5 Mustang 80.0
Filling Specific Columns
You can also fill missing values for specific columns only ?
import pandas as pd
import numpy as np
# Create DataFrame with multiple columns having NaN
dataFrame = pd.DataFrame({
"Car": ['BMW', 'Lexus', np.NaN, 'Mustang', 'Bentley', 'Mustang'],
"Units": [100, 150, np.NaN, 80, np.NaN, np.NaN],
"Price": [50000, 60000, np.NaN, 45000, np.NaN, 45000]
})
print("Original DataFrame:")
print(dataFrame)
# Fill only Units column with its mode
dataFrame['Units'].fillna(dataFrame['Units'].mode()[0], inplace=True)
print("\nAfter filling Units column with mode:")
print(dataFrame)
Original DataFrame:
Car Units Price
0 BMW 100.0 50000.0
1 Lexus 150.0 60000.0
2 NaN NaN NaN
3 Mustang 80.0 45000.0
4 Bentley NaN NaN
5 Mustang NaN 45000.0
After filling Units column with mode:
Car Units Price
0 BMW 100.0 50000.0
1 Lexus 150.0 60000.0
2 NaN 80.0 NaN
3 Mustang 80.0 45000.0
4 Bentley 80.0 NaN
5 Mustang 80.0 45000.0
Key Points
- The
mode()method returns a Series, so use[0]to get the first mode value - If all values are unique, the first value (alphabetically or numerically) becomes the mode
- Use
inplace=Trueto modify the original DataFrame - Mode filling is useful for categorical and discrete numerical data
Conclusion
Filling missing values with mode is an effective technique for handling NaN values in datasets. Use fillna() with mode()[0] to replace missing values with the most frequent value in the column.
