Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Fillna in Multiple Columns in Place in Python Pandas
Python's Pandas library provides powerful tools for handling missing data in DataFrames. The fillna() method is specifically designed to fill NaN (Not a Number) or null values with specified replacement values or strategies.
Syntax
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
Key Parameters
- value ? Scalar value, dictionary, Series, or DataFrame to use for filling
- inplace ? If True, modifies the original DataFrame instead of returning a copy
- method ? Method to use for filling ('ffill', 'bfill', etc.)
- axis ? Axis along which to fill missing values (0 for rows, 1 for columns)
Filling Multiple Columns with Different Values
You can fill different columns with specific values using a dictionary ?
import pandas as pd
import numpy as np
# Create a sample DataFrame with missing values
df = pd.DataFrame({
'Name': ['Alice', 'Bob', np.nan, 'David'],
'Age': [25, np.nan, 30, 22],
'Salary': [50000, 60000, np.nan, np.nan],
'Department': ['HR', np.nan, 'IT', 'Finance']
})
print("Original DataFrame:")
print(df)
print("\n" + "="*40 + "\n")
# Fill multiple columns with different values in place
df.fillna(value={
'Name': 'Unknown',
'Age': df['Age'].mean(),
'Salary': df['Salary'].median(),
'Department': 'Not Specified'
}, inplace=True)
print("After filling missing values:")
print(df)
Original DataFrame:
Name Age Salary Department
0 Alice 25.0 50000.0 HR
1 Bob NaN 60000.0 NaN
2 NaN 30.0 NaN IT
3 David 22.0 NaN Finance
========================================
After filling missing values:
Name Age Salary Department
0 Alice 25.0 50000.0 HR
1 Bob 25.67 60000.0 Not Specified
2 Unknown 30.0 55000.0 IT
3 David 22.0 55000.0 Finance
Filling with Statistical Measures
You can fill missing values using mean, median, or mode for numerical columns ?
import pandas as pd
import numpy as np
# Create sample student data
students = pd.DataFrame({
'StudentID': [1, 2, 3, 4, 5],
'Math': [85, np.nan, 92, 78, np.nan],
'Science': [90, 88, np.nan, 85, 92],
'English': [np.nan, 82, 88, np.nan, 90]
})
print("Original DataFrame:")
print(students)
print()
# Fill missing values with column means
students_filled = students.copy()
students_filled.fillna(students.mean(numeric_only=True), inplace=True)
print("Filled with column means:")
print(students_filled)
Original DataFrame: StudentID Math Science English 0 1 85.0 90.0 NaN 1 2 NaN 88.0 82.0 2 3 92.0 NaN 88.0 3 4 78.0 85.0 NaN 4 5 NaN 92.0 90.0 Filled with column means: StudentID Math Science English 0 1 85.0 90.0 86.67 1 2 85.0 88.0 82.0 2 3 92.0 88.75 88.0 3 4 78.0 85.0 86.67 4 5 85.0 92.0 90.0
Using Forward Fill and Backward Fill
For time-series data, you can use forward fill (ffill) or backward fill (bfill) methods ?
import pandas as pd
import numpy as np
# Create time series data
dates = pd.date_range('2024-01-01', periods=6, freq='D')
ts_data = pd.DataFrame({
'Date': dates,
'Temperature': [20, np.nan, np.nan, 25, np.nan, 28],
'Humidity': [60, 65, np.nan, np.nan, 70, np.nan]
})
print("Original time series data:")
print(ts_data)
print()
# Forward fill method
ts_ffill = ts_data.copy()
ts_ffill.fillna(method='ffill', inplace=True)
print("After forward fill:")
print(ts_ffill)
Original time series data:
Date Temperature Humidity
0 2024-01-01 20.0 60.0
1 2024-01-02 NaN 65.0
2 2024-01-03 NaN NaN
3 2024-01-04 25.0 NaN
4 2024-01-05 NaN 70.0
5 2024-01-06 28.0 NaN
After forward fill:
Date Temperature Humidity
0 2024-01-01 20.0 60.0
1 2024-01-02 20.0 65.0
2 2024-01-03 20.0 65.0
3 2024-01-04 25.0 65.0
4 2024-01-05 25.0 70.0
5 2024-01-06 28.0 70.0
Comparison of Fill Methods
| Method | Use Case | Advantage |
|---|---|---|
| Specific Values | Different defaults per column | Complete control over fill values |
| Mean/Median | Numerical data | Preserves data distribution |
| Forward Fill | Time series data | Maintains temporal continuity |
| Backward Fill | Time series data | Uses future known values |
Conclusion
The fillna() method with inplace=True efficiently handles missing data across multiple columns in Pandas DataFrames. Use dictionaries for column-specific values, statistical measures for numerical data, and forward/backward fill for time series data.
