Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python Replace NaN values with average of columns
In this article, we will explore methods to replace NaN (Not a Number) values with the average of columns. When working with data analysis, handling NaN values is a crucial step. Here you will learn various approaches to replace NaN values with column averages using NumPy.
Using numpy.nanmean() and numpy.where()
The most straightforward approach uses nanmean() to calculate column averages and where() to replace NaN values ?
import numpy as np
arr = np.array([[1, 2, np.nan],
[4, np.nan, 6],
[np.nan, 8, 9]])
col_means = np.nanmean(arr, axis=0)
arr_filled = np.where(np.isnan(arr), col_means, arr)
print("Column means:", col_means)
print("Final array:\n", arr_filled)
Column means: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Here, np.nanmean(arr, axis=0) calculates the mean for each column ignoring NaN values. The np.where() function replaces NaN values with corresponding column means.
Using Loop Traversal
This method processes each column individually using a loop ?
import numpy as np
arr = np.array([[1, 2, np.nan],
[4, np.nan, 6],
[np.nan, 8, 9]])
for i in range(arr.shape[1]):
column = arr[:, i]
column_mean = np.nanmean(column)
column[np.isnan(column)] = column_mean
print("Final array:\n", arr)
Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
This approach iterates through each column, calculates its mean, and replaces NaN values in-place.
Using numpy.nan_to_num()
The nan_to_num() function provides a direct way to replace NaN values ?
import numpy as np
arr = np.array([[1, 2, np.nan],
[4, np.nan, 6],
[np.nan, 8, 9]])
col_means = np.nanmean(arr, axis=0)
arr_filled = np.nan_to_num(arr, nan=0)
# Replace with column means using broadcasting
mask = np.isnan(arr)
arr_filled = np.where(mask, col_means, arr)
print("Column means:", col_means)
print("Final array:\n", arr_filled)
Column means: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Using Broadcasting for In-Place Replacement
Broadcasting allows efficient in-place replacement of NaN values ?
import numpy as np
arr = np.array([[1, 2, np.nan],
[4, np.nan, 6],
[np.nan, 8, 9]])
col_means = np.nanmean(arr, axis=0)
mask = np.isnan(arr)
arr[mask] = np.take(col_means, np.where(mask)[1])
print("Column means:", col_means)
print("Final array:\n", arr)
Column means: [2.5 5. 7.5] Final array: [[1. 2. 7.5] [4. 5. 6. ] [2.5 8. 9. ]]
Comparison of Methods
| Method | In-Place | Performance | Best For |
|---|---|---|---|
np.where() |
No | Good | Creating new arrays |
| Loop traversal | Yes | Slower | Learning/debugging |
nan_to_num() |
No | Good | Simple replacements |
| Broadcasting | Yes | Fast | Memory efficiency |
Conclusion
Use np.where() with nanmean() for clean, readable code. For memory efficiency, use broadcasting with boolean indexing. Choose the method that best fits your performance and memory requirements.
