Python Replace NaN values with average of columns

In this article, we will explore methods to replace NaN (Not a Number) values with the average of columns. When working with data analysis, handling NaN values is a crucial step. Here you will learn various approaches to replace NaN values with column averages using NumPy.

Using numpy.nanmean() and numpy.where()

The most straightforward approach uses nanmean() to calculate column averages and where() to replace NaN values ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
arr_filled = np.where(np.isnan(arr), col_means, arr)

print("Column means:", col_means)
print("Final array:\n", arr_filled)
Column means: [2.5 5.  7.5]
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

Here, np.nanmean(arr, axis=0) calculates the mean for each column ignoring NaN values. The np.where() function replaces NaN values with corresponding column means.

Using Loop Traversal

This method processes each column individually using a loop ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

for i in range(arr.shape[1]):
    column = arr[:, i]
    column_mean = np.nanmean(column)
    column[np.isnan(column)] = column_mean

print("Final array:\n", arr)
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

This approach iterates through each column, calculates its mean, and replaces NaN values in-place.

Using numpy.nan_to_num()

The nan_to_num() function provides a direct way to replace NaN values ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
arr_filled = np.nan_to_num(arr, nan=0)

# Replace with column means using broadcasting
mask = np.isnan(arr)
arr_filled = np.where(mask, col_means, arr)

print("Column means:", col_means)
print("Final array:\n", arr_filled)
Column means: [2.5 5.  7.5]
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

Using Broadcasting for In-Place Replacement

Broadcasting allows efficient in-place replacement of NaN values ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
mask = np.isnan(arr)
arr[mask] = np.take(col_means, np.where(mask)[1])

print("Column means:", col_means)
print("Final array:\n", arr)
Column means: [2.5 5.  7.5]
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

Comparison of Methods

Method In-Place Performance Best For
np.where() No Good Creating new arrays
Loop traversal Yes Slower Learning/debugging
nan_to_num() No Good Simple replacements
Broadcasting Yes Fast Memory efficiency

Conclusion

Use np.where() with nanmean() for clean, readable code. For memory efficiency, use broadcasting with boolean indexing. Choose the method that best fits your performance and memory requirements.

Updated on: 2026-03-27T14:38:40+05:30

714 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements