Article Categories

Selected Reading

Python Replace NaN values with average of columns

Python Server Side Programming Programming

In this article, we will explore methods to replace NaN (Not a Number) values with the average of columns. When working with data analysis, handling NaN values is a crucial step. Here you will learn various approaches to replace NaN values with column averages using NumPy.

Using numpy.nanmean() and numpy.where()

The most straightforward approach uses nanmean() to calculate column averages and where() to replace NaN values ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
arr_filled = np.where(np.isnan(arr), col_means, arr)

print("Column means:", col_means)
print("Final array:\n", arr_filled)

Column means: [2.5 5.  7.5]
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

Here, np.nanmean(arr, axis=0) calculates the mean for each column ignoring NaN values. The np.where() function replaces NaN values with corresponding column means.

Using Loop Traversal

This method processes each column individually using a loop ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

for i in range(arr.shape[1]):
    column = arr[:, i]
    column_mean = np.nanmean(column)
    column[np.isnan(column)] = column_mean

print("Final array:\n", arr)

Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

This approach iterates through each column, calculates its mean, and replaces NaN values in-place.

Using numpy.nan_to_num()

The nan_to_num() function provides a direct way to replace NaN values ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
arr_filled = np.nan_to_num(arr, nan=0)

# Replace with column means using broadcasting
mask = np.isnan(arr)
arr_filled = np.where(mask, col_means, arr)

print("Column means:", col_means)
print("Final array:\n", arr_filled)

Column means: [2.5 5.  7.5]
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

Using Broadcasting for In-Place Replacement

Broadcasting allows efficient in-place replacement of NaN values ?

import numpy as np

arr = np.array([[1, 2, np.nan],
                [4, np.nan, 6],
                [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
mask = np.isnan(arr)
arr[mask] = np.take(col_means, np.where(mask)[1])

print("Column means:", col_means)
print("Final array:\n", arr)

Column means: [2.5 5.  7.5]
Final array:
 [[1.  2.  7.5]
 [4.  5.  6. ]
 [2.5 8.  9. ]]

Comparison of Methods

Method	In-Place	Performance	Best For
`np.where()`	No	Good	Creating new arrays
Loop traversal	Yes	Slower	Learning/debugging
`nan_to_num()`	No	Good	Simple replacements
Broadcasting	Yes	Fast	Memory efficiency

Conclusion

Use np.where() with nanmean() for clean, readable code. For memory efficiency, use broadcasting with boolean indexing. Choose the method that best fits your performance and memory requirements.

Kalyan Mishra

Updated on: 2026-03-27T14:38:40+05:30

806 Views

Previous Next