Python Replace NaN values with average of columns


In this article we will see method to replace NaN (Not a Number) value with the average of columns. If we talk about data analysis then handling NaN value is very crucial step. So, here you will learn about various methods using which we can replace NaN (Not a Number) value with the average of columns.

Method 1: Using Numpy.nanmean().

Example

import numpy as np

arr = np.array([[1, 2, np.nan],
   [4, np.nan, 6],
   [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
arr_filled = np.where(np.isnan(arr), col_means, arr)
print("Column mean: ",col_means)
print("Final array: \n", arr_filled)

Output

Column mean: 
[2.5 5. 7.5] 

Final array: 
[[1. 2. 7.5] 
 [4. 5. 6. ] 
 [2.5 8. 9. ]]

Explanation

Here in the above example we use the numpy.nanmean() function to calculate the column mean value of the numpy array along with specific axis (here axis=0 means columns). To identify the NaN value in the array we used numpy.isnan() function and using numpy.where() we replaced the NaN value with the columns means. arr_filled is the resultant value after replacing the NaN value with column means.

Method 2: Using Traversal and Column Mean.

Example

import numpy as np

arr = np.array([[1, 2, np.nan],
   [4, np.nan, 6],
   [np.nan, 8, 9]])

for i in range(arr.shape[1]):
   column = arr[:, i]
   column_mean = np.nanmean(column)
   column[np.isnan(column)] = column_mean
   print("Column mean: ",column_mean)
print("Final array: \n", arr)

Output

Column mean: 2.5 
Column mean: 5.0 
Column mean: 7.5

Final array: 
[[1. 2. 7.5] 
 [4. 5. 6. ] 
 [2.5 8. 9. ]]

Explanation

Here in the above example we traversed using loop through each column in the NumPy array. For every column we calculate the column mean using that column with the mean value. We assigned the value of column_mean to column[np.isnan(column)].

Method 3: Using Numpy.nan_to_num() and Numpy.mean().

Example

import pandas as pd
import numpy as np

arr = np.array([[1, 2, np.nan],
   [4, np.nan, 6],
   [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
arr_filled = np.nan_to_num(arr, nan=col_means)
print("Column mean: ",col_means)
print("Final array: \n", arr_filled)

Output

Column mean: [2.5 5. 7.5] 
Final array: 
[[1. 2. 7.5] 
 [4. 5. 6. ] 
 [2.5 8. 9. ]]

Explanation

Here in the above example we used the numpy.nan_to_num() method which is used to replace NaN values with any value by passing the column means as value which we want to replace. In the arr_filled resultant the replaced column values will be there in the place of NaN.

Method 4: Numpy.apply_along_axis() and Column Mean.

Example

import pandas as pd
import numpy as np

arr = np.array([[1, 2, np.nan],
   [4, np.nan, 6],
   [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)

def replace_nan(column):
   column[np.isnan(column)] = np.nanmean(column)
   return column

arr_filled = np.apply_along_axis(replace_nan, axis=0, arr=arr)
print("Column mean: ",col_means)
print("Final array: \n", arr_filled)

Output

Column mean: [2.5 5. 7.5] 
Final array: 
[[1. 2. 7.5] 
 [4. 5. 6. ] 
 [2.5 8. 9. ]]

Explanation

Here in the above example we used the numpy.apply_along_axis() method to apply replace_nan() function on every column of the NumPy array with specific axis (here axis=0 means columns). The replace_nan() function here replaces the NaN value in every column with the column mean.

Method 5: Numpy.nanmean() and Fancy Indexing.

Example

import pandas as pd
import numpy as np

arr = np.array([[1, 2, np.nan],
   [4, np.nan, 6],
   [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
mask = np.isnan(arr)
arr[mask] = col_means[np.newaxis, :].repeat(arr.shape[0], axis=0)[mask]

print("Column mean: ",col_means)
print("Final array: \n", arr)

Output

Column mean: [2.5 5. 7.5] 
Final array: 
[[1. 2. 7.5] 
 [4. 5. 6. ] 
 [2.5 8. 9. ]]

Explanation

Here in the above example we used the numpy.repeat() function to repeat the column means with the NumPy row array for matching the shape of the original array. Then we use fancy indexing to replace NaN values with the column mean in the array. This process requires no extra space as it performs the modification in-pleace.

Method 6: Numpy.nanmean() and Broadcasting.

Example

import pandas as pd
import numpy as np

arr = np.array([[1, 2, np.nan],
   [4, np.nan, 6],
   [np.nan, 8, 9]])

col_means = np.nanmean(arr, axis=0)
mask = np.isnan(arr)
arr[mask] = col_means

print("Column mean: ",col_means)
print("Final array: \n", arr)

Output

Column mean: [2.5 5. 7.5] 
Final array: 
[[1. 2. 7.5] 
 [4. 5. 6. ] 
 [2.5 8. 9. ]]

Explanation

Here in the above example we used the broadcasting method to replace the NaN value with the column means in the NumPy array. In the program mask variable is created to identify the NaN values and we assign the column means value to the desired location in the array.

So, we get to know about different methods using which we can replace the NaN value with the average of columns in the NumPy array. Every method provides us with unique approach to replace the NaN value. You can choose any method according to your requirement and ease of use.

Updated on: 03-Oct-2023

166 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements