Fillna in Multiple Columns in Place in Python Pandas


Python has an open-source built-in library called Pandas for data analysis and manipulation. It has a well-defined data structure called DataFrame, similar to a table. It can also be used for writing and reading data from various types of files like CSV, Excel, SQL databases, etc.

fillna() is a method which is used to fill missing (NaN/Null) values in a Pandas DataFrame or Series. The missing values are filled with a definite value or another specified method along with the method call.

Syntax

object_name.fillna(value, method, limit, axis, inplace, downcast)

The fillna() method returns the same input DataFrame or Series with the missing values filled.

Example 1

We use fillna() to fill missing values in a pandas DataFrame and a CSV file. The fillna() method with the same parameters can be used for both the objects.

Note −

The data imported from the csv file is attached for your reference here. sampel_data.csv

Algorithm

  • Step 1 −Identify the missing values (NaN/Null) in the specified DataFrame or Series.

  • Step 2 − Based on the arguments passed to the fillna() method fill in the identified missing values. If an integer value is passed, it will be used to replace all missing values. If a method is passed, it will be used to fill missing values. Also, fill in the values on the axis and downcast mentioned.

  • Step 3 − Return a new DataFrame or Series with the missing values filled.

import pandas as pd
 
# Create a sample DataFrame
df = pd.DataFrame({'C1': [5, 23, 33, np.NaN], 'C2': [26, np.NaN, 7, 18], 'C3': [11, 30, np.NaN,112]})
print(df)
 
#Or read a dataset from a csv or any other file
df1=pd.read_csv("sample_data.csv")
 
# Fill NaN values in C1 and C2 with 0, and in C3 with 1
df.fillna(value={'C1': 0, 'C2': 0, 'C3': 1}, inplace=True)
 
#Filling NaN values in df1 with a random integer
df1.fillna(111)
 
# Print the updated DataFrame to see the difference
print(df)

Output

#Before filling missing values
 	C1    C2 	C3
0       5.0   NaN   11.0
1      23.0  89.0   30.0
2      33.0   7.0.   NaN
3       NaN  18.0  112.0
 
#After filling missing values
 	 C1    C2    C3
0        5.0   0.0   11.0
1       23.0  89.0   30.0
2       33.0   7.0    1.0
3        0.0  18.0  112.0

Example 2

We'll be working with a dataset containing information about school students, and we will use the fillna() method to fill in missing values with the mean of the column values. We randomly take up the dataset rather than importing from the CSV file, as in Example 1.

import numpy as np
import pandas as pd

# Create a sample DataFrame with missing values
data = {
   'RollNo': [1, 2, 3, 4, 5],
   'Age': [10, np.NaN, 5, 8, 12],
   'Marks': [100, 200,np.NaN, 150,np.NaN]
}

data= pd.DataFrame(data)

#Original DataFrame with missing values
print(data)

# Fill missing values with mean values
data1 = data.fillna(data.mean())
print(data1)

Output

RollNo   Age  Marks
0       1  10.0  100.0
1       2   NaN  200.0
2       3   5.0    NaN
3       4   8.0  150.0
4       5  12.0    NaN
RollNo    Age  Marks
0       1  10.00  100.0
1       2   8.75  200.0
2       3   5.00  150.0
3       4   8.00  150.0
4       5  12.00  150.0

Conclusion

You can use the fillna() method in Pandas to fill missing values in single or multiple columns of a DataFrame, or can be used to fill missing values in a series too. You can specify the value to be used for filling and how to fill the values with various arguments.

Pandas have other methods like replace(), which replaces the missing values with mean, median, mode, or any such values. The difference between the two is that fillna() is specifically designed to handle missing values whereas replace is more universal and can be used to fill any values in the object. Thus making the fillna() method a better choice to deal with missing values in your data.

Updated on: 23-Aug-2023

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements