How to Replace Values in Columns Based on Condition in Pandas


In Python, we can replace values in Column based on conditions in Pandas with the help of various inbuilt functions like loc, where and mask, apply and lambda, etc. Pandas is a Python library that is used for data manipulation and work with structured data. In this article, we will replace values in columns based on conditions in Pandas.

Method 1: Using loc

The loc function is used to access a group of rows and columns in a DataFrame. We can use this function to replace values in a column based on some condition.

Syntax

df.loc[row_labels, column_labels]

The loc method is used to select rows and columns from a DataFrame based on labels. Here,row_labels is a label or a list of labels to select rows from the DataFrame and column_labels is a label or a list of labels to select columns from the DataFrame.

Example

In the below example, we will replace the gender of the people with age more than 50 with Male, in our created data frame. we used df.loc[df['age'] >= 50, 'gender'] to access all the rows where age is greater than or equal to 50, and the 'gender' column of those rows. We then replaced the value of the 'gender' column with 'M'.

import pandas as pd

data = {
   'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
   'age': [25, 35, 45, 55, 65],
   'gender': ['F', 'M', 'M', 'F', 'F']
}

df = pd.DataFrame(data)
df.loc[df['age'] >= 50, 'gender'] = 'M'
print(df)

Output

      name  age gender
0    Alice   25      F
1      Bob   35      M
2  Charlie   45      M
3    David   55      M
4    Emily   65      M

Method 2: Using where and mask

The where and mask functions are used to replace values based on a condition. The where function replaces values where the condition is False, and the mask function replaces values where the condition is True.

Syntax

df.where(condition, other=nan, inplace=False, axis=None, level=None, errors='raise')

df.mask(condition, other=nan, inplace=False, axis=None, level=None, errors='raise')

The where and mask methods are used to replace values in a DataFrame based on a condition. Here, the condition is a boolean array or a callable function that specifies the condition for the replacement. other is the value to replace the existing values with. If inplace is True, the original DataFrame is modified. axis specifies whether to replace values along rows (0) or columns (1). level specifies the level for multi-level indexing. errors specifies how to handle error

Example

In the below example, we will replace the age of all the person with 0 whose gender is Male. we used df['age'].where(df['gender'] != 'M', 0) to replace the age with 0 where the gender is 'M'.

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'age': [25, 35, 45, 55, 65],
    'gender': ['F', 'M', 'M', 'F', 'F']
}

df = pd.DataFrame(data)

df['age'] = df['age'].where(df['gender'] != 'M', 0)
print(df)

Output

      name  age gender
0    Alice   25      F
1      Bob    0      M
2  Charlie    0      M
3    David   55      F
4    Emily   65      F

We can also perform the same operation using the mask method.

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'age': [25, 35, 45, 55, 65],
    'gender': ['F', 'M', 'M', 'F', 'F']
}

df = pd.DataFrame(data)

df['age'] = df['age'].mask(df['gender'] == 'M', 0)
print(df)

Output

      name  age gender
0    Alice   25      F
1      Bob    0      M
2  Charlie    0      M
3    David   55      F
4    Emily   65      F

Method 3: Using Apply and Lambda

We can also use the apply function along with a lambda function to replace values in a column based on some condition.

Syntax

df.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

lambda arguments: expression

The apply method is used to apply a function to a DataFrame. The lambda function is a type of anonymous function that can be used with the apply method to apply a function to each row or column of the DataFrame. Here, func is the function to apply to the DataFrame.axis specifies whether to apply the function to rows (0) or columns (1).raw if True, the function is applied to the underlying numpy array.result_types specifies the type of the resulting object. args is a tuple of arguments to pass to the function.**kwds is an additional keyword argument to pass to the function.

Example

In the below example, we used df.apply(lambda x: 'F' if x['name'].startswith('A') else x['gender'], axis=1) to apply a lambda function to each row of the DataFrame. The lambda function replaces the gender with 'F' where the name starts with 'A'.

import pandas as pd

data = {
    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
    'age': [25, 35, 45, 55, 65],
    'gender': ['F', 'M', 'M', 'F', 'F']
}

df = pd.DataFrame(data)

df['gender'] = df.apply(lambda x: 'F' if x['name'].startswith('A') else x['gender'], axis=1)
print(df)

Output

      name  age gender
0    Alice   25      F
1      Bob   35      M
2  Charlie   45      M
3    David   55      F
4    Emily   65      F

Method 4: Using map method

The map method is used to replace values in a DataFrame column based on a dictionary.

Syntax

df['column'] = df['column'].map(dict)

Here, column is the column to replace values in and dict is a dictionary that maps the old values to the new values.

Example

If we want to replace the gender of all people whose age is less than or equal to 30 with an 'F'. We can use the map method like this −

import pandas as pd

data = {
   'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
   'age': [25, 35, 45, 55, 65],
   'gender': ['F', 'M', 'M', 'F', 'F']
}

df = pd.DataFrame(data)

df['age'] = df['age'].mask(df['gender'] == 'M', 0)
print(df)

Output

      name  age gender
0    Alice   25      F
1      Bob    0      M
2  Charlie    0      M
3    David   55      F
4    Emily   65      F

Method 5: Using numpy.where() method

The numpy.where() method is used to replace values in a DataFrame column based on a condition.

Syntax

df['column'] = np.where(condition, x, y)

Here, condition is a boolean array that specifies the condition for the replacement.

X is the value to replace the existing values with where the condition is True. y is the value to keep where the condition is False.

Example

If we want to replace the age of all people whose gender is 'M' with 0. We can use the numpy.where() method like this −

import pandas as pd
import numpy as np
data = {
   'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
   'age': [25, 35, 45, 55, 65],
   'gender': ['F', 'M', 'M', 'F', 'F']
}

df = pd.DataFrame(data)



df['age'] = np.where(df['gender'] == 'M', 0, df['age'])


print(df)

Output

      name  age gender
0    Alice   25      F
1      Bob    0      M
2  Charlie    0      M
3    David   55      F
4    Emily   65      F

Conclusion

In the below example, we discussed how we can replace values in columns based on Conditions in pandas using Python inbuilt methods like loc, where and mask, apply, and lambda, map(), and numpy.where() method we can replace values in columns based on the condition is pandas. Depending on the scenario and the type of data, one method may be more suitable than the others. It's always good practice to choose a method that is efficient and easy to understand.

Updated on: 10-Jul-2023

4K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements