Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to Replace Values in Columns Based on Condition in Pandas
In Python, we can replace values in columns based on conditions in Pandas using various built-in functions like loc, where and mask, apply and lambda, map, and numpy.where. Pandas is a powerful library for data manipulation and working with structured data. This article demonstrates five different methods to conditionally replace column values.
Using loc
The loc function allows you to access and modify specific rows and columns in a DataFrame based on boolean conditions ?
Syntax
df.loc[row_condition, column_labels] = new_value
Example
Let's replace the gender of people aged 50 or older with 'M' ?
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'age': [25, 35, 45, 55, 65],
'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Replace gender with 'M' where age >= 50
df.loc[df['age'] >= 50, 'gender'] = 'M'
print("\nAfter replacement:")
print(df)
Original DataFrame:
name age gender
0 Alice 25 F
1 Bob 35 M
2 Charlie 45 M
3 David 55 F
4 Emily 65 F
After replacement:
name age gender
0 Alice 25 F
1 Bob 35 M
2 Charlie 45 M
3 David 55 M
4 Emily 65 M
Using where and mask
The where function keeps values where the condition is True and replaces others. The mask function does the opposite it replaces values where the condition is True ?
Using where()
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'age': [25, 35, 45, 55, 65],
'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)
# Replace age with 0 where gender is 'M'
df['age'] = df['age'].where(df['gender'] != 'M', 0)
print(df)
name age gender
0 Alice 25 F
1 Bob 0 M
2 Charlie 0 M
3 David 55 F
4 Emily 65 F
Using mask()
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'age': [25, 35, 45, 55, 65],
'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)
# Same result using mask - replace where condition is True
df['age'] = df['age'].mask(df['gender'] == 'M', 0)
print(df)
name age gender
0 Alice 25 F
1 Bob 0 M
2 Charlie 0 M
3 David 55 F
4 Emily 65 F
Using apply and Lambda
The apply function with lambda expressions provides flexible row-wise or column-wise transformations ?
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'age': [25, 35, 45, 55, 65],
'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)
# Replace gender with 'F' where name starts with 'A'
df['gender'] = df.apply(lambda x: 'F' if x['name'].startswith('A') else x['gender'], axis=1)
print(df)
name age gender
0 Alice 25 F
1 Bob 35 M
2 Charlie 45 M
3 David 55 F
4 Emily 65 F
Using numpy.where()
NumPy's where function provides a vectorized approach for conditional replacement ?
import pandas as pd
import numpy as np
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'age': [25, 35, 45, 55, 65],
'gender': ['F', 'M', 'M', 'F', 'F']
}
df = pd.DataFrame(data)
# Replace age with 0 where gender is 'M', keep original age otherwise
df['age'] = np.where(df['gender'] == 'M', 0, df['age'])
print(df)
name age gender
0 Alice 25 F
1 Bob 0 M
2 Charlie 0 M
3 David 55 F
4 Emily 65 F
Comparison
| Method | Best For | Performance |
|---|---|---|
loc |
Simple boolean indexing | Fast |
where/mask |
Single column operations | Fast |
apply + lambda |
Complex row-wise logic | Slower |
numpy.where |
Vectorized operations | Fastest |
Conclusion
Pandas offers multiple methods for conditional value replacement. Use loc for straightforward boolean indexing, numpy.where for best performance, and apply + lambda for complex logic. Choose the method that best fits your specific use case and performance requirements.
