Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python - Replace negative values with latest preceding positive value in Pandas DataFrame
In Pandas, you can replace negative values with the latest preceding positive value using DataFrame masking combined with forward fill. If there's no positive preceding value, the value should be set to 0.
Understanding the Problem
When working with time series or sequential data, negative values might represent missing or invalid data that need to be replaced with the most recent valid (positive) observation.
Creating Sample DataFrame
Let's start by creating a sample DataFrame with negative values ?
import pandas as pd
# Create pandas DataFrame
df = pd.DataFrame({'One': [-3, 7, 4, 0], 'two': [-6, -1, 2, -8]})
print("Original DataFrame:")
print(df)
Original DataFrame: One two 0 -3 -6 1 7 -1 2 4 2 3 0 -8
Solution Using mask() and ffill()
The solution uses DataFrame masking to identify negative values, then applies forward fill to replace them ?
import pandas as pd
# Create pandas DataFrame
df = pd.DataFrame({'One': [-3, 7, 4, 0], 'two': [-6, -1, 2, -8]})
print("Original DataFrame:")
print(df)
# Replace negative values using masking and forward fill
df_updated = df.mask(df.lt(0)).ffill().fillna(0).astype('int32')
print("\nUpdated DataFrame:")
print(df_updated)
Original DataFrame: One two 0 -3 -6 1 7 -1 2 4 2 3 0 -8 Updated DataFrame: One two 0 0 0 1 7 0 2 4 2 3 0 2
How It Works
The solution works in three steps:
- mask(df.lt(0)) − Creates NaN values where the condition (less than 0) is True
- ffill() − Forward fills NaN values with the last valid observation
- fillna(0) − Replaces any remaining NaN values (at the beginning) with 0
- astype('int32') − Converts back to integer type
Step-by-Step Breakdown
Let's see each step of the transformation ?
import pandas as pd
df = pd.DataFrame({'One': [-3, 7, 4, 0], 'two': [-6, -1, 2, -8]})
print("Step 1 - After masking:")
masked = df.mask(df.lt(0))
print(masked)
print("\nStep 2 - After forward fill:")
filled = masked.ffill()
print(filled)
print("\nStep 3 - After filling NaN with 0:")
final = filled.fillna(0)
print(final)
Step 1 - After masking: One two 0 NaN NaN 1 7.0 NaN 2 4.0 2.0 3 0.0 NaN Step 2 - After forward fill: One two 0 NaN NaN 1 7.0 NaN 2 4.0 2.0 3 0.0 2.0 Step 3 - After filling NaN with 0: One two 0 0.0 0.0 1 7.0 0.0 2 4.0 2.0 3 0.0 2.0
Conclusion
Use mask() with ffill() to replace negative values with the latest preceding positive value. The fillna(0) ensures that any remaining NaN values at the beginning are replaced with zeros.
