Create a Pipeline and remove a row from an already created DataFrame - Python Pandas

Use the ValDrop() method from the pdpipe library to remove rows from an already created Pandas DataFrame. The pdpipe library provides a pipeline-based approach for data preprocessing tasks.

Installing pdpipe

First, install the pdpipe library if you haven't already ?

pip install pdpipe

Basic Setup

Import the required pdpipe and pandas libraries with their respective aliases ?

import pdpipe as pdp
import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
    "Units": [100, 150, 110, 80, 110, 90]
})

print("Original DataFrame:")
print(dataFrame)
Original DataFrame:
       Car  Units
0      BMW    100
1    Lexus    150
2     Audi    110
3  Mustang     80
4  Bentley    110
5   Jaguar     90

Removing a Row Using ValDrop()

Use the ValDrop() method to remove rows containing specific values ?

import pdpipe as pdp
import pandas as pd

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
    "Units": [100, 150, 110, 80, 110, 90]
})

# Remove row where Car column contains 'Jaguar'
dataFrame = pdp.ValDrop(['Jaguar'], 'Car').apply(dataFrame)
print("DataFrame after removing Jaguar:")
print(dataFrame)
DataFrame after removing Jaguar:
       Car  Units
0      BMW    100
1    Lexus    150
2     Audi    110
3  Mustang     80
4  Bentley    110

Complete Pipeline Example

Here's a complete example that creates a pipeline, adds a new column, and removes a row ?

import pdpipe as pdp
import pandas as pd

# Function to check for excess units
def check_stock(x):
    if x >= 100:
        return "OverStock"
    else:
        return "UnderStock"

# Create DataFrame
dataFrame = pd.DataFrame({
    "Car": ['BMW', 'Lexus', 'Audi', 'Mustang', 'Bentley', 'Jaguar'],
    "Units": [100, 150, 110, 80, 110, 90]
})

print("Original DataFrame:")
print(dataFrame)

# Add a new column based on existing column
dataFrame['Stock'] = dataFrame['Units'].apply(check_stock)
print("\nDataFrame with Stock column:")
print(dataFrame)

# Remove row using pdpipe
dataFrame = pdp.ValDrop(['Jaguar'], 'Car').apply(dataFrame)
print("\nDataFrame after removing Jaguar row:")
print(dataFrame)
Original DataFrame:
       Car  Units
0      BMW    100
1    Lexus    150
2     Audi    110
3  Mustang     80
4  Bentley    110
5   Jaguar     90

DataFrame with Stock column:
       Car  Units       Stock
0      BMW    100   OverStock
1    Lexus    150   OverStock
2     Audi    110   OverStock
3  Mustang     80  UnderStock
4  Bentley    110   OverStock
5   Jaguar     90  UnderStock

DataFrame after removing Jaguar row:
       Car  Units       Stock
0      BMW    100   OverStock
1    Lexus    150   OverStock
2     Audi    110   OverStock
3  Mustang     80  UnderStock
4  Bentley    110   OverStock

ValDrop() Parameters

The ValDrop() method accepts the following parameters ?

  • values: List of values to drop
  • columns: Column name(s) to check for values
  • drop_na: Whether to drop NaN values (default: True)

Conclusion

The pdpipe library's ValDrop() method provides an efficient way to remove rows from DataFrames based on specific column values. This pipeline approach is particularly useful for complex data preprocessing workflows where multiple operations need to be chained together.

Updated on: 2026-03-26T13:39:46+05:30

404 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements