Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Drop rows from the dataframe based on certain condition applied on a column
In this article, we will discuss different methods to drop rows from a DataFrame based on conditions applied to columns. We will use pandas to create and manipulate DataFrames, demonstrating various filtering techniques.
Pandas is a powerful library that supports multiple file types including CSV, JSON, HTML, SQL, and Excel, making it an essential tool for data manipulation.
Creating a Pandas DataFrame
We will create a DataFrame consisting of player profiles with their ratings and salaries arranged in rows and columns ?
import pandas as pd
dataset = {
"Player name": ["Drake", "John", "Sergio", "Adam"],
"Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"],
"Rating": [8.2, 7.1, 9.0, 8.4],
"Salary": [250000, 450000, 900000, 500000]
}
dataframe = pd.DataFrame(dataset, index=[1, 2, 3, 4])
print("The original DataFrame:")
print(dataframe)
The original DataFrame: Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000
Dropping Rows Based on Single Column Condition
We can filter rows by applying conditions on a specific column. This uses boolean indexing to retain only rows that satisfy the condition ?
import pandas as pd
dataset = {
"Player name": ["Drake", "John", "Sergio", "Adam"],
"Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"],
"Rating": [8.2, 7.1, 9.0, 8.4],
"Salary": [250000, 450000, 900000, 500000]
}
dataframe = pd.DataFrame(dataset, index=[1, 2, 3, 4])
print("Original DataFrame:")
print(dataframe)
# Keep only players with salary greater than 400000
filtered_dataframe = dataframe[dataframe["Salary"] > 400000]
print("\nAfter applying condition (Salary > 400000):")
print(filtered_dataframe)
Original DataFrame: Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 After applying condition (Salary > 400000): Player name Position Rating Salary 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000
Dropping Rows Using Multiple Column Conditions
For complex filtering, we can combine multiple conditions using logical operators. The drop() method permanently removes rows when inplace=True ?
import pandas as pd
dataset = {
"Player name": ["Drake", "John", "Sergio", "Adam"],
"Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"],
"Rating": [8.2, 7.1, 9.0, 8.4],
"Salary": [250000, 450000, 900000, 500000]
}
dataframe = pd.DataFrame(dataset, index=[1, 2, 3, 4])
print("Original DataFrame:")
print(dataframe)
# Get index values of rows where Rating < 9 AND Salary > 400000
index_values = dataframe[(dataframe["Rating"] < 9) & (dataframe["Salary"] > 400000)].index
dataframe.drop(index_values, inplace=True)
print("\nAfter applying conditions on multiple columns:")
print(dataframe)
Original DataFrame: Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 After applying conditions on multiple columns: Player name Position Rating Salary 1 Drake Striker 8.2 250000 3 Sergio Midfielder 9.0 900000
Performance-Based Row Removal
Let's create a performance rating system where defenders with ratings below 8.0 are removed from the team ?
import pandas as pd
dataset = {
"Player name": ["Drake", "John", "Sergio", "Adam", "Nick"],
"Position": ["Striker", "Defender", "Midfielder", "Goal Keeper", "Defender"],
"Rating": [8.2, 7.1, 9.0, 8.4, 9.0],
"Salary": [250000, 450000, 900000, 500000, 560000]
}
dataframe = pd.DataFrame(dataset, index=[1, 2, 3, 4, 5])
print("Original DataFrame:")
print(dataframe)
# Find underperforming defenders
underperforming = dataframe.loc[(dataframe["Position"] == "Defender") & (dataframe["Rating"] < 8)]
print("\nUnderperforming player(s):")
print(underperforming)
# Remove underperforming defenders
index_to_drop = underperforming.index
dataframe.drop(index_to_drop, inplace=True)
print("\nDataFrame after removing underperforming defenders:")
print(dataframe)
Original DataFrame: Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 5 Nick Defender 9.0 560000 Underperforming player(s): Player name Position Rating Salary 2 John Defender 7.1 450000 DataFrame after removing underperforming defenders: Player name Position Rating Salary 1 Drake Striker 8.2 250000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 5 Nick Defender 9.0 560000
Key Methods Summary
| Method | Purpose | Modifies Original |
|---|---|---|
| Boolean Indexing | Filter rows based on conditions | No (creates new DataFrame) |
drop() with inplace=True
|
Remove specific rows permanently | Yes |
loc[] |
Select rows by label-based conditions | No (returns subset) |
Conclusion
Use boolean indexing for simple filtering and drop() with inplace=True for permanent row removal. Combine multiple conditions with & (and) or | (or) operators for complex filtering scenarios.
