Drop rows from the dataframe based on certain condition applied on a column


In this article, we will discuss the different methods to drop rows from a data frame base on a one or multiple conditions. These conditions will be applied on the columns and the rows will be dropped accordingly. We will use pandas to create a data frame as it offers multiple functions to manipulate the data frame.

We will also create a dataset which will act as a reference for the data frame although it is not mandatory to create one, we can also use a CSV file or any other document. Pandas support multiple file types including: “CSV”, “JSON”, “HTML”, “SQL”, “EXCEL” etc. This makes pandas a very powerful and flexible programming tool. Let’s create a pandas data frame.

Creating a Pandas Data Frame

We will create a data frame consisting of different player profiles and their rating. The data will be tabularly arranged (In rows and columns).

Example

  • In the following example we have imported the pandas library and created a dictionary dataset.

  • We passed the dataset as the argument for “.DataFrame” method to create a pandas data frame.

import pandas as pd

dataset = {"Player name": ["Drake", "John", "Sergio", "Adam"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"], "Rating": [8.2, 7.1, 9.0, 8.4], "Salary": [250000, 450000, 900000, 500000]}

dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4])
print("The original data frame is like: -")
print(dataframe)

Output

The original data frame is like: -
  Player name     Position  Rating  Salary
1       Drake      Striker     8.2  250000
2        John     Defender     7.1  450000
3      Sergio   Midfielder     9.0  900000
4        Adam  Goal Keeper     8.4  500000

Now that we have created the data frame, let’s discuss the different methods to drop rows based on conditions.

Applying Condition on a Single Column

We will apply a condition on a particular column and drop the rows accordingly. Data selection depends upon the kind of data frame we want to produce. Let’s see how the code works −

Example

  • After creating the data frame, we used the data vectorization technique to filter the data frame and retain selective information that satisfies the passed condition.

  • We selected only the rows where the “Salary” column is greater than “400000” and created a new data frame.

  • In this manner, we dropped the rows where the salary value is less than “400000”.

import pandas as pd

dataset = {"Player name": ["Drake", "John", "Sergio", "Adam"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"], "Rating": [8.2, 7.1, 9.0, 8.4], "Salary": [250000, 450000, 900000, 500000]}

dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4])
print("The original data frame is like: -")
print(dataframe)

#including player with salary more than 400000
new_dataframe = dataframe[dataframe["Salary"] > 400000]
print("The new data frame after applying condition")
print(new_dataframe)

Output

The original data frame is like: -
  Player name     Position  Rating  Salary
1       Drake      Striker     8.2  250000
2        John     Defender     7.1  450000
3      Sergio   Midfielder     9.0  900000
4        Adam  Goal Keeper     8.4  500000
The new data frame after applying condition
  Player name     Position  Rating  Salary
2        John     Defender     7.1  450000
3      Sergio   Midfielder     9.0  900000
4        Adam  Goal Keeper     8.4  500000 

Applying Condition on Multiple Columns to Drop the Rows

We can apply numerous conditions on multiple columns for advance filtration of data. In case of complex dataset, there can be a situation where we have to eliminate a range of values that doesn’t satisfies a particular condition.

Example

  • Here we used an extended approach of manipulating the data frame. We retrieved the index values of those rows where the “Rating” column value is less than 9 and “Salary” column value is greater than 400000.

  • We used the “drop()” function and passed the index values to drop the rows. We made changes to the original data frame by passing the “Inplace =True” clause.

  • Finally, we printed the data frame in which the 2nd and 4th row was dropped. In this example, we created a range an then the rows were dropped accordingly.

import pandas as pd

dataset = {"Player name": ["Drake", "John", "Sergio", "Adam"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"], "Rating": [8.2, 7.1, 9.0, 8.4], "Salary": [250000, 450000, 900000, 500000]}

dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4])
print("The original data frame is like: -")
print(dataframe)

#Obtaining index values and dropping rows
index_values = dataframe[(dataframe["Rating"] < 9) & (dataframe["Salary"] > 400000)].index
dataframe.drop(index_values, inplace= True)
print("After appling conditions on multiple columns we get: -")
print(dataframe)

Output

The original data frame is like: -
  Player name     Position  Rating  Salary
1       Drake      Striker     8.2  250000
2        John     Defender     7.1  450000
3      Sergio   Midfielder     9.0  900000
4        Adam  Goal Keeper     8.4  500000
After appling conditions on multiple columns we get: -
  Player name    Position  Rating  Salary
1       Drake     Striker     8.2  250000
3      Sergio  Midfielder     9.0  900000 

Creating Rating System and Dropping Rows

Now we will create a rating system in which a player below a “limiting” rating value will be dropped from the data frame.

Example

import pandas as pd

dataset = {"Player name": ["Drake", "John", "Sergio", "Adam", "Nick"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper", "Defender"], "Rating": [8.2, 7.1, 9.0, 8.4, 9], "Salary": [250000, 450000, 900000, 500000, 560000]}

dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4, 5])
print("The original data frame is like: -")
print(dataframe)
player_name = dataframe.loc[(dataframe["Position"] == "Defender") & (dataframe["Rating"] < 8)]
index_value = dataframe.loc[(dataframe["Position"] == "Defender") & (dataframe["Rating"] < 8)].index

print("This player's: ")
print(player_name)

print("Performance is poor, dropping it from the dataframe")
dataframe.drop(index_value, inplace= True)
print(dataframe)

Output

The original data frame is like: -
  Player name     Position  Rating  Salary
1       Drake      Striker     8.2  250000
2        John     Defender     7.1  450000
3      Sergio   Midfielder     9.0  900000
4        Adam  Goal Keeper     8.4  500000
5        Nick     Defender     9.0  560000
This player's: 
  Player name  Position  Rating  Salary
2        John  Defender     7.1  450000
Performance is poor, dropping it from the dataframe
  Player name     Position  Rating  Salary
1       Drake      Striker     8.2  250000
3      Sergio   Midfielder     9.0  900000
4        Adam  Goal Keeper     8.4  500000
5        Nick     Defender     9.0  560000

We used the “loc” and “.index” methods to select the rows where the “Rating” is less than 8 and dropped it from the data frame.

Conclusion

In this article, we dropped the rows by applying conditions on single and multiple columns in a data frame. We vectorized the data and also used the “drop()” and “loc()” functions to drop the rows. Finally, we created a rating system for the data frame.

Updated on: 05-May-2023

906 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements