- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Drop rows from the dataframe based on certain condition applied on a column
In this article, we will discuss the different methods to drop rows from a data frame base on a one or multiple conditions. These conditions will be applied on the columns and the rows will be dropped accordingly. We will use pandas to create a data frame as it offers multiple functions to manipulate the data frame.
We will also create a dataset which will act as a reference for the data frame although it is not mandatory to create one, we can also use a CSV file or any other document. Pandas support multiple file types including: “CSV”, “JSON”, “HTML”, “SQL”, “EXCEL” etc. This makes pandas a very powerful and flexible programming tool. Let’s create a pandas data frame.
Creating a Pandas Data Frame
We will create a data frame consisting of different player profiles and their rating. The data will be tabularly arranged (In rows and columns).
Example
In the following example we have imported the pandas library and created a dictionary dataset.
We passed the dataset as the argument for “.DataFrame” method to create a pandas data frame.
import pandas as pd dataset = {"Player name": ["Drake", "John", "Sergio", "Adam"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"], "Rating": [8.2, 7.1, 9.0, 8.4], "Salary": [250000, 450000, 900000, 500000]} dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4]) print("The original data frame is like: -") print(dataframe)
Output
The original data frame is like: - Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000
Now that we have created the data frame, let’s discuss the different methods to drop rows based on conditions.
Applying Condition on a Single Column
We will apply a condition on a particular column and drop the rows accordingly. Data selection depends upon the kind of data frame we want to produce. Let’s see how the code works −
Example
After creating the data frame, we used the data vectorization technique to filter the data frame and retain selective information that satisfies the passed condition.
We selected only the rows where the “Salary” column is greater than “400000” and created a new data frame.
In this manner, we dropped the rows where the salary value is less than “400000”.
import pandas as pd dataset = {"Player name": ["Drake", "John", "Sergio", "Adam"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"], "Rating": [8.2, 7.1, 9.0, 8.4], "Salary": [250000, 450000, 900000, 500000]} dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4]) print("The original data frame is like: -") print(dataframe) #including player with salary more than 400000 new_dataframe = dataframe[dataframe["Salary"] > 400000] print("The new data frame after applying condition") print(new_dataframe)
Output
The original data frame is like: - Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 The new data frame after applying condition Player name Position Rating Salary 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000
Applying Condition on Multiple Columns to Drop the Rows
We can apply numerous conditions on multiple columns for advance filtration of data. In case of complex dataset, there can be a situation where we have to eliminate a range of values that doesn’t satisfies a particular condition.
Example
Here we used an extended approach of manipulating the data frame. We retrieved the index values of those rows where the “Rating” column value is less than 9 and “Salary” column value is greater than 400000.
We used the “drop()” function and passed the index values to drop the rows. We made changes to the original data frame by passing the “Inplace =True” clause.
Finally, we printed the data frame in which the 2nd and 4th row was dropped. In this example, we created a range an then the rows were dropped accordingly.
import pandas as pd dataset = {"Player name": ["Drake", "John", "Sergio", "Adam"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper"], "Rating": [8.2, 7.1, 9.0, 8.4], "Salary": [250000, 450000, 900000, 500000]} dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4]) print("The original data frame is like: -") print(dataframe) #Obtaining index values and dropping rows index_values = dataframe[(dataframe["Rating"] < 9) & (dataframe["Salary"] > 400000)].index dataframe.drop(index_values, inplace= True) print("After appling conditions on multiple columns we get: -") print(dataframe)
Output
The original data frame is like: - Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 After appling conditions on multiple columns we get: - Player name Position Rating Salary 1 Drake Striker 8.2 250000 3 Sergio Midfielder 9.0 900000
Creating Rating System and Dropping Rows
Now we will create a rating system in which a player below a “limiting” rating value will be dropped from the data frame.
Example
import pandas as pd dataset = {"Player name": ["Drake", "John", "Sergio", "Adam", "Nick"], "Position": ["Striker", "Defender", "Midfielder", "Goal Keeper", "Defender"], "Rating": [8.2, 7.1, 9.0, 8.4, 9], "Salary": [250000, 450000, 900000, 500000, 560000]} dataframe = pd.DataFrame(dataset, index= [1, 2, 3, 4, 5]) print("The original data frame is like: -") print(dataframe) player_name = dataframe.loc[(dataframe["Position"] == "Defender") & (dataframe["Rating"] < 8)] index_value = dataframe.loc[(dataframe["Position"] == "Defender") & (dataframe["Rating"] < 8)].index print("This player's: ") print(player_name) print("Performance is poor, dropping it from the dataframe") dataframe.drop(index_value, inplace= True) print(dataframe)
Output
The original data frame is like: - Player name Position Rating Salary 1 Drake Striker 8.2 250000 2 John Defender 7.1 450000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 5 Nick Defender 9.0 560000 This player's: Player name Position Rating Salary 2 John Defender 7.1 450000 Performance is poor, dropping it from the dataframe Player name Position Rating Salary 1 Drake Striker 8.2 250000 3 Sergio Midfielder 9.0 900000 4 Adam Goal Keeper 8.4 500000 5 Nick Defender 9.0 560000
We used the “loc” and “.index” methods to select the rows where the “Rating” is less than 8 and dropped it from the data frame.
Conclusion
In this article, we dropped the rows by applying conditions on single and multiple columns in a data frame. We vectorized the data and also used the “drop()” and “loc()” functions to drop the rows. Finally, we created a rating system for the data frame.