Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Filter the rows – Python Pandas
In Python Pandas, filtering rows based on specific criteria is a common data manipulation task. The contains() method is particularly useful for filtering string columns by checking if they contain a specific substring.
Basic Row Filtering with contains()
The str.contains() method returns a boolean mask that can be used to filter DataFrame rows ?
import pandas as pd
# Create sample DataFrame
data = {
'Car': ['Lamborghini', 'Ferrari', 'Lamborghini', 'Porsche', 'BMW'],
'Model': ['Huracan', 'F8', 'Aventador', '911', 'M3'],
'Year': [2020, 2021, 2019, 2020, 2018],
'Price': [240000, 280000, 400000, 150000, 70000]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Original DataFrame:
Car Model Year Price
0 Lamborghini Huracan 2020 240000
1 Ferrari F8 2021 280000
2 Lamborghini Aventador 2019 400000
3 Porsche 911 2020 150000
4 BMW M3 2018 70000
Filtering Rows with Specific Text
Filter rows where the 'Car' column contains 'Lamborghini' ?
import pandas as pd
# Create sample DataFrame
data = {
'Car': ['Lamborghini', 'Ferrari', 'Lamborghini', 'Porsche', 'BMW'],
'Model': ['Huracan', 'F8', 'Aventador', '911', 'M3'],
'Year': [2020, 2021, 2019, 2020, 2018],
'Price': [240000, 280000, 400000, 150000, 70000]
}
df = pd.DataFrame(data)
# Filter rows containing 'Lamborghini'
filtered_df = df[df['Car'].str.contains('Lamborghini')]
print("Filtered DataFrame (Lamborghini only):")
print(filtered_df)
Filtered DataFrame (Lamborghini only):
Car Model Year Price
0 Lamborghini Huracan 2020 240000
2 Lamborghini Aventador 2019 400000
Advanced Filtering Options
Case-Insensitive Filtering
Use the case parameter for case-insensitive matching ?
import pandas as pd
data = {
'Car': ['lamborghini', 'Ferrari', 'LAMBORGHINI', 'Porsche'],
'Model': ['Huracan', 'F8', 'Aventador', '911']
}
df = pd.DataFrame(data)
# Case-insensitive filtering
filtered_df = df[df['Car'].str.contains('lambo', case=False)]
print("Case-insensitive filtering:")
print(filtered_df)
Case-insensitive filtering:
Car Model
0 lamborghini Huracan
2 LAMBORGHINI Aventador
Using Regular Expressions
Enable regex patterns for more complex matching ?
import pandas as pd
data = {
'Car': ['Lamborghini Huracan', 'Ferrari F8', 'Lamborghini Aventador', 'Porsche 911'],
'Price': [240000, 280000, 400000, 150000]
}
df = pd.DataFrame(data)
# Filter using regex pattern
filtered_df = df[df['Car'].str.contains(r'Lamborghini.*', regex=True)]
print("Regex filtering:")
print(filtered_df)
Regex filtering:
Car Price
0 Lamborghini Huracan 240000
2 Lamborghini Aventador 400000
Multiple Conditions
Combine multiple filtering conditions using logical operators ?
import pandas as pd
data = {
'Car': ['Lamborghini', 'Ferrari', 'Lamborghini', 'Porsche', 'BMW'],
'Year': [2020, 2021, 2019, 2020, 2018],
'Price': [240000, 280000, 400000, 150000, 70000]
}
df = pd.DataFrame(data)
# Multiple conditions: Lamborghini cars from 2020 or later
filtered_df = df[(df['Car'].str.contains('Lamborghini')) & (df['Year'] >= 2020)]
print("Lamborghini cars from 2020+:")
print(filtered_df)
Lamborghini cars from 2020+:
Car Year Price
0 Lamborghini 2020 240000
Common Use Cases
| Method | Use Case | Example |
|---|---|---|
str.contains('text') |
Basic substring matching | Find cars containing "BMW" |
str.contains('text', case=False) |
Case-insensitive matching | Find "bmw", "BMW", "Bmw" |
str.contains(pattern, regex=True) |
Pattern matching | Find cars starting with "L" |
Conclusion
The str.contains() method is essential for filtering DataFrame rows based on string patterns. Use case=False for case-insensitive searches and regex=True for advanced pattern matching.
