Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Selected Reading
How to filter rows in Pandas by regex?
A regular expression (regex) is a sequence of characters that define a search pattern. Pandas provides several methods to filter DataFrame rows using regex patterns, including str.match(), str.contains(), and str.extract().
Using str.match() Method
The str.match() method matches regex patterns from the beginning of each string ?
import pandas as pd
df = pd.DataFrame({
'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
'marks': [89, 23, 100, 56, 90],
'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})
print("Input DataFrame:")
print(df)
Input DataFrame:
name marks subjects
0 John 89 Math
1 Jacob 23 Physics
2 Tom 100 Chemistry
3 Tim 56 Biology
4 Ally 90 English
Filter Names Starting with 'J'
import pandas as pd
df = pd.DataFrame({
'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
'marks': [89, 23, 100, 56, 90],
'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})
regex = 'J.*'
filtered_df = df[df.name.str.match(regex)]
print(f"Names starting with 'J':")
print(filtered_df)
Names starting with 'J':
name marks subjects
0 John 89 Math
1 Jacob 23 Physics
Using str.contains() Method
The str.contains() method finds regex patterns anywhere in the string ?
import pandas as pd
df = pd.DataFrame({
'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})
# Find subjects containing 'ics'
pattern = '.*ics'
filtered_df = df[df.subjects.str.contains(pattern, regex=True)]
print("Subjects ending with 'ics':")
print(filtered_df)
Subjects ending with 'ics':
name subjects
1 Jacob Physics
Multiple Column Filtering
You can filter multiple columns using regex patterns ?
import pandas as pd
df = pd.DataFrame({
'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
'email': ['john@test.com', 'jacob@gmail.com', 'tom@yahoo.com', 'tim@test.com', 'ally@outlook.com']
})
# Filter names starting with 'J' and emails containing 'gmail'
name_filter = df.name.str.match('J.*')
email_filter = df.email.str.contains('.*gmail.*', regex=True)
filtered_df = df[name_filter & email_filter]
print("Names starting with 'J' AND emails containing 'gmail':")
print(filtered_df)
Names starting with 'J' AND emails containing 'gmail':
name email
1 Jacob jacob@gmail.com
Comparison of Methods
| Method | Matches From | Best For |
|---|---|---|
str.match() |
Beginning of string | Prefix matching |
str.contains() |
Anywhere in string | General pattern matching |
str.extract() |
Capture groups | Extracting specific parts |
Conclusion
Use str.match() for matching patterns at the beginning of strings and str.contains() for finding patterns anywhere in the text. Both methods support powerful regex patterns for flexible DataFrame filtering.
Advertisements
