How to filter rows in Pandas by regex?

A regular expression (regex) is a sequence of characters that define a search pattern. Pandas provides several methods to filter DataFrame rows using regex patterns, including str.match(), str.contains(), and str.extract().

Using str.match() Method

The str.match() method matches regex patterns from the beginning of each string ?

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'marks': [89, 23, 100, 56, 90],
    'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})

print("Input DataFrame:")
print(df)
Input DataFrame:
    name  marks   subjects
0   John     89       Math
1  Jacob     23    Physics
2    Tom    100  Chemistry
3    Tim     56    Biology
4   Ally     90    English

Filter Names Starting with 'J'

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'marks': [89, 23, 100, 56, 90],
    'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})

regex = 'J.*'
filtered_df = df[df.name.str.match(regex)]
print(f"Names starting with 'J':")
print(filtered_df)
Names starting with 'J':
    name  marks subjects
0   John     89     Math
1  Jacob     23  Physics

Using str.contains() Method

The str.contains() method finds regex patterns anywhere in the string ?

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})

# Find subjects containing 'ics'
pattern = '.*ics'
filtered_df = df[df.subjects.str.contains(pattern, regex=True)]
print("Subjects ending with 'ics':")
print(filtered_df)
Subjects ending with 'ics':
    name subjects
1  Jacob  Physics

Multiple Column Filtering

You can filter multiple columns using regex patterns ?

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'email': ['john@test.com', 'jacob@gmail.com', 'tom@yahoo.com', 'tim@test.com', 'ally@outlook.com']
})

# Filter names starting with 'J' and emails containing 'gmail'
name_filter = df.name.str.match('J.*')
email_filter = df.email.str.contains('.*gmail.*', regex=True)
filtered_df = df[name_filter & email_filter]

print("Names starting with 'J' AND emails containing 'gmail':")
print(filtered_df)
Names starting with 'J' AND emails containing 'gmail':
    name          email
1  Jacob  jacob@gmail.com

Comparison of Methods

Method Matches From Best For
str.match() Beginning of string Prefix matching
str.contains() Anywhere in string General pattern matching
str.extract() Capture groups Extracting specific parts

Conclusion

Use str.match() for matching patterns at the beginning of strings and str.contains() for finding patterns anywhere in the text. Both methods support powerful regex patterns for flexible DataFrame filtering.

Updated on: 2026-03-26T01:57:29+05:30

18K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements