Article Categories

Selected Reading

How to filter rows in Pandas by regex?

Python Server Side Programming Programming

A regular expression (regex) is a sequence of characters that define a search pattern. Pandas provides several methods to filter DataFrame rows using regex patterns, including str.match(), str.contains(), and str.extract().

Using str.match() Method

The str.match() method matches regex patterns from the beginning of each string ?

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'marks': [89, 23, 100, 56, 90],
    'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})

print("Input DataFrame:")
print(df)

Input DataFrame:
    name  marks   subjects
0   John     89       Math
1  Jacob     23    Physics
2    Tom    100  Chemistry
3    Tim     56    Biology
4   Ally     90    English

Filter Names Starting with 'J'

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'marks': [89, 23, 100, 56, 90],
    'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})

regex = 'J.*'
filtered_df = df[df.name.str.match(regex)]
print(f"Names starting with 'J':")
print(filtered_df)

Names starting with 'J':
    name  marks subjects
0   John     89     Math
1  Jacob     23  Physics

Using str.contains() Method

The str.contains() method finds regex patterns anywhere in the string ?

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'subjects': ["Math", "Physics", "Chemistry", "Biology", "English"]
})

# Find subjects containing 'ics'
pattern = '.*ics'
filtered_df = df[df.subjects.str.contains(pattern, regex=True)]
print("Subjects ending with 'ics':")
print(filtered_df)

Subjects ending with 'ics':
    name subjects
1  Jacob  Physics

Multiple Column Filtering

You can filter multiple columns using regex patterns ?

import pandas as pd

df = pd.DataFrame({
    'name': ['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
    'email': ['john@test.com', 'jacob@gmail.com', 'tom@yahoo.com', 'tim@test.com', 'ally@outlook.com']
})

# Filter names starting with 'J' and emails containing 'gmail'
name_filter = df.name.str.match('J.*')
email_filter = df.email.str.contains('.*gmail.*', regex=True)
filtered_df = df[name_filter & email_filter]

print("Names starting with 'J' AND emails containing 'gmail':")
print(filtered_df)

Names starting with 'J' AND emails containing 'gmail':
    name          email
1  Jacob  jacob@gmail.com

Comparison of Methods

Method	Matches From	Best For
`str.match()`	Beginning of string	Prefix matching
`str.contains()`	Anywhere in string	General pattern matching
`str.extract()`	Capture groups	Extracting specific parts

Conclusion

Use str.match() for matching patterns at the beginning of strings and str.contains() for finding patterns anywhere in the text. Both methods support powerful regex patterns for flexible DataFrame filtering.

Rishikesh Kumar Rishi

Updated on: 2026-03-26T01:57:29+05:30

18K+ Views

Previous Next