How to filter rows in Pandas by regex?


A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str.match() method.

Steps

  • Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
  • Print the input DataFrame, df.
  • Initialize a variable regex for the expression. Supply a string value as regex, for example, the string 'J.*' will filter all the entries that start with the letter 'J'.
  • Use df.column_name.str.match(regex) to filter all the entries in the given column name by the supplied regex.

Example

import pandas as pd

df = pd.DataFrame(
   dict(
      name=['John', 'Jacob', 'Tom', 'Tim', 'Ally'],
      marks=[89, 23, 100, 56, 90],
      subjects=["Math", "Physics", "Chemistry", "Biology", "English"]
   )
)

print "Input DataFrame is:\n", df

regex = 'J.*'
print "After applying ", regex, " DataFrame is:\n", df[df.name.str.match(regex)]

regex = 'A.*'
print "After applying ", regex, " DataFrame is:\n", df[df.name.str.match(regex)]

Output

Input DataFrame is:

     name    marks   subjects
0    John     89        Math
1   Jacob     23     Physics
2     Tom    100   Chemistry
3     Tim     56     Biology
4    Ally     90     English

After applying J.* DataFrame is:

    name   marks   subjects
0   John     89        Math
1  Jacob     23     Physics

After applying A.* DataFrame is:

    name   marks   subjects
4   Ally     90     English

Updated on: 14-Sep-2021

16K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements