How to use regular expressions (Regex) to filter valid emails in a Pandas series?


A regular expression is a sequence of characters that define a search pattern. In this program, we will use these regular expressions to filter valid and invalid emails.

We will define a Pandas series with different emails and check which email is valid. We will also use a python library called re which is used for regex purposes.

Algorithm

Step 1: Define a Pandas series of different email ids.
Step 2: Define a regex for checking validity of emails.
Step 3: Use the re.search() function in the re library for checking the validity of the email.

Example Code

import pandas as pd
import re

series = pd.Series(['jimmyadams123@gmail.com', 'hellowolrd.com'])
regex = '^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
for email in series:
   if re.search(regex, email):
      print("{}: Valid Email".format(email))
   else:
      print("{} : Invalid Email".format(email))

Output

jimmyadams123@gmail.com: Valid Email
hellowolrd.com : Invalid Email

Explanation

The regex variable has the following symbols:

  • ^: Anchor for the start of the string
  • [ ]: Opening and closing square brackets define a character class to match a single character
  • : Escape character
  • : The dot matches any character except the newline symbol
  • {} : The opening and closing curly brackets are used for range definition
  • :  The dollar sign is the anchor for the end of the string

Updated on: 16-Mar-2021

637 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements