How to match a word in python using Regular Expression?


Mastering regular expressions and their ‘re’ module opens up a world of powerful text−processing possibilities in Python. Regular expressions, often called regex, make it possible for us to identify, search for, and manipulate specific patterns within strings. One common task that we often come across in our work is matching a particular word in a text using regular expressions. In this article, we will take a deep dive into the art of using regular expressions in Python to find and match words in strings. We'll explore this domain using a few code examples, each followed by stepwise explanations, and this will surely guide you on this exciting journey of word matching with regex.

Matching a Simple Word

Example

  • In the very first code example, we start by importing the 're' module; this module permits us to work with regular expressions in Python. Our goal is to match the word "fox" in the given text.

  • To create the regex pattern, we use the re.escape() function to ensure that any special characters in the word are treated as literal characters. This is essential to avoid unintended behavior if the word contains regex metacharacters.

  • The pattern r"\b" + re.escape(word_to_match) + r"\b" uses the \b word boundary anchors to match the word "fox" as a complete word. The \b anchors ensure that the word is not a part of a longer word and that it is surrounded by non−word characters or the beginning/end of the string.

  • Next, we utilize the re.search() function to find the first occurrence of the word in the text. If a match is found, we output the matched word using match.group(). Otherwise, we print "Word not found."

import re

# Sample text
text = "The quick brown fox jumps over the lazy dog."

# The word we want to match
word_to_match = "fox"

# Regular expression pattern to match the word
pattern = r"\b" + re.escape(word_to_match) + r"\b"

# Find the word in the text
match = re.search(pattern, text)

# Output the match
if match:
    print("Word found:", match.group())
else:
    print("Word not found.")

Output

Word found: fox

Case−Insensitive Word Matching

Example

  • In this code snippet, we have a sample text that mentions the Python programming language. Our objective is to match the word "Python" in a case−insensitive manner. This means that the regex should find "Python" regardless of whether it appears as "Python" or "python" in the text.

  • To achieve case−insensitivity, we utilize the re.IGNORECASE flag as the third argument in the re.search() function. This flag instructs the regex engine to ignore case while searching for the word.

  • The rest of the code is similar to the previous example. We create the regex pattern with the word boundary anchors and use re.escape() to ensure safe matching of the word. Then, we perform the search and output the result accordingly.

import re

# Sample text
text = "The Python programming language is versatile and powerful."

# The word we want to match (case-insensitive)
word_to_match = "python"

# Regular expression pattern for case-insensitive word matching
pattern = r"\b" + re.escape(word_to_match) + r"\b"

# Find the word in the text (case-insensitive)
match = re.search(pattern, text, re.IGNORECASE)

# Output the match
if match:
    print("Word found:", match.group())
else:
    print("Word not found.")

Output

Word found: Python

Matching Words with Variant Spellings

Example

  • In this current example, we have a sample text that contains variant spellings of the word "color" and "colour." Our task is to match both spellings regardless of case.

  • To match variant spellings, we create a regex pattern using the | (pipe) symbol to represent the OR operator. This allows us to specify alternative spellings for the word. We also include the re.IGNORECASE flag to ensure case-insensitive matching.

  • The pattern r"\b(" + re.escape(word_to_match) + r")\b" with the word boundary anchors ensures that we match the entire word, not part of it.

  • We use re.findall() to find all occurrences of the variant spellings in the text and store the matches in the matches variable. Finally, we output the matched words, joining them with a comma and space.

import re

# Sample text with variant spellings of a word
text = "Color or colour, which one do you prefer?"

# The word we want to match (variant spellings)
word_to_match = "color|colour"

# Regular expression pattern to match variant spellings
pattern = r"\b(" + re.escape(word_to_match) + r")\b"

# Find the word in the text
matches = re.findall(pattern, text, re.IGNORECASE)

# Output the matches
if matches:
    print("Words found:", ", ".join(matches))
else:
    print("Word not found.")

Output

Word not found.

Matching Words with Prefixes or Suffixes

Example

  • In the penultimate example, we have a sample text containing words with prefixes or suffixes. Our goal is to match the word "uncomplete" regardless of whether it appears with any prefixes or suffixes.

  • To achieve this, we create a regex pattern using the \w* (zero or more word characters) on both sides of the word we want to match. The re.IGNORECASE flag ensures case−insensitive matching.

  • The pattern r"\b\w*" + re.escape(word_to_match) + r"\w*\b" uses word boundary anchors along with \w* to match the entire word, even if it has characters before or after it.

  • We use re.findall() to find all occurrences of the word with prefixes or suffixes in the text and store the matches in the matches variable. Finally, we output the matched words, joining them with a comma and space.

import re

# Sample text with words having prefixes or suffixes
text = "The project is uncompleted, but they're working on it."

# The word with prefixes or suffixes we want to match
word_to_match = "uncomplete"

# Regular expression pattern to match word with prefixes or suffixes
pattern = r"\b\w*" + re.escape(word_to_match) + r"\w*\b"

# Find the word in the text
matches = re.findall(pattern, text, re.IGNORECASE)

# Output the matches
if matches:
    print("Words found:", ", ".join(matches))
else:
    print("Word not found.")

Output

Words found: uncompleted

Matching Words with Variable Lengths

Example

  • In the last and final example, we have a sample text that mentions the word "sun" in varying contexts. Our task is to match the word "sun" regardless of its position or length in the text.

  • To achieve this, we create a regex pattern using the word boundary anchors `\b` to ensure that we match the entire word. As usual, we use `re.escape()` to handle any special characters in the word safely, and `re.IGNORECASE` for case−insensitive matching.

  • The pattern `r"\b" + re.escape(word_to_match) + r"\b"` will match the word "sun" wherever it appears as a complete word.

  • We use `re.findall()` to find all occurrences of the word "sun" in the text, regardless of their positions or lengths. The matches are stored in the `matches` variable, and we output them, joining the words with a comma and space.

import re

# Sample text with words of varying lengths
text = "The sun sets early in summer, but late in winter."

# The word we want to match with variable lengths
word_to_match = "sun"

# Regular expression pattern to match word with variable lengths
pattern = r"\b" + re.escape(word_to_match) + r"\b"

# Find the word in the text
matches = re.findall(pattern, text, re.IGNORECASE)

# Output the matches
if matches:
    print("Words found:", ", ".join(matches))
else: print("Word not found.")

Output

Words found: sun

In conclusion, in this article by now, you have learned how to utilize the power of regular expressions in Python to find and match words within strings. Regular expressions provide a flexible and efficient way to work with text. This process enables you to perform complex searches and manipulations effortlessly.

Throughout this article, you have realized that we explored some practical code examples, where various aspects of word matching using regular expressions was showcased. We learned various tasks like how to match simple words, conduct case−insensitive matching, handle variant spellings, find words with prefixes or suffixes, and even match words with variable lengths.

As you continue work on your practice and experimentation with regular expressions, you will gain a deeper understanding of their capabilities. You will become an expert at crafting powerful patterns for text−processing tasks. Regex is a invaluable tool in your Python toolkit, and with it, you can surely tackle diverse challenges in fields such as data analysis, web scraping, natural language processing, and more.

Note that you must keep honing your skills and exploring new ways to use regular expressions in your projects. May your journey with regular expressions lead you on the way of discovery of new and exciting possibilities in the world of Python programming!

Updated on: 08-Sep-2023

810 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements