How do we find the exact positions of each match in Python's regular expression?


Introduction

The re-module is what we use in Python for regular expressions. Text searches and more complex text manipulation employ regular expressions. Tools like grep and sed, text editors like vi and emacs, and computer languages like Tcl, Perl, and Python all have built-in regular expression support.

The re-module in Python offers functions for matching regular expressions.

A regular expression that defines the text we are looking for or modifying is called a pattern. Text literals and metacharacters make up this string. The compile function is used to create the pattern. Raw strings are advised since regular expressions frequently contain special characters. (The r character is used to indicate raw strings.) The characters are not interpreted this way until they are assembled into a pattern.

One of the functions can be used to apply a pattern to a text string after the pattern has been assembled. Match, search, find, and finditer are among the available functions.

Syntax Used

The regex functions used here are: We look for matches with regex functions.

re.match(): Determines if the RE matches at the beginning of the string. If zero or more characters at the beginning of the string match the regular expression pattern, the match method returns a match object.

p.finditer(): Finds all substrings where the RE matches and returns them as an iterator. An iterator delivering match objects across all non-overlapping matches for the pattern in a string is the result of the finditer method.

re.compile(): Compile a regular expression pattern into a regular expression object, which can be used for matching using its match(), search(), and other methods described below. The expression’s behavior can be modified by specifying a flag's value. Values can be any of the following variables combined using bitwise OR (the | operator).

m.start(): m.start() returns the offset in the string at the match's start.

m.group(): You may use the multiple-assignment approach to assign each value to a different variable when mo.groups() returns a tuple of values, as in the areaCode, mainNumber = mo.groups() line below.

search: It is comparable to re.match() but does not require that we just look for matches at the beginning of the text. The search() function can locate a pattern in the string at any location, but it only returns the first instance of the pattern.

Algorithm

  • Import the regex module with import re.

  • Create a Regex object with the re.compile() function. (Remember to use a raw string.)

  • Pass the string you want to search into the Regex object’s finditer() method. This returns a Match object.

  • Call the Match object’s group() method to return a string of the actual matched text.

  • We may also obtain the start and end indexes in a single tuple using the span() method().

Example

#importing re functions import re #compiling [A-Z0-9] and storing it in a variable p p = re.compile("[A-Z0-9]") #looping m times in p.finditer for m in p.finditer('A5B6C7D8'): #printing the m.start and m.group print m.start(), m.group()

Output

This gives the output −

0 A
1 5
2 B
3 6
4 C
5 7
6 D
7 8

Code Explanation

Import the regex module with import re. Create a Regex object (“[A-Z0-9]”) with the re.compile() function and assign it to a variable p. Run a loop for m and pass the string you want to search into the Regex object’s finditer() method. This returns a Match object. Call the Match object’s m.group() and m.start() method to return a string of the actual matched text.

Example

# Python program to illustrate # Matching regex objects # with groups import re phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') mo = phoneNumRegex.search('My number is 415-555-4242.') print(mo.groups())

Output

This gives the output −

('415', '555-4242')

Code Explanation

Import the regex module with import re. Create a Regex object (r'(\d\d\d)-(\d\d\d-\d\d\d\d)') with the re.compile() function and assign it to a variable phoneNumRegex. Pass the string you want to search into the Regex object’s search() method and store it in a variable mo. This returns a Match object. Call the Match object’s mo.groups() method to return a string of the actual matched text.

Conclusion

The search(), match(), and finditer() methods provided by the Python re module allow us to match a regex pattern, and if a match is made, it provides the Match object instance. Utilize the start(), end(), and span() methods to retrieve details about the matching string using this Match object.

When there are many matches, you run the danger of overloading your RAM if you use findall() to load them all. You may obtain all potential matches in the form of an iterator object rather than using the finditer() method, which will increase efficiency.

This implies that finditer() provides a callable object that, when called, will load the results into memory.

Updated on: 20-Sep-2022

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements