How regular expression anchors work in Python?



Anchors are regex tokens that don't match any characters but that say or assert something about the string or the matching process. Anchors inform us that the engine's current position in the string matches a determined location: for example, the beginning of the string/line, or the end of a string/line.

This type of assertion is useful for  many reasons. First, it lets you specify that you want to match alphabets/digits at the beginning/end of a string/line, but not anywhere else. Second, when you tell the engine that you want to find a pattern at a certain location, it  need not find that pattern at any other locations. This is why  it is recommended to use anchors whenever possible.

^ and $ are two examples of  anchor tokens in regex.

The following code shows the use of anchors ^ and $

import re
s = 'Princess Diana was a beauty icon'
result = re.search(r'^\w+', s)
print result.group()
result2 = re.search(r'\w+$', s)
print result2.group()

Output
Princess
icon

In Python, regular expression anchors are special characters that allow us to match specific positions within a string. Anchors are used to specify the start or end of a line or the start or end of a word. In this article, you will learn in detail how regular expression anchors work in Python, using several code examples.

1. Matching the start of a line with ^

In this example, we use the ^ anchor to match the word "Hello" at the start of the first line of the text string. The re.search() function is used to search for the pattern in the string, and the group() method is used to extract the matching string.

import re
text = "Hello World\nWelcome to Python"
match = re.search('^Hello', text)
print(match.group())

Output:

Hello

2. Matching the end of a line with $

In this example, we use the $ anchor to match the word "Python" at the end of the second line of the text string.

In this example, the 'text' variable contains the string "Hello World\nWelcome to Python". The '\n' character in the string is a line break, indicating that the string contains two lines.

The regular expression pattern 'Python$' is used to match the word "Python" at the end of the second line of the text string. The '$' anchor specifies the end of the line, so the pattern only matches if "Python" is the last word in the second line of the string.

The 're.search()' function is then used to search the 'text' variable for a match with the regular expression pattern. If a match is found, the 'match.group()' method returns the matched string "Python".

import re
text = "Hello World\nWelcome to Python"
match = re.search('Python$', text)
print(match.group())

Output:

Python

3. Matching the start of a word with \b

In this example, we use the \b anchor to match the word "Hello" at the start of the first line of the text string. The \b anchor matches a word boundary, which is defined as the point between a word character (as defined by \w) and a non-word character (as defined by \W).

import re
text = "Hello World\nWelcome to Python"
match = re.search(r'\bHello\b', text)
print(match.group())
Hello

4. Matching the end of a word with \b

In this example, we use the \b anchor to match the word "Python" at the end of the second line of the text string.

import re
text = "Hello World\nWelcome to Python"
match = re.search(r'\bPython\b', text)
print(match.group())

Output

Python

5. Matching the start or end of a line with ^ and $

In this example, we use both ^ and $ anchors to match the start of the first line and the end of the second line of the text string.

import re
text = "Hello World\nWelcome to Python"
match1 = re.search('^Hello', text)
match2 = re.search('Python$', text)
print(match1.group())
print(match2.group())

Output

Hello
Python

Advertisements