Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How regular expression back references works in Python?
Backreferences in regular expressions allow us to reuse a previously captured group within the same regex pattern. This feature is extremely useful for matching repeated patterns, validating formats, and finding duplicates in strings.
What are Backreferences?
A backreference is a reference to a previously captured group in a regular expression. When parentheses "()" are used in a regex pattern, they create a capturing group. Each group is automatically assigned a number starting from 1 for the first group, 2 for the second, and so on.
Syntax
Here's the basic syntax for using backreferences ?
(\w+): Captures a word as the first group
\1: References the first captured group
\2: References the second captured group
\n: References the nth captured group
Backreferences help simplify patterns that involve repetition and are particularly useful for matching complex structures like paired tags, duplicate words, and validation patterns.
Finding Repeated Words
This example demonstrates how to find words that appear consecutively in a string ?
import re
# Pattern to find repeated words
pattern = r'\b(\w+)\s+\1\b'
text = "This is is just a test test string"
# Find all matches
matches = re.findall(pattern, text)
print("Repeated Words:", matches)
Repeated Words: ['is', 'test']
The pattern \b(\w+)\s+\1\b works as follows:
\b- Word boundary(\w+)- Captures one or more word characters\s+- Matches one or more whitespace characters\1- References the first captured group\b- Word boundary
Matching Duplicate Strings
Here's how to find pairs of identical strings separated by a space ?
import re
# Pattern to find duplicate strings
pattern = r'([a-zA-Z]+) \1'
text = "hello hello world world example example"
# Find all matches
matches = re.findall(pattern, text)
print("Duplicated Strings:", matches)
Duplicated Strings: ['hello', 'world', 'example']
Validating Repeated Hex Color Codes
This example shows how to find repeated hexadecimal color codes ?
import re
# Pattern to validate repeated hex color codes
pattern = r'#([0-9A-Fa-f]{6})\s+#\1'
text = "#AFAFAF #AFAFAF this is not a color #123456 #123456"
# Find all matches
matches = re.findall(pattern, text)
print("Repeated Hex Colors:", matches)
Repeated Hex Colors: ['AFAFAF', '123456']
Advanced Example: Matching Quoted Strings
Backreferences are useful for matching properly paired quotes ?
import re
# Pattern to match strings with same opening and closing quotes
pattern = r'(["']).*?\1'
text = 'He said "Hello world" and then 'Good bye''
# Find all matches
matches = re.findall(pattern, text)
print("Quoted Strings:", matches)
print("Full matches:")
for match in re.finditer(pattern, text):
print(f" {match.group()}")
Quoted Strings: ['"', "'"] Full matches: "Hello world" 'Good bye'
Using Multiple Backreferences
You can use multiple capturing groups and reference them individually ?
import re
# Pattern with multiple backreferences
pattern = r'(\w+)-(\w+)-\1-\2'
text = "hello-world-hello-world test-case-test-case"
# Find all matches
matches = re.findall(pattern, text)
print("Pattern matches:", matches)
# Get full matches
for match in re.finditer(pattern, text):
print(f"Full match: {match.group()}")
Pattern matches: [('hello', 'world'), ('test', 'case')]
Full match: hello-world-hello-world
Full match: test-case-test-case
Conclusion
Backreferences are powerful tools for matching repeated patterns in regular expressions. Use \1, \2, \n to reference previously captured groups, making your regex patterns more efficient and readable when dealing with duplicate or paired content.
