How regular expression back references works in Python?

Backreferences in regular expressions allow us to reuse a previously captured group within the same regex pattern. This feature is extremely useful for matching repeated patterns, validating formats, and finding duplicates in strings.

What are Backreferences?

A backreference is a reference to a previously captured group in a regular expression. When parentheses "()" are used in a regex pattern, they create a capturing group. Each group is automatically assigned a number starting from 1 for the first group, 2 for the second, and so on.

Syntax

Here's the basic syntax for using backreferences ?

  • (\w+): Captures a word as the first group

  • \1: References the first captured group

  • \2: References the second captured group

  • \n: References the nth captured group

Backreferences help simplify patterns that involve repetition and are particularly useful for matching complex structures like paired tags, duplicate words, and validation patterns.

Finding Repeated Words

This example demonstrates how to find words that appear consecutively in a string ?

import re

# Pattern to find repeated words
pattern = r'\b(\w+)\s+\1\b'
text = "This is is just a test test string"

# Find all matches
matches = re.findall(pattern, text)

print("Repeated Words:", matches)
Repeated Words: ['is', 'test']

The pattern \b(\w+)\s+\1\b works as follows:

  • \b - Word boundary

  • (\w+) - Captures one or more word characters

  • \s+ - Matches one or more whitespace characters

  • \1 - References the first captured group

  • \b - Word boundary

Matching Duplicate Strings

Here's how to find pairs of identical strings separated by a space ?

import re

# Pattern to find duplicate strings
pattern = r'([a-zA-Z]+) \1'
text = "hello hello world world example example"

# Find all matches
matches = re.findall(pattern, text)

print("Duplicated Strings:", matches)
Duplicated Strings: ['hello', 'world', 'example']

Validating Repeated Hex Color Codes

This example shows how to find repeated hexadecimal color codes ?

import re

# Pattern to validate repeated hex color codes
pattern = r'#([0-9A-Fa-f]{6})\s+#\1'
text = "#AFAFAF #AFAFAF this is not a color #123456 #123456"

# Find all matches
matches = re.findall(pattern, text)

print("Repeated Hex Colors:", matches)
Repeated Hex Colors: ['AFAFAF', '123456']

Advanced Example: Matching Quoted Strings

Backreferences are useful for matching properly paired quotes ?

import re

# Pattern to match strings with same opening and closing quotes
pattern = r'(["']).*?\1'
text = 'He said "Hello world" and then 'Good bye''

# Find all matches
matches = re.findall(pattern, text)

print("Quoted Strings:", matches)
print("Full matches:")
for match in re.finditer(pattern, text):
    print(f"  {match.group()}")
Quoted Strings: ['"', "'"]
Full matches:
  "Hello world"
  'Good bye'

Using Multiple Backreferences

You can use multiple capturing groups and reference them individually ?

import re

# Pattern with multiple backreferences
pattern = r'(\w+)-(\w+)-\1-\2'
text = "hello-world-hello-world test-case-test-case"

# Find all matches
matches = re.findall(pattern, text)

print("Pattern matches:", matches)

# Get full matches
for match in re.finditer(pattern, text):
    print(f"Full match: {match.group()}")
Pattern matches: [('hello', 'world'), ('test', 'case')]
Full match: hello-world-hello-world
Full match: test-case-test-case

Conclusion

Backreferences are powerful tools for matching repeated patterns in regular expressions. Use \1, \2, \n to reference previously captured groups, making your regex patterns more efficient and readable when dealing with duplicate or paired content.

Updated on: 2026-03-24T18:57:14+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements