How regular expression back references works in Python?


We group part of a regular expression by enclosing it in a pair of parentheses. This way we apply operators to the group instead of a single character.

Capturing Groups and Backreferences

Parentheses not only group sub-expressions but they also create backreferences. The part of the string matched by the grouped part of the regular expression, is stored in a backreference. With the use of backreferences we reuse parts of regular expressions. 

If sub-expression is placed in parentheses, it can be accessed with \1 or $1 and so on.

For example, the regex \b(\w+)\b\s+\1\b matches repeated words, such as tahiti tahiti, because the parentheses in (\w+) capture a word to Group 1 then the back-reference \1 matches the characters that were captured by Group 1.


import re s = 'Tahiti Tahiti Atoll' result = re.findall(r'\b(\w+)\b\s+\1\b', s) print result


This gives the output


Updated on: 19-Feb-2020


Kickstart Your Career

Get certified by completing the course

Get Started