How regular expression alternatives work in Python?

PythonServer Side ProgrammingProgramming

Alternations and their applications

In real world applications, we often use regular expressions that match any one of two or more alternatives. Also, we sometimes use a quantifier to apply to several expressions. All such goals are achieved by grouping with parentheses; and, in the use of alternatives, applying alternation with the vertical bar (|).

Using the vertical bar(|)

Alternation is useful when we need to match any one of several different alternatives. For example, the regex airways|airplane|bomber will match any text that contains airways or airplane or bomber. The same is achieved by using the regex air(ways|plane)|bomber.

If we used the regex (airways|airplane|bomber), it would match any of the three expressions. Consider the regex (air(ways|plane)|bomber), which has two captures if the first expression matches (airways or airplane as the first capture and ways or plane as the second capture), and one capture if the second expression matches (bomber). We can switch off the capturing effect by following an opening parenthesis with ?: like this:


 This will have only one capture if it matches (airways or airplane or bomber).


The following code illustrates the points discussed above −

import re
s = 'airways aircraft airplane bomber'
result = re.findall(r'(airways|airplane|bomber)', s)
print result
result2 = re.findall(r'(air(ways|plane)|bomber)', s)
print result2
result3 = re.findall(r'(air(?:ways|plane)|bomber)', s)
print result3


This gives the output

['airways', 'airplane', 'bomber']
[('airways', 'ways'), ('airplane', 'plane'), ('bomber', '')]
['airways', 'airplane', 'bomber']
Published on 07-Jan-2018 20:36:34