What are metacharacters inside character classes used in Python regular expression?



Python's regular expressions provide various ways to search and manipulate strings. They are used to define search patterns that can be matched in text data. These patterns are defined using a set of characters known as metacharacters, which carry special meaning in regex.

Their behaviour can change when used inside character classes (also known as character sets or square brackets '[]'). Understanding how metacharacters are interpreted within character classes is crucial for writing accurate and effective regular expressions.

Understanding Character Classes

In Python regex, character classes are denoted by square brackets '[ ]', which define a set of characters or ranges of characters that we want to match. For example, '[aeiou]' will match any single lowercase vowel.

Most metacharacters lose their special meaning in a character class and are treated as literal characters. However, a few retain or alter their behaviour.

Common Metacharacters

Some of the common metacharacters are as follows.

Metacharacter Description Example
. Matches any character. r' . ' matches "[praveen_tutorialspoint@gmail.com]".
[ ] Matches character class inside the brackets, not excluded by ^. r'Char[mander|meleon|izard]' matches "Charmander", "Charmeleon", and "Charizard".
^ Matches characters at the beginning of a string. r'^T' matches "Tutorialspoint".
$ Matches characters at the end of a string. r'x$' matches "Tutorix".

Using ^ Inside Character Classes

The caret ^ is a metacharacter that behaves differently depending on where it appears in a character class. It matches any character that was not in the class. When we place it at the beginning of a character class, it negates the set. The following program demonstrates the basic usage of the ^ metacharacter.

import re

pattern = r"[^aeiou]"
text = "hello"
result = re.findall(pattern, text)
print(result)

Following is the output of the above code:

['h', 'l', 'l']

Quantifiers

In Python regular expressions, quantifiers specify how many times a character, group, or character class should be repeated.

Metacharacter Description Example
? Matches zero or one of the preceding characters. r'neighbo?ur' matches "neighbor" and "neighbour".
* Matches zero or more of the preceding character. r're*d' matches "red" and "reed".
+ Matches one or more of the preceding characters. r'tw+o' matches "two" but not "to".
| Matches either the pattern before or after the |. r'true|false' matches "true" or "false".
{x} Matches if the preceding character occurred x times in a row. r're{2}d' matches "reed" but not "red".

Matches zero or one occurrence of the preceding character

In the following program, the pattern Tutorials?Point uses the "?" quantifier to indicate that the character 's' is optional. The re.findall() method finds all the occurrences and makes a list of them.

import re

pattern = r'Tutorials?Point'
text_1 = "TutorialsPoint"
text_2 = "TutorialPoint"
print(re.findall(pattern, text_1))
print(re.findall(pattern, text_2))

Following is the output of the above code:

['TutorialsPoint']
['TutorialPoint']

Special Sequences

Special sequences in Python regular expressions are escape sequences (starting with a backslash \) that represent commonly used character sets (like digits, whitespace) or positions within a string (like the beginning or end of a word).

Metacharacter Description Example
\A Only matches the beginning of a string. r'\AT' matches "TutorialsPoint" but not "tutorialspoint".
\b Matches the boundary at the beginning or end of a word. r'\bTutorials\b' matches "Tutorials Point" but not "TutorialsPoint".
\B Matches a non-boundary position inside a word. r'Tutorials\B' matches "TutorialsPoint" but not "Tutorials Point".
\d Matches any digit character (0-9). r'\d lessons' matches "10 lessons".
\D Matches any non-digit character. r'\D lessons' matches "ten lessons".
\s Matches any whitespace character (space, tab, newline). r'Tutorials\sPoint' matches "Tutorials Point".
\S Matches any non-whitespace character. r'Tutorials Point\S' matches "Tutorials Points" but not "Tutorials Point".
\w Matches any alphanumeric character or underscore. r'\w' matches characters in "tutorials_point.pdf" except for the period.
\W Matches any non-word character. r'\W' matches the period in "tutorials_point.pdf".
\Z Only matches the end of a string. r'oint\Z' matches "TutorialsPoint".

Using the Backslash \ for Escaping

The backslash \ is a metacharacter used to escape other metacharacters. In this program, the backslash is used to escape the hyphen so that it is treated as a literal - instead of indicating a range.

import re

pattern = r"[a\-z]"
text = "a-z"
result = re.findall(pattern, text)
print(result)

Following is the output of the above code:

['a', '-', 'z']
Updated on: 2025-08-28T11:42:12+05:30

462 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements