What are character class operations in Python?



Character class operations in Python's regular expressions allow us to define set of characters we want to match. Instead of searching for one specific character, we can search for any character within that set. A character class in regex is written using square brackets []. It defines a group of characters where any character from the group can match a part of the string.

Commonly Used Character Classes in Python (re module)

Regular expressions use both normal and special characters. Normal characters like 'A', 'a', or '0' match themselves. So, "last" (a sequence of characters) matches the string 'last'. Some characters, like '|' or '(', are special. These types of special predefined characters either stand for classes of ordinary characters or affect how the regular expressions around them are interpreted.

Repetition operators or quantifiers (*, +, ?, {m,n}, etc) cannot be directly nested. To avoid ambiguity, especially with the non-greedy (?). To repeat a pattern, you need to use parentheses. For example, (a{6})* matches any number of groups of six 'a's.

The following are the most commonly used character classes in Python regular expressions.

Character Class Meaning Description
. Any character except a newline Matches any single character except a newline character (`\n`).
\d Digit character Matches any numeric digit from 0 to 9 (equivalent to [0-9]).
\D Non-digit character Matches any character that is not a digit.
\w Word character Matches any alphanumeric character, including underscore.
\W Non-word character Matches any character that is not a letter, digit, or underscore.
\s Whitespace character Matches any whitespace character, such as space, tab, newline, etc.
\S Non-whitespace character Matches any character that is not a whitespace character.
[abc] Matches a, b, or c Matches any one character within the set a, b, or c.
[^abc] Not a, b, or c Matches any character except those listed inside the brackets.
[a-z] Lowercase letters Matches any lowercase letter.
[A-Z] Uppercase letters Matches any uppercase letter
[0-9] Digits Matches any digit from 0 to 9.
[\[\]] Literal [ or ] Matches a literal opening or closing square bracket.

Basic Example: Matching Vowels

The following is a simple program that searches for vowels in a sentence using the re.findall() function. It returns a list of all the characters that match the given pattern.

import re
text = "The quick brown fox jumps over the lazy dog."
vowels = re.findall("[aeiou]", text)
print(vowels)

Following is the output of the above code:

['e', 'u', 'i', 'o', 'o', 'u', 'e', 'o', 'e', 'a', 'o']

Matching Non-Vowels (Negated Character Class)

We can reverse a character class using the ^ inside the square brackets. For example, [^aeiou] matches any character except the vowels. In this example, we also exclude the space character, so we only get consonants and punctuation.

import re
text = "The quick brown fox jumps over the lazy dog."
consonants = re.findall("[^aeiou ]", text)
print(consonants)

Following is the output of the above code:

['T', 'h', 'q', 'c', 'k', 'b', 'r', 'w', 'n', 'f', 'x', 'j', 'm', 'p', 's', 'v', 'r', 't', 'h', 'l', 'z', 'y', 'd', 'g', '.']

Using Ranges in Character Classes

Using ranges in character classes allows you to specify a set of characters based on their alphabetical or numerical order within a regular expression. This avoids having to list each character individually.

A hyphen - inside a character class can be used to define a range of characters. For instance, [A-Z] matches any uppercase letter. In this example, only the letter 'T' is matched because it's the only capital letter in the input text.

import re
text = "The quick brown fox jumps over the lazy dog."
capital_letters = re.findall("[A-Z]", text)
print(capital_letters)

Following is the output of the above code:

['T']
Updated on: 2025-08-28T11:09:56+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements