What are character classes or character sets used in Python regular expression?

In this chapter, we will understand character classes, how they work, and provide simple examples and programs to explain their usage. One of the main components of regular expressions is character classes or character sets.

Character classes help you to define a set of characters that will match in a string. They can be used to define a range of characters or specific characters to consider during a search.

Character Classes or Character Sets

A character class, also known as a character set, helps you inform the regex engine to match only one of numerous characters. Simply enter the characters you want to match in square brackets. If you want to match an 'a' or 'e', type [ae]. This could be used in gr[ae]y to match 'gray' or 'grey' − very helpful if you're dealing with American or British English spelling variations.

A hyphen can be used within a character class to represent a range of characters. [0-9] refers to a single digit between 0 and 9. You can also use multiple ranges. [0-9a-fA-F] represents a single hexadecimal digit, case insensitive. You can combine ranges and individual characters − [0-9a-fxA-FX] represents a hexadecimal number or the letter X.

Character classes are one of the most common types of regular expressions. You can find misspelled terms like sep[ae]r[ae]te or li[cs]en[cs]e. A programming language's identifier can be matched with [A-Za-z_][A-Za-z_0-9]*, and a C-style hexadecimal number with 0[xX][A-Fa-f0-9]+.

Predefined Character Sets

Here are some of the predefined character sets for your reference −

  • \d accepts any digit (equivalent to [0-9])

  • \D corresponds to any non-digit character

  • \w corresponds to any alphanumeric character (equivalent to [a-zA-Z0-9_])

  • \W corresponds to any non-alphanumeric character

  • \s matches all whitespace characters (spaces, tabs and newlines)

  • \S corresponds to any non-whitespace character

Now we will see some examples to show you how we can use character sets in Python Regular Expressions −

Match Specific Characters

In this example, we have created a regex pattern for matching 'cat', 'rat', or 'mat'. The character class [crm] allows any character from 'c', 'r', or 'm' before 'at' −

import re

text = "The cat and rat sat on the mat."
pattern = r'[crm]at'  

matches = re.findall(pattern, text)
print(matches)

The output of the above code is −

['cat', 'rat', 'mat']

Matching Vowels

The program below searches for all vowels in the given string and returns a list of those vowels as output −

import re

text = "Hello World! Are you there?"
pattern = r'[aeiouAEIOU]'

matches = re.findall(pattern, text)
print(matches)

The output of the above code is −

['e', 'o', 'o', 'A', 'e', 'o', 'u', 'e', 'e']

Find Numeric Characters

In this program, we will extract all numeric characters or digits from the given string. Here we use the \d pattern, which identifies each digit present in the given string −

import re

text = "There are 4 apples and 10 oranges."
pattern = r'\d'  

matches = re.findall(pattern, text)
print(matches)

The output of the above code is −

['4', '1', '0']

Match Username Pattern

The program below checks if the username contains only letters, numbers, or underscores. We use the character set [a-zA-Z0-9_] to define all allowed characters −

import re

username = "user_123"
pattern = r"^[a-zA-Z0-9_]+$"
match = re.match(pattern, username)

if match:
    print("Valid username")
else:
    print("Invalid username")

The output of the above code is −

Valid username

Negated Character Classes

You can also create negated character classes using the caret ^ symbol at the beginning of the character class. This matches any character NOT in the specified set −

import re

text = "abc123XYZ!@#"
pattern = r'[^a-zA-Z]'  # Match non-alphabetic characters

matches = re.findall(pattern, text)
print(matches)

The output of the above code is −

['1', '2', '3', '!', '@', '#']

Conclusion

Character classes in Python regular expressions provide a powerful way to match specific sets of characters. Use square brackets [...] to define custom character sets, leverage predefined classes like \d and \w for common patterns, and use negated classes [^...] to exclude specific characters.

Updated on: 2026-03-24T19:03:23+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements