What are metacharacters inside character classes used in Python regular expression?


Most letters and characters simply match themselves. However, there are some characters called metacharacters, that don’t match themselves. Instead, they indicate that some pattern should be matched, or they repeat or change parts of the regular expression. 

Here’s a complete list of the metacharacters

. ^ $ * + ? { } [ ] \ | ( )

At first we’ll look at [ and ]. They’re used for indicating a character class, which is a set of characters that you want to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a '-'. For example, [xyz] will match any of the characters x, y, or z; this is the same as [x-z], which uses a range to express the same set of characters. If you wanted to match only lowercase letters, your regex would be [a-z].

Metacharacters don't work inside classes. For example, [abc$] will match any of the characters 'a', 'b', 'c', or '$'; '$' is a metacharacter, but inside a character class it is stripped of its special nature.

If there is a '^' as the first character of a class it means all those characters that are NOT of this class;  For example, [^8] will match any character except '8'.

Perhaps the most significant metacharacter is the backslash, \. It’s also used to escape all the metacharacters so you can still match them in patterns; for example, if you need to match a ] or \, you can precede them with a backslash to remove their special meaning: \] or \.

Updated on: 13-Jun-2020

271 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements