Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Regular Expression Patterns in Python
Regular expressions are patterns used to match character combinations in strings. In Python, most characters match themselves, except for special control characters (+ ? . * ^ $ ( ) [ ] { } | \) which have special meanings. You can escape these control characters by preceding them with a backslash.
The following table lists the regular expression syntax available in Python ?
| Pattern | Description | Example |
|---|---|---|
| ^ | Matches beginning of line |
^Hello matches "Hello world" |
| $ | Matches end of line |
world$ matches "Hello world" |
| . | Matches any single character except newline |
h.t matches "hat", "hit", "hot" |
| [...] | Matches any single character in brackets |
[aeiou] matches any vowel |
| [^...] | Matches any single character not in brackets |
[^0-9] matches any non-digit |
| * | Matches 0 or more occurrences of preceding expression |
ab* matches "a", "ab", "abb" |
| + | Matches 1 or more occurrence of preceding expression |
ab+ matches "ab", "abb" but not "a" |
| ? | Matches 0 or 1 occurrence of preceding expression |
colou?r matches "color" or "colour" |
| {n} | Matches exactly n occurrences |
a{3} matches "aaa" |
| {n,} | Matches n or more occurrences |
a{2,} matches "aa", "aaa", etc. |
| {n,m} | Matches between n and m occurrences |
a{2,4} matches "aa", "aaa", "aaaa" |
| | | Matches either pattern (OR operator) |
cat|dog matches "cat" or "dog" |
| () | Groups expressions and captures matched text |
(ab)+ matches "ab", "abab" |
| \w | Matches word characters (letters, digits, underscore) |
\w+ matches "hello123" |
| \W | Matches non-word characters |
\W matches spaces, punctuation |
| \d | Matches digits (equivalent to [0-9]) |
\d+ matches "123" |
| \D | Matches non-digits |
\D+ matches "abc" |
| \s | Matches whitespace characters |
\s+ matches spaces, tabs, newlines |
| \S | Matches non-whitespace characters |
\S+ matches "hello" |
| \b | Matches word boundaries |
\bword\b matches "word" but not "sword" |
| \B | Matches non-word boundaries |
\Boo\B matches "oo" in "book" |
Basic Examples
Here are some practical examples of regular expressions in Python ?
import re
# Match digits
text = "I have 25 apples and 10 oranges"
numbers = re.findall(r'\d+', text)
print("Numbers found:", numbers)
# Match email pattern
email_text = "Contact us at info@example.com or support@test.org"
emails = re.findall(r'\w+@\w+\.\w+', email_text)
print("Emails found:", emails)
# Match word boundaries
sentence = "The cat sat on the mat"
cat_matches = re.findall(r'\bcat\b', sentence)
print("Word 'cat' found:", len(cat_matches), "times")
Numbers found: ['25', '10'] Emails found: ['info@example.com', 'support@test.org'] Word 'cat' found: 1 times
Character Classes and Quantifiers
Character classes and quantifiers are essential for flexible pattern matching ?
import re
text = "Phone numbers: 123-456-7890, (555) 123-4567, 555.123.4567"
# Match different phone number formats
phone_pattern = r'\d{3}[-.(]?\d{3}[-.)]\s?\d{4}'
phones = re.findall(phone_pattern, text)
print("Phone numbers:", phones)
# Match vowels using character class
words = "Hello World Python Programming"
vowels = re.findall(r'[aeiouAEIOU]', words)
print("Vowels found:", vowels)
# Match non-vowels
consonants = re.findall(r'[^aeiouAEIOU\s]', words)
print("Consonants found:", consonants)
Phone numbers: ['123-456-7890', '555) 123-4567', '555.123.4567'] Vowels found: ['e', 'o', 'o', 'o', 'y', 'o', 'o', 'a', 'i'] Consonants found: ['H', 'l', 'l', 'W', 'r', 'l', 'd', 'P', 't', 'h', 'n', 'P', 'r', 'g', 'r', 'm', 'm', 'n', 'g']
Grouping and Alternatives
Use parentheses for grouping and the pipe symbol for alternatives ?
import re
text = "I like cats and dogs, but not rats or bats"
# Match cats or dogs using alternation
pets = re.findall(r'cats|dogs', text)
print("Pets found:", pets)
# Match words ending in 'ats' using grouping
ats_words = re.findall(r'(c|r|b)ats', text)
print("Words ending in 'ats':", ats_words)
# Extract parts using groups
dates = "Today is 2024-03-15 and tomorrow is 2024-03-16"
date_pattern = r'(\d{4})-(\d{2})-(\d{2})'
matches = re.findall(date_pattern, dates)
print("Date parts:", matches)
Pets found: ['cats', 'dogs']
Words ending in 'ats': ['c', 'r', 'b']
Date parts: [('2024', '03', '15'), ('2024', '03', '16')]
Conclusion
Regular expressions provide powerful pattern matching capabilities in Python. Master the basic metacharacters, character classes, and quantifiers to create flexible search patterns for text processing and data validation.
