Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Regular Expression Examples in Python
Regular expressions (regex) are powerful tools for pattern matching and text processing in Python. The re module provides functions to work with regular expressions, allowing you to search, match, and manipulate text based on specific patterns.
Literal Characters
Literal characters in regex match themselves exactly. Here's how to use basic literal matching ?
import re
text = "python is awesome"
pattern = "python"
match = re.search(pattern, text)
if match:
print(f"Found: '{match.group()}'")
else:
print("Not found")
Found: 'python'
Character Classes
Character classes allow you to match any character from a specified set. Use square brackets to define a character class ?
import re
# Match Python or python
text1 = "Python programming"
text2 = "python programming"
pattern = "[Pp]ython"
for text in [text1, text2]:
if re.search(pattern, text):
print(f"Match found in: {text}")
Match found in: Python programming Match found in: python programming
Common Character Classes
| Pattern | Description | Example |
|---|---|---|
[aeiou] |
Match any lowercase vowel | Matches 'a', 'e', 'i', 'o', 'u' |
[0-9] |
Match any digit | Same as [0123456789] |
[a-z] |
Match any lowercase letter | Matches 'a' through 'z' |
[^aeiou] |
Match anything except vowels | Negated character class |
Special Character Classes
Python regex provides shorthand notations for common character classes ?
import re
text = "Phone: 123-456-7890"
# \d matches digits
digits = re.findall(r'\d', text)
print(f"Digits found: {digits}")
# \w matches word characters
words = re.findall(r'\w+', text)
print(f"Words found: {words}")
# \s matches whitespace
spaces = re.findall(r'\s', text)
print(f"Spaces found: {len(spaces)} space(s)")
Digits found: ['1', '2', '3', '4', '5', '6', '7', '8', '9', '0'] Words found: ['Phone', '123', '456', '7890'] Spaces found: 1 space(s)
Quantifiers and Repetition
Quantifiers specify how many times a character or group should be matched ?
import re
texts = ["rub", "ruby", "rubyyy", "123", "12345"]
patterns = {
r'ruby?': "Match 'rub' or 'ruby' (y is optional)",
r'ruby*': "Match 'rub' plus 0 or more y's",
r'ruby+': "Match 'rub' plus 1 or more y's",
r'\d{3}': "Match exactly 3 digits",
r'\d{3,5}': "Match 3 to 5 digits"
}
for pattern, description in patterns.items():
print(f"\nPattern: {pattern} - {description}")
for text in texts:
if re.fullmatch(pattern, text):
print(f" ? '{text}' matches")
else:
print(f" ? '{text}' doesn't match")
Pattern: ruby? - Match 'rub' or 'ruby' (y is optional)
? 'rub' matches
? 'ruby' matches
? 'rubyyy' doesn't match
? '123' doesn't match
? '12345' doesn't match
Pattern: ruby* - Match 'rub' plus 0 or more y's
? 'rub' matches
? 'ruby' matches
? 'rubyyy' matches
? '123' doesn't match
? '12345' doesn't match
Pattern: ruby+ - Match 'rub' plus 1 or more y's
? 'rub' doesn't match
? 'ruby' matches
? 'rubyyy' matches
? '123' doesn't match
? '12345' doesn't match
Pattern: \d{3} - Match exactly 3 digits
? 'rub' doesn't match
? 'ruby' doesn't match
? 'rubyyy' doesn't match
? '123' matches
? '12345' doesn't match
Pattern: \d{3,5} - Match 3 to 5 digits
? 'rub' doesn't match
? 'ruby' doesn't match
? 'rubyyy' doesn't match
? '123' matches
? '12345' matches
Greedy vs Non-greedy Matching
By default, quantifiers are greedy and match as much as possible. Add ? to make them non-greedy ?
import re
text = "<python>perl>"
# Greedy matching
greedy = re.search(r'<.*>', text)
print(f"Greedy match: {greedy.group()}")
# Non-greedy matching
non_greedy = re.search(r'<.*?>', text)
print(f"Non-greedy match: {non_greedy.group()}")
Greedy match: <python>perl> Non-greedy match: <python>
Grouping and Alternatives
Use parentheses to group patterns and the pipe | symbol for alternatives ?
import re
texts = ["python", "perl", "ruby", "ruble"]
# Alternative matching
pattern = r'python|perl'
print("Matching 'python' or 'perl':")
for text in texts:
if re.search(pattern, text):
print(f" ? '{text}' matches")
# Grouping example
pattern2 = r'rub(y|le)'
print("\nMatching 'ruby' or 'ruble':")
for text in texts:
if re.fullmatch(pattern2, text):
print(f" ? '{text}' matches")
Matching 'python' or 'perl': ? 'python' matches ? 'perl' matches Matching 'ruby' or 'ruble': ? 'ruby' matches ? 'ruble' matches
Anchors and Boundaries
Anchors specify where in the text the pattern should match ?
import re
texts = ["Python is great", "I love Python", "Python"]
# Start of string
start_pattern = r'^Python'
print("Matches starting with 'Python':")
for text in texts:
if re.search(start_pattern, text):
print(f" ? '{text}'")
# End of string
end_pattern = r'Python$'
print("\nMatches ending with 'Python':")
for text in texts:
if re.search(end_pattern, text):
print(f" ? '{text}'")
# Word boundary
boundary_text = "Python programming in python"
word_pattern = r'\bpython\b'
matches = re.findall(word_pattern, boundary_text, re.IGNORECASE)
print(f"\nWord boundary matches in '{boundary_text}': {matches}")
Matches starting with 'Python': ? 'Python is great' ? 'Python' Matches ending with 'Python': ? 'I love Python' ? 'Python' Word boundary matches in 'Python programming in python': ['Python', 'python']
Conclusion
Regular expressions provide powerful pattern matching capabilities in Python. Master character classes, quantifiers, and anchors to efficiently search and manipulate text. Use the re module's functions like search(), findall(), and match() for different matching needs.
