Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Pattern matching in Python with Regex
Regular expressions (regex) are a powerful tool for pattern matching and string manipulation in Python. The re module provides comprehensive regex functionality for finding, matching, and replacing text patterns.
What is Regular Expression?
A regular expression is a sequence of characters that defines a search pattern. In Python, the re module handles string parsing and pattern matching. Regular expressions can answer questions like ?
Is this string a valid URL?
Which users in /etc/passwd are in a given group?
What is the date and time of all warning messages in a log file?
What username and document were requested by the URL a visitor typed?
A typical regular expression search follows this pattern ?
import re match = re.search(pattern, string)
Basic Pattern Matching
Let's start with a simple example using literal characters ?
import re
search_string = "TutorialsPoint"
pattern = "Tutorials"
match = re.match(pattern, search_string)
if match:
print("regex matches:", match.group())
else:
print('pattern not found')
regex matches: Tutorials
Using re.search() for Pattern Matching
The re.search() method finds the first occurrence of a pattern anywhere in the string ?
Syntax
matchObject = re.search(pattern, input_string, flags=0)
Example with Groups
import re
# Regular expression to match a date string
regex = r"([a-zA-Z]+) (\d+)"
text = "Jan 2"
if re.search(regex, text):
match = re.search(regex, text)
# Match position
print("Match at index %s, %s" % (match.start(), match.end()))
# Full match and groups
print("Full match: %s" % (match.group(0)))
print("Month: %s" % (match.group(1)))
print("Day: %s" % (match.group(2)))
else:
print("Pattern not Found!")
Match at index 0, 5 Full match: Jan 2 Month: Jan Day: 2
Capturing Groups with findall()
When patterns include parentheses, findall() returns tuples containing captured groups ?
import re
regex = r'([\w\.-]+)@([\w\.-]+)'
text = 'hello john@hotmail.com, hello@Tutorialspoint.com, hello python@gmail.com'
matches = re.findall(regex, text)
print("All matches:", matches)
for username, host in matches:
print("Username:", username)
print("Host:", host)
print("---")
All matches: [('john', 'hotmail.com'), ('hello', 'Tutorialspoint.com'), ('python', 'gmail.com')]
Username: john
Host: hotmail.com
---
Username: hello
Host: Tutorialspoint.com
---
Username: python
Host: gmail.com
---
Finding and Replacing with re.sub()
Use re.sub() to find patterns and replace them with new text ?
import re text = 'hello john@hotmail.com, hello@Tutorialspoint.com, hello python@gmail.com, Hello World!' pattern = r'([\w\.-]+)@([\w\.-]+)' replacement = r'\1@XYZ.com' # \1 refers to first group (username) result = re.sub(pattern, replacement, text) print(result)
hello john@XYZ.com, hello@XYZ.com, hello python@XYZ.com, Hello World!
Regular Expression Flags
Flags modify how patterns are matched. Common flags include ?
re.IGNORECASE ? Makes pattern case-insensitive, so 'a' matches both 'a' and 'A'
re.DOTALL ? Allows dot (.) to match newline characters (\n)
re.MULTILINE ? Enables ^ and $ to match start/end of each line, not just the whole string
Example with Flags
import re
text = "Python PROGRAMMING"
pattern = r"python"
# Without flag
match1 = re.search(pattern, text)
print("Without IGNORECASE:", match1)
# With IGNORECASE flag
match2 = re.search(pattern, text, re.IGNORECASE)
print("With IGNORECASE:", match2.group() if match2 else None)
Without IGNORECASE: None With IGNORECASE: Python
Conclusion
Regular expressions in Python provide powerful pattern matching capabilities through the re module. Use re.search() for finding patterns, re.findall() for extracting all matches, and re.sub() for replacements. Flags like re.IGNORECASE modify matching behavior for more flexible pattern matching.
