Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How not to match a character after repetition in Python Regex?
Regular expressions (regex) are powerful tools in Python for pattern matching and text processing. Sometimes you need to match a pattern but exclude cases where specific characters follow after repetition. This is achieved using lookahead assertions that check what comes next without including it in the match.
Understanding re.findall() Function
The re.findall() function searches for all occurrences of a pattern in a string and returns them as a list. It takes two main parameters: the pattern to search for and the string to search in.
Syntax
import re re.findall(pattern, string)
Using Positive Lookahead (?=)
Positive lookahead (?=x) matches a position where 'x' follows, but doesn't include 'x' in the match ?
import re # Match digits followed by 'x' but don't include 'x' pattern = r'\d+(?=x)' text = '123x 456xx 789xxx' matches = re.findall(pattern, text) print(matches)
['123', '456', '789']
Using Negative Lookahead (?!)
Negative lookahead (?!x) matches positions where 'x' does NOT follow. This helps exclude unwanted patterns after repetition ?
import re
# Match 3-4 'a's not followed by 'X'
text = "aaaaX aaaY aaaaZ aaaX"
pattern = r"a{3,4}(?!X)"
matches = re.findall(pattern, text)
print(matches)
['aaa', 'aaaa', 'aaa']
Excluding Specific Digits After Numbers
You can match repeated digits while avoiding specific characters that follow ?
import re
# Match three digits not followed by '5'
text = "777A 7775 888C 8885"
pattern = r"\d{3}(?!5)"
matches = re.findall(pattern, text)
print(matches)
['777', '888']
Avoiding Special Characters After Words
Match words of certain length while excluding those followed by punctuation marks ?
import re
# Match words with 4+ letters not followed by '!'
text = "Wow! Amazing Fantastic! Excellent"
pattern = r"\b\w{4,}(?!\!)"
matches = re.findall(pattern, text)
print(matches)
['Amazing', 'Excellent']
Complex Pattern Exclusion
You can combine multiple lookahead assertions for more complex exclusion rules ?
import re
# Match repeated letters not followed by numbers or punctuation
text = "aaa1 bbb! ccc ddd2 eee"
pattern = r"[a-z]{3}(?![0-9!])"
matches = re.findall(pattern, text)
print(matches)
['ccc', 'eee']
Summary Table
| Lookahead Type | Syntax | Purpose |
|---|---|---|
| Positive | (?=x) |
Match if followed by 'x' |
| Negative | (?!x) |
Match if NOT followed by 'x' |
Conclusion
Lookahead assertions are essential for controlling matches after repetition in regex. Use positive lookahead (?=) to ensure specific characters follow, and negative lookahead (?!) to exclude unwanted patterns that come after repeated elements.
