Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to match pattern over multiple lines in Python?
Learning Python's Regular Expressions (Regex) may require you to match text that spans multiple lines. This commonly occurs when reading information from files or scraping data from websites.
This article demonstrates how to match patterns across multiple lines using Python's re module with special flags like re.DOTALL and re.MULTILINE.
Understanding Multi-line Pattern Matching
By default, the dot metacharacter . in regular expressions matches any character except newline characters. To match patterns that span multiple lines, Python's re module provides special flags that modify this behavior ?
re.DOTALL: Makes
.match any character including newlinesre.MULTILINE: Makes
^and$match start/end of each line, not just the entire string
Using re.DOTALL Flag
The re.DOTALL flag allows the dot . to match newline characters, enabling pattern matching across multiple lines ?
import re
paragraph = '''
<p>
Tutorials point is a website.
It is a platform to enhance your skills.
</p>
'''
match = re.search(r'<p>.*</p>', paragraph, re.DOTALL)
if match:
print(match.group(0))
<p> Tutorials point is a website. It is a platform to enhance your skills. </p>
Using re.MULTILINE Flag
The re.MULTILINE flag makes ^ and $ match the start and end of each line, not just the entire string ?
import re
text = """Hello world.
It is a beautiful day.
It will work!"""
matches = re.findall(r"^It.*", text, re.MULTILINE)
print("Matches found:", matches)
Matches found: ['It is a beautiful day.', 'It will work!']
Complex Multi-line Patterns
You can combine flags and create complex patterns. Here's how to find lines starting with "It" and ending with a period ?
import re
multi_line_text = """Hello world.
It is a beautiful day.
This is a test.
It will work!"""
pattern = r'(^It.*\.)'
matches = re.findall(pattern, multi_line_text, re.MULTILINE)
print("Matches found:", matches)
Matches found: ['It is a beautiful day.']
Matching HTML Blocks
Use re.DOTALL to match HTML blocks that span multiple lines ?
import re
html = """<div>
<p>Hello</p>
<p>Welcome</p>
</div>"""
pattern = r"<div>.*</div>"
match = re.search(pattern, html, re.DOTALL)
if match:
print(match.group())
<div> <p>Hello</p> <p>Welcome</p> </div>
Comparison of Flags
| Flag | Purpose | Best For |
|---|---|---|
re.DOTALL |
Makes . match newlines |
Matching content across lines |
re.MULTILINE |
Makes ^ and $ match line boundaries |
Line-by-line pattern matching |
Conclusion
Use re.DOTALL when you need to match patterns spanning multiple lines. Use re.MULTILINE when you want to match patterns at the beginning or end of individual lines within multi-line text.
