Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to match whitespace but not newlines using Python regular expressions?
In Python, regular expressions (regex) provide powerful tools to search and manipulate strings. When you need to match whitespace characters like spaces and tabs but exclude newline characters, you can use specific regex patterns with Python's re module.
The following methods demonstrate how to match whitespace but not newlines using Python regular expressions ?
Using re.sub() Method
The re.sub() method efficiently replaces whitespace characters (excluding newlines) within a string. The regex pattern [ \t]+ matches one or more occurrences of spaces or tabs, preserving newlines in the original text.
Example
The following example replaces multiple spaces and tabs with a single space while keeping newlines intact ?
import re
text = "This is a sample text.\nIt contains multiple spaces.\n\tAnd tabs too."
normalized_text = re.sub(r'[ \t]+', ' ', text)
print("Original text:")
print(repr(text))
print("\nNormalized text:")
print(repr(normalized_text))
print("\nFormatted output:")
print(normalized_text)
The output of the above code is ?
Original text: 'This is a sample text.\nIt contains multiple spaces.\n\tAnd tabs too.' Normalized text: 'This is a sample text.\nIt contains multiple spaces.\n And tabs too.' Formatted output: This is a sample text. It contains multiple spaces. And tabs too.
Using re.findall() Method
The re.findall() method returns a list of all whitespace sequences (excluding newlines) found in the string. This is useful for analyzing whitespace patterns in text.
Example
The following example finds all whitespace sequences using the [ \t]+ pattern ?
import re
text = "This is a sample text.\nIt contains multiple spaces."
whitespace_matches = re.findall(r'[ \t]+', text)
print("Text:", repr(text))
print("Whitespace matches:", whitespace_matches)
print("Number of matches:", len(whitespace_matches))
The output of the above code is ?
Text: 'This is a sample text.\nIt contains multiple spaces.' Whitespace matches: [' ', ' ', ' ', ' ', ' '] Number of matches: 5
Using Character Classes
Character classes provide more precise control over whitespace matching. The pattern [^\S\n] matches any whitespace character except newlines, offering an alternative to explicit space and tab matching.
Example
The following example demonstrates different character class patterns for matching whitespace ?
import re
text = "Hello\t\tworld with\nspaces and\ttabs."
# Method 1: Explicit space and tab
method1 = re.findall(r'[ \t]+', text)
# Method 2: Non-whitespace negation excluding newline
method2 = re.findall(r'[^\S\n]+', text)
print("Text:", repr(text))
print("Method 1 [ \t]+:", method1)
print("Method 2 [^\S\n]+:", method2)
print("Results match:", method1 == method2)
The output of the above code is ?
Text: 'Hello\t\tworld with\nspaces and\ttabs.' Method 1 [ \t]+: ['\t\t', ' ', ' ', '\t'] Method 2 [^\S\n]+: ['\t\t', ' ', ' ', '\t'] Results match: True
Using Positive Lookahead
Positive lookahead (?=pattern) checks what follows without including it in the match. This technique helps find whitespace followed by specific characters while excluding newlines from the match.
Example
The following example finds whitespace followed by non-whitespace characters using positive lookahead ?
import re
text = "This is a test.\nNew line here.\tAnother line."
# Find whitespace followed by non-whitespace characters
matches = re.findall(r'[ \t]+(?=\S)', text)
print("Whitespace before non-whitespace:", matches)
# Find whitespace at end of lines (before newline or end of string)
end_matches = re.findall(r'[ \t]+(?=\n|$)', text)
print("Whitespace at line ends:", end_matches)
# Replace whitespace sequences with single space using lookahead
cleaned = re.sub(r'[ \t]+(?=\S)', ' ', text)
print("Cleaned text:", repr(cleaned))
The output of the above code is ?
Whitespace before non-whitespace: [' ', ' ', ' ', ' ', '\t'] Whitespace at line ends: [] Cleaned text: 'This is a test.\nNew line here. Another line.'
Comparison of Methods
| Method | Pattern | Best For | Preserves Newlines? |
|---|---|---|---|
re.sub() |
[ \t]+ |
Replacing whitespace | Yes |
re.findall() |
[ \t]+ |
Finding whitespace patterns | Yes |
| Character classes | [^\S\n]+ |
Complex whitespace matching | Yes |
| Positive lookahead | [ \t]+(?=\S) |
Conditional matching | Yes |
Conclusion
Use the [ \t]+ pattern with re.sub() for simple whitespace replacement. For more complex scenarios, character classes like [^\S\n]+ or lookahead patterns provide greater control while preserving newline characters.
